一段 sql 比较

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

MySQL 5.5 Community Server

MySQL 5.6 Community Server

Percona Configuration Wizard

XtraBackup 搭建主从复制

Great Sites on MySQL

Percona

MySQL Performance Blog

Severalnines

推荐管理工具

Sequel Pro

phpMyAdmin

推荐书目

MySQL Cookbook

MySQL 相关项目

MariaDB

Drizzle

参考文档

http://mysql-python.sourceforge.net/MySQLdb.html

这是一个创建于 1329 天前的主题，其中的信息可能已经有所发展或是发生改变。

A: and (data_his_id = '0' or data_his_id in (select id from xx where business_type = 1 and states = 2 and audit_status = 2))

B: and (data_his_id = '0' or (select count(1) from xx where business_type = 1 and states = 2 and audit_status = 2 and id=data_his_id)>=1)

本人菜鸡大佬说 a 写法好因为 b 是小表求解但是这连主键都用不上了本人不敢多说. 大伙看看

20 条回复 2022-04-28 22:58:26 +08:00

ration

2022-04-28 11:12:41 +08:00 via Android

凭感觉是 A 好，B 用了 count 还关联了表。具体你可以看看实际的执行时间和执行计划

Hurriance

2022-04-28 11:19:00 +08:00

感觉 A B 最终结果集是不一样的

bthulu

2022-04-28 11:29:28 +08:00

navicat 连上去, 生成测试数据, 大表 1 一个亿, 小表 100 万, 再把你这两条 sql 分别执行一下看看就知道了

brader

2022-04-28 11:31:56 +08:00

理论争论不下的时候，建议实操，试下各自的想法建立索引，不同表数据量下的实际查询时间较量。

另外，长点心吧，where 条件 id 用字符串'0'来做条件，虽然 mysql 听智能的，会自动转化

season8

2022-04-28 11:40:18 +08:00

都不好，换成 exists
and (data_his_id = '0' or exists (select 1 from xx where xx.id = data_his_id and business_type = 1 and states = 2 and audit_status = 2))

season8

2022-04-28 11:51:57 +08:00

@season8 抱歉，没看清，exist 大表小表都合适（万金油），如果 xx 是小表的话，A 方式也是可以的。

lipaa

2022-04-28 13:50:55 +08:00

@Hurriance 一样的

lipaa

2022-04-28 13:51:13 +08:00

@bthulu 1

lipaa

2022-04-28 13:51:25 +08:00

@brader 1

lipaa

2022-04-28 13:51:35 +08:00

@season8 1

lipaa

2022-04-28 13:51:47 +08:00

麻了

xuanbg

2022-04-28 13:52:23 +08:00

A 写法中规中矩，B 写法实在是脑洞有点大啊。

lipaa

2022-04-28 13:52:59 +08:00

@xuanbg 我也觉得

lipaa

2022-04-28 13:54:45 +08:00

我的想法是 A 查全表了 B 可以用上主键索引性能应该更好尤其 A 表后期变大了的情况下但是貌似我的想法错了我多测测吧复习吧

wolfie

2022-04-28 15:05:35 +08:00

A：关联表数据量大就慢。
B：跟 exists 差不多，看起来难受，性能不会拉跨。

DonaldY

2022-04-28 16:48:50 +08:00

都不好诶。

OR 优化掉吧。

LeegoYih

2022-04-28 17:03:59 +08:00

如果条件允许的话，我建议是拆成 2 个 SQL：
1：select id from xx where business_type = 1 and states = 2 and audit_status = 2;
2：select * from t where any = ? and data_his_id in (ids);

第 1 个 SQL 执行完后，通过代码往结果里塞一个'0'，然后再执行第二个 SQL 。

不确定这个是最高效的，建议在仿真环境看看每个 SQL 的执行计划：

```sql
set optimizer_trace="enabled=on";
select * from xxx where xx = ?;
select * from information_schema.optimizer_trace;
set optimizer_trace="enabled=off";
```

LeegoYih

2022-04-28 17:13:26 +08:00

如果一定要写成一个 SQL 的话，可以用 union all ，不会影响性能。

select * from t
where any = ?
and data_his_id in (
select 0
union all
select id from xx where business_type = 1 and states = 2 and audit_status = 2);

zlowly

2022-04-28 17:36:46 +08:00

两者有可能在优化器作用下差别不大，你真要比较，应该将 B 重写成 EXISTS 方式的查询语句来比较，因为 exists 可以在子查询只返回第一条记录而不是所有记录，应该对执行计划有较大影响。

但在真实世界里，如果 A 查询对 xx 表的子查询的结果集比较大（例如上万条）那么优化器可能会将原表和子查询结果集做 Hash Join ，而 B 查询也许是 Nested Loop ，这都是视乎你表的数据量和列数据的选择性等情况而定。

所以要做性能比较，这与其上网隔空问，还远不如直接看执行计划或者就是直接执行来得实在。

pengtdyd

2022-04-28 22:58:26 +08:00

现在的 sql 优化器你怎么写基本上已经没啥区别了。。。。。