转自http://yaanzy.itpub.net/post/1263/10217
在平时工作中,IN & EXISTS NOT IN & NOT EXISTS是使用频率比较高的SQL语句,
所以对它们的优化工作是很有必要的
测试环境:Oracle 9.2.0.1 for Windows2000
1、IN 和 EXISTS
IN和EXISTS的处理流程是不一样的:
IN的执行流程
select * from T1 where x in ( select y from T2 )
可以理解为:
select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y;
EXISTS的执行流程
select * from t1 where exists ( select null from t2 where y = x )
可以理解为:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
创建测试用例表big(4000 row)和small(400 row)
create table big as select * from dba_objects where rownum <= 10000;
insert into big select * from big;
insert into big select * from big;
commit;
create table small as select * from dba_objects where rownum <= 400;
当内层表为small,外层表为big时,两种语法的查询如下:
SQL> select count(1) from big a where a.object_id in
(select b.object_id from sall b);
COUNT(1)
----------
1600
已用时间: 00: 00: 00.56
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 MERGE JOIN
3 2 SORT (JOIN)
4 3 TABLE ACCESS (FULL) OF 'BIG'
5 2 SORT (JOIN)
6 5 VIEW OF 'VW_NSO_1'
7 6 SORT (UNIQUE)
8 7 TABLE ACCESS (FULL) OF 'SMALL'
Statistics
-----------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
SQL> select count(1) from big a where exists
(select 1 from small b where a.object_id=b.object_id);
COUNT(1)
----------
1600
已用时间: 00: 00: 03.10
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'BIG'
4 2 TABLE ACCESS (FULL) OF 'SMALL'
Statistics
-----------------------------------------------------
0 recursive calls
0 db block gets
312157 consistent gets
0 physical reads
当内层表为big,外层表为small时,两种语法的查询如下:
SQL> select count(1) from small a where a.object_id in
(select b.object_id from big b);
COUNT(1)
----------
400
已用时间: 00: 00: 00.56
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 MERGE JOIN
3 2 SORT (JOIN)
4 3 TABLE ACCESS (FULL) OF 'SMALL'
5 2 SORT (JOIN)
6 5 VIEW OF 'VW_NSO_1'
7 6 SORT (UNIQUE)
8 7 TABLE ACCESS (FULL) OF 'BIG'
Statistics
-----------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
SQL> select count(1) from small a where exists
(select null from big b where a.bject_id=b.object_id);
COUNT(1)
----------
400
已用时间: 00: 00: 00.25
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'SMALL'
4 2 TABLE ACCESS (FULL) OF 'BIG'
Statistics
-----------------------------------------------------
0 recursive calls
0 db block gets
2562 consistent gets
0 physical reads
在对表big、small进行分析后,发现CBO下两种语法的执行计划是一样的,都使用hash连接或者hash半连接
SQL> analyze table big compute statistics;
SQL> analyze table small compute statistics;
SQL> select count(1) from big a where a.object_id in
(select b.object_id from small b);
COUNT(1)
----------
1600
已用时间: 00: 00: 00.09
Execution Plan
-------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=58
Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=58 Card=1600 Bytes=12800)
3 2 SORT (UNIQUE)
4 3 TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
5 2 TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
Statistics
-------------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
SQL> select count(1) from big a where exists
(select 1 from small b where a.object_id=b.object_id);
COUNT(1)
----------
1600
已用时间: 00: 00: 00.09
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=58 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=58 Card=1600 Bytes=12800)
3 2 SORT (UNIQUE)
4 3 TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
5 2 TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
Statistics
--------------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
SQL> select count(1) from small a where a.object_id in
(select b.object_id from big b);
COUNT(1)
----------
400
已用时间: 00: 00: 00.09
Execution Plan
------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
3 2 TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
4 2 TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
Statistics
------------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
SQL> select count(1) from small a where exists
(select 1 from big b where a.object_id=b.object_id);
COUNT(1)
----------
400
已用时间: 00: 00: 00.09
Execution Plan
-------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
3 2 TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
4 2 TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
Statistics
-------------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
删除表分析,使用提示/*+ use_hash(a,b) */ 或者 /*+ hash_sj */,
两种语法都可以达到CBO的执行计划
SQL> analyze table big delete statistics;
SQL> analyze table small delete statistics;
SQL> select /*+ use_hash(a,b) */count(1) from big a where a.object_id in
(select b.object_id from small b);
SQL> select /*+ use_hash(a,b) */count(1) from big a where exists
(select 1 from small b where a.object_id=b.object_id);
SQL> select count(1) from small a where a.object_id in
(select /*+ hash_sj */ b.object_id from big b);
SQL> select count(1) from small a where exists
(select /*+ hash_sj */ 1 from big b where a.object_id=b.object_id);
下表列出了各种情况下的速度情况:
┌───────────┬──────────────────────────┬─────────────────────────┬─────────────┐
│ │ outer big,inner small │ outer small,inner big │ table rows │
├───────────┼──────────┬───────────────┼──────────┬──────────────┼─────────────┤
│ │ IN SQL │ EXISTS SQL │ IN SQL │ EXISTS SQL │ │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │ 0.56s │ 3.10s │ 0.56s │ 0.25s │ big=40000 │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤ and │
│ analyzed │ 0.09s │ 0.09s │ 0.09s │ 0.09s │ small=400 │
├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
│un-analyze │ 0.72s │ 3.53s │ 0.25s │ 2.97s │ big=5000 │
├───────────┼──────────┼───────────────┼──────────┼──────────────┤ and │
│ analyzed │ 0.09s │ 0.09s │ 0.09s │ 0.09s │ small=4000 │
└───────────┴──────────┴───────────────┴──────────┴──────────────┴─────────────┘
结论:在未对表进行分析前,若两个表数据量差异很大,则外层表是大表时使用IN较快,
外层表是小表时使用EXISTS较快;若两表数据量接近,则使用IN较快;
分析表后无论用IN还是EXISTS都变得更快,由于执行计划一样,所以速度一样;
所以:无论使用IN还是EXISTS,只要使用散列连接,即提示/*+ use_hash(a,b) */,
或者在子句中散列半连接提示/*+ hash_sj */, 就使其达到最优速度;
附注:半连接的提示有hash_sj、merge_sj、nl_sj
***********************************************************************************************************************
***********************************************************************************************************************
2、NOT IN 和 NOT EXISTS
NOT EXISTS的执行流程
select .....
from rollup R
where not exists ( select 'Found' from title T
where R.source_id = T.Title_ID);
可以理解为:
for x in ( select * from rollup )
loop
if ( not exists ( that query ) ) then
OUTPUT
end if;
end;
注意:NOT EXISTS 与 NOT IN 不能完全互相替换,看具体的需求。如果选择的列可以为空,则不能被替换。
对于not in 和 not exists的性能区别:
not in 只有当子查询中,select 关键字后的字段有not null约束或者有这种暗示时用not in,另外如果主查询中表大,
子查询中的表小但是记录多,则应当使用not in,并使用anti hash join.
如果主查询表中记录少,子查询表中记录多,并有索引,可以使用not exists,另外not in最好也可以用/*+ HASH_AJ */
或者外连接+is null,NOT IN 在基于成本的应用中较好
创建测试用例表big(40000 row)和small(1000 row):
truncate table big;
truncate table small;
insert into big select * from dba_objects where rownum <=20000;
insert into big select * from dba_objects where rownum <=20000;
insert into small select * from dba_objects where rownum <=1000;
commit;
基本句型:
<1> not in
SQL> select count(1) from big a where a.object_id not in (select b.object_id from small b);
COUNT(1)
----------
38000
已用时间: 00: 00: 12.56
Execution Plan
-----------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'BIG'
4 2 TABLE ACCESS (FULL) OF 'SMALL'
Statistics
分享到:
相关推荐
SQL语句优化——in,not in,exists,not exists, left join...on博客所需SQL语句.txt欢迎下载!
关于Exists,not Exists.in ,not in效率的说明。关于Exists,not Exists.in ,not in效率的说明。关于Exists,not Exists.in ,not in效率的说明。关于Exists,not Exists.in ,not in效率的说明。关于Exists,not Exists.in...
里面自己根据网上的资源整理出来的一份sql中in,exists,not in,not exists的使用方法以及注意事项等,有助于初学的朋友们借鉴。
sql语句优化之用EXISTS替代IN、用NOT EXISTS替代NOT IN的语句
SQL语句优化——in,not in,exists,not exists, left join...on博客所需SQL语句2.txt,欢迎下载!
NULL 博文链接:https://576017120.iteye.com/blog/1624774
exists 和 not exists的详细解释
MySQL优化之in,exists,not in,not exists的区别in与existsin查询过程结论:exists查询过程:结论:not in与not existsnot in查询过程:结论:not exists查询过程:结论: 首先我们使用两个用户表作为实例 insert ...
下面只是从理论上提出了一些建议,最好的原则是大家在上面的基础上,能够使用执行计划来分析,得出最佳的语句的写法。
1. EXISTS的执行流程 select * from t1 where exists ( select null from t2 where y = x ) 可以理解为: 代码如下: for x in ( select * from t1 ) loop if ( exists ( select null from t2 where y = x.x ) then ...
NOT EXISTS ⇔ NOT IN ⇒ NOT NULL 问题 结果不一样 疑问 如果等效
一个是问in exist的区别,一个是not in和not exists的区别
总结:exists 和not exists语句强调是否返回结果集,不要求知道返回什么,与in的区别就是,in只能返回一个字段值,exists允许返回多个字段 提醒:文章中提供了exists和not exists的常用示例,已经经过本人测试,...
非上传者作品。本人只是学习的时候百度到的比较的文档,拿出来和大家分享一下,希望对大家有所帮助!
系统要求进行SQL优化,对效率比较低的SQL进行优化,使其运行效率更高,其中要求对SQL中的部分in/not in修改为exists/not
NOT IN、JOIN、IS NULL、NOT EXISTS效率对比 语句一:select count(*) from A where A.a not in (select a from B) 语句二:select count(*) from A left join B on A.a = B.a where B.a is null 语句三:select ...
基本数据插入 except和intersect和exists和not exists和union和union all sql server
使用EXISTS(或NOT EXISTS)关键字引入一个子查询时,子查询实际上不产生任何数据;它只返回TRUE或FALS值。 指定一个子查询,检测行的存在。 如果子查询包含行,则返回 TRUE IN效率要差(返回子查询)