SQL中的半連接在MySQL和Oracle還是存在一些差距,從測(cè)試的情況來(lái)看,Oracle的處理要更加全面。
首先我們來(lái)看看在MySQL中怎么測(cè)試,對(duì)于MySQL方面的測(cè)試也參考了不少海翔兄的博客文章,自己也完整的按照他的測(cè)試思路練習(xí)了一遍。
首先創(chuàng)建下面的表:
create table users(
userid int(11) unsigned not null,
user_name varchar(64) default null,
primary key(userid)
)engine=innodb default charset=UTF8;
如果要插入數(shù)據(jù),可以使用存儲(chǔ)過(guò)程的方式。比如先插入20000條定制數(shù)據(jù)。
delimiter $$
drop procedure if exists proc_auto_insertdata$$
create procedure proc_auto_insertdata()
begin
declare
init_data integer default 1;
while init_data<=20000 do
insert into users values(init_data,concat('user' ,init_data));
set init_data=init_data+1;
end while;
end$$
delimiter ;
call proc_auto_insertdata();
初始化的過(guò)程會(huì)很快,最后一步即插入數(shù)據(jù)花費(fèi)了近6秒的時(shí)間。
[test]>source insert_proc.sql
Query OK, 0 rows affected (0.12 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 1 row affected (5.63 sec)
然后我們使用如下的半連接查詢(xún)數(shù)據(jù),實(shí)際上執(zhí)行了6秒左右。
select u.userid,u.user_name from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
1999 rows in set (6.36 sec)
為了簡(jiǎn)化測(cè)試條件和查詢(xún)結(jié)果,我們使用count的方式來(lái)完成對(duì)比測(cè)試。
[test]>select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
+-----------------+
| count(u.userid) |
+-----------------+
| 1999 |
+-----------------+
1 row in set (6.38 sec)然后使用如下的方式來(lái)查看,當(dāng)然看起來(lái)這種結(jié)構(gòu)似乎有些多余,因?yàn)閡serid<-1的數(shù)據(jù)是不存在的。
select count(u.userid) from users u
where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
+-----------------+
| count(u.userid) |
+-----------------+
| 1999 |
+-----------------+
1 row in set (0.06 sec)但是效果卻好很多。
當(dāng)然兩種方式的執(zhí)行計(jì)劃差別很大。
第一種效率較差的執(zhí)行計(jì)劃如下:
[test]>explain select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
+----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | | ALL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 19762 | Using where; Using join buffer (Block Nested Loop) |
| 2 | MATERIALIZED | t | range | PRIMARY | PRIMARY | 4 | NULL | 1998 | Using where |
+----+--------------+-------------+-------+---------------+---------+---------+------+-------+----------------------------------------------------+
3 rows in set (0.02 sec)
第二個(gè)執(zhí)行效率較高的執(zhí)行計(jì)劃如下:
[test]>explain select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
| 1 | PRIMARY | u | ALL | NULL | NULL | NULL | NULL | 19762 | Using where |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | SUBQUERY | t | range | PRIMARY | PRIMARY | 4 | NULL | 1998 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------------------------------+
3 rows in set (0.00 sec)
我們?cè)谶@個(gè)測(cè)試中先不解釋更多的原理,只是對(duì)比說(shuō)明。
如果想得到更多的執(zhí)行效率對(duì)比情況,可以使用show status 的方式。
首先f(wàn)lush status
[test]>flush status;
Query OK, 0 rows affected (0.02 sec)
然后執(zhí)行語(yǔ)句如下:
[test]>select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
+-----------------+
| count(u.userid) |
+-----------------+
| 1999 |
+-----------------+
1 row in set (6.22 sec)
查看狀態(tài)信息,關(guān)鍵詞是Handler_read.
[test]>show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 2 |
| Handler_read_key | 2 |
| Handler_read_last | 0 |
| Handler_read_next | 1999 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 22001 |
+-----------------------+-------+
7 rows in set (0.04 sec
Handler_read_key這個(gè)參數(shù)的解釋是根據(jù)鍵讀一行的請(qǐng)求數(shù)。如果較高,說(shuō)明查詢(xún)和表的索引正確。
Handler_read_next這個(gè)參數(shù)的解釋是按照鍵順序讀下一行的請(qǐng)求數(shù)。如果用范圍約束或如果執(zhí)行索引掃描來(lái)查詢(xún)索引列,該值增加。
Handler_read_rnd_next這個(gè)參數(shù)的解釋是在數(shù)據(jù)文件中讀下一行的請(qǐng)求數(shù)。如果正進(jìn)行大量的表掃描,該值較高。通常說(shuō)明表索引不正確或?qū)懭氲牟樵?xún)沒(méi)有利用索引。
這是一個(gè)count的操作,所以Handler_read_rnd_next的指標(biāo)較高,這是一個(gè)范圍查詢(xún),所以Handler_read_next 的值也是一個(gè)范圍值。
然后運(yùn)行另外一個(gè)子查詢(xún),可以看到show status的結(jié)果如下:
[test]>show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 2 |
| Handler_read_key | 20002 |
| Handler_read_last | 0 |
| Handler_read_next | 1999 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 20001 |
+-----------------------+-------+
7 rows in set (0.00 sec)
可以和明顯看到Handler_read_key這個(gè)值很高,根據(jù)參數(shù)的解釋?zhuān)f(shuō)明查詢(xún)和表的索引使用正確。也就意味著這種方式想必于第一種方案要好很多。
而對(duì)于此,MySQL其實(shí)也有一些方式方法可以得到更細(xì)節(jié)的信息。
一種就是explain extended的方式。
[test]>explain extended select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
。。。。
3 rows in set, 1 warning (0.00 sec)
然后show warnings就會(huì)看到詳細(xì)的信息。
[test]>show warnings;
| Note | 1003 | /* select#1 */ select count(`test`.`u`.`userid`) AS `count(u.userid)` from `test`.`users` `u` semi join (`test`.`users` `t`) where ((`test`.`u`.`user_name` = ``.`user_name`) and (`test`.`t`.`userid` < 2000)) |
1 row in set (0.00 sec)
第二個(gè)語(yǔ)句的情況如下:
[test]>explain extended select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
3 rows in set, 1 warning (0.00 sec)
[test]>show warnings;
| Note | 1003 | /* select#1 */ select count(`test`.`u`.`userid`) AS `count(u.userid)` from `test`.`users` `u` where ((`test`.`u`.`user_name`,`test`.`u`.`user_name` in ( (/* select#2 */ select `test`.`t`.`user_name` from `test`.`users` `t` where (`test`.`t`.`userid` < 2000) ), (`test`.`u`.`user_name` in on where ((`test`.`u`.`user_name` = `materialized-subquery`.`user_name`))))) or (`test`.`u`.`user_name`,`test`.`u`.`user_name` in ( (/* select#3 */ select `test`.`t`.`user_name` from `test`.`users` `t` where 0 ), (`test`.`u`.`user_name` in on where ((`test`.`u`.`user_name` = `materialized-subquery`.`user_name`)))))) |
1 row in set (0.00 sec)
還有一種方式就是使用 optimizer_trace,在5.6可用
set optimizer_trace="enabled=on";
運(yùn)行語(yǔ)句后,然后通過(guò)下面的查詢(xún)得到trace信息。
select *from information_schema.optimizer_trace\G
當(dāng)然可以看出半連接的表現(xiàn)其實(shí)還不夠好,能不能選擇性的關(guān)閉呢,有一個(gè)參數(shù)可以控制,即是optimizer_switch,其實(shí)我們也可以看看這個(gè)參數(shù)的情況。
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on |
關(guān)閉半連接的設(shè)置
>set optimizer_switch="semijoin=off";
Query OK, 0 rows affected (0.00 sec)
再次運(yùn)行原本執(zhí)行時(shí)間近6秒的SQL,執(zhí)行時(shí)間大大降低。
[test]> select count(u.userid) from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
+-----------------+
| count(u.userid) |
+-----------------+
| 1999 |
+-----------------+
1 row in set (0.05 sec)執(zhí)行第二個(gè)語(yǔ)句,情況如下:
[test]>select count(u.userid) from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
+-----------------+
| count(u.userid) |
+-----------------+
| 1999 |
+-----------------+
1 row in set (0.07 sec)
參考內(nèi)容如下:
http://dbaplus.cn/news-11-133-1.html
http://blog.chinaunix.net/uid-16909016-id-214888.html
而在Oracle中表現(xiàn)如何呢。
創(chuàng)建測(cè)試表
create table users(
userid number not null,
user_name varchar2(64) default null,
primary key(userid)
);
初始化數(shù)據(jù),其實(shí)一句SQL就可以搞定。遞歸查詢(xún)可以換種方式來(lái)用,效果杠杠的。
insert into users select level,'user'||level from dual connect by level<=20000;
收集一下統(tǒng)計(jì)信息
exec dbms_stats.gather_table_stats(ownname=>'CYDBA',tabname=>'USERS',cascade=>true);
然后執(zhí)行和MySQL中同樣的語(yǔ)句。
我們使用trace的方式來(lái)查看,我們僅列出trace的情況。
SQL> set autot trace exp stat
SQL> select u.userid,u.user_name from users u where u.user_name in (select t.user_name from users t where t.userid<2000);
1999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 771105466
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2003 | 52078 | 21 (5)| 00:00:01 |
|* 1 | HASH JOIN RIGHT SEMI | | 2003 | 52078 | 21 (5)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| USERS | 1999 | 25987 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C0042448 | 1999 | | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | USERS | 20000 | 253K| 17 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."USER_NAME"="T"."USER_NAME")
3 - access("T"."USERID"<2000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
205 consistent gets 0 physical reads
0 redo size
52196 bytes sent via SQL*Net to client
1983 bytes received via SQL*Net from client
135 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1999 rows processed
SQL> select u.userid,u.user_name from users u where (u.user_name in (select t.user_name from users t where t.userid<2000) or u.user_name in (select t.user_name from users t where userid<-1) );
1999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1012235795
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2004 | 94188 | 22 (5)| 00:00:01 |
|* 1 | HASH JOIN | | 2004 | 94188 | 22 (5)| 00:00:01 |
| 2 | VIEW | VW_NSO_1 | 2000 | 68000 | 4 (0)| 00:00:01 |
| 3 | HASH UNIQUE | | 2000 | 26000 | 4 (25)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| USERS | 1 | 13 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | SYS_C0042448 | 1 | | 1 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| USERS | 1999 | 25987 | 3 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | SYS_C0042448 | 1999 | | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | USERS | 20000 | 253K| 17 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("U"."USER_NAME"="USER_NAME")
6 - access("USERID"<(-1))
8 - access("T"."USERID"<2000)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
207 consistent gets 0 physical reads
0 redo size
52196 bytes sent via SQL*Net to client
1983 bytes received via SQL*Net from client
135 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1999 rows processed
從Oracle的表現(xiàn)來(lái)看,支持的力度要全面很多。當(dāng)然半連接的玩法還有很多,比如exists,這些限于篇幅暫沒(méi)有展開(kāi)。而且對(duì)于對(duì)比測(cè)試中的更多知識(shí)點(diǎn)分析,我們后期也會(huì)逐步補(bǔ)充。
分享標(biāo)題:MySQL和Oracle中的半連接測(cè)試總結(jié)(一)
分享網(wǎng)址:
http://weahome.cn/article/jpeosj.html