真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

成都創(chuàng)新互聯(lián)網(wǎng)站制作重慶分公司

利用sysschema解決一次詭異的語句hang問題

沃趣科技  羅小波


導讀
1、故事背景
2、復現(xiàn)與剖析
3、解決方法
4、總結

1、故事背景

  • 在開始之前,先列出數(shù)據(jù)庫的運行環(huán)境信息
    • 操作系統(tǒng):redhat 7.2 x8_64
    • 文件系統(tǒng):xfs
    • 數(shù)據(jù)庫版本:MySQL 5.7.17
    • 主機配置:
      * CPU:32 vcpus
      * 內(nèi)存:128 G
      * 磁盤:單盤intel SSD 320G(只存放mysql的data和binlog)
    • 主要配置參數(shù)設置:innodb_buffer_pool_size = 96G,innodb_log_file_size = 2G,innodb_flush_method = O_DIRECT,sync_binlog = 1,innodb_flush_log_at_trx_commit = 1,innodb_thread_concurrency = 32,innodb_io_capacity = 20000,innodb_read_io_threads = 4,innodb_write_io_threads = 12,transaction_isolation = READ-COMMITTED,performance_schema=ON,binlog_rows_query_log_events=ON
    • 該實例接入了高可用機制:HA心跳探測機制60S內(nèi)發(fā)現(xiàn)實例持續(xù)探測失敗時(每5秒探測一次),直接嘗試關閉探測失敗的MySQL實例和主機,進行高可用切換
  • 故事情節(jié):
    • 前些天某客戶反饋一個詭異的問題,一個MySQL實例一會可訪問,一會不可訪問,查看相關日志發(fā)現(xiàn)該實例反復執(zhí)行高可用切換,在進行初略排查時,通過監(jiān)控發(fā)現(xiàn)服務器故障時刻磁盤負載都較高,但是并沒有完全用滿。而在高可用切換之前,數(shù)據(jù)庫中跑著一些insert…select語句和大量心跳檢測語句,而心跳檢測語句與業(yè)務表是獨立的,且心跳檢測語句只固定更新一行數(shù)據(jù),怎么會被阻塞一堆心跳檢測語句呢 ?經(jīng)過了一番折騰總算把原因找到了,具體過程請看下文!

2、復現(xiàn)與剖析

  • 經(jīng)過初略的分析,雖然業(yè)務SQL和心跳SQL操作的是不同的表,不會出現(xiàn)鎖等待問題,但是從show processlist;的結果來看,業(yè)務SQL執(zhí)行時間最長,而且不斷有新的心跳語句被阻塞(當然,這里要先關掉HA的切換機制,否則待會復現(xiàn)操作時,過了60S就可能被切換了),看起來阻塞心跳SQL的很可能就是業(yè)務SQL。而且這些業(yè)務SQL操作的數(shù)據(jù)量多達3千萬行(因為是insert…select語句,所以從慢日志或者innodb_trx表、sys.session視圖中查看到的數(shù)據(jù)量實際上多達6KW,翻倍了)。但是兩者之間有具體有什么關聯(lián)無法直觀地看出來。我們按照如下步驟進行了復現(xiàn):

    成都創(chuàng)新互聯(lián)公司專注于菏澤企業(yè)網(wǎng)站建設,響應式網(wǎng)站建設,購物商城網(wǎng)站建設。菏澤網(wǎng)站建設公司,為菏澤等地區(qū)提供建站服務。全流程定制制作,專業(yè)設計,全程項目跟蹤,成都創(chuàng)新互聯(lián)公司專業(yè)和態(tài)度為您提供的服務

  • 2.1. 首先按照線上環(huán)境標準準備好復現(xiàn)環(huán)境,搭建了一臺測試機

  • 2.2. 創(chuàng)建一張心跳表,插入一行數(shù)據(jù)

點擊(此處)折疊或打開

  1. root@localhost :test:33: > CREATE TABLE `xx_heartbeat` (
  2.       `server_id` int(10) unsigned NOT NULL,
  3.       `hb_time` datetime NOT NULL,
  4.       PRIMARY KEY (`server_id`)
  5.     ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
  6.     root@localhost :test:33: > insert into xx_heartbeat values(@@server_id,now());
  7.     root@localhost :test:33: > select * from xx_heartbeat;
  8.     +-----------+---------------------+
  9.     | server_id | hb_time |
  10.     +-----------+---------------------+
  11.     | 3306103 | 2017-10-11 12:33:26 |
  12.     +-----------+---------------------+
  13.     2 rows in set (0.00 sec)
2.3. 打開performance_schema中的等待事件采集(為了省事,這里使用sys schema下的函數(shù)操作,而不使用UPDATE語句直接修改performance_schema的配置表)

點擊(此處)折疊或打開

  1. root@localhost :test:34: > use sys
  2.     Database changed
  3.     root@localhost : sys:36: > call ps_setup_enable_instrument('wait');
  4.     +-------------------------+
  5.     | summary |
  6.     +-------------------------+
  7.     | Enabled 303 instruments |
  8.     +-------------------------+
  9.     1 row in set (0.04 sec)
  10.     Query OK, 0 rows affected (0.04 sec)
  11.     root@localhost : sys:36: > call ps_setup_enable_consumer('wait');
  12.     +---------------------+
  13.     | summary |
  14.     +---------------------+
  15.     | Enabled 3 consumers |
  16.     +---------------------+
  17.     1 row in set (0.01 sec)
  18.     Query OK, 0 rows affected (0.01 sec)
2.4. 開4個MySQL會話連接,依次對4張不同的表執(zhí)行insert…select操作,表數(shù)據(jù)量都為3KW(測試數(shù)據(jù)為使用sysbench造的4張3KW數(shù)據(jù)的表)

點擊(此處)折疊或打開

  1. # 會話1
  2.     ADMIN@127.0.0.1 : (none):15: > use sbtest;
  3.     Database changed
  4.     ADMIN@127.0.0.1 : sbtest:37: > insert sbtest1(k,c,pad) select k,c,pad from sbtest1;
  5.     # 會話2
  6.     ADMIN@127.0.0.1 : (none):15: > use sbtest;
  7.     Database changed
  8.     ADMIN@127.0.0.1 : sbtest:37: > insert sbtest3(k,c,pad) select k,c,pad from sbtest3;
  9.     # 會話3
  10.     ADMIN@127.0.0.1 : (none):18: > use sbtest;
  11.     Database changed
  12.     ADMIN@127.0.0.1 : sbtest:37: > insert sbtest2(k,c,pad) select k,c,pad from sbtest2;
  13.     # 會話4
  14.     ADMIN@127.0.0.1 : (none):18: > use sbtest;
  15.     Database changed
  16.     ADMIN@127.0.0.1 : sbtest:37: > insert sbtest4(k,c,pad) select k,c,pad from sbtest4
2.5. 對心跳表進行更新,持續(xù)間隔每5秒一次執(zhí)行,數(shù)十分鐘之后,發(fā)現(xiàn)心跳SQL被阻塞了,如果HA切換機制沒關,阻塞心跳SQL超過60S就會發(fā)生切換了,關閉HA切換機制之后,自動心跳會停止檢測,所以這里人工模擬了一條心跳SQL,使用腳本循環(huán)每5秒更新一次,腳本在發(fā)現(xiàn)有阻塞時也會同時打印show processlist和sys.session視圖信息,腳本鏈接:http://5d096a11.wiz03.com/share/s/1t2mEh0a-kl_2c2NZ33kSiac2wQRmx1ykAEE23yLIT3aKVmx)

點擊(此處)折疊或打開

  1. # 心跳SQL語句(可手工每5秒執(zhí)行這句,不過肯定手會酸。。不過為了方便大家閱讀,在復現(xiàn)過程中采用了用腳本持續(xù)檢測,發(fā)現(xiàn)有語句阻塞時立即手工方式執(zhí)行心跳語句,以方便截?。?br />
  2.     ADMIN@127.0.0.1 :test:01: > update xx_heartbeat set hb_time=now() where server_id=@@server_id;
  3.     Query OK, 0 rows affected (0.00 sec)
  4.     Rows matched: 1 Changed: 0 Warnings: 0
  5.     ADMIN@127.0.0.1 :test:01: > update xx_heartbeat set hb_time=now() where server_id=@@server_id
2.6. 立即新開一個會話連接查看數(shù)據(jù)庫正在做什么,可以發(fā)現(xiàn)4個會話的insert…select語句,以及心跳SQL update xx_heartbeat…

點擊(此處)折疊或打開

  1. ADMIN@127.0.0.1 : sys:25: > show processlist;
  2.     +-----+-------+--------------------+--------+---------+------+-----------+----------------------------------------------------------------------+
  3.     | Id | User | Host | db | Command | Time | State | Info |
  4.     +-----+-------+--------------------+--------+---------+------+-----------+----------------------------------------------------------------------+
  5.     | 25 | ADMIN | 127.0.0.1:35458 | sys | Query | 0 | starting | show processlist |
  6.     | 26 | ADMIN | 127.0.0.1:35459 | sbtest | Query | 1353 | query end | insert sbtest1(k,c,pad) select k,c,pad from sbtest1 |
  7.     | 27 | ADMIN | 127.0.0.1:35460 | sbtest | Query | 1352 | query end | insert sbtest3(k,c,pad) select k,c,pad from sbtest3 |
  8.     | 30 | ADMIN | 127.0.0.1:35463 | sbtest | Query | 1352 | query end | insert sbtest2(k,c,pad) select k,c,pad from sbtest2 |
  9.     | 31 | ADMIN | 127.0.0.1:35464 | sbtest | Query | 1351 | query end | insert sbtest4(k,c,pad) select k,c,pad from sbtest4 |
  10.     | 52 | ADMIN | 127.0.0.1:35485 | test | Query | 62 | query end | update xx_heartbeat set hb_time=now() where server_id=@@server_id |
  11.     | 151 | qfha | 10.10.40.167:51328 | NULL | Query | 562 | starting | SHOW BINARY LOGS |
  12.     ......
  13.     | 160 | qfha | 10.10.40.167:51337 | NULL | Query | 22 | starting | SHOW BINARY LOGS |
  14.     +-----+-------+--------------------+--------+---------+------+-----------+----------------------------------------------------------------------+
  15.     16 rows in set (0.00 sec)
2.7. 查看服務器負載情況怎樣

點擊(此處)折疊或打開

  1. # top
  2.     top - 22:51:21 up 3 days, 6:54, 9 users, load average: 5.74, 3.39, 1.70
  3.     Tasks: 980 total, 1 running, 979 sleeping, 0 stopped, 0 zombie
  4.     Cpu(s): 0.4%us, 0.4%sy, 0.0%ni, 97.3%id, 1.9%wa, 0.0%hi, 0.0%si, 0.0%st
  5.     Mem: 131804428k total, 129297568k used, 2506860k free, 282052k buffers
  6.     Swap: 7974908k total, 106080k used, 7868828k free, 33743992k cached
  7.       PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  8.     22080 mysql 20 0 86.1g 71g 12m S 23.8 57.0 602:42.90 mysqld
  9.     29121 qemu 20 0 17.5g 8.2g 6160 S 6.9 6.5 140:51.21 qemu-kvm
  10.     20475 root 20 0 0 0 0 S 0.7 0.0 0:28.71 xfs-cil/dm-6
  11.     18272 root 20 0 15692 1952 924 R 0.3 0.0 0:00.33 top
  12.         1 root 20 0 19356 1408 1228 S 0.0 0.0 0:02.14 init
  13.         2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
  14.         3 root RT 0 0 0 0 S 0.0 0.0 0:00.28 migration/0
  15.     ......
  16.     # free
  17.     #free -m
  18.                 total used free shared buffers cached
  19.     Mem: 128715 121616 7099 0 275 28429
  20.     -/+ buffers/cache: 92911 35803
  21.     Swap: 7787 88 7699
  22.     # vmstat
  23.     [root@10-10-66-229 ~]# vmstat 1 10
  24.     procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
  25.     r b swpd free buff cache si so bi bo in cs us sy id wa st
  26.     2 1 90392 2484712 281996 33758712 0 0 0 266326 14803 19717 9 1 87 3 0
  27.     7 1 90392 2454332 281996 33787004 0 0 0 263063 14746 18893 10 1 87 2 0
  28.     4 10 90392 2427820 281996 33814484 0 0 0 266977 13522 18440 10 1 87 2 0
  29.     4 0 90392 2399888 281996 33840928 0 0 0 254064 14727 18820 10 1 87 2 0
  30.     4 1 90392 2373260 281996 33865532 0 0 0 255281 15479 19072 10 0 88 2 0
  31.     4 1 90392 2344072 281996 33894036 0 0 16 261650 13296 18970 10 0 87 2 0
  32.     5 0 90392 2316840 281996 33920892 0 0 0 253947 12292 18442 10 1 88 2 0
  33.     4 0 90392 2289436 281996 33947748 0 0 0 273755 13894 19790 10 1 87 3 0
  34.     4 0 90392 2258064 281996 33977544 0 0 0 265602 12351 18488 10 0 87 2 0
  35.     5 1 90392 2230940 281996 34005292 0 0 0 269967 12720 19439 9 0 88 2 0
  36.     ......
  37.     # iostat
  38.     [root@10-10-66-229 ~]# vmstat -x 1 10
  39.     ......
  40.     avg-cpu: %user %nice %system %iowait %steal %idle
  41.               10.28 0.00 0.50 2.51 0.00 86.72
  42.     Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
  43.     sdc 0.00 1031.00 0.00 6041.00 0.00 537654.00 89.00 13.20 2.19 0.16 94.10
  44.     ......
  45.     avg-cpu: %user %nice %system %iowait %steal %idle
  46.               9.83 0.00 0.53 2.54 0.00 87.10
  47.     Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
  48.     sdc 0.00 891.00 0.00 6034.00 0.00 524171.00 86.87 12.78 2.12 0.16 94.70
  49.     ......
  • 2.8. 從2.6與2.7步驟中查看到的信息來說,服務器的CPU和內(nèi)存并不是瓶頸,磁盤負載雖然較高且偶爾會用到swap且有2~3%左右的IOWAIT,但是也不至于完全堵死一條單行更新的心跳SQL,而且業(yè)務SQL與心跳SQL是不同的表,也不存在表鎖和事務鎖相互阻塞的問題(使用sys.innodb_lock_waits視圖查過,鎖等待信息為空)。而從mysql的show processlist信息來看,看不出這些SQL有什么關聯(lián),只能看到這些SQL處于query end的狀態(tài)時間比較長,明顯不正常,另外還有一些show binary logs語句 ,該語句是監(jiān)控系統(tǒng)監(jiān)控binlog的大小使用的語句,而且這個語句不斷增多,看起來像是binlog的訪問出現(xiàn)了問題

  • 2.9. 這個時候,通常使用的排查手段已經(jīng)相形見拙了,我們啟用sys schema,新開一個會話連接,使用session視圖來查查這些活躍會話正在執(zhí)行的SQL到底正在做什么?

點擊(此處)折疊或打開

  1. ADMIN@127.0.0.1 : sys:25: > select * from session where conn_id!=connection_id()\G;
  2.     *************************** 1. row ***************************
  3.                     thd_id: 60
  4.                   conn_id: 26
  5.                       user: ADMIN@127.0.0.1
  6.                         db: sbtest
  7.                   command: Query
  8.                     state: query end
  9.                       time: 6355
  10.         current_statement: insert sbtest1(k,c,pad) select k,c,pad from sbtest1 # 操作sbtest1表的業(yè)務SQL
  11.         statement_latency: 22.58 m
  12.                   progress: NULL
  13.               lock_latency: 1.25 ms
  14.             rows_examined: 60000000
  15.                 rows_sent: 0
  16.             rows_affected: 0
  17.                 tmp_tables: 1
  18.           tmp_disk_tables: 1
  19.                 full_scan: YES
  20.             last_statement: NULL
  21.     last_statement_latency: NULL
  22.             current_memory: 0 bytes
  23.                 last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done # 發(fā)現(xiàn)連接ID為26的會話執(zhí)行insert...select語句時在等待這個事件(binlog的COND_done互斥對象)
  24.         last_wait_latency: Still Waiting #該字段值為"Still Waiting"表示這個線程目前一直在等待last_wait字段顯示的事件
  25.                     source: binlog.cc:1949 #發(fā)生等待事件的源碼文件和代碼行數(shù)在這里
  26.               trx_latency: NULL
  27.                 trx_state: NULL
  28.             trx_autocommit: NULL
  29.                       pid: 4571
  30.               program_name: mysql
  31.     *************************** 2. row ***************************
  32.                     thd_id: 61
  33.                   conn_id: 27
  34.     ......
  35.         current_statement: insert sbtest3(k,c,pad) select k,c,pad from sbtest3
  36.         statement_latency: 22.57 m
  37.       ......
  38.                 last_wait: wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log # 發(fā)現(xiàn)連接ID為27的會話執(zhí)行insert...select語句時在等待這個事件(binlog的LOCK_log互斥鎖)
  39.         last_wait_latency: Still Waiting # 解釋同第一行信息相同字段
  40.                     source: binlog.cc:8587 # 解釋同第一行信息相同字段
  41.     ......
  42.     *************************** 3. row ***************************
  43.                     thd_id: 64
  44.                   conn_id: 30
  45.     ......
  46.         current_statement: insert sbtest2(k,c,pad) select k,c,pad from sbtest2 #操作sbtest2表的業(yè)務SQL
  47.         statement_latency: 22.57 m
  48.     ......
  49.                 last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done # 發(fā)現(xiàn)連接ID為30的會話執(zhí)行insert...select語句時在等待這個事件(binlog的COND_done對象)
  50.         last_wait_latency: Still Waiting # 解釋同第一行信息相同字段
  51.                     source: binlog.cc:1949 # 解釋同第一行信息相同字段
  52.     ......
  53.     *************************** 4. row ***************************
  54.                     thd_id: 65
  55.                   conn_id: 31
  56.                       user: ADMIN@127.0.0.1
  57.                         db: sbtest
  58.                   command: Query
  59.                     state: query end
  60.                       time: 6353
  61.         current_statement: insert sbtest4(k,c,pad) select k,c,pad from sbtest4 #操作sbtest4表的業(yè)務SQL
  62.         statement_latency: 22.55 m
  63.                   progress: NULL
  64.               lock_latency: 1.55 ms
  65.             rows_examined: 60000000
  66.                 rows_sent: 0
  67.             rows_affected: 0
  68.                 tmp_tables: 1
  69.           tmp_disk_tables: 1
  70.                 full_scan: YES
  71.             last_statement: NULL
  72.     last_statement_latency: NULL
  73.             current_memory: 0 bytes
  74.                 # 發(fā)現(xiàn)連接ID為31的會話執(zhí)行insert...select語句時在等待這個事件(正在執(zhí)行binlog的文件IO操作,說明在寫binlog cache到binlog file,但這里只是寫到文件系統(tǒng)的buffer,并不是sync)
  75.                 last_wait: wait/io/file/sql/binlog
  76.         last_wait_latency: Still Waiting # 解釋同第一行信息相同字段
  77.                     source: mf_iocache.c:1566 # 解釋同第一行信息相同字段
  78.               trx_latency: NULL
  79.                 trx_state: NULL
  80.             trx_autocommit: NULL
  81.                       pid: 4881
  82.               program_name: mysql
  83.     ......
  84.     *************************** 5. row ***************************
  85.                     thd_id: 185
  86.                   conn_id: 151
  87.                       user: qfha@10.10.40.167
  88.                         db: sys
  89.                   command: Query
  90.                     state: starting
  91.                       time: 564
  92.         current_statement: SHOW BINARY LOGS
  93.         statement_latency: 9.40 m
  94.                   progress: NULL
  95.               lock_latency: 0 ps
  96.             rows_examined: 0
  97.                 rows_sent: 0
  98.             rows_affected: 0
  99.                 tmp_tables: 0
  100.           tmp_disk_tables: 0
  101.                 full_scan: NO
  102.             last_statement: NULL
  103.     last_statement_latency: NULL
  104.             current_memory: 0 bytes
  105.                 last_wait: wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log
  106.         last_wait_latency: Still Waiting
  107.                     source: rpl_master.cc:707
  108.               trx_latency: NULL
  109.                 trx_state: NULL
  110.             trx_autocommit: NULL
  111.                       pid: NULL
  112.               program_name: NULL
  113.     *************************** 6. row ***************************
  114.      ......
  115.     *************************** 11. row ***************************
  116.                     thd_id: 86
  117.                   conn_id: 52
  118.                       user: ADMIN@127.0.0.1
  119.                         db:test
  120.                   command: Query
  121.                     state: query end
  122.                       time: 244
  123.         current_statement: update xx_heartbeat set hb_ ... () where server_id=@@server_id #操作心跳表的SQL
  124.         statement_latency: 1.07 m
  125.                   progress: NULL
  126.               lock_latency: 219.00 us
  127.             rows_examined: 1
  128.                 rows_sent: 0
  129.             rows_affected: 0
  130.                 tmp_tables: 0
  131.           tmp_disk_tables: 0
  132.                 full_scan: NO
  133.             last_statement: NULL
  134.     last_statement_latency: NULL
  135.             current_memory: 0 bytes
  136.                 last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done # 發(fā)現(xiàn)連接ID為52的會話執(zhí)行心跳SQL語句時在等待這個事件
  137.         last_wait_latency: Still Waiting # 解釋同第一行信息相同字段
  138.                     source: binlog.cc:1949 # 解釋同第一行信息相同字段
  139.               trx_latency: NULL
  140.                 trx_state: NULL
  141.             trx_autocommit: NULL
  142.                       pid: 6772
  143.               program_name: mysql
  144.     **********
    文章題目:利用sysschema解決一次詭異的語句hang問題
    文章出自:http://weahome.cn/article/jedepj.html

在線咨詢

微信咨詢

電話咨詢

028-86922220(工作日)

18980820575(7×24)

提交需求

返回頂部