下文給大家?guī)黻P(guān)于MySQL semaphore crash的相關(guān)要點(diǎn)介紹,感興趣的話就一起來看看這篇文章吧,相信看完MySQL semaphore crash的相關(guān)要點(diǎn)介紹對(duì)大家多少有點(diǎn)幫助吧。
創(chuàng)新互聯(lián)主營(yíng)若羌網(wǎng)站建設(shè)的網(wǎng)絡(luò)公司,主營(yíng)網(wǎng)站建設(shè)方案,成都APP應(yīng)用開發(fā),若羌h5微信小程序開發(fā)搭建,若羌網(wǎng)站營(yíng)銷推廣歡迎若羌等地區(qū)企業(yè)咨詢
BA應(yīng)該對(duì)InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 一點(diǎn)都不陌生,MySQL后臺(tái)線程srv_error_monitor_thread發(fā)現(xiàn)存在阻塞超過600s的latch鎖時(shí),如果連續(xù)10次檢測(cè)該鎖仍沒有釋放,就會(huì)觸發(fā)panic避免服務(wù)持續(xù)hang下去。
發(fā)生了什么
版本號(hào):MySQL 5.5.40
日志中持續(xù)輸出線程等待數(shù)據(jù)字典鎖,位置是dict0dict.c line 305,等待時(shí)間超過了900s。
持有鎖的線程是 139998697924352 ,其十六進(jìn)制是7f53fca8a700。
--Thread 139998393616128 has waited at dict0dict.c line 305 for 934.00 seconds the semaphore:X-lock on RW-latch at 0x105a1b8 created in file dict0dict.c line 748a writer (thread id 139998697924352) has reserved it in mode exclusivenumber of readers 0, waiters flag 1, lock_word: 0Last time read locked in file dict0dict.c line 302Last time write locked in file /pb2/build/sb_0-13157587-1410170252.03/rpm/BUILD/mysql-5.5.40/mysql-5.5.40/storage/innobase/dict/dict0dict.c line 305
上鎖線程 139998697924352 的事務(wù)信息,查詢數(shù)據(jù)字典表的操作。
---TRANSACTION 0, not started updating table statistics:MySQL thread id 14075, OS thread handle 0x7f53fca8a700, query id 110414021 21.14.5.139 jzjkusrSELECT ROUND(SUM(DATA_LENGTH+INDEX_LENGTH+DATA_FREE)/1024/1024/1024) AS MY_DB_TOTAL_SIZE FROM information_schema.TABLES
檢查下持鎖線程 139998697924352 是否存在其他鎖等待。
發(fā)現(xiàn)線程 139998697924352 ,self-lock 在 btr0sea.c line 1134,該鎖結(jié)構(gòu)和 AHI 相關(guān)。
--Thread 139998697924352 has waited at btr0sea.c line 1134 for 934.00 seconds the semaphore:X-lock (wait_ex) on RW-latch at 0x1eb06448 created in file btr0sea.c line 178a writer (thread id 139998697924352) has reserved it in mode wait exclusivenumber of readers 1, waiters flag 1, lock_word: ffffffffffffffffLast time read locked in file btr0sea.c line 1057Last time write locked in file /pb2/build/sb_0-13157587-1410170252.03/rpm/BUILD/mysql-5.5.40/mysql-5.5.40/storage/innobase/btr/btr0sea.c line 1134
接下來看下兩處鎖結(jié)構(gòu)分別在哪個(gè)函數(shù)內(nèi):
dict0dict.c line 305在dict_table_stats_lock函數(shù)內(nèi)
btr0sea.c line 1134在btr_search_drop_page_hash_index函數(shù)內(nèi)
什么情況會(huì)調(diào)用這些函數(shù)?
啟用 innodb_table_monitor,輸出日志時(shí)調(diào)用 dict_table_stats_lock 上 X 鎖,本案例未開啟。
啟用 innodb_stats_on_metadata 時(shí),查詢數(shù)據(jù)字典表會(huì)觸發(fā)統(tǒng)計(jì)信息的更新操作,會(huì)調(diào)用 dict_table_stats_lock 上 X 鎖。這與持鎖線程的事務(wù)信息匹配。
Adaptive hash index(AHI) 是 InnoDB 用來加速索引頁(yè)查找的 hash 表結(jié)構(gòu)。當(dāng)頁(yè)面訪問次數(shù)滿足一定條件后,這個(gè)頁(yè)面的地址將存入一個(gè) hash 表中,減少 B 樹查詢的開銷。
MySQL 5.5 版本 AHI 是由全局鎖 btr_search_latch 維護(hù) hash 表修改的一致性。
InnoDB buffer pool 狀態(tài)顯示 free buffer 基本保持0空閑。InnoDB buffer pool 驅(qū)逐數(shù)據(jù)頁(yè)時(shí),會(huì)調(diào)用 btr_search_drop_page_hash_index 函數(shù),從 AHI 中清理該數(shù)據(jù)頁(yè)。
-----------------------------BUFFER POOL AND MEMORY-----------------------------Total memory allocated 17582522368; in additional pool allocated 0
Dictionary memory allocated 4289681
Buffer pool size 1048576
Free buffers 0
Database pages 1040831
Old database pages 384193
Modified db pages 0
小結(jié)
AHI 的全局鎖 btr_search_latch 經(jīng)常會(huì)是競(jìng)爭(zhēng)熱點(diǎn)影響性能,5.7版本后有所改善與 InnoDB buffer 一樣做了多實(shí)例拆分。本案例在開啟 Innodb_stats_on_metadata 參數(shù),查詢?cè)獢?shù)據(jù)信息時(shí)觸發(fā)統(tǒng)計(jì)信息更新,上鎖數(shù)據(jù)字典,阻塞了了大量業(yè)務(wù)操作,又由于 buffer pool 空間不足,導(dǎo)致表驅(qū)逐舊頁(yè)觸發(fā) AHI 的 btr_search_latch 鎖競(jìng)爭(zhēng),最終導(dǎo)致信號(hào)量超時(shí) crash。
>> 彩蛋 <<
在動(dòng)輒幾兆的日志中分析 Semaphore crash,尋找鎖、線程、事務(wù)之間的關(guān)系,相當(dāng)令人抓狂的。借助 sed、awk、grep 三大法寶,雖有效率提升,但仍不夠高效。
為了偷懶寫了一個(gè)小程序,幫助DBA快速梳理出這些關(guān)系。
它的用法是這樣的:
hongbin@MBP ~> mysqldba doctor -f /Users/hongbin/workbench/mysqld_safe.log
目標(biāo)版本,查代碼時(shí)找對(duì)應(yīng)版本:
MySQL Server Version: '5.7.16-log’
日志中出現(xiàn)的 semaphore crash 次數(shù)和 mysql 啟動(dòng)次數(shù),如果啟動(dòng)次數(shù)大于 crash 次數(shù)說明可能是正常啟動(dòng)或其他 crash 造成:
********** MySQL service start count **********
MySQL Semaphore crash -> 3 times ["2018-08-13 23:12:18" "2018-08-14 12:13:43" "2018-08-16 13:42:36"]
MySQL Service start -> 3 times ["2018-08-13 23:12:59" "2018-08-14 12:15:20" "2018-08-16 13:46:37"]
線程主要在等待哪些 RW-latch,內(nèi)容包括:鎖位置、出現(xiàn)次數(shù)、線程 id (出現(xiàn)次數(shù)),重點(diǎn)關(guān)注出現(xiàn)次數(shù)較多的:
********** Which thread waited lock **********row0purge.cc:861 -> 58 140477266503424:(57) 140617703745280:(1)gi.cc:14791 -> 1 140477035656960:(1)trx0undo.ic:171 -> 1 140617682765568:(1)ha_innodb.cc:14791 -> 620 140617389913856:(58) 140202719565568:(58) 140202716903168:(57) 140477029533440:(56) 140617407219456:(55) 140477035656960:(52) 140477035124480:(29) 140477108467456:(29) 140477025539840:(26) 140477031130880:(25) 140477027669760:(22) 140617634944768:(21) 140617634146048:(21) 140477019948800:(21) 140477026604800:(20) 140477022078720:(18) 140477018883840:(16) 140477038585600:(15) 140477028734720:(10) 140477022877440:(9) 140477034325760:(1) 140477031663360:(1)srv0srv.cc:1968 -> 208 140477276993280:(185) 140617714235136:(23)ha_innodb.cc:5510 -> 601 140617398167296:(57) 140617409615616:(55) 140617392043776:(53) 140477110597376:(52) 140617395771136:(50) 140617636275968:(45) 140617632548608:(40) 140617634146048:(33) 140617639675648:(32) 140617397102336:(28) 140617639409408:(23) 140617635743488:(21) 140617637811968:(18) 140617638610688:(16) 140617399232256:(12) 140617638344448:(10) 140617638078208:(10) 140477033793280:(10) 140477029267200:(10) 140617397368576:(9) 140617635211008:(6) 140617393641216:(5) 140617637545728:(3) 140617402693376:(2) 140477037254400:(1)dict0dict.cc:1239 -> 136 140477122623232:(50) 140617392842496:(35) 140202726487808:(26) 140477123688192:(12) 140477038851840:(5) 140477030065920:(4) 140617634412288:(4)row0trunc.cc:1835 -> 1 140477109798656:(1)
上述鎖被哪些寫線程持有 X 鎖,重點(diǎn)關(guān)注出現(xiàn)次數(shù)較多的:
********** Which writer threads block at **********
140616681907968 -> 1 trx0undo.ic:171:(1)140477173069568 -> 243 srv0srv.cc:1968:(185) row0purge.cc:861:(57) row0trunc.cc:1835:(1)140617682765568 -> 29 srv0srv.cc:1968:(23) ha_innodb.cc:5510:(5) row0purge.cc:861:(1)
寫線程對(duì)應(yīng)的事務(wù)信息,也可能存在日志記錄沒有輸出事務(wù)信息:
********** These writer threads trx state **********MySQL thread id 83874, OS thread handle 140477173069568, query id 13139674 10.0.1.146 aml deleting from reference tables
統(tǒng)計(jì)寫線程持有 S 鎖情況:
****These writer threads at last time reads locked ****
140477173069568 -> 243 row0purge.cc:861:(243)140617682765568 -> 24 row0purge.cc:861:(24)140616681907968 -> 1 trx0undo.ic:190:(1)
統(tǒng)計(jì)寫線程持有 X 鎖情況:
****These writer threads at last time write locked ****
140477173069568 -> 243 dict0stats.cc:2366:(243)140617682765568 -> 24 dict0stats.cc:2366:(24)140616681907968 -> 1 buf0flu.cc:1198:(1)
通過事后日志分析,有可能出現(xiàn)線程的事務(wù)信息沒有輸出到日志中的情況,無法獲知事務(wù)具體執(zhí)行了什么操作。應(yīng)對(duì)這種情況,小程序加入了事中采集事務(wù)信息。
用法是這樣的:
hongbin@MBP ~> mysqldba -uxxx -pxxx doctor -w
它會(huì)監(jiān)視目標(biāo) mysql 的錯(cuò)誤日志,一旦出現(xiàn)“a writer (thread id 140616681907968) has reserved it in mode” 關(guān)鍵字就查詢 ps 中的事務(wù)信息,并保存下來。
看了以上關(guān)于MySQL semaphore crash詳細(xì)內(nèi)容,是否有所收獲。如果想要了解更多相關(guān),可以繼續(xù)關(guān)注我們的行業(yè)資訊板塊。