問(wèn)題現(xiàn)象:
成都創(chuàng)新互聯(lián)公司自成立以來(lái),一直致力于為企業(yè)提供從網(wǎng)站策劃、網(wǎng)站設(shè)計(jì)、成都做網(wǎng)站、成都網(wǎng)站制作、電子商務(wù)、網(wǎng)站推廣、網(wǎng)站優(yōu)化到為企業(yè)提供個(gè)性化軟件開(kāi)發(fā)等基于互聯(lián)網(wǎng)的全面整合營(yíng)銷服務(wù)。公司擁有豐富的網(wǎng)站建設(shè)和互聯(lián)網(wǎng)應(yīng)用系統(tǒng)開(kāi)發(fā)管理經(jīng)驗(yàn)、成熟的應(yīng)用系統(tǒng)解決方案、優(yōu)秀的網(wǎng)站開(kāi)發(fā)工程師團(tuán)隊(duì)及專業(yè)的網(wǎng)站設(shè)計(jì)師團(tuán)隊(duì)。
MySQL實(shí)例Hang住,鏈接不斷累積然后達(dá)到連接數(shù)上限,所有涉及事務(wù)的操作及連接的操作都被卡住,CPU及負(fù)載較低;
問(wèn)題處理:
MySQL主庫(kù)Hang住不可用,臨時(shí)解決方法只能重啟實(shí)例或者切換到備庫(kù),以保持業(yè)務(wù)持續(xù)可用;
問(wèn)題原因:
錯(cuò)誤監(jiān)控線程(srv_error_monitor_thread)中調(diào)用 log_get_lsn中的方法存在問(wèn)題,不能解決 memory barrier在mutex_exit退出時(shí)可能導(dǎo)致的問(wèn)題;
問(wèn)題解決方法:
升級(jí)版本至 5.6.29及以后,可以解決此問(wèn)題,官方修復(fù)信息如下圖:
問(wèn)題分析過(guò)程:
對(duì)現(xiàn)場(chǎng)信息使用 pt-pmp進(jìn)行歸類,詳情見(jiàn)文檔cti.pt_pmp,簡(jiǎn)單說(shuō)明如下:
1456 __lll_lock_wait(libpthread.so.0),_L_lock_975(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:690),THD::release_resources(sql_class.cc:1559),one_thread_per_connection_end(mysqld.cc:2751),do_handle_one_connection(sql_connect.cc:989),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
1456個(gè)線程在獲取 Lock_status時(shí)被阻塞
16 __lll_lock_wait(libpthread.so.0),_L_lock_975(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:690),fill_status(mysql_thread.h:690),do_fill_table(sql_show.cc:7416),get_schema_tables_result(sql_show.cc:7416),JOIN::prepare_result(sql_select.cc:823),JOIN::exec(sql_executor.cc:116),mysql_execute_select(sql_select.cc:1100),mysql_select(sql_select.cc:1100),handle_select(sql_select.cc:110),execute_sqlcom_select(sql_parse.cc:5134),mysql_execute_command(sql_parse.cc:2612),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
16個(gè)線程在獲取 Lock_status時(shí)被阻塞
1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),srv_export_innodb_status(sync0sync.ic:218),innodb_export_status(ha_innodb.cc:12442),show_innodb_vars(ha_innodb.cc:12442),show_status_array(sql_show.cc:2597),fill_status(sql_show.cc:6749),do_fill_table(sql_show.cc:7416),get_schema_tables_result(sql_show.cc:7416),JOIN::prepare_result(sql_select.cc:823),JOIN::exec(sql_executor.cc:116),mysql_execute_select(sql_select.cc:1100),mysql_select(sql_select.cc:1100),handle_select(sql_select.cc:110),execute_sqlcom_select(sql_parse.cc:5134),mysql_execute_command(sql_parse.cc:2612),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
這個(gè)線程獲取了 Lock_status,但是在獲取 srv_innodb_monitor_mutex時(shí)被阻塞
1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_print(sync0sync.ic:218),srv_printf_innodb_monitor(srv0srv.cc:1233),innodb_show_status(ha_innodb.cc:12491),innobase_show_status(ha_innodb.cc:12491),ha_show_status(handler.cc:6980),mysql_execute_command(sql_parse.cc:2836),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
這個(gè)線程獲取了 srv_innodb_monitor_mutex,但是在獲取 log_sys->mutex時(shí)被阻塞
1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),mtr_add_dirtied_pages_to_flush_list(sync0sync.ic:218),mtr_log_reserve_and_write(mtr0mtr.cc:270),mtr_commit(mtr0mtr.cc:270),trx_purge_free_segment(trx0purge.cc:392),trx_purge_truncate_rseg_history(trx0purge.cc:392),trx_purge_truncate_history(trx0purge.cc:527),trx_purge_truncate(trx0purge.cc:527),trx_purge(trx0purge.cc:527),srv_do_purge(srv0srv.cc:2589),srv_purge_coordinator_thread(srv0srv.cc:2589),start_thread(libpthread.so.0),clone(libc.so.6)
這個(gè)線程獲取了 log_sys->mutex,但是在獲取 log_sys->log_flush_order_muex時(shí)被阻塞,原因是 memory barrier的問(wèn)題,詳情可參考:http://dbaplus.cn/news-11-718-1.html
1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),sync_array_wait_event(sync0arr.cc:424),mutex_spin_wait(sync0sync.cc:580),mutex_enter_func(sync0sync.ic:218),pfs_mutex_enter_func(sync0sync.ic:218),log_reserve_and_write_fast(sync0sync.ic:218),mtr_log_reserve_and_write(sync0sync.ic:218),mtr_commit(sync0sync.ic:218),trx_undo_report_row_operation(trx0rec.cc:1353),btr_cur_upd_lock_and_undo(btr0cur.cc:1706),btr_cur_update_in_place(btr0cur.cc:1706),btr_cur_optimistic_update(btr0cur.cc:2166),row_upd_clust_rec(row0upd.cc:2132),row_upd_clust_step(row0upd.cc:2435),row_upd(row0upd.cc:2521),row_upd_step(row0upd.cc:2521),row_update_for_mysql(row0mysql.cc:1779),ha_innobase::update_row(ha_innodb.cc:7091),handler::ha_update_row(handler.cc:7305),Rpl_info_table::do_flush_info(rpl_info_table.cc:207),flush_info(rpl_info_handler.h:92),Master_info::flush_info(rpl_info_handler.h:92),flush_master_info(rpl_slave.cc:871),handle_slave_io(rpl_slave.cc:4818),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
這個(gè)線程獲取了 lock_log & mi->data_lock,但是在獲取 log_sys->mutex被阻塞住,導(dǎo)致了其它 show slave status\G被阻塞住;
1 __lll_lock_wait(libpthread.so.0),_L_lock_791(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:690),next_event(mysql_thread.h:690),exec_relay_log_event(mysql_thread.h:690),handle_slave_sql(mysql_thread.h:690),pfs_spawn_thread(pfs.cc:1860),start_thread(libpthread.so.0),clone(libc.so.6)
sql thread在獲取 lock_log被阻塞,原因是 lock_log被 IO thread線程獲取