由于好多公司節(jié)約成本都沒(méi)有自己的DBA人員,大部分都是開(kāi)發(fā)或者運(yùn)維人員操作數(shù)據(jù)庫(kù),但數(shù)據(jù)庫(kù)是重中之重,等公司達(dá)到一定規(guī)模之后,數(shù)據(jù)庫(kù)一個(gè)不小心的事故,很有可能會(huì)讓公司回到解放前。所以在公司小規(guī)模的時(shí)候就應(yīng)該有一套自己的數(shù)據(jù)庫(kù)體系以及完善的數(shù)據(jù)庫(kù)架構(gòu),操作人員的重點(diǎn)在優(yōu)化和提升性能,而不是再去修改數(shù)據(jù)庫(kù)的整體架構(gòu)。
下面我給大家總結(jié)了三種數(shù)據(jù)庫(kù)事故和解決辦法一邊操作人員應(yīng)急使用。請(qǐng)做類(lèi)比參考,錯(cuò)誤類(lèi)型千變?nèi)f化,解決思路一成不變。
錯(cuò)誤:
成都創(chuàng)新互聯(lián)專(zhuān)注于企業(yè)營(yíng)銷(xiāo)型網(wǎng)站、網(wǎng)站重做改版、鼓樓網(wǎng)站定制設(shè)計(jì)、自適應(yīng)品牌網(wǎng)站建設(shè)、H5開(kāi)發(fā)、商城建設(shè)、集團(tuán)公司官網(wǎng)建設(shè)、成都外貿(mào)網(wǎng)站建設(shè)公司、高端網(wǎng)站制作、響應(yīng)式網(wǎng)頁(yè)設(shè)計(jì)等建站業(yè)務(wù),價(jià)格優(yōu)惠性?xún)r(jià)比高,為鼓樓等各大城市提供網(wǎng)站開(kāi)發(fā)制作服務(wù)。
1、InnoDB: Error: Table "MySQL"."innodb_table_stats" not found.問(wèn)題
2、Error 'Cannot add or update a child row: a foreign key constraint fails故障
3、SQL_ERROR 1032解決辦法
問(wèn)題明確,下面開(kāi)始逐條解決吧:
1、InnoDB: Error: Table "mysql"."innodb_table_stats" not found.問(wèn)題
在MySQL 5.6.30后臺(tái)日志報(bào)如下警告信息:
2016-05-27 12:25:27 7fabf86f7700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
2016-05-27 12:25:27 7fabf86f7700 InnoDB: Error: Fetch of persistent statistics requested for table "hj_web"."wechat_res" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.2016-05-27 14:03:52 28585 [Warning] InnoDB: Cannot open table mysql/slave_master_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2016-05-27 14:03:52 28585 [Warning] InnoDB: Cannot open table mysql/slave_relay_log_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2016-05-27 14:03:52 28585 [Warning] InnoDB: Cannot open table mysql/slave_worker_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.~~
這幾張表確實(shí)是在mysql5.6中新入的
innodb_index_stats,
innodb_tables_stats,
slave_master_info,
slave_relay_log_info,
slave_worker_info
解決方法:
登錄數(shù)據(jù)庫(kù),進(jìn)入mysql庫(kù),執(zhí)行如下SQL刪除5張表
記住,一定要是drop table if exists
mysql> use mysql;
mysql> drop table if exists innodb_index_stats;
mysql> drop table if exists innodb_table_stats;
mysql> drop table if exists slave_master_info;
mysql> drop table if exists slave_relay_log_info;
mysql> drop table if exists slave_worker_info;
執(zhí)行完后,可以用show tables查看一下,看表的數(shù)據(jù)是否已經(jīng)比刪除之前減少了,如果減少了,說(shuō)明你成功了!
上一步操作完成后,停止數(shù)據(jù)庫(kù),并進(jìn)入到數(shù)據(jù)庫(kù)數(shù)據(jù)文件所在目錄,刪除上面5個(gè)表所對(duì)應(yīng)的idb文件,如下所示:
#/etc/init.d/mysqld stop
#cd /data/mysql/data/mysql/
#ls -l .ibd
-rw-rw---- 1 mysql mysql 98304 May 27 14:17 innodb_index_stats.ibd
-rw-rw---- 1 mysql mysql 98304 May 27 14:17 innodb_table_stats.ibd
-rw-rw---- 1 mysql mysql 98304 May 27 14:14 slave_master_info.ibd
-rw-rw---- 1 mysql mysql 98304 May 27 14:14 slave_relay_log_info.ibd
-rw-rw---- 1 mysql mysql 98304 May 27 14:14 slave_worker_info.ibd
#/bin/rm -rf .ibd
重新啟動(dòng)數(shù)據(jù)庫(kù),進(jìn)入到mysql庫(kù),重建上面被刪除的表結(jié)構(gòu):
#/etc/init.d/mysqld start
mysql> use mysql;
mysql> source /data/mysql/share/mysql_system_tables.sql(這個(gè)mysql_system_tables.sql是從別的mysql數(shù)據(jù)庫(kù)備份過(guò)來(lái)的,備份方法如下:mysqldump –u賬號(hào) –p密碼 mysql > mysql_system_tables.sql)
mysql> show tables;
+---------------------------+
| Tables_in_mysql |
+---------------------------+
| columns_priv |
| db |
| event |
| func |
| general_log |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| innodb_index_stats |
| innodb_table_stats |
| ndb_binlog_index |
| plugin |
| proc |
| procs_priv |
| proxies_priv |
| servers |
| slave_master_info |
| slave_relay_log_info |
| slave_worker_info |
| slow_log |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| user |
+---------------------------+
28 rows in set (0.00 sec)
mysql> desc innodb_table_stats;
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
| database_name | varchar(64) | NO | PRI | NULL | |
| table_name | varchar(64) | NO | PRI | NULL | |
| last_update | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| n_rows | bigint(20) unsigned | NO | | NULL | |
| clustered_index_size | bigint(20) unsigned | NO | | NULL | |
| sum_of_other_index_sizes | bigint(20) unsigned | NO | | NULL | |
+--------------------------+---------------------+------+-----+-------------------+-----------------------------+
6 rows in set (0.00 sec)
mysql> desc slave_master_info;
+------------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------+-------+
| Number_of_lines | int(10) unsigned | NO | | NULL | |
| Master_log_name | text | NO | | NULL | |
| Master_log_pos | bigint(20) unsigned | NO | | NULL | |
| Host | char(64) | NO | PRI | | |
| User_name | text | YES | | NULL | |
| User_password | text | YES | | NULL | |
| Port | int(10) unsigned | NO | PRI | NULL | |
| Connect_retry | int(10) unsigned | NO | | NULL | |
| Enabled_ssl | tinyint(1) | NO | | NULL | |
| Ssl_ca | text | YES | | NULL | |
| Ssl_capath | text | YES | | NULL | |
| Ssl_cert | text | YES | | NULL | |
| Ssl_cipher | text | YES | | NULL | |
| Ssl_key | text | YES | | NULL | |
| Ssl_verify_server_cert | tinyint(1) | NO | | NULL | |
| Heartbeat | float | NO | | NULL | |
| Bind | text | YES | | NULL | |
| Ignored_server_ids | text | YES | | NULL | |
| Uuid | text | YES | | NULL | |
| Retry_count | bigint(20) unsigned | NO | | NULL | |
| Ssl_crl | text | YES | | NULL | |
| Ssl_crlpath | text | YES | | NULL | |
| Enabled_auto_position | tinyint(1) | NO | | NULL | |
+------------------------+---------------------+------+-----+---------+-------+
23 rows in set (0.00 sec)
說(shuō)明表都正常了,再次查看mysql報(bào)錯(cuò)日志,就會(huì)發(fā)現(xiàn)沒(méi)有了關(guān)于這5個(gè)表的報(bào)錯(cuò)日志。
2、Error 'Cannot add or update a child row: a foreign key constraint fails故障解決
一大早的,某從庫(kù)突然報(bào)出故障:SQL線(xiàn)程中斷!
查看從庫(kù)狀態(tài):
mysql> show slave status\G
Slave_IO_State: Waiting for master to send event
Master_Log_File: mysql-bin.026023
Read_Master_Log_Pos: 230415889
Relay_Log_File: relay-bin.058946
Relay_Log_Pos: 54632056
Relay_Master_Log_File: mysql-bin.026002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1452
Last_Error: Error 'Cannot add or update a child row: a foreign key constraint fails (zabbix
.trigger_discovery
, CONSTRAINTc_trigger_discovery_2
FOREIGN KEY (parent_triggerid
) REFERENCEStriggers
(triggerid
) ON DELETE CASCADE)' on query. Default database: 'zabbix'. Query: 'insert into trigger_discovery (triggerdiscoveryid,triggerid,parent_triggerid,name) values (1677,26249,22532,'Free inodes is less than 20% on volume {#FSNAME}'), (1678,26250,22532,'Free inodes is less than 20% on volume {#FSNAME}'), (1679,26251,22532,'Free inodes is less than 20% on volume {#FSNAME}')'
Exec_Master_Log_Pos: 54631910
重點(diǎn)關(guān)注報(bào)錯(cuò)信息,定位問(wèn)題,問(wèn)題是:Cannot add or update a child row:a foreign key constraint fails ,涉及到的外鍵是:c_trigger_discovery_2
那這個(gè)外鍵的定義是什么呢?
報(bào)錯(cuò)信息中也有列出:
trigger_discovery
, CONSTRAINT
c_trigger_discovery_2FOREIGN KEY (
parent_triggerid) REFERENCES
triggers(
triggerid`) ON DELETE CASCADE
那明白了,是表trigger_discovery中的列parent_triggerid
和表triggers中的列triggerid
有外鍵關(guān)聯(lián),現(xiàn)在這里的數(shù)據(jù)插入出現(xiàn)了問(wèn)題
那為什么會(huì)出現(xiàn)問(wèn)題?
繼續(xù)看報(bào)錯(cuò),錯(cuò)誤是從這里開(kāi)始的:
insert into trigger_discovery (triggerdiscoveryid,triggerid,parent_triggerid,name) values (1677,26249,22532,'Free inodes is less than 20% on volume {#FSNAME}')
上述外鍵對(duì)應(yīng)的列parent_triggerid的值是22532,難道這個(gè)值在表triggers中有問(wèn)題?
我們?nèi)ケ韙riggers中查看:
從庫(kù)
mysql> select * from triggers where triggerid=22532;
Empty set (0.00 sec)
主庫(kù)
mysql> select * from triggers where triggerid=22532;
+-----------+------------+--------------------------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------------+-------+
| triggerid | expression | description | url | status | value | priority | lastchange | comments | error | templateid | type | value_flags | flags |
+-----------+------------+--------------------------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------------+-------+
| 22532 | {23251}<20 | Free inodes is less than 20% on volume {#FSNAME} | | 0 | 0 | 2 | 0 | | | 13272 | 0 | 1 | 2 |
+-----------+------------+--------------------------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------------+-------+
row in set (0.00 sec)
果然,從庫(kù)中沒(méi)有這個(gè)值對(duì)應(yīng)的信息,但主庫(kù)中是有的,原來(lái)是主從不一致導(dǎo)致的,從庫(kù)中缺失這個(gè)值,主庫(kù)中順利插入了,但數(shù)據(jù)傳到從庫(kù)后,從庫(kù)的外鍵約束限制了這一插入操作,所以SQL線(xiàn)程阻塞
問(wèn)題找到了,那怎么解決?
首先,為了讓從庫(kù)盡快恢復(fù)運(yùn)行,就先把這個(gè)錯(cuò)誤跳過(guò)吧:
mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; #跳過(guò)一個(gè)事務(wù)
mysql>start slave;
接下來(lái)就是主從數(shù)據(jù)不一致的問(wèn)題,可以使用pt-table-checksum來(lái)檢查下不一致的數(shù)據(jù),再進(jìn)行同步,具體步驟如下:
在主庫(kù)執(zhí)行:
mysql>GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE,CREATE,DELETE,INSERT,UPDATE ON . TO 'USER'@'MASTER_HOST' identified by 'PASSWORD';
注:創(chuàng)建用戶(hù),這些權(quán)限都是必須的,否則會(huì)報(bào)錯(cuò)
shell> ./pt-table-checksum --host='master_host' --user='user' --password='password' --port='port' --databases=zabbix --ignore-tables=ignore_table --recursion-method=processlist
注:(1)因?yàn)樯婕暗降谋硖?,查看后發(fā)現(xiàn)很多表都有外鍵關(guān)聯(lián),錯(cuò)綜復(fù)雜,而且因?yàn)槭潜O(jiān)控表,即使丟失一些也沒(méi)什么關(guān)系,所以查出較大的且沒(méi)有外鍵關(guān)聯(lián)的表用ignore-tables選項(xiàng)排除,對(duì)其他表進(jìn)行比對(duì),如果表比較少的話(huà)直接指定--TABLES
(2)recursion-method如果不設(shè)的話(huà),會(huì)報(bào)錯(cuò):Diffs cannot be detected because no slaves were found. 其參數(shù)有四:processlist/hosts/dsn/no,用來(lái)決定查找slave的方式是show full processlist還是show slave hosts還是直接給出slave信息,具體用法在另一隨筆pt-table-checksum介紹中詳述
shell>./pt-table-sync --print --replicate=percona.checksums h=master_host,u=user,p=password,P=port h=slave_host,u=user,p=password,P=port --recursion-method=processlist >pt.log
注:最好使用--print,不要直接使用--execute,否則如果弄出問(wèn)題,就更麻煩了,打印出直接執(zhí)行的語(yǔ)句,去從庫(kù)執(zhí)行就好了
將pt.log傳到從庫(kù),直接執(zhí)行,然后再次在主庫(kù)上進(jìn)行一致性檢查,如果還有不一致的數(shù)據(jù),記得登錄mysql去把checksums表清空,然后再次進(jìn)行檢查同步,直到?jīng)]有不一致的數(shù)據(jù)。
當(dāng)然,如果主從數(shù)據(jù)反復(fù)出現(xiàn)不一致的話(huà),那就要先去檢查造成不一致的原因了,釜底抽薪才是硬道理。
3、SQL_ERROR 1032解決辦法
緣由:
在主主同步的測(cè)試環(huán)境,由于業(yè)務(wù)側(cè)沒(méi)有遵循同一時(shí)間只寫(xiě)一個(gè)點(diǎn)的原則,造成A庫(kù)上刪除了一條數(shù)據(jù),B庫(kù)上在同時(shí)更新這條數(shù)據(jù)。
由于異步和網(wǎng)絡(luò)延時(shí),B的更新event先到達(dá)A端執(zhí)行,造成A端找不到這條記錄,故SQL_THREAD報(bào)錯(cuò)1032,主從同步停止。
錯(cuò)誤說(shuō)明:
MySQL主從同步的1032錯(cuò)誤,一般是指要更改的數(shù)據(jù)不存在,SQL_THREAD提取的日志無(wú)法應(yīng)用故報(bào)錯(cuò),造成同步失敗
(Update、Delete、Insert一條已經(jīng)delete的數(shù)據(jù))。
1032的錯(cuò)誤本身對(duì)數(shù)據(jù)一致性沒(méi)什么影響,影響最大的是造成了同步失敗、同步停止。
如果主主(主從)有同步失敗,要第一時(shí)間查看并著手解決。因?yàn)椴煌?,?huì)造成讀取數(shù)據(jù)的不一致。應(yīng)在第一時(shí)間恢復(fù)同步,
盡量減少對(duì)業(yè)務(wù)的影響。然后再具體分析不同步的原因,手動(dòng)或者自動(dòng)修復(fù)數(shù)據(jù),并做pt-table-checksum數(shù)據(jù)一致性檢查。
目前業(yè)務(wù)一般是做主主同步,主主同步由于是異步更新,存在更新沖突的問(wèn)題,且很容易引起SQL ERROR 1032錯(cuò)誤。這個(gè)應(yīng)該在業(yè)務(wù)側(cè)解決,
保證同一時(shí)間只更新數(shù)據(jù)庫(kù)的一個(gè)點(diǎn),類(lèi)似單點(diǎn)寫(xiě)入。我們的解決辦法是:寫(xiě)一個(gè)底層數(shù)據(jù)庫(kù)調(diào)用庫(kù),可能涉及到更新沖突的操作,都調(diào)用這個(gè)庫(kù)。
在配置文件里,配2個(gè)點(diǎn)的數(shù)據(jù)庫(kù)A、B,保證一直都更新A庫(kù),如果A庫(kù)不可用,就去更新B庫(kù)。
另外,如果是對(duì)數(shù)據(jù)一致性要求較高的場(chǎng)景,比如涉及到錢(qián),建議用PXC(強(qiáng)一致性、真正同步復(fù)制)。
解決辦法:
MySQL5.6.30版本,binlog模式為ROW。
show slave status\G,可以看到如下報(bào)錯(cuò):
Slave_SQL_Running: NO
Last_SQL_Errno: 1032
Last_SQL_Error: Worker 3 failed executing transaction '' at master log mysql-bin.000003, end_log_pos 440267874;
Could not execute Delete_rows event on table db_test.tbuservcbgolog; Can't find record in 'tbuservcbgolog', Error_code: 1032;
handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000003, end_log_pos 440267874
從上可以看出,是SQL_THREAD線(xiàn)程出錯(cuò),錯(cuò)誤號(hào)碼1032。是在應(yīng)用delete db_test.tbuservcbgolog 表中一行數(shù)據(jù)的事件時(shí),由于這條數(shù)據(jù)
不存在而出錯(cuò)。此事件在主服務(wù)器Master binlog中的位置是 mysql-bin.000003, end_log_pos 440267874。(當(dāng)然可以在從服務(wù)器Slave的Relay
log中查找,具體方法見(jiàn)最后)
方法1:跳過(guò)錯(cuò)誤Event
先跳過(guò)這一條錯(cuò)誤(event),讓主從同步恢復(fù)正常。(或者N條event,一條一條跳過(guò))
stop slave;
set global sql_slave_skip_counter=1;
start slave;
方法2:跳過(guò)所有1032錯(cuò)誤
更改my.cnf文件,在Replication settings下添加:
slave-skip-errors?。健?032
并重啟數(shù)據(jù)庫(kù),然后start salve。
注意:因?yàn)橐貑?shù)據(jù)庫(kù),不推薦,除非錯(cuò)誤事件太多。
方法3:還原被刪除的數(shù)據(jù)
根據(jù)錯(cuò)誤提示信息,用mysqlbinlog找到該條數(shù)據(jù)event SQL并逆向手動(dòng)執(zhí)行。如delete 改成insert。
本例中,此事件在主服務(wù)器Master binlog中的位置是 mysql-bin.000003, end_log_pos 440267874。
1)利用mysqlbinlog工具找出440267874的事件
/usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-output=decode-rows -vv mysql-bin.000003 |grep -A 20 '440267874'
或者/usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-output=decode-rows -vv mysql-bin.000003 --stop-position=440267874 | tail -20
或者usr/local/mysql-5.6.30/bin/mysqlbinlog --base64-output=decode-rows -vv mysql-bin.000003 > decode.log
( 或者加上參數(shù)-d, --database=name 來(lái)進(jìn)一步過(guò)濾)
#160923 20:01:27 server id 1223307 end_log_pos 440267874 CRC32 0x134b2cbc Delete_rows: table id 319 flags: STMT_END_F
###DELETE FROM db_99ducj
.tbuservcbgolog
###WHERE
###@1=10561502 / INT meta=0 nullable=0 is_null=0 /
###@2=1683955 / INT meta=0 nullable=0 is_null=0 /
###@3=90003 / INT meta=0 nullable=0 is_null=0 /
###@4=0 / INT meta=0 nullable=0 is_null=0 /
###@5='2016-09-23 17:02:24' / DATETIME(0) meta=0 nullable=1 is_null=0 /
###@6=NULL / DATETIME(0) meta=0 nullable=1 is_null=1 /
#at 440267874
以上為檢索出來(lái)的結(jié)果,事務(wù)語(yǔ)句為:delete from db_99ducj.tbuservcbgolog where @1=10561502 and @2=1683955 ...
其中@1 @2 @3...分別對(duì)應(yīng)表tbuservcbgolog的列名,填補(bǔ)上即可。
我們可以逆向此SQL 將deleter 變成Insert,手動(dòng)在從庫(kù)上執(zhí)行此Insert SQL,之后restart slave就好了。
注:通過(guò)Relay Log查找event SQL http://www.tuicool.com/articles/6RvUnqV