在某次重啟數(shù)據(jù)庫后,發(fā)現(xiàn)實(shí)例服務(wù)一直無法注冊,而僅有asm實(shí)例的服務(wù)注冊:
站在用戶的角度思考問題,與客戶深入溝通,找到秀山土家族苗族網(wǎng)站設(shè)計與秀山土家族苗族網(wǎng)站推廣的解決方案,憑借多年的經(jīng)驗(yàn),讓設(shè)計與互聯(lián)網(wǎng)技術(shù)結(jié)合,創(chuàng)造個性化、用戶體驗(yàn)好的作品,建站類型包括:網(wǎng)站建設(shè)、成都網(wǎng)站制作、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣、申請域名、虛擬主機(jī)、企業(yè)郵箱。業(yè)務(wù)覆蓋秀山土家族苗族地區(qū)。
lsnrctl status LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 17-JAN-2020 19:43:44 Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxxx)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_DATA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_MGMT" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_OCR" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... The command completed successfully
在ORACLE 12C中注冊監(jiān)聽服務(wù)是有l(wèi)reg進(jìn)程來決定的,此時我通過strace來追蹤lreg進(jìn)程是否存在異常,發(fā)現(xiàn)在POLL是持續(xù)發(fā)生timeout:
epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 31995}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32157}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32271}, ...}) = 0 open("/proc/loadavg", O_RDONLY) = 13 fstat(13, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1a71503000 read(13, "0.16 0.20 0.33 4/1395 210929\n", 1024) = 29 close(13) = 0 munmap(0x7f1a71503000, 4096) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32503}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout)
關(guān)于poll的描述如下:
poll的是一種查詢的方式 函數(shù)原型:int poll(struct pollfd *fds ,nfds_t nfds ,int timeout); fds為指向待查詢的設(shè)備文件數(shù)組; nfds描述第一個參數(shù)fds中有多少個設(shè)備; timeout為查詢不到我們期望的結(jié)果進(jìn)程睡眠的時間; 返回值:查詢到期望狀態(tài)的設(shè)備文件個數(shù) 功能過程描述:應(yīng)用程序中調(diào)用poll查詢文件的狀態(tài),首先將fds里面的每個設(shè)備文件fd取出,調(diào)用它們驅(qū)動程序的poll函數(shù),查詢是否出現(xiàn)我們期望狀態(tài),查詢完fds里面所有的設(shè)備文件得到滿足期望狀態(tài)的設(shè)備文件的數(shù)量,如果這個數(shù)為0,則poll調(diào)用將導(dǎo)致進(jìn)程就進(jìn)入睡眠狀態(tài),睡眠時間由poll函數(shù)設(shè)定,如果程序在睡眠狀態(tài)中fds的某個文件出現(xiàn)我們期望狀態(tài),那么poll立即返回,否則一直睡眠到睡眠時間結(jié)束為止,返回值為0;如果這個數(shù)大于0 ,poll返回滿足條件的設(shè)備數(shù)量。 poll相當(dāng)于open("/dev/xxx",O_RDWR)阻塞打開文件,區(qū)別在于當(dāng)設(shè)備文件無數(shù)據(jù)可讀時poll只導(dǎo)致程序休眠固定時間,而open將導(dǎo)致程序一直休眠到有數(shù)據(jù)為止。
此時我想難道是進(jìn)程存在異常,于是通過sqlplus 來重啟數(shù)據(jù),之后重新追蹤lreg進(jìn)程,發(fā)現(xiàn)不再出現(xiàn)poll函數(shù) timeout:
epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11203}, ru_stime={0, 21388}, ...}) = 0 epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11234}, ru_stime={0, 21447}, ...}) = 0 epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11264}, ru_stime={0, 21505}, ...}) = 0
但是數(shù)據(jù)庫實(shí)例監(jiān)聽還是無法注冊到監(jiān)聽.
此時開始懵圈了,監(jiān)聽能夠注冊asm實(shí)例的服務(wù),說明監(jiān)聽?wèi)?yīng)該沒有問題,數(shù)據(jù)庫的lreg進(jìn)程能夠持續(xù)進(jìn)行注冊,說明注冊沒有問題,那應(yīng)該是這之間存在什么異常。
于是,我用 oradebug Event 10257 進(jìn)行追蹤lreg進(jìn)程:
*** 2020-01-17T20:17:21.365862+08:00 (CDB$ROOT(1)) kmlwait: status: succ=0, wait=0, fail=0 kmmlrl: update for process drop delta: 357 357 149 150 5999 kmmlrl: 149 processes kmmlrl: instance load 2 kmmgdnu: O12DB goodness=0, delta=1, pdb=1, flags=0x104:unblocked/not overloaded, update=0x2:G/-/- kmmgdnu: O12DBXDB goodness=0, delta=1, pdb=1, flags=0x105:unblocked/not overloaded, update=0x2:G/-/- kmmlrl_network_hdlr_state: update kmmlrl_network_hdlr_state: update for network '-oracledefault-' kmmlrl_network_hdlr_state: beq handler: load=149, max=5999, flag=0x2002, upd=0x2 ------------------------------ Start Registration Information ------------------------------ Last update: 53704792 (3 seconds ago) Flag: 0x4, 0x0 State: succ=0, wait=0, fail=0 CDB: root pdb 1 last pdb 4098 open max pdb 2 Dispatcher configuration index: cur 1 max 1 Network '-oracledefault-' pdb 1 : Local listeners: Remote listeners: Handlers: Dedicated flg=0x2002, upd=0x2, srvl=1 services=O12DB hdlr load=149, max=5999 nam=DEDICATED adr=(ADDRESS=(PROTOCOL=BEQ)(PROGRAM=/app/oracle/product/12.2.0/dbhome_1/bin/oracle)(ARGV0='oracle./O12DB1')(ARGS='(LOCAL=NO)')) inf=LOCAL SERVER pri=0x7fea7aa8a208 *** 2020-01-17T20:17:21.365862+08:00 (CDB$ROOT(1)) kmlwait: status: succ=0, wait=0, fail=0 kmmlrl: update for process drop delta: 357 357 149 150 5999 kmmlrl: 149 processes kmmlrl: instance load 2 kmmgdnu: O12DB goodness=0, delta=1, pdb=1, flags=0x104:unblocked/not overloaded, update=0x2:G/-/- kmmgdnu: O12DBXDB goodness=0, delta=1, pdb=1, flags=0x105:unblocked/not overloaded, update=0x2:G/-/- kmmlrl_network_hdlr_state: update kmmlrl_network_hdlr_state: update for network '-oracledefault-' kmmlrl_network_hdlr_state: beq handler: load=149, max=5999, flag=0x2002, upd=0x2 ------------------------------ Start Registration Information ------------------------------ Last update: 53704792 (3 seconds ago) Flag: 0x4, 0x0 State: succ=0, wait=0, fail=0
這里發(fā)現(xiàn)Local listeners: 和Remote listeners:等的變量都是空的,查看數(shù)據(jù)庫的local_listener參數(shù)發(fā)現(xiàn)了異常當(dāng)前為oraagent-dummy:
SQL> show parameter local NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string -oraagent-dummy-
此時再次查看crs資源狀態(tài)發(fā)現(xiàn)實(shí)例1的狀態(tài)是offline的,那是因?yàn)槲沂菑膕qlplus 直接啟動數(shù)據(jù)庫,并沒有從集群資源來啟動
ora.o12db.db
1 ONLINE OFFLINE STABLE
于是,用srvctl啟動后,集群資源變?yōu)檎?,?shù)據(jù)庫實(shí)例監(jiān)聽也正確注冊到了監(jiān)聽:
Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_DATA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_MGMT" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_OCR" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "O12DB" has 1 instance(s). Instance "O12DB1", status READY, has 1 handler(s) for this service... Service "O12DBXDB" has 1 instance(s). Instance "O12DB1", status READY, has 1 handler(s) for this service... The command completed successfully