怎么解決G1垃圾回收器GC頻繁導(dǎo)致的系統(tǒng)波動(dòng)問(wèn)題,相信很多沒(méi)有經(jīng)驗(yàn)的人對(duì)此束手無(wú)策,為此本文總結(jié)了問(wèn)題出現(xiàn)的原因和解決方法,通過(guò)這篇文章希望你能解決這個(gè)問(wèn)題。
在成都網(wǎng)站建設(shè)、成都網(wǎng)站設(shè)計(jì)中從網(wǎng)站色彩、結(jié)構(gòu)布局、欄目設(shè)置、關(guān)鍵詞群組等細(xì)微處著手,突出企業(yè)的產(chǎn)品/服務(wù)/品牌,幫助企業(yè)鎖定精準(zhǔn)用戶,提高在線咨詢和轉(zhuǎn)化,使成都網(wǎng)站營(yíng)銷成為有效果、有回報(bào)的無(wú)錫營(yíng)銷推廣。創(chuàng)新互聯(lián)專業(yè)成都網(wǎng)站建設(shè)十年了,客戶滿意度97.8%,歡迎成都創(chuàng)新互聯(lián)客戶聯(lián)系。
CPU波動(dòng)如圖所示:
內(nèi)存波動(dòng)如圖所示:
CPU經(jīng)常達(dá)到告警閾值,觸發(fā)告警信息,第一反應(yīng)就是去看下java進(jìn)程里哪個(gè)線程耗CPU資源多,其實(shí)這里看到內(nèi)存波動(dòng)情況就大致能猜測(cè)出和GC有關(guān)。
top -Hp PID PID %CPU %MEM TIME+ COMMAND 89011 70.9 31.8 154:44.25 java 89001 7.7 31.8 216:48.06 java 90049 7.7 31.8 57:37.76 java
將 89011
轉(zhuǎn)換為十六進(jìn)制 15bb3
通過(guò)jstack導(dǎo)出來(lái)的線程快照可以找到 15bb3
對(duì)應(yīng)的線程,結(jié)果如下所示:
cat 01_jstack.txt | grep '15bb3' --color "Gang worker#0 (G1 Parallel Marking Threads)" os_prio=0 tid=0x00117f198005f000 nid=0x15bb3 runnable
可以看到是G1在執(zhí)行GC的時(shí)候?qū)е翪PU強(qiáng)烈波動(dòng),和我們的猜測(cè)是吻合的。
馬上看了下GC日志,很恐怖
15:30:46.696+0800: 52753.211: [GC pause (G1 Humongous Allocation) (young) (to-space exhausted), 0.3541373 secs] [Parallel Time: 146.3 ms, GC Workers: 4] [GC Worker Start (ms): Min: 52753215.2, Avg: 52753215.2, Max: 52753215.2, Diff: 0.0] [Ext Root Scanning (ms): Min: 3.9, Avg: 4.2, Max: 4.7, Diff: 0.7, Sum: 16.7] [Update RS (ms): Min: 103.8, Avg: 104.3, Max: 104.4, Diff: 0.6, Sum: 417.1] [Processed Buffers: Min: 1512, Avg: 1551.5, Max: 1582, Diff: 70, Sum: 6206] [Scan RS (ms): Min: 1.6, Avg: 1.6, Max: 1.7, Diff: 0.1, Sum: 6.6] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Object Copy (ms): Min: 35.9, Avg: 36.1, Max: 36.2, Diff: 0.3, Sum: 144.2] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 146.2, Avg: 146.2, Max: 146.2, Diff: 0.1, Sum: 584.8] [GC Worker End (ms): Min: 52753361.4, Avg: 52753361.4, Max: 52753361.4, Diff: 0.0] [Code Root Fixup: 0.3 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.2 ms] [Other: 207.3 ms] [Evacuation Failure: 199.2 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.2 ms] [Ref Enq: 0.1 ms] [Redirty Cards: 0.2 ms] [Humongous Register: 0.2 ms] [Humongous Reclaim: 1.9 ms] [Free CSet: 0.2 ms] [Eden: 644.0M(1994.0M)->0.0B(204.0M) Survivors: 10.0M->0.0B Heap: 2886.0M(4096.0M)->1675.7M(4096.0M)] [Times: user=1.29 sys=0.00, real=0.36 secs]
幾乎是平均5秒一次的大對(duì)象分配失敗,導(dǎo)致每一次GC耗時(shí)300ms以上,不僅僅導(dǎo)致CPU增長(zhǎng),也導(dǎo)致了STW,大量的數(shù)據(jù)積壓。
起初看到這個(gè)馬上覺(jué)得可能是內(nèi)存不足導(dǎo)致的,偷偷的在一臺(tái)機(jī)器上做了試驗(yàn),由4G堆內(nèi)存增加到6G堆內(nèi)存。
-Xmx6144m -Xms6144m
經(jīng)過(guò)一天的觀察,發(fā)現(xiàn)CPU和內(nèi)存的波動(dòng)較之前沒(méi)有那么明顯,但是依然會(huì)有波動(dòng),只是拉長(zhǎng)了波動(dòng)范圍,GC的間隔時(shí)間變長(zhǎng)了。 很顯然加這點(diǎn)內(nèi)存這個(gè)治標(biāo)不治本。
因?yàn)槭鞘谴髮?duì)象分配失敗,我們的預(yù)留分配空間也是不小的,根本不可能存在分配失敗的情況,配置如下:
-XX:G1ReservePercent=25
這時(shí)候馬上想到了去看下gc時(shí)堆空間的變化情況。
jstat -gc PID 500 100 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 0.0 10240.0 0.0 10240.0 2232320.0 163840.0 1951744.0 1306537.7 143616.0 136650.1 16896.0 15666.3 66281 9351.321 0 0.000 9351.321 0.0 10240.0 0.0 10240.0 2232320.0 227328.0 1951744.0 1431467.6 143616.0 136650.1 16896.0 15666.3 66282 9351.321 0 0.000 9351.321 0.0 8192.0 0.0 8192.0 2226176.0 86016.0 1959936.0 1156258.9 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2226176.0 151552.0 1959936.0 1228965.3 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2226176.0 219136.0 1959936.0 1377447.6 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2226176.0 286720.0 1959936.0 1505449.5 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2226176.0 358400.0 1959936.0 1697964.5 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2226176.0 415744.0 1959936.0 1863855.0 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2160640.0 481280.0 2025472.0 2023601.4 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 2017280.0 546816.0 2168832.0 2167987.6 143616.0 136650.1 16896.0 15666.3 66282 9351.411 0 0.000 9351.411 0.0 8192.0 0.0 8192.0 1935360.0 581632.0 2250752.0 2249908.9 143616.0 136650.1 16896.0 15666.3 66283 9351.411 0 0.000 9351.411 0.0 0.0 0.0 0.0 221184.0 90112.0 3973120.0 1709604.2 143616.0 136650.1 16896.0 15666.3 66283 9351.727 0 0.000 9351.727 0.0 0.0 0.0 0.0 221184.0 145408.0 3973120.0 1850918.4 143616.0 136650.1 16896.0 15666.3 66283 9351.727 0 0.000 9351.727 0.0 0.0 0.0 0.0 221184.0 208896.0 3973120.0 1992232.5 143616.0 136650.1 16896.0 15666.3 66284 9351.727 0 0.000 9351.727 0.0 4096.0 0.0 4096.0 217088.0 61440.0 3973120.0 1214197.0 143616.0 136650.1 16896.0 15666.3 66284 9351.797 0 0.000 9351.797 0.0 4096.0 0.0 4096.0 217088.0 129024.0 3973120.0 1379063.5 143616.0 136650.1 16896.0 15666.3 66284 9351.797 0 0.000 9351.797 0.0 4096.0 0.0 4096.0 217088.0 192512.0 3973120.0 1538809.9 143616.0 136650.1 16896.0 15666.3 66284 9351.797 0 0.000 9351.797 0.0 6144.0 0.0 6144.0 2252800.0 57344.0 1935360.0 1060884.7 143616.0 136650.1 16896.0 15666.3 66285 9351.873 0 0.000 9351.873 0.0 6144.0 0.0 6144.0 2252800.0 100352.0 1935360.0 1157142.1 143616.0 136650.1 16896.0 15666.3 66285 9351.873 0 0.000 9351.873 0.0 6144.0 0.0 6144.0 2252800.0 182272.0 1935360.0 1335320.9 143616.0 136650.1 16896.0 15666.3 66285 9351.873 0 0.000 9351.873 0.0 6144.0 0.0 6144.0 2252800.0 227328.0 1935360.0 1417242.1 143616.0 136650.1 16896.0 15666.3 66285 9351.873 0 0.000 9351.873 0.0 8192.0 0.0 8192.0 2252800.0 116736.0 1933312.0 1203152.9 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2252800.0 186368.0 1933312.0 1201107.2 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2252800.0 256000.0 1933312.0 1346517.4 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2252800.0 313344.0 1933312.0 1492951.6 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2252800.0 378880.0 1933312.0 1660890.2 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2252800.0 440320.0 1933312.0 1806300.4 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2228224.0 499712.0 1957888.0 1956830.7 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 2095104.0 546816.0 2091008.0 2089952.7 143616.0 136650.1 16896.0 15666.3 66286 9351.970 0 0.000 9351.970 0.0 8192.0 0.0 8192.0 1982464.0 593920.0 2203648.0 2203618.5 143616.0 136650.1 16896.0 15666.3 66287 9351.970 0 0.000 9351.970 0.0 0.0 0.0 0.0 221184.0 75776.0 3973120.0 1607096.3 143616.0 136650.1 16896.0 15666.3 66287 9352.284 0 0.000 9352.284 0.0 0.0 0.0 0.0 221184.0 114688.0 3973120.0 1681849.4 143616.0 136650.1 16896.0 15666.3 66287 9352.284 0 0.000 9352.284 0.0 0.0 0.0 0.0 221184.0 200704.0 3973120.0 1896892.7 143616.0 136650.1 16896.0 15666.3 66287 9352.284 0 0.000 9352.284 0.0 4096.0 0.0 4096.0 217088.0 63488.0 3973120.0 1134152.3 143616.0 136650.1 16896.0 15666.3 66288 9352.356 0 0.000 9352.356 0.0 4096.0 0.0 4096.0 217088.0 116736.0 3973120.0 1279562.6 143616.0 136650.1 16896.0 15666.3 66288 9352.356 0 0.000 9352.356 0.0 4096.0 0.0 4096.0 217088.0 180224.0 3973120.0 1407564.5 143616.0 136650.1 16896.0 15666.3 66288 9352.356 0 0.000 9352.356 0.0 6144.0 0.0 6144.0 2244608.0 65536.0 1943552.0 1115144.2 143616.0 136650.1 16896.0 15666.3 66289 9352.454 0 0.000 9352.454 0.0 6144.0 0.0 6144.0 2244608.0 131072.0 1943552.0 1256458.4 143616.0 136650.1 16896.0 15666.3 66289 9352.454 0 0.000 9352.454 0.0 6144.0 0.0 6144.0 2244608.0 204800.0 1943552.0 1412108.8 143616.0 136650.1 16896.0 15666.3 66289 9352.454 0 0.000 9352.454 0.0 8192.0 0.0 8192.0 2238464.0 77824.0 1947648.0 1108199.4 143616.0 136650.1 16896.0 15666.3 66290 9352.558 0 0.000 9352.558 0.0 8192.0 0.0 8192.0 2238464.0 141312.0 1947648.0 1147881.6 143616.0 136650.1 16896.0 15666.3 66290 9352.558 0 0.000 9352.558 0.0 8192.0 0.0 8192.0 2238464.0 212992.0 1947648.0 1281003.6 143616.0 136650.1 16896.0 15666.3 66290 9352.558 0 0.000 9352.558
這很明顯是老年代一直在增長(zhǎng),而且增加特別快,當(dāng)增長(zhǎng)到老年代分配的總大小的時(shí)候就發(fā)生了GC,一次耗時(shí)300ms左右。 能直接躍升到老年的對(duì)象,且如此之多,很明顯是對(duì)象的大小超過(guò)了 HeapRegionSize 的一半大小以上,直接被扔到了老年代區(qū)域。 對(duì)于G1來(lái)說(shuō),如果 HeapRegionSize
沒(méi)有配置,那么這個(gè)值是智能變化的。 region sizes <1MB or >32MB
,通常來(lái)說(shuō),它的值可以參考如下:
https://stackoverflow.com/questions/46786601/how-to-know-region-size-used-of-g1-garbage-collector
in general these are the region-sizes per heap-size range: <4GB - 1MB <8GB - 2MB <16GB - 4MB <32GB - 8MB <64GB - 16MB 64GB+ - 32MB
For G1 GC, any object that is more than half a region size is considered a "Humongous object". Such an object is allocated directly in the old generation into "Humongous regions". These Humongous regions are a contiguous set of regions. StartsHumongous marks the start of the contiguous set and ContinuesHumongous marks the continuation of the set.
參考文獻(xiàn): https://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
這說(shuō)明是真的一直在分配大對(duì)象,在撐死了老年代的時(shí)候引發(fā)了GC的停頓。
這時(shí)候有點(diǎn)尷尬,又在這個(gè)夜黑風(fēng)高的晚上偷偷的問(wèn)運(yùn)維要了jmap的命令權(quán)限,找了一臺(tái)生產(chǎn)環(huán)境的機(jī)器導(dǎo)了N份hprof文件下來(lái)。
jmap -dump:live,format=b,file=xxx.hprof PID
從服務(wù)器拉下來(lái)N個(gè)hprof文件到本地一對(duì)比分析發(fā)現(xiàn)了很嚇人的東西。
堆里面的 byte[]
對(duì)象占堆內(nèi)存一半以上的空間,且里面一個(gè)數(shù)組的大小不多不少,正好1M。 每一個(gè) byte[]
的大小是 1048579 的字節(jié)。
這到底是什么東西分配了這么多的 byte
數(shù)組,且一次就分配1M。
通過(guò)使用 jvisualvm
把單獨(dú)的一個(gè) byte[]
數(shù)組的內(nèi)容導(dǎo)成csv文件看了下,內(nèi)容大致如下所示:
112, 2, 0, 67, 48, 89, 111, 76, 15, 46, 97, 107, 16, 99, 12, 101, 46, 99, 97, 12, 101, 115, 46, 105, 119, 12, 118, 46, 68, 12, 108, 97, 119, 101, 12, 66, 120, 99, 15, 90, 118, 110, 14, 70, 111, 108, 100, 17, 114, -103, 10, 11, 120, 99, 10, 97, 110... <已截?cái)?
看到這些字節(jié)數(shù)據(jù),我覺(jué)得可以寫一個(gè)工具把這些數(shù)據(jù)轉(zhuǎn)換為字符,因?yàn)榭梢灾乐灰?ASCII
表可以轉(zhuǎn)換過(guò)來(lái)看到大致的數(shù)據(jù)內(nèi)容。
小工具代碼如下:
public static void main(String[] args) throws IOException { int i = 0; File file = new File("1.csv"); String str = FileUtils.readFileToString(file, "utf-8"); String[] list = str.split(","); for (String s : list) { byte b = Byte.valueOf(s); char c = (char) b; System.out.print(c); i++; if (i > 3000) { break; } } System.out.println("hello end"); }
抽查了幾個(gè)1M對(duì)象導(dǎo)出csv字節(jié)文件,執(zhí)行轉(zhuǎn)換后的大致內(nèi)容如下所示:
eId inFault outFault exception in out inHeader outHeader properties0:ID-dcn-15597-1561992626862-68-335522280FFNS {"action":"xxxx","actionTm":"2019-xx-xx 02:26:29","aType":X,"code":"xxxxx","Type":x,"bId":"xxxx","bMessage":{"bOrg":1,"boxName ......
看到這里馬上就明白了這是我們的kafka消息報(bào)文的字節(jié)數(shù)組,而這一段kafka消息在我們系統(tǒng)的并發(fā)量能派件前三,數(shù)據(jù)量也是穩(wěn)居前排。
分析這種情況應(yīng)該是生產(chǎn)消息或者消費(fèi)接受消息的時(shí)候?yàn)橄⑸傻淖止?jié)數(shù)組,馬上找到了生產(chǎn)者和消費(fèi)者的基礎(chǔ)組件的核心代碼,發(fā)現(xiàn)了這樣一段代碼:
baos = new ByteArrayOutputStream(point.getMaxRequestSizeBytes()); ...... eData = baos.toByteArray();
這是生產(chǎn)者的代碼,其中 getMaxRequestSizeBytes
如果沒(méi)有配置值的話默認(rèn)大小是 1048576
,將消息放在了一個(gè)如此大的數(shù)組里面,這種對(duì)象分配直接扔到了老年代。 而這個(gè)消息又是并發(fā)量比較大,TPS是3000左右,所以導(dǎo)致了老年代空間使用率瞬間就上去了,分配失敗,然后GC,接著又上去,又分配失敗,接著GC。。。
問(wèn)題終于水落石出了,解決方案有兩種:
設(shè)置 getMaxRequestSizeBytes
的值為 1024*256
,減少每一個(gè)消息的數(shù)組的大小,使其不會(huì)超過(guò) HeapRegionSize
的一半大小,這個(gè)缺陷就是需要保證消息的大小必須在 1024*256
字節(jié)以內(nèi)。
調(diào)整堆內(nèi)存大小為8G或以上,同時(shí)增加配置 -XX:G1HeapRegionSize=4M
,使得1M的消息數(shù)組大小小于 HeapRegionSize
的一半。
HeapRegionSize
常見(jiàn)的大小是:
<4GB - 1MB <8GB - 2MB <16GB - 4MB <32GB - 8MB <64GB - 16MB 64GB+ - 32MB
一般情況下使用堆內(nèi)存大小/2048個(gè)Region。
看完上述內(nèi)容,你們掌握怎么解決G1垃圾回收器GC頻繁導(dǎo)致的系統(tǒng)波動(dòng)問(wèn)題的方法了嗎?如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容,歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,感謝各位的閱讀!