優化的最終目的是保障用戶體驗的同時,減少機器,節約成本。
為了更好的編寫本文,花費20美金。歡迎讚賞支持。
g1原理見附錄官方文檔,本文假設讀者對jvm gc和g1原理有基本的瞭解。
g1特點是內存分片(一般1024片),支持動態調整young區大小,old區使用mixed gc方式分成多次小gc,盡量減少單次gc STW(stop the world)暫停時間,讓gc對應用延遲的影響在預期範圍內。
對平均響應時間,最大響應時間有嚴格要求的應用系統,如hbase regionserver。
系統調優是從業務到實現,從整體到局部,從架構到具體組件的。在進行gc調優之前,我們應該確保業務層和應用層已經評估優化過。業務層和應用層的優化一般來說更容易有收益,我們不能指望一個架構設計有缺陷,應用層代碼有很多已知問題的系統,通過gc調優一勞永逸。
先來看一下衡量gc的指標有哪些。對應用吞吐量的影響(一般是gc對cpu的消耗),對延遲的影響,總內存佔用(gc觸發時留有內存業務可以繼續,留有內存做對象拷貝碎片整理等操作,不能oom)。
GC調優3選2原則: 在吞吐量、延遲、內存佔用上,我們只能選擇其中兩個進行調優,無法三者兼得。
在調優之前,必須要有明確的性能優化目標, 然後找到未達到該目標的性能瓶頸。再針對瓶頸做優化。通過各種監控和統計工具,確認調優後的應用是否已經達到相關目標。
hbase集羣啟用了group分組,重要業務有獨立的regionserver分組。
重要業務regionserver的調優目標是,在滿足業務延遲要求的基礎上,用盡量低的成本,滿足業務吞吐量的峯值需求。
也就是說,總吞吐量固定,延遲要求固定,單機cpu和內存固定,求最小機器數。
再轉換一下,對單機來說,延遲指標確定,將單機吞吐在單機cpu和內存允許的範圍內調整到最大。
需要說明的是,單機能承擔多少吞吐,跟業務訪問模型,region數,讀寫緩存參數,網路IO,磁碟IO都有關係。業務和hbase參數的調整應該在gc優化之前進行,網路和磁碟IO一般是應用層優化的。所以下文假設業務層和應用層已優化完畢,網路和磁碟都不是瓶頸,只聚焦在gc參數調優。
本文假設我們換算後的延遲目標是平均gc暫停時間100ms,最大暫停時間2s,gc時間佔比3%以內。實際達到這個目標後,還要通過regionserver監控確定請求的延時要是否在用戶用戶的要求範圍內。
gc的時間佔比。平均stw gc時間,頻率。毛刺stw gc時間,頻率。峯值stw gc時間,頻率。
一般來說,regionserver應該避免full gc。
新生代越大,單次young gc時間越長,頻率越低。
mixed gc受gc觸發時機,gc並發線程數,預期迭代次數,每個迭代回收分片比例等多個參數影響,詳見附錄官方文檔。
目前生產環境用1.8.0_77, 小米hbase環境用1.8.0_111, Oracle jdk的8最新版本是8u201。
intel性能測試見附錄,jdk7不同版本間g1性能差距很大。Jdk7u21升級到jdk7u60,gc以及stw gc的平均時間,最大時間,時間分佈都有大幅優化。
所以應該盡量用最新版本的JDK。
需要有方法論判斷當前是否應該繼續優化。
根據業務對延遲的需求,比較現在的請求延遲和gc情況,預估能接受的平均gc暫停時間,最大gc 暫停時間範圍。
關掉自動balance,給一臺regionserver少量增加region從而增加單機吞吐。當請求延遲超過用戶要求的警戒線後,分析gc日誌,找到瓶頸,優化降低gc延遲從而降低請求延遲,以便繼續增加region。
當單機region數過多(可以考慮合併region),cpu負載過高,請求延遲無法降下來,任意一個條件滿足,單機優化結束。穩定運行一段時間後,嘗試將優化推廣到整個業務分組。
要分析gc情況一定要有gc日誌。之前的日誌參數如下
-XX:+PrintGCDetails gc細節 -XX:+PrintGCDateStamps 時間戳 -Xloggc:${HBASE_LOG_DIR}/gc-`date +%Y%m%d%H%M` gc文件格式 -XX:+UseGCLogFileRotation gc文件循環 -XX:NumberOfGCLogFiles=10 文件數 -XX:GCLogFileSize=512M 文件大小 -XX:+HeapDumpOnOutOfMemoryError oom時堆dump -XX:HeapDumpPath=${HBASE_LOG_DIR}/hbase.heapdump dump目錄 -XX:ErrorFile=${HBASE_LOG_DIR}/hs_err_pid%p.log -XX:+PrintAdaptiveSizePolicy 列印自適應收集的大小 -XX:+PrintFlagsFinal 列印參數值
參考其他優化的文章,增加列印參數
-XX:+PrintGCApplicationStoppedTime 列印垃圾回收期間程序暫停的時間 -XX:+PrintTenuringDistribution https://www.jianshu.com/p/e634955f3bbb survivor分佈情況 -XX:+PrintHeapAtGC gc前後列印堆信息 -XX:+PrintSafepointStatistics https://blog.csdn.net/u011918260/article/details/70047159 分析安全點統計信息,優化gc參考 -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 列印每次GC前後內存碎片的統計信息,統計信息主要包括3個維度:Free Space、Max Chunk Size和Num Chunks。似乎cms更有用
gc日誌太多可能會影響性能,目前沒有日誌對性能影響的數據,暫不考慮日誌對性能的影響。
有很多gc可視化的工具。比如在線的gceasy https://gceasy.io/index.jsp#banner,上傳gc日誌可以進行分析。免費功能有各種圖表展示。20美金一個月可以使用高級功能。
本文用gceasy做例子,其他可視化工具介紹見附錄。
下面從前文優化過的節點開始,分析gc日誌。
不花錢怎麼變強?開啟gceasy高級功能。
單次gc暫停最長到5秒了。鏈接給出了優化建議,見附錄。
首先是優化程序降低對象創建速度。現在平均對象創建速度是1.22gb/sec。減少對象創建速度主要是更多用堆外內存,優化寫放大,本文重點是gc優化,不展開。
再是年輕代太小,增大年輕代。這規則適用於其他gc,而g1 young gc是動態的,mixed gc是分次迭代的,young gc不能太大,否則young gc反而會比單次mixed慢。
再就是設置期待的最大暫停時間,目前設置100ms,暫不修改。
伺服器上進程多,內存交換,伺服器無這個情況。
gc線程不夠多,有可能。但gc3秒以上的只有6次,差不多每天一次。如果是線程數明顯不夠次數應該更多,先放放。
伺服器高IO導致的等待時間長。有可能,有datanode,需要進一步分析gc日誌和IO監控。
沒有禁用system.gc()。gc case表格來看沒有相關調用。
堆太大。有可能,畢竟80GB堆。但大堆對吞吐有好處。
看5秒暫停gc的具體日誌,
{Heap before GC invocations=50103 (full 0): garbage-first heap total 83886080K, used 48760053K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 128 young (4194304K), 5 survivors (163840K) Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K 2019-02-18T17:48:53.570+0800: 365568.567: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 268435456 bytes, new threshold 1 (max 1) - age 1: 87997400 bytes, 87997400 total 365568.567: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3313913, predicted base time: 1151.36 ms, remaining time: 0.00 ms, target pause time: 100.00 ms] 365568.567: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 123 regions, survivors: 5 regions, predicted young region time: 50.17 ms] 365568.583: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 70 regions, max: 256 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %] 365568.583: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 70 regions, expensive: 70 regions, min: 80 regions, remaining time: 0.00 ms] 365568.583: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 123 regions, survivors: 5 regions, old: 70 regions, predicted pause time: 4353.85 ms, target pause time: 100.00 ms] 365573.866: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 40.96 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] 365573.866: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 407 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %] , 5.2999017 secs] [Parallel Time: 5227.7 ms, GC Workers: 16] [GC Worker Start (ms): Min: 365568584.3, Avg: 365568584.3, Max: 365568584.4, Diff: 0.1] [Ext Root Scanning (ms): Min: 1.3, Avg: 1.6, Max: 3.9, Diff: 2.7, Sum: 25.1] [Update RS (ms): Min: 1065.2, Avg: 1067.2, Max: 1067.8, Diff: 2.6, Sum: 17075.4] [Processed Buffers: Min: 739, Avg: 831.5, Max: 879, Diff: 140, Sum: 13304] [Scan RS (ms): Min: 3610.8, Avg: 3611.6, Max: 3612.6, Diff: 1.8, Sum: 57786.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.5] [Object Copy (ms): Min: 545.2, Avg: 546.3, Max: 547.2, Diff: 2.0, Sum: 8741.5] [Termination (ms): Min: 0.1, Avg: 0.6, Max: 0.8, Diff: 0.7, Sum: 9.8] [Termination Attempts: Min: 1, Avg: 465.3, Max: 552, Diff: 551, Sum: 7445] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.4] [GC Worker Total (ms): Min: 5227.3, Avg: 5227.4, Max: 5227.5, Diff: 0.2, Sum: 83638.8] [GC Worker End (ms): Min: 365573811.8, Avg: 365573811.8, Max: 365573811.8, Diff: 0.0] [Code Root Fixup: 0.4 ms] [Code Root Purge: 0.0 ms] [Clear CT: 3.4 ms] [Other: 68.4 ms] [Choose CSet: 16.3 ms] [Ref Proc: 3.6 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 8.7 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 31.6 ms] [Eden: 3936.0M(3936.0M)->0.0B(3904.0M) Survivors: 160.0M->192.0M Heap: 46.5G(80.0G)->42.3G(80.0G)] Heap after GC invocations=50104 (full 0): garbage-first heap total 83886080K, used 44323410K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 6 young (196608K), 6 survivors (196608K) Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K } [Times: user=85.30 sys=0.00, real=5.30 secs] 2019-02-18T17:48:58.871+0800: 365573.867: Total time for which application threads were stopped: 5.3131110 seconds, Stopping threads took: 0.0005771 seconds 2019-02-18T17:48:58.896+0800: 365573.893: Total time for which application threads were stopped: 0.0186168 seconds, Stopping threads took: 0.0005541 seconds 2019-02-18T17:48:58.906+0800: 365573.902: Total time for which application threads were stopped: 0.0078098 seconds, Stopping threads took: 0.0005988 seconds 2019-02-18T17:48:58.915+0800: 365573.912: Total time for which application threads were stopped: 0.0075271 seconds, Stopping threads took: 0.0003996 seconds 2019-02-18T17:48:58.923+0800: 365573.919: Total time for which application threads were stopped: 0.0072014 seconds, Stopping threads took: 0.0003114 seconds 2019-02-18T17:48:58.934+0800: 365573.930: Total time for which application threads were stopped: 0.0078706 seconds, Stopping threads took: 0.0005490 seconds 2019-02-18T17:48:58.942+0800: 365573.938: Total time for which application threads were stopped: 0.0073198 seconds, Stopping threads took: 0.0003927 seconds 2019-02-18T17:48:58.964+0800: 365573.960: Total time for which application threads were stopped: 0.0078810 seconds, Stopping threads took: 0.0007641 seconds 2019-02-18T17:48:58.972+0800: 365573.969: Total time for which application threads were stopped: 0.0075881 seconds, Stopping threads took: 0.0006277 seconds 2019-02-18T17:48:58.981+0800: 365573.978: Total time for which application threads were stopped: 0.0081290 seconds, Stopping threads took: 0.0011246 seconds 2019-02-18T17:48:58.992+0800: 365573.989: Total time for which application threads were stopped: 0.0076556 seconds, Stopping threads took: 0.0005358 seconds 2019-02-18T17:48:59.015+0800: 365574.011: Total time for which application threads were stopped: 0.0076750 seconds, Stopping threads took: 0.0005602 seconds 2019-02-18T17:48:59.026+0800: 365574.022: Total time for which application threads were stopped: 0.0089086 seconds, Stopping threads took: 0.0006000 seconds 2019-02-18T17:48:59.044+0800: 365574.041: Total time for which application threads were stopped: 0.0087554 seconds, Stopping threads took: 0.0006332 seconds 2019-02-18T17:48:59.054+0800: 365574.050: Total time for which application threads were stopped: 0.0084038 seconds, Stopping threads took: 0.0004326 seconds {Heap before GC invocations=50104 (full 0): garbage-first heap total 83886080K, used 48321106K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 128 young (4194304K), 6 survivors (196608K) Metaspace used 50946K, capacity 51410K, committed 52168K, reserved 53248K
從日誌看,期望survivor區有大概256MB,實際只有32*5=160MB,可能會將eden區直接提到old區。
近期gc時間佔比過高,到40%了。
清理young 區的gc線程耗時過長,清理後可回收內存低於10%,mixed gc結束,其他gc線程空閑。說明mixed gc觸發閾值設置低了,young gc參數也需要調整。
看暫停時間大於2秒的其他日誌,都是觸發mixed gc,但只有一個清理young和個別old區region的線程工作。確實說明mixed gc觸發閾值設置低了,young gc過大了。
有41次gc的系統時間大於用戶時間,大概3小時一次。
操作系統層面有可能是操作系統層異常,或者系統太忙了。出現頻率大概3小時一次,看ganglia日誌問題出現時cpu負載並不高。並且系統時間只是略高於用戶時間不是明顯有問題的數量級差距。先不考慮操作系統和JVM有bug的情況,暫不處理。
還是說有5秒的gc,同上。
-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UnlockExperimentalVMOptions -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 -XX:G1NewSizePercent=5 -XX:G1MaxNewSizePercent=20 -XX:MaxTenuringThreshold=1 -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=16 -XX:InitiatingHeapOccupancyPercent=60 -XX:MaxDirectMemorySize=110g -XX:G1OldCSetRegionThresholdPercent=10 -XX:G1HeapWastePercent=10 -Xmx80g -Xms80g
從gc前後堆內存圖來看,-XX:InitiatingHeapOccupancyPercent=65,52GB更合適。
暫停時間平均135ms,從暫停時間圖來看,young gc時間較長,需要調小young gc區,小米經驗改初始值,由5改成2。-XX:G1NewSizePercent=2
mixed gc 大多數一輪不到8次,暫停時間基本都超過100ms了。期待每輪16次的參數沒起作用,這是由於每輪清理的region數過多。-XX:G1OldCSetRegionThresholdPercent=5,由10改成5,減少單次mixed gc時間。
survivor區應該大於young日誌中期待的,Set SurvivorRatio from 8 (default) to 4。
-XX:InitiatingHeapOccupancyPercent=65 -XX:G1NewSizePercent=2 -XX:G1OldCSetRegionThresholdPercent=5 -XX:SurvivorRatio=4
參考各類優化資料,增加以下參數
-XX:+UseStringDeduplication 字元串去重,提高性能 -XX:-ResizePLAB 減少gc線程間通信的東西,關閉動態提升本地buffer -XX:+PerfDisableSharedMem 關掉統計信息的內存映射。開啟在某些特殊場景下,會極大增加gc暫停時間
平均暫停時間,最大暫停時間,暫停時間分佈,gc吞吐佔比都有提高,優化成功。
接下來迭代提高這臺伺服器region數,做gc優化。
偶發系統時間大於用戶時間的問題,依然存在。發生時系統負載並不高,有可能操作系統和JVM層面有優化空間。
gceasy建議不要用-XX:+UseGCLogFileRotation,理由是歷史會看不到,文件順序不容易分辨,多個文件不利於gc分析。但這個參數意義是控制總文件大小,所以依然保留。
https://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
https://blog.csdn.net/qq_32447301/article/details/83067984
https://blog.csdn.net/xuelinmei_happy/article/details/51090115
https://blog.csdn.net/renfufei/article/details/56678064
https://blog.gceasy.io/2016/11/22/reduce-long-gc-pauses/
https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/
https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
https://blog.gceasy.io/2018/12/23/usestringdeduplication/
https://blog.gceasy.io/2019/01/29/try-to-avoid-xxusegclogfilerotation/
https://www.oracle.com/technetwork/cn/articles/java/g1gc-1984535-zhs.html
https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase
http://openinx.github.io/ppt/hbaseconasia2017_paper_18.pdf
https://blog.csdn.net/songhaifengshuaige/article/details/54177612
https://blog.csdn.net/maosijunzi/article/details/46562489
https://dzone.com/articles/g1gcgarbage-first-garbage-collector-tuning-flags-1
http://www.evanjones.ca/jvm-mmap-pause.html
曉哥:hbase優化之旅(一)探索regionserver參數優化?zhuanlan.zhihu.com 推薦閱讀: