優化的最終目的是保障用戶體驗的同時,減少機器,節約成本。

為了更好的編寫本文,花費20美金。歡迎讚賞支持。

g1介紹

g1特點

g1原理見附錄官方文檔,本文假設讀者對jvm gc和g1原理有基本的了解。

g1特點是內存分片(一般1024片),支持動態調整young區大小,old區使用mixed gc方式分成多次小gc,盡量減少單次gc STW(stop the world)暫停時間,讓gc對應用延遲的影響在預期範圍內。

g1適用場景

對平均響應時間,最大響應時間有嚴格要求的應用系統,如hbase regionserver。

優化原則

先優化業務層和應用層

系統調優是從業務到實現,從整體到局部,從架構到具體組件的。在進行gc調優之前,我們應該確保業務層和應用層已經評估優化過。業務層和應用層的優化一般來說更容易有收益,我們不能指望一個架構設計有缺陷,應用層代碼有很多已知問題的系統,通過gc調優一勞永逸。

gc調優3選2原則

先來看一下衡量gc的指標有哪些。對應用吞吐量的影響(一般是gc對cpu的消耗),對延遲的影響,總內存佔用(gc觸發時留有內存業務可以繼續,留有內存做對象拷貝碎片整理等操作,不能oom)。

GC調優3選2原則: 在吞吐量、延遲、內存佔用上,我們只能選擇其中兩個進行調優,無法三者兼得。

hbase已有業務regionserver的調優目標

在調優之前,必須要有明確的性能優化目標, 然後找到未達到該目標的性能瓶頸。再針對瓶頸做優化。通過各種監控和統計工具,確認調優後的應用是否已經達到相關目標。

hbase集群啟用了group分組,重要業務有獨立的regionserver分組。

重要業務regionserver的調優目標是,在滿足業務延遲要求的基礎上,用盡量低的成本,滿足業務吞吐量的峰值需求。

也就是說,總吞吐量固定,延遲要求固定,單機cpu和內存固定,求最小機器數。

再轉換一下,對單機來說,延遲指標確定,將單機吞吐在單機cpu和內存允許的範圍內調整到最大。

需要說明的是,單機能承擔多少吞吐,跟業務訪問模型,region數,讀寫緩存參數,網路IO,磁碟IO都有關係。業務和hbase參數的調整應該在gc優化之前進行,網路和磁碟IO一般是應用層優化的。所以下文假設業務層和應用層已優化完畢,網路和磁碟都不是瓶頸,只聚焦在gc參數調優。

本文假設我們換算後的延遲目標是平均gc暫停時間100ms,最大暫停時間2s,gc時間佔比3%以內。實際達到這個目標後,還要通過regionserver監控確定請求的延時要是否在用戶用戶的要求範圍內。

影響延遲的因素

gc的時間佔比。平均stw gc時間,頻率。毛刺stw gc時間,頻率。峰值stw gc時間,頻率。

一般來說,regionserver應該避免full gc。

新生代越大,單次young gc時間越長,頻率越低。

mixed gc受gc觸發時機,gc並發線程數,預期迭代次數,每個迭代回收分片比例等多個參數影響,詳見附錄官方文檔。

關於JVM版本

目前生產環境用1.8.0_77, 小米hbase環境用1.8.0_111, Oracle jdk的8最新版本是8u201。

intel性能測試見附錄,jdk7不同版本間g1性能差距很大。Jdk7u21升級到jdk7u60,gc以及stw gc的平均時間,最大時間,時間分布都有大幅優化。

所以應該盡量用最新版本的JDK。

優化步驟

需要有方法論判斷當前是否應該繼續優化。

根據業務對延遲的需求,比較現在的請求延遲和gc情況,預估能接受的平均gc暫停時間,最大gc 暫停時間範圍。

關掉自動balance,給一台regionserver少量增加region從而增加單機吞吐。當請求延遲超過用戶要求的警戒線後,分析gc日誌,找到瓶頸,優化降低gc延遲從而降低請求延遲,以便繼續增加region。

當單機region數過多(可以考慮合併region),cpu負載過高,請求延遲無法降下來,任意一個條件滿足,單機優化結束。穩定運行一段時間後,嘗試將優化推廣到整個業務分組。

增加日誌

要分析gc情況一定要有gc日誌。之前的日誌參數如下

-XX:+PrintGCDetails gc細節
-XX:+PrintGCDateStamps 時間戳
-Xloggc:${HBASE_LOG_DIR}/gc-`date +%Y%m%d%H%M` gc文件格式
-XX:+UseGCLogFileRotation gc文件循環
-XX:NumberOfGCLogFiles=10 文件數
-XX:GCLogFileSize=512M 文件大小
-XX:+HeapDumpOnOutOfMemoryError oom時堆dump
-XX:HeapDumpPath=${HBASE_LOG_DIR}/hbase.heapdump dump目錄
-XX:ErrorFile=${HBASE_LOG_DIR}/hs_err_pid%p.log
-XX:+PrintAdaptiveSizePolicy 列印自適應收集的大小
-XX:+PrintFlagsFinal 列印參數值

參考其他優化的文章,增加列印參數

-XX:+PrintGCApplicationStoppedTime 列印垃圾回收期間程序暫停的時間
-XX:+PrintTenuringDistribution https://www.jianshu.com/p/e634955f3bbb survivor分布情況
-XX:+PrintHeapAtGC gc前後列印堆信息
-XX:+PrintSafepointStatistics https://blog.csdn.net/u011918260/article/details/70047159 分析安全點統計信息,優化gc參考
-XX:PrintSafepointStatisticsCount=1
-XX:PrintFLSStatistics=1 列印每次GC前後內存碎片的統計信息,統計信息主要包括3個維度:Free Space、Max Chunk Size和Num Chunks。似乎cms更有用

gc日誌太多可能會影響性能,目前沒有日誌對性能影響的數據,暫不考慮日誌對性能的影響。

可視化

有很多gc可視化的工具。比如在線的gceasy gceasy.io/index.jsp#,上傳gc日誌可以進行分析。免費功能有各種圖表展示。20美金一個月可以使用高級功能。

本文用gceasy做例子,其他可視化工具介紹見附錄。

下面從前文優化過的節點開始,分析gc日誌。

問題分析

gceasy優化建議

不花錢怎麼變強?開啟gceasy高級功能。

單次gc暫停最長到5秒了。鏈接給出了優化建議,見附錄。

首先是優化程序降低對象創建速度。現在平均對象創建速度是1.22gb/sec。減少對象創建速度主要是更多用堆外內存,優化寫放大,本文重點是gc優化,不展開。

再是年輕代太小,增大年輕代。這規則適用於其他gc,而g1 young gc是動態的,mixed gc是分次迭代的,young gc不能太大,否則young gc反而會比單次mixed慢。

再就是設置期待的最大暫停時間,目前設置100ms,暫不修改。

伺服器上進程多,內存交換,伺服器無這個情況。

gc線程不夠多,有可能。但gc3秒以上的只有6次,差不多每天一次。如果是線程數明顯不夠次數應該更多,先放放。

伺服器高IO導致的等待時間長。有可能,有datanode,需要進一步分析gc日誌和IO監控。

沒有禁用system.gc()。gc case表格來看沒有相關調用。

堆太大。有可能,畢竟80GB堆。但大堆對吞吐有好處。

看5秒暫停gc的具體日誌,

{Heap before GC invocations=50103 (full 0):
garbage-first heap total 83886080K, used 48760053K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000)
region size 32768K, 128 young (4194304K), 5 survivors (163840K)
Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K
2019-02-18T17:48:53.570+0800: 365568.567: [GC pause (G1 Evacuation Pause) (mixed)
Desired survivor size 268435456 bytes, new threshold 1 (max 1)
- age 1: 87997400 bytes, 87997400 total
365568.567: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3313913, predicted base time: 1151.36 ms, remaining time: 0.00 ms, target pause time: 100.00 ms]
365568.567: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 123 regions, survivors: 5 regions, predicted young region time: 50.17 ms]
365568.583: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 70 regions, max: 256 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %]
365568.583: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 70 regions, expensive: 70 regions, min: 80 regions, remaining time: 0.00 ms]
365568.583: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 123 regions, survivors: 5 regions, old: 70 regions, predicted pause time: 4353.85 ms, target pause time: 100.00 ms]
365573.866: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 40.96 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
365573.866: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 407 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %]
, 5.2999017 secs]
[Parallel Time: 5227.7 ms, GC Workers: 16]
[GC Worker Start (ms): Min: 365568584.3, Avg: 365568584.3, Max: 365568584.4, Diff: 0.1]
[Ext Root Scanning (ms): Min: 1.3, Avg: 1.6, Max: 3.9, Diff: 2.7, Sum: 25.1]
[Update RS (ms): Min: 1065.2, Avg: 1067.2, Max: 1067.8, Diff: 2.6, Sum: 17075.4]
[Processed Buffers: Min: 739, Avg: 831.5, Max: 879, Diff: 140, Sum: 13304]
[Scan RS (ms): Min: 3610.8, Avg: 3611.6, Max: 3612.6, Diff: 1.8, Sum: 57786.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.5]
[Object Copy (ms): Min: 545.2, Avg: 546.3, Max: 547.2, Diff: 2.0, Sum: 8741.5]
[Termination (ms): Min: 0.1, Avg: 0.6, Max: 0.8, Diff: 0.7, Sum: 9.8]
[Termination Attempts: Min: 1, Avg: 465.3, Max: 552, Diff: 551, Sum: 7445]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.4]
[GC Worker Total (ms): Min: 5227.3, Avg: 5227.4, Max: 5227.5, Diff: 0.2, Sum: 83638.8]
[GC Worker End (ms): Min: 365573811.8, Avg: 365573811.8, Max: 365573811.8, Diff: 0.0]
[Code Root Fixup: 0.4 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 3.4 ms]
[Other: 68.4 ms]
[Choose CSet: 16.3 ms]
[Ref Proc: 3.6 ms]
[Ref Enq: 0.2 ms]
[Redirty Cards: 8.7 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 31.6 ms]
[Eden: 3936.0M(3936.0M)->0.0B(3904.0M) Survivors: 160.0M->192.0M Heap: 46.5G(80.0G)->42.3G(80.0G)]
Heap after GC invocations=50104 (full 0):
garbage-first heap total 83886080K, used 44323410K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000)
region size 32768K, 6 young (196608K), 6 survivors (196608K)
Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K
}
[Times: user=85.30 sys=0.00, real=5.30 secs]
2019-02-18T17:48:58.871+0800: 365573.867: Total time for which application threads were stopped: 5.3131110 seconds, Stopping threads took: 0.0005771 seconds
2019-02-18T17:48:58.896+0800: 365573.893: Total time for which application threads were stopped: 0.0186168 seconds, Stopping threads took: 0.0005541 seconds
2019-02-18T17:48:58.906+0800: 365573.902: Total time for which application threads were stopped: 0.0078098 seconds, Stopping threads took: 0.0005988 seconds
2019-02-18T17:48:58.915+0800: 365573.912: Total time for which application threads were stopped: 0.0075271 seconds, Stopping threads took: 0.0003996 seconds
2019-02-18T17:48:58.923+0800: 365573.919: Total time for which application threads were stopped: 0.0072014 seconds, Stopping threads took: 0.0003114 seconds
2019-02-18T17:48:58.934+0800: 365573.930: Total time for which application threads were stopped: 0.0078706 seconds, Stopping threads took: 0.0005490 seconds
2019-02-18T17:48:58.942+0800: 365573.938: Total time for which application threads were stopped: 0.0073198 seconds, Stopping threads took: 0.0003927 seconds
2019-02-18T17:48:58.964+0800: 365573.960: Total time for which application threads were stopped: 0.0078810 seconds, Stopping threads took: 0.0007641 seconds
2019-02-18T17:48:58.972+0800: 365573.969: Total time for which application threads were stopped: 0.0075881 seconds, Stopping threads took: 0.0006277 seconds
2019-02-18T17:48:58.981+0800: 365573.978: Total time for which application threads were stopped: 0.0081290 seconds, Stopping threads took: 0.0011246 seconds
2019-02-18T17:48:58.992+0800: 365573.989: Total time for which application threads were stopped: 0.0076556 seconds, Stopping threads took: 0.0005358 seconds
2019-02-18T17:48:59.015+0800: 365574.011: Total time for which application threads were stopped: 0.0076750 seconds, Stopping threads took: 0.0005602 seconds
2019-02-18T17:48:59.026+0800: 365574.022: Total time for which application threads were stopped: 0.0089086 seconds, Stopping threads took: 0.0006000 seconds
2019-02-18T17:48:59.044+0800: 365574.041: Total time for which application threads were stopped: 0.0087554 seconds, Stopping threads took: 0.0006332 seconds
2019-02-18T17:48:59.054+0800: 365574.050: Total time for which application threads were stopped: 0.0084038 seconds, Stopping threads took: 0.0004326 seconds
{Heap before GC invocations=50104 (full 0):
garbage-first heap total 83886080K, used 48321106K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000)
region size 32768K, 128 young (4194304K), 6 survivors (196608K)
Metaspace used 50946K, capacity 51410K, committed 52168K, reserved 53248K

從日誌看,期望survivor區有大概256MB,實際只有32*5=160MB,可能會將eden區直接提到old區。

近期gc時間佔比過高,到40%了。

清理young 區的gc線程耗時過長,清理後可回收內存低於10%,mixed gc結束,其他gc線程空閑。說明mixed gc觸發閾值設置低了,young gc參數也需要調整。

看暫停時間大於2秒的其他日誌,都是觸發mixed gc,但只有一個清理young和個別old區region的線程工作。確實說明mixed gc觸發閾值設置低了,young gc過大了。

有41次gc的系統時間大於用戶時間,大概3小時一次。

操作系統層面有可能是操作系統層異常,或者系統太忙了。出現頻率大概3小時一次,看ganglia日誌問題出現時cpu負載並不高。並且系統時間只是略高於用戶時間不是明顯有問題的數量級差距。先不考慮操作系統和JVM有bug的情況,暫不處理。

還是說有5秒的gc,同上。

修改參數

-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
-XX:+UnlockExperimentalVMOptions
-XX:+ParallelRefProcEnabled
-XX:ConcGCThreads=4
-XX:ParallelGCThreads=16
-XX:G1NewSizePercent=5
-XX:G1MaxNewSizePercent=20
-XX:MaxTenuringThreshold=1
-XX:G1HeapRegionSize=32m
-XX:G1MixedGCCountTarget=16
-XX:InitiatingHeapOccupancyPercent=60
-XX:MaxDirectMemorySize=110g
-XX:G1OldCSetRegionThresholdPercent=10
-XX:G1HeapWastePercent=10
-Xmx80g
-Xms80g

從gc前後堆內存圖來看,-XX:InitiatingHeapOccupancyPercent=65,52GB更合適。

暫停時間平均135ms,從暫停時間圖來看,young gc時間較長,需要調小young gc區,小米經驗改初始值,由5改成2。-XX:G1NewSizePercent=2

mixed gc 大多數一輪不到8次,暫停時間基本都超過100ms了。期待每輪16次的參數沒起作用,這是由於每輪清理的region數過多。-XX:G1OldCSetRegionThresholdPercent=5,由10改成5,減少單次mixed gc時間。

survivor區應該大於young日誌中期待的,Set SurvivorRatio from 8 (default) to 4。

-XX:InitiatingHeapOccupancyPercent=65
-XX:G1NewSizePercent=2
-XX:G1OldCSetRegionThresholdPercent=5
-XX:SurvivorRatio=4

設置其他必要參數

參考各類優化資料,增加以下參數

-XX:+UseStringDeduplication 字元串去重,提高性能
-XX:-ResizePLAB 減少gc線程間通信的東西,關閉動態提升本地buffer
-XX:+PerfDisableSharedMem 關掉統計信息的內存映射。開啟在某些特殊場景下,會極大增加gc暫停時間

優化結果

平均暫停時間,最大暫停時間,暫停時間分布,gc吞吐佔比都有提高,優化成功。

接下來迭代提高這台伺服器region數,做gc優化。

遺留問題

偶發系統時間大於用戶時間的問題,依然存在。發生時系統負載並不高,有可能操作系統和JVM層面有優化空間。

gceasy建議不要用-XX:+UseGCLogFileRotation,理由是歷史會看不到,文件順序不容易分辨,多個文件不利於gc分析。但這個參數意義是控制總文件大小,所以依然保留。

附錄

g1官方介紹

oracle.com/technetwork/

其他可視化工具介紹

blog.csdn.net/qq_324473

blog.csdn.net/xuelinmei

blog.csdn.net/renfufei/

gceasy優化建議

blog.gceasy.io/2016/11/

blog.gceasy.io/2016/12/

blog.gceasy.io/2016/12/

blog.gceasy.io/2018/12/

blog.gceasy.io/2019/01/

優化參考

oracle.com/technetwork/

blogs.apache.org/hbase/

software.intel.com/en-u

openinx.github.io/ppt/h

blog.csdn.net/songhaife

blog.csdn.net/maosijunz

dzone.com/articles/g1gc

evanjones.ca/jvm-mmap-p

之前的優化探索

曉哥:hbase優化之旅(一)探索regionserver參數優化?

zhuanlan.zhihu.com圖標
推薦閱讀:

相关文章