优化的最终目的是保障用户体验的同时,减少机器,节约成本。
为了更好的编写本文,花费20美金。欢迎赞赏支持。
g1原理见附录官方文档,本文假设读者对jvm gc和g1原理有基本的了解。
g1特点是内存分片(一般1024片),支持动态调整young区大小,old区使用mixed gc方式分成多次小gc,尽量减少单次gc STW(stop the world)暂停时间,让gc对应用延迟的影响在预期范围内。
对平均响应时间,最大响应时间有严格要求的应用系统,如hbase regionserver。
系统调优是从业务到实现,从整体到局部,从架构到具体组件的。在进行gc调优之前,我们应该确保业务层和应用层已经评估优化过。业务层和应用层的优化一般来说更容易有收益,我们不能指望一个架构设计有缺陷,应用层代码有很多已知问题的系统,通过gc调优一劳永逸。
先来看一下衡量gc的指标有哪些。对应用吞吐量的影响(一般是gc对cpu的消耗),对延迟的影响,总内存占用(gc触发时留有内存业务可以继续,留有内存做对象拷贝碎片整理等操作,不能oom)。
GC调优3选2原则: 在吞吐量、延迟、内存占用上,我们只能选择其中两个进行调优,无法三者兼得。
在调优之前,必须要有明确的性能优化目标, 然后找到未达到该目标的性能瓶颈。再针对瓶颈做优化。通过各种监控和统计工具,确认调优后的应用是否已经达到相关目标。
hbase集群启用了group分组,重要业务有独立的regionserver分组。
重要业务regionserver的调优目标是,在满足业务延迟要求的基础上,用尽量低的成本,满足业务吞吐量的峰值需求。
也就是说,总吞吐量固定,延迟要求固定,单机cpu和内存固定,求最小机器数。
再转换一下,对单机来说,延迟指标确定,将单机吞吐在单机cpu和内存允许的范围内调整到最大。
需要说明的是,单机能承担多少吞吐,跟业务访问模型,region数,读写缓存参数,网路IO,磁碟IO都有关系。业务和hbase参数的调整应该在gc优化之前进行,网路和磁碟IO一般是应用层优化的。所以下文假设业务层和应用层已优化完毕,网路和磁碟都不是瓶颈,只聚焦在gc参数调优。
本文假设我们换算后的延迟目标是平均gc暂停时间100ms,最大暂停时间2s,gc时间占比3%以内。实际达到这个目标后,还要通过regionserver监控确定请求的延时要是否在用户用户的要求范围内。
gc的时间占比。平均stw gc时间,频率。毛刺stw gc时间,频率。峰值stw gc时间,频率。
一般来说,regionserver应该避免full gc。
新生代越大,单次young gc时间越长,频率越低。
mixed gc受gc触发时机,gc并发线程数,预期迭代次数,每个迭代回收分片比例等多个参数影响,详见附录官方文档。
目前生产环境用1.8.0_77, 小米hbase环境用1.8.0_111, Oracle jdk的8最新版本是8u201。
intel性能测试见附录,jdk7不同版本间g1性能差距很大。Jdk7u21升级到jdk7u60,gc以及stw gc的平均时间,最大时间,时间分布都有大幅优化。
所以应该尽量用最新版本的JDK。
需要有方法论判断当前是否应该继续优化。
根据业务对延迟的需求,比较现在的请求延迟和gc情况,预估能接受的平均gc暂停时间,最大gc 暂停时间范围。
关掉自动balance,给一台regionserver少量增加region从而增加单机吞吐。当请求延迟超过用户要求的警戒线后,分析gc日志,找到瓶颈,优化降低gc延迟从而降低请求延迟,以便继续增加region。
当单机region数过多(可以考虑合并region),cpu负载过高,请求延迟无法降下来,任意一个条件满足,单机优化结束。稳定运行一段时间后,尝试将优化推广到整个业务分组。
要分析gc情况一定要有gc日志。之前的日志参数如下
-XX:+PrintGCDetails gc细节 -XX:+PrintGCDateStamps 时间戳 -Xloggc:${HBASE_LOG_DIR}/gc-`date +%Y%m%d%H%M` gc文件格式 -XX:+UseGCLogFileRotation gc文件循环 -XX:NumberOfGCLogFiles=10 文件数 -XX:GCLogFileSize=512M 文件大小 -XX:+HeapDumpOnOutOfMemoryError oom时堆dump -XX:HeapDumpPath=${HBASE_LOG_DIR}/hbase.heapdump dump目录 -XX:ErrorFile=${HBASE_LOG_DIR}/hs_err_pid%p.log -XX:+PrintAdaptiveSizePolicy 列印自适应收集的大小 -XX:+PrintFlagsFinal 列印参数值
参考其他优化的文章,增加列印参数
-XX:+PrintGCApplicationStoppedTime 列印垃圾回收期间程序暂停的时间 -XX:+PrintTenuringDistribution https://www.jianshu.com/p/e634955f3bbb survivor分布情况 -XX:+PrintHeapAtGC gc前后列印堆信息 -XX:+PrintSafepointStatistics https://blog.csdn.net/u011918260/article/details/70047159 分析安全点统计信息,优化gc参考 -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 列印每次GC前后内存碎片的统计信息,统计信息主要包括3个维度:Free Space、Max Chunk Size和Num Chunks。似乎cms更有用
gc日志太多可能会影响性能,目前没有日志对性能影响的数据,暂不考虑日志对性能的影响。
有很多gc可视化的工具。比如在线的gceasy https://gceasy.io/index.jsp#banner,上传gc日志可以进行分析。免费功能有各种图表展示。20美金一个月可以使用高级功能。
本文用gceasy做例子,其他可视化工具介绍见附录。
下面从前文优化过的节点开始,分析gc日志。
不花钱怎么变强?开启gceasy高级功能。
单次gc暂停最长到5秒了。链接给出了优化建议,见附录。
首先是优化程序降低对象创建速度。现在平均对象创建速度是1.22gb/sec。减少对象创建速度主要是更多用堆外内存,优化写放大,本文重点是gc优化,不展开。
再是年轻代太小,增大年轻代。这规则适用于其他gc,而g1 young gc是动态的,mixed gc是分次迭代的,young gc不能太大,否则young gc反而会比单次mixed慢。
再就是设置期待的最大暂停时间,目前设置100ms,暂不修改。
伺服器上进程多,内存交换,伺服器无这个情况。
gc线程不够多,有可能。但gc3秒以上的只有6次,差不多每天一次。如果是线程数明显不够次数应该更多,先放放。
伺服器高IO导致的等待时间长。有可能,有datanode,需要进一步分析gc日志和IO监控。
没有禁用system.gc()。gc case表格来看没有相关调用。
堆太大。有可能,毕竟80GB堆。但大堆对吞吐有好处。
看5秒暂停gc的具体日志,
{Heap before GC invocations=50103 (full 0): garbage-first heap total 83886080K, used 48760053K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 128 young (4194304K), 5 survivors (163840K) Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K 2019-02-18T17:48:53.570+0800: 365568.567: [GC pause (G1 Evacuation Pause) (mixed) Desired survivor size 268435456 bytes, new threshold 1 (max 1) - age 1: 87997400 bytes, 87997400 total 365568.567: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3313913, predicted base time: 1151.36 ms, remaining time: 0.00 ms, target pause time: 100.00 ms] 365568.567: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 123 regions, survivors: 5 regions, predicted young region time: 50.17 ms] 365568.583: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 70 regions, max: 256 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %] 365568.583: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 70 regions, expensive: 70 regions, min: 80 regions, remaining time: 0.00 ms] 365568.583: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 123 regions, survivors: 5 regions, old: 70 regions, predicted pause time: 4353.85 ms, target pause time: 100.00 ms] 365573.866: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 40.96 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] 365573.866: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 407 regions, reclaimable: 8557735352 bytes (9.96 %), threshold: 10.00 %] , 5.2999017 secs] [Parallel Time: 5227.7 ms, GC Workers: 16] [GC Worker Start (ms): Min: 365568584.3, Avg: 365568584.3, Max: 365568584.4, Diff: 0.1] [Ext Root Scanning (ms): Min: 1.3, Avg: 1.6, Max: 3.9, Diff: 2.7, Sum: 25.1] [Update RS (ms): Min: 1065.2, Avg: 1067.2, Max: 1067.8, Diff: 2.6, Sum: 17075.4] [Processed Buffers: Min: 739, Avg: 831.5, Max: 879, Diff: 140, Sum: 13304] [Scan RS (ms): Min: 3610.8, Avg: 3611.6, Max: 3612.6, Diff: 1.8, Sum: 57786.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.5] [Object Copy (ms): Min: 545.2, Avg: 546.3, Max: 547.2, Diff: 2.0, Sum: 8741.5] [Termination (ms): Min: 0.1, Avg: 0.6, Max: 0.8, Diff: 0.7, Sum: 9.8] [Termination Attempts: Min: 1, Avg: 465.3, Max: 552, Diff: 551, Sum: 7445] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.4] [GC Worker Total (ms): Min: 5227.3, Avg: 5227.4, Max: 5227.5, Diff: 0.2, Sum: 83638.8] [GC Worker End (ms): Min: 365573811.8, Avg: 365573811.8, Max: 365573811.8, Diff: 0.0] [Code Root Fixup: 0.4 ms] [Code Root Purge: 0.0 ms] [Clear CT: 3.4 ms] [Other: 68.4 ms] [Choose CSet: 16.3 ms] [Ref Proc: 3.6 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 8.7 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 31.6 ms] [Eden: 3936.0M(3936.0M)->0.0B(3904.0M) Survivors: 160.0M->192.0M Heap: 46.5G(80.0G)->42.3G(80.0G)] Heap after GC invocations=50104 (full 0): garbage-first heap total 83886080K, used 44323410K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 6 young (196608K), 6 survivors (196608K) Metaspace used 50945K, capacity 51410K, committed 52168K, reserved 53248K } [Times: user=85.30 sys=0.00, real=5.30 secs] 2019-02-18T17:48:58.871+0800: 365573.867: Total time for which application threads were stopped: 5.3131110 seconds, Stopping threads took: 0.0005771 seconds 2019-02-18T17:48:58.896+0800: 365573.893: Total time for which application threads were stopped: 0.0186168 seconds, Stopping threads took: 0.0005541 seconds 2019-02-18T17:48:58.906+0800: 365573.902: Total time for which application threads were stopped: 0.0078098 seconds, Stopping threads took: 0.0005988 seconds 2019-02-18T17:48:58.915+0800: 365573.912: Total time for which application threads were stopped: 0.0075271 seconds, Stopping threads took: 0.0003996 seconds 2019-02-18T17:48:58.923+0800: 365573.919: Total time for which application threads were stopped: 0.0072014 seconds, Stopping threads took: 0.0003114 seconds 2019-02-18T17:48:58.934+0800: 365573.930: Total time for which application threads were stopped: 0.0078706 seconds, Stopping threads took: 0.0005490 seconds 2019-02-18T17:48:58.942+0800: 365573.938: Total time for which application threads were stopped: 0.0073198 seconds, Stopping threads took: 0.0003927 seconds 2019-02-18T17:48:58.964+0800: 365573.960: Total time for which application threads were stopped: 0.0078810 seconds, Stopping threads took: 0.0007641 seconds 2019-02-18T17:48:58.972+0800: 365573.969: Total time for which application threads were stopped: 0.0075881 seconds, Stopping threads took: 0.0006277 seconds 2019-02-18T17:48:58.981+0800: 365573.978: Total time for which application threads were stopped: 0.0081290 seconds, Stopping threads took: 0.0011246 seconds 2019-02-18T17:48:58.992+0800: 365573.989: Total time for which application threads were stopped: 0.0076556 seconds, Stopping threads took: 0.0005358 seconds 2019-02-18T17:48:59.015+0800: 365574.011: Total time for which application threads were stopped: 0.0076750 seconds, Stopping threads took: 0.0005602 seconds 2019-02-18T17:48:59.026+0800: 365574.022: Total time for which application threads were stopped: 0.0089086 seconds, Stopping threads took: 0.0006000 seconds 2019-02-18T17:48:59.044+0800: 365574.041: Total time for which application threads were stopped: 0.0087554 seconds, Stopping threads took: 0.0006332 seconds 2019-02-18T17:48:59.054+0800: 365574.050: Total time for which application threads were stopped: 0.0084038 seconds, Stopping threads took: 0.0004326 seconds {Heap before GC invocations=50104 (full 0): garbage-first heap total 83886080K, used 48321106K [0x00007faec4000000, 0x00007faec6005000, 0x00007fc2c4000000) region size 32768K, 128 young (4194304K), 6 survivors (196608K) Metaspace used 50946K, capacity 51410K, committed 52168K, reserved 53248K
从日志看,期望survivor区有大概256MB,实际只有32*5=160MB,可能会将eden区直接提到old区。
近期gc时间占比过高,到40%了。
清理young 区的gc线程耗时过长,清理后可回收内存低于10%,mixed gc结束,其他gc线程空闲。说明mixed gc触发阈值设置低了,young gc参数也需要调整。
看暂停时间大于2秒的其他日志,都是触发mixed gc,但只有一个清理young和个别old区region的线程工作。确实说明mixed gc触发阈值设置低了,young gc过大了。
有41次gc的系统时间大于用户时间,大概3小时一次。
操作系统层面有可能是操作系统层异常,或者系统太忙了。出现频率大概3小时一次,看ganglia日志问题出现时cpu负载并不高。并且系统时间只是略高于用户时间不是明显有问题的数量级差距。先不考虑操作系统和JVM有bug的情况,暂不处理。
还是说有5秒的gc,同上。
-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+UnlockExperimentalVMOptions -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 -XX:G1NewSizePercent=5 -XX:G1MaxNewSizePercent=20 -XX:MaxTenuringThreshold=1 -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=16 -XX:InitiatingHeapOccupancyPercent=60 -XX:MaxDirectMemorySize=110g -XX:G1OldCSetRegionThresholdPercent=10 -XX:G1HeapWastePercent=10 -Xmx80g -Xms80g
从gc前后堆内存图来看,-XX:InitiatingHeapOccupancyPercent=65,52GB更合适。
暂停时间平均135ms,从暂停时间图来看,young gc时间较长,需要调小young gc区,小米经验改初始值,由5改成2。-XX:G1NewSizePercent=2
mixed gc 大多数一轮不到8次,暂停时间基本都超过100ms了。期待每轮16次的参数没起作用,这是由于每轮清理的region数过多。-XX:G1OldCSetRegionThresholdPercent=5,由10改成5,减少单次mixed gc时间。
survivor区应该大于young日志中期待的,Set SurvivorRatio from 8 (default) to 4。
-XX:InitiatingHeapOccupancyPercent=65 -XX:G1NewSizePercent=2 -XX:G1OldCSetRegionThresholdPercent=5 -XX:SurvivorRatio=4
参考各类优化资料,增加以下参数
-XX:+UseStringDeduplication 字元串去重,提高性能 -XX:-ResizePLAB 减少gc线程间通信的东西,关闭动态提升本地buffer -XX:+PerfDisableSharedMem 关掉统计信息的内存映射。开启在某些特殊场景下,会极大增加gc暂停时间
平均暂停时间,最大暂停时间,暂停时间分布,gc吞吐占比都有提高,优化成功。
接下来迭代提高这台伺服器region数,做gc优化。
偶发系统时间大于用户时间的问题,依然存在。发生时系统负载并不高,有可能操作系统和JVM层面有优化空间。
gceasy建议不要用-XX:+UseGCLogFileRotation,理由是历史会看不到,文件顺序不容易分辨,多个文件不利于gc分析。但这个参数意义是控制总文件大小,所以依然保留。
https://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
https://blog.csdn.net/qq_32447301/article/details/83067984
https://blog.csdn.net/xuelinmei_happy/article/details/51090115
https://blog.csdn.net/renfufei/article/details/56678064
https://blog.gceasy.io/2016/11/22/reduce-long-gc-pauses/
https://blog.gceasy.io/2016/12/11/sys-time-greater-than-user-time/
https://blog.gceasy.io/2016/12/22/total-time-for-which-application-threads-were-stopped/
https://blog.gceasy.io/2018/12/23/usestringdeduplication/
https://blog.gceasy.io/2019/01/29/try-to-avoid-xxusegclogfilerotation/
https://www.oracle.com/technetwork/cn/articles/java/g1gc-1984535-zhs.html
https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase
http://openinx.github.io/ppt/hbaseconasia2017_paper_18.pdf
https://blog.csdn.net/songhaifengshuaige/article/details/54177612
https://blog.csdn.net/maosijunzi/article/details/46562489
https://dzone.com/articles/g1gcgarbage-first-garbage-collector-tuning-flags-1
http://www.evanjones.ca/jvm-mmap-pause.html
晓哥:hbase优化之旅(一)探索regionserver参数优化?zhuanlan.zhihu.com 推荐阅读: