在上一篇文章中,我初步搭建了软硬体平台:

拓荒犬:3A平台上搭建机器学习平台(一)硬体和软体平台的初步搭建?

zhuanlan.zhihu.com
图标

平台搭建好了,由于想知道该平台的一个整体性能,所以想要跑些Benchmarks。由于TensorFlow使用的人群较多,打算集中在TensorFlow,也没有太多精力去折腾其他的framework。

了解到TensorFlow有自己的官方Benchmarks:tensorflow/benchmarks,里面的tf_cnn_benchmarks包含了resnet50, resnet152, inception3, vgg16, googlenet, alexnet等模型,只需要简单地提供一些参数,便可开始测试,然后给出该平台的性能。

由于是第一次使用TensorFlow,所以遇到不少坑,当然也有很多是因为自身经验不足的原因导致的,所以这里都会记录我踩坑的经历,如果想要看VEGA64具体的Benchmarks表现,直接拉到最后看结果即可。


首先遇到的问题是,从git上clone TensorFlow benchmarks后,执行以下指令,报错,说缺少相应的文件。

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server

提示缺少gradients_util文件

然后发现git上的TensorFlow有gradients_util.py,于是就怀疑是AMD的tensorflow-rocm文件不全,所以就自己手动添加相应的文件。可是添加好一个文件后,又报另一个文件缺失的错误,连续这样好几次,我觉得也不是办法,所以就想能否安装tensorflow-gpu来补全缺失的文件。虽然我了解到tensorflow-gpu明确需要CUDA的支持,需要run在NVIDIA GPU上面,但是还是抱著试一试的心态。

试一试的心态安装了-gpu
安装好后,果然报错,缺少相应的cuda9.0

于是我只好卸载了tensorflow-gpu,然后再想找下还有没有别的benchmarks可以run在AMD的平台,于是就在git上搜索AMD TensorFlow,还真找到这么一个项目:

noxouille/nvamdbench?

github.com
图标

这个项目虽然也是用的是TensorFlow官方的Benchmarks,但是是用的docker的ROCm,所以这样就不会缺少相应的文件?两者相Match?TensorFlow和Benchmark的版本匹配问题其实在我安装TensorFlow-gpu之前就考虑过这个问题,这里有个坑,待会再填。

虽然之前没有接触过docker,但是看了一下简介,大致了解可以当做是一个打包好的镜像,但是要比传统的镜像精简而且效率更高,后续有空再深入研究吧。安装docker,然后pull rocm/tensorflow,发现这个镜像还是蛮大的,下载大小大概有2GB,下下来后查看有5.21GB。

弄好了docker,相应的测试脚本终于能work了,然后跑了下Benchmarks,使用rocm-smi查看GPU的状态,发现GPU并没有一直处于100%的利用率,有可能是使用docker的原因吧,这个问题后续再研究,而且后面也发现使用docker的速度好像没有直接使用TensorFlow的速度快。但是后续和docker下的ROCm比对了一下测试结果,发现两者性能几乎没有差距,都在同一水平,甚至有些测试结果docker的值还比Python下的TensorFlow多几个点,不都都差不多。看来Docker确实有可用之处的,如果嫌自己安装ROCm下的TensorFlow麻烦,或者怕弄乱了自己的环境,也可以下载一个ROCm的docker试一试,先看看具体的表现如何。

有了测试结果,就想著和ROCm git上其他人的测试结果对比下:Performance comparsion: AMD with ROCm vs NVIDIA with cuDNN? · Issue #173 · ROCmSoftwarePlatform/tensorflow-upstream

在查看他人的结果过程中,也发现了有人遇到我相关的问题,提示ImportError。然后有人给出相应的解决方法是到Benchmarks的路径下执行git checkout cnn_tf_v1.11_compatible。就是checkout到与TensorFlow对应版本,这个我也知道,因为tf_cnn_benchmarks里面就明确提到了要对应版本,git上最新的benchmarks对应的是tf-nightly-gpu版本。

Note that the master branch of tf_cnn_benchmarks requires the latest nightly version of TensorFlow. You can install the nightly version by running pip install tf-nightly-gpu in a clean environment, or by installing TensorFlow from source. We sometimes will create a branch of tf_cnn_benchmarks, in the form of cnn_tf_vX.Y_compatible, that is compatible with TensorFlow version X.Y For example, branch cnn_tf_v1.9_compatible works with TensorFlow 1.9.

当时我是怎么操作的呢?我直接在网页中切换到了v1.12

然后直接复制地址去clone

我根本不知道我当时怎么想的,根本没有留意到地址甚至没有带版本号,只能说我对github还是太陌生了,平时也没有用github,只知道简单的clone, pull,甚至忘记了要切换版本要使用checkout。

然后我立马使用将Benchmarks切换到v1.12版本,果然不会报错,可以正常执行了。我就因为这个小疏忽,折腾了这么久。获得的经验教训就是,以后遇到问题,一定要先静下来,细心分析好,再去解决,某个方法无法解决,有可能并不是方法不奏效,而是你压根就没用正确地执行这个方法。

既然能work了,那么立即来测试一下各个model相应的Benchmark成绩吧。测试前其实还出了一个乌龙,我将batch_size设置成了128,TensorFlow给我报错说ResourceExhaustedError。然后了解到这个错误的原因是因为GPU资源耗尽,而Batch size的含义是批处理数据量,简单说就是我一批处理的数据太多了,GPU那可怜的8GB内存不够用。

还好还是,只要TensorFlow平台还稳定著没事,因为目前我的Machine Learning课程才刚刚学到Backpropagation这一部分,还没接触到相关的概念。我平时对于顶层应用和软体层接触比较少,所以相关的经验不足,不过对底层的CPU和GPU的Microarchitecture比较了解,就是大概知道有哪些Modules,各个Modules具体的功能是什么,他们之间是如何配合工作的。如何后续有时间的话,可以考虑和大家分享一下我对了底层硬体的一些了解。


ROCm版本显示为2.1.96,OS为ubuntu18.04,以下是TensorFlow输出的具体硬体信息:

name: Vega [Radeon RX Vega]

AMDGPU ISA: gfx900memoryClockRate (GHz) 1.63pciBusID 0000:43:00.0Total memory: 7.98GiBFree memory: 7.73GiB

TensorFlow: 1.12

以下所有测试,每种类型测试执行5次,取结果最好的成绩:

  • ResNet50

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50

Step Img/sec total_loss

1 images/sec: 190.0 +/- 0.0 (jitter = 0.0) 8.21710 images/sec: 190.7 +/- 0.4 (jitter = 1.3) 8.12420 images/sec: 190.7 +/- 0.4 (jitter = 0.9) 8.23030 images/sec: 190.6 +/- 0.4 (jitter = 0.8) 8.25640 images/sec: 190.4 +/- 0.3 (jitter = 0.7) 8.34750 images/sec: 190.5 +/- 0.3 (jitter = 0.7) 8.003

60 images/sec: 190.6 +/- 0.2 (jitter = 0.7) 8.260

70 images/sec: 190.6 +/- 0.2 (jitter = 0.7) 8.28980 images/sec: 190.6 +/- 0.2 (jitter = 0.6) 8.23290 images/sec: 190.6 +/- 0.2 (jitter = 0.6) 8.307100 images/sec: 190.6 +/- 0.2 (jitter = 0.6) 8.305----------------------------------------------------------------total images/sec: 190.58----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50

Step Img/sec total_loss

1 images/sec: 167.7 +/- 0.0 (jitter = 0.0) 8.45810 images/sec: 171.9 +/- 0.6 (jitter = 0.9) 7.998

20 images/sec: 171.7 +/- 0.5 (jitter = 0.7) 8.260

30 images/sec: 171.7 +/- 0.4 (jitter = 0.7) 8.33940 images/sec: 171.6 +/- 0.4 (jitter = 0.7) 8.18650 images/sec: 171.5 +/- 0.4 (jitter = 0.7) 7.74960 images/sec: 171.5 +/- 0.3 (jitter = 0.8) 8.06470 images/sec: 171.6 +/- 0.3 (jitter = 0.9) 8.47380 images/sec: 171.7 +/- 0.3 (jitter = 0.8) 8.30390 images/sec: 171.7 +/- 0.2 (jitter = 0.8) 8.024100 images/sec: 171.8 +/- 0.2 (jitter = 0.8) 7.985----------------------------------------------------------------

total images/sec: 171.70

----------------------------------------------------------------

TF_ROCM_FUSION_ENABLE=1 python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50

Step Img/sec total_loss

1 images/sec: 206.4 +/- 0.0 (jitter = 0.0) 8.21710 images/sec: 208.3 +/- 0.8 (jitter = 0.8) 8.12020 images/sec: 208.7 +/- 0.5 (jitter = 1.1) 8.23030 images/sec: 208.3 +/- 0.4 (jitter = 2.4) 8.26640 images/sec: 208.5 +/- 0.3 (jitter = 2.0) 8.35950 images/sec: 208.8 +/- 0.3 (jitter = 1.4) 7.99960 images/sec: 208.8 +/- 0.3 (jitter = 1.4) 8.282

70 images/sec: 208.8 +/- 0.2 (jitter = 1.4) 8.312

80 images/sec: 208.7 +/- 0.2 (jitter = 1.4) 8.22190 images/sec: 208.8 +/- 0.2 (jitter = 1.2) 8.310100 images/sec: 208.8 +/- 0.2 (jitter = 1.1) 8.288----------------------------------------------------------------total images/sec: 208.78----------------------------------------------------------------

TF_ROCM_FUSION_ENABLE=1 python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50

Step Img/sec total_loss

1 images/sec: 181.9 +/- 0.0 (jitter = 0.0) 8.45810 images/sec: 187.6 +/- 0.9 (jitter = 1.2) 7.99720 images/sec: 187.2 +/- 0.7 (jitter = 1.4) 8.261

30 images/sec: 187.5 +/- 0.5 (jitter = 1.3) 8.337

40 images/sec: 187.7 +/- 0.4 (jitter = 1.0) 8.18150 images/sec: 187.8 +/- 0.4 (jitter = 1.1) 7.75160 images/sec: 187.7 +/- 0.4 (jitter = 1.2) 8.06370 images/sec: 187.7 +/- 0.3 (jitter = 1.2) 8.48380 images/sec: 187.8 +/- 0.3 (jitter = 1.2) 8.30690 images/sec: 187.8 +/- 0.3 (jitter = 1.2) 8.038100 images/sec: 187.8 +/- 0.3 (jitter = 1.2) 8.000----------------------------------------------------------------total images/sec: 187.76----------------------------------------------------------------

  • AlexNet

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet

Step Img/sec total_loss

1 images/sec: 1575.9 +/- 0.0 (jitter = 0.0) 7.20010 images/sec: 1576.9 +/- 1.5 (jitter = 2.1) 7.19820 images/sec: 1573.9 +/- 3.4 (jitter = 2.1) 7.19830 images/sec: 1575.4 +/- 2.4 (jitter = 3.3) 7.20040 images/sec: 1575.9 +/- 2.1 (jitter = 4.7) 7.19950 images/sec: 1570.4 +/- 3.1 (jitter = 5.6) 7.19960 images/sec: 1570.1 +/- 2.9 (jitter = 5.6) 7.19970 images/sec: 1571.4 +/- 2.5 (jitter = 5.4) 7.19980 images/sec: 1572.2 +/- 2.2 (jitter = 5.4) 7.19990 images/sec: 1572.7 +/- 2.0 (jitter = 4.9) 7.201100 images/sec: 1573.3 +/- 1.8 (jitter = 4.7) 7.199----------------------------------------------------------------total images/sec: 1573.01----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=256 --model=alexnet

Step Img/sec total_loss

1 images/sec: 1402.2 +/- 0.0 (jitter = 0.0) 7.20010 images/sec: 1397.5 +/- 6.8 (jitter = 15.7) 7.20120 images/sec: 1413.7 +/- 5.2 (jitter = 19.6) 7.20030 images/sec: 1412.1 +/- 5.5 (jitter = 21.2) 7.20040 images/sec: 1414.8 +/- 4.3 (jitter = 18.6) 7.19650 images/sec: 1417.3 +/- 3.5 (jitter = 15.9) 7.19860 images/sec: 1418.7 +/- 3.0 (jitter = 15.0) 7.19970 images/sec: 1419.4 +/- 2.6 (jitter = 11.2) 7.19980 images/sec: 1420.1 +/- 2.4 (jitter = 13.0) 7.19990 images/sec: 1419.9 +/- 2.3 (jitter = 15.2) 7.200100 images/sec: 1421.1 +/- 2.1 (jitter = 15.2) 7.199----------------------------------------------------------------total images/sec: 1420.65----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=alexnet

Step Img/sec total_loss

1 images/sec: 1316.0 +/- 0.0 (jitter = 0.0) 7.20210 images/sec: 1331.5 +/- 4.2 (jitter = 17.9) 7.19920 images/sec: 1337.5 +/- 3.0 (jitter = 12.9) 7.20030 images/sec: 1338.9 +/- 2.3 (jitter = 12.7) 7.20040 images/sec: 1341.2 +/- 1.9 (jitter = 12.2) 7.20050 images/sec: 1342.5 +/- 1.6 (jitter = 11.1) 7.19960 images/sec: 1344.0 +/- 1.5 (jitter = 10.2) 7.19870 images/sec: 1343.9 +/- 1.4 (jitter = 10.7) 7.19980 images/sec: 1344.3 +/- 1.5 (jitter = 10.4) 7.19990 images/sec: 1345.5 +/- 1.4 (jitter = 10.8) 7.199100 images/sec: 1346.6 +/- 1.3 (jitter = 11.4) 7.199----------------------------------------------------------------total images/sec: 1345.73----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=alexnet

Step Img/sec total_loss

1 images/sec: 1136.7 +/- 0.0 (jitter = 0.0) 7.19710 images/sec: 1147.9 +/- 3.8 (jitter = 16.4) 7.20020 images/sec: 1150.9 +/- 2.8 (jitter = 11.5) 7.20030 images/sec: 1152.7 +/- 2.2 (jitter = 12.9) 7.20040 images/sec: 1153.1 +/- 1.9 (jitter = 8.7) 7.20050 images/sec: 1152.9 +/- 1.7 (jitter = 8.5) 7.20060 images/sec: 1153.0 +/- 1.4 (jitter = 7.6) 7.20070 images/sec: 1152.4 +/- 1.5 (jitter = 7.8) 7.19980 images/sec: 1152.6 +/- 1.4 (jitter = 7.6) 7.19990 images/sec: 1152.9 +/- 1.3 (jitter = 7.4) 7.200100 images/sec: 1153.1 +/- 1.2 (jitter = 7.9) 7.200----------------------------------------------------------------total images/sec: 1151.98----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=alexnet

Step Img/sec total_loss

1 images/sec: 968.5 +/- 0.0 (jitter = 0.0) nan10 images/sec: 976.0 +/- 4.3 (jitter = 9.1) nan20 images/sec: 975.0 +/- 2.9 (jitter = 10.5) nan30 images/sec: 974.4 +/- 2.2 (jitter = 9.7) nan40 images/sec: 974.6 +/- 1.8 (jitter = 9.1) nan50 images/sec: 973.9 +/- 1.6 (jitter = 8.0) nan60 images/sec: 974.1 +/- 1.4 (jitter = 7.0) nan70 images/sec: 974.1 +/- 1.3 (jitter = 7.4) nan80 images/sec: 973.2 +/- 1.7 (jitter = 7.4) nan90 images/sec: 973.2 +/- 1.5 (jitter = 7.0) nan100 images/sec: 973.4 +/- 1.4 (jitter = 7.0) nan----------------------------------------------------------------total images/sec: 971.85----------------------------------------------------------------

  • Inception v3

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3

Step Img/sec total_loss

1 images/sec: 103.7 +/- 0.0 (jitter = 0.0) 7.42110 images/sec: 104.2 +/- 0.1 (jitter = 0.3) 7.39520 images/sec: 104.1 +/- 0.1 (jitter = 0.4) 7.48330 images/sec: 103.9 +/- 0.1 (jitter = 0.5) 7.40240 images/sec: 103.8 +/- 0.1 (jitter = 0.5) 7.35250 images/sec: 103.9 +/- 0.1 (jitter = 0.6) 7.39260 images/sec: 103.9 +/- 0.1 (jitter = 0.5) 7.39470 images/sec: 103.8 +/- 0.1 (jitter = 0.6) 7.38480 images/sec: 103.8 +/- 0.1 (jitter = 0.6) 7.35390 images/sec: 103.8 +/- 0.1 (jitter = 0.5) 7.412100 images/sec: 103.8 +/- 0.1 (jitter = 0.5) 7.355----------------------------------------------------------------total images/sec: 103.82----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=inception3

Step Img/sec total_loss

1 images/sec: 97.7 +/- 0.0 (jitter = 0.0) 7.40110 images/sec: 98.4 +/- 0.4 (jitter = 0.9) 7.43820 images/sec: 98.6 +/- 0.2 (jitter = 0.9) 7.35030 images/sec: 98.5 +/- 0.2 (jitter = 0.9) 7.47940 images/sec: 98.5 +/- 0.2 (jitter = 0.8) 7.35950 images/sec: 98.5 +/- 0.1 (jitter = 0.8) 7.33060 images/sec: 98.5 +/- 0.1 (jitter = 0.7) 7.40270 images/sec: 98.5 +/- 0.1 (jitter = 0.7) 7.35480 images/sec: 98.5 +/- 0.1 (jitter = 0.6) 7.34290 images/sec: 98.5 +/- 0.1 (jitter = 0.7) 7.508100 images/sec: 98.5 +/- 0.1 (jitter = 0.7) 7.443----------------------------------------------------------------total images/sec: 98.50----------------------------------------------------------------

TF_ROCM_FUSION_ENABLE=1 python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3

Step Img/sec total_loss

1 images/sec: 109.2 +/- 0.0 (jitter = 0.0) 7.42810 images/sec: 109.8 +/- 0.2 (jitter = 0.7) 7.41520 images/sec: 109.9 +/- 0.1 (jitter = 0.5) 7.48630 images/sec: 109.8 +/- 0.1 (jitter = 0.4) 7.40540 images/sec: 109.8 +/- 0.1 (jitter = 0.4) 7.34850 images/sec: 109.8 +/- 0.1 (jitter = 0.4) 7.39760 images/sec: 109.7 +/- 0.1 (jitter = 0.3) 7.36470 images/sec: 109.7 +/- 0.1 (jitter = 0.3) 7.41380 images/sec: 109.7 +/- 0.1 (jitter = 0.3) 7.36890 images/sec: 109.7 +/- 0.1 (jitter = 0.3) 7.421100 images/sec: 109.7 +/- 0.0 (jitter = 0.3) 7.348----------------------------------------------------------------total images/sec: 109.66----------------------------------------------------------------

TF_ROCM_FUSION_ENABLE=1 python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=inception3

Step Img/sec total_loss

1 images/sec: 105.3 +/- 0.0 (jitter = 0.0) 7.36410 images/sec: 105.0 +/- 0.5 (jitter = 0.4) 7.37120 images/sec: 105.2 +/- 0.3 (jitter = 0.7) 7.31230 images/sec: 105.2 +/- 0.2 (jitter = 0.7) 7.50140 images/sec: 105.3 +/- 0.2 (jitter = 0.6) 7.36250 images/sec: 105.3 +/- 0.1 (jitter = 0.6) 7.33860 images/sec: 105.2 +/- 0.1 (jitter = 0.6) 7.42170 images/sec: 105.2 +/- 0.1 (jitter = 0.6) 7.30980 images/sec: 105.2 +/- 0.1 (jitter = 0.6) 7.37790 images/sec: 105.2 +/- 0.1 (jitter = 0.6) 7.520100 images/sec: 105.2 +/- 0.1 (jitter = 0.6) 7.367----------------------------------------------------------------total images/sec: 105.20----------------------------------------------------------------

  • VGG16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16

Step Img/sec total_loss

1 images/sec: 101.8 +/- 0.0 (jitter = 0.0) 7.29110 images/sec: 101.8 +/- 0.2 (jitter = 0.7) 7.26520 images/sec: 101.9 +/- 0.1 (jitter = 0.5) 7.27730 images/sec: 102.0 +/- 0.1 (jitter = 0.2) 7.24840 images/sec: 102.0 +/- 0.1 (jitter = 0.3) 7.28250 images/sec: 102.0 +/- 0.1 (jitter = 0.3) 7.26760 images/sec: 102.0 +/- 0.1 (jitter = 0.3) 7.26970 images/sec: 102.0 +/- 0.0 (jitter = 0.2) 7.24780 images/sec: 102.0 +/- 0.0 (jitter = 0.3) 7.27390 images/sec: 102.0 +/- 0.0 (jitter = 0.3) 7.254100 images/sec: 102.0 +/- 0.0 (jitter = 0.3) 7.275----------------------------------------------------------------total images/sec: 101.95----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=vgg16

Step Img/sec total_loss

1 images/sec: 91.2 +/- 0.0 (jitter = 0.0) 7.23110 images/sec: 92.0 +/- 0.1 (jitter = 0.2) 7.21920 images/sec: 92.0 +/- 0.1 (jitter = 0.2) 7.28730 images/sec: 91.9 +/- 0.1 (jitter = 0.1) 7.20440 images/sec: 91.9 +/- 0.0 (jitter = 0.2) 7.27950 images/sec: 91.9 +/- 0.1 (jitter = 0.2) 7.28960 images/sec: 91.9 +/- 0.0 (jitter = 0.2) 7.24470 images/sec: 91.9 +/- 0.0 (jitter = 0.2) 7.25180 images/sec: 91.8 +/- 0.0 (jitter = 0.2) 7.24590 images/sec: 91.8 +/- 0.0 (jitter = 0.2) 7.278100 images/sec: 91.8 +/- 0.0 (jitter = 0.2) 7.281----------------------------------------------------------------total images/sec: 91.80----------------------------------------------------------------

  • GoogLeNet

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet

Step Img/sec total_loss

1 images/sec: 496.7 +/- 0.0 (jitter = 0.0) 7.09710 images/sec: 499.8 +/- 1.1 (jitter = 1.1) 7.09420 images/sec: 500.2 +/- 0.7 (jitter = 1.3) 7.08730 images/sec: 499.6 +/- 0.8 (jitter = 2.3) 7.10440 images/sec: 499.7 +/- 0.6 (jitter = 2.3) 7.10250 images/sec: 499.4 +/- 0.5 (jitter = 2.2) 7.09460 images/sec: 499.3 +/- 0.5 (jitter = 2.4) 7.10170 images/sec: 499.2 +/- 0.4 (jitter = 2.4) 7.11180 images/sec: 499.2 +/- 0.4 (jitter = 2.2) 7.08390 images/sec: 499.0 +/- 0.3 (jitter = 2.3) 7.095100 images/sec: 498.9 +/- 0.3 (jitter = 2.1) 7.094----------------------------------------------------------------total images/sec: 498.73----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=googlenet

Step Img/sec total_loss

1 images/sec: 474.7 +/- 0.0 (jitter = 0.0) 7.08410 images/sec: 474.6 +/- 1.7 (jitter = 2.9) 7.04620 images/sec: 475.9 +/- 1.0 (jitter = 3.0) 7.10430 images/sec: 475.0 +/- 1.0 (jitter = 1.9) 7.09340 images/sec: 474.9 +/- 0.8 (jitter = 2.5) 7.10950 images/sec: 474.6 +/- 0.7 (jitter = 2.3) 7.10160 images/sec: 474.4 +/- 0.7 (jitter = 2.7) 7.08770 images/sec: 474.5 +/- 0.7 (jitter = 2.7) 7.08180 images/sec: 474.3 +/- 0.7 (jitter = 2.4) 7.09790 images/sec: 474.4 +/- 0.6 (jitter = 2.4) 7.096100 images/sec: 474.3 +/- 0.6 (jitter = 2.5) 7.079----------------------------------------------------------------total images/sec: 474.07----------------------------------------------------------------

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=googlenet

Step Img/sec total_loss

1 images/sec: 418.9 +/- 0.0 (jitter = 0.0) 7.19010 images/sec: 424.7 +/- 2.9 (jitter = 7.4) 7.09820 images/sec: 424.9 +/- 1.6 (jitter = 4.3) 7.09530 images/sec: 425.7 +/- 1.3 (jitter = 4.1) 7.07740 images/sec: 425.0 +/- 1.3 (jitter = 3.6) 7.08250 images/sec: 424.6 +/- 1.1 (jitter = 3.7) 7.06960 images/sec: 424.8 +/- 1.1 (jitter = 3.8) 7.11470 images/sec: 424.8 +/- 1.0 (jitter = 3.9) 7.09780 images/sec: 424.7 +/- 1.0 (jitter = 3.8) 7.10290 images/sec: 424.7 +/- 0.9 (jitter = 3.6) 7.111100 images/sec: 424.7 +/- 0.8 (jitter = 3.3) 7.118----------------------------------------------------------------total images/sec: 424.32----------------------------------------------------------------

ResNet152

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152

Step Img/sec total_loss

1 images/sec: 68.4 +/- 0.0 (jitter = 0.0) 9.90610 images/sec: 68.9 +/- 0.2 (jitter = 1.1) 9.65720 images/sec: 68.7 +/- 0.1 (jitter = 0.3) 9.67430 images/sec: 68.8 +/- 0.1 (jitter = 0.4) 9.93840 images/sec: 68.9 +/- 0.1 (jitter = 0.5) 9.92250 images/sec: 68.9 +/- 0.1 (jitter = 0.5) 10.06460 images/sec: 68.7 +/- 0.1 (jitter = 0.7) 10.30770 images/sec: 68.7 +/- 0.1 (jitter = 0.7) 10.00680 images/sec: 68.7 +/- 0.1 (jitter = 0.7) 9.87390 images/sec: 68.7 +/- 0.1 (jitter = 0.6) 10.233100 images/sec: 68.7 +/- 0.1 (jitter = 0.6) 10.008----------------------------------------------------------------total images/sec: 68.71----------------------------------------------------------------

TF_ROCM_FUSION_ENABLE=1 python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152

Step Img/sec total_loss

1 images/sec: 76.6 +/- 0.0 (jitter = 0.0) 9.88910 images/sec: 76.2 +/- 0.2 (jitter = 0.4) 9.66720 images/sec: 76.1 +/- 0.1 (jitter = 0.5) 9.70630 images/sec: 76.0 +/- 0.1 (jitter = 0.6) 9.92140 images/sec: 76.1 +/- 0.1 (jitter = 0.6) 9.96350 images/sec: 76.1 +/- 0.1 (jitter = 0.6) 10.08860 images/sec: 76.0 +/- 0.1 (jitter = 0.6) 10.25970 images/sec: 75.9 +/- 0.1 (jitter = 0.8) 10.02380 images/sec: 75.8 +/- 0.1 (jitter = 0.8) 9.91190 images/sec: 75.8 +/- 0.1 (jitter = 0.8) 10.248100 images/sec: 75.8 +/- 0.1 (jitter = 0.8) 10.052----------------------------------------------------------------total images/sec: 75.81----------------------------------------------------------------

经过一系列的测试,我发现大概等GPU冷却下来后,执行的第二次测试一般是最好的测试结果,第一次测试差不多相当于warm up(虽然Benchmark中会有warm up的步骤),然后第二次测试可能会得到最好的结果,后面可能由于连续测试,温度下不来导致Performance有所下降。反正还是要多测试测试,最好Performance和最差Performance有时还是会有一定差距的。

我也尝试过添加--use_fp16选项,发现和不添加结果相差不大,可能后续ROCm版本相应的策略有所更改吧,后面有空了再细究。

还有就是添加TF_ROCM_FUSION_ENABLE=1选项确实会在一定程度上提升ResNet的性能,有兴趣深究可以查看如下网页中,Fusion Support这一部分的内容。由于对于某些Framework,TF_ROCM_FUSION_ENABLE选项影响不大,所以有些Framework的测试没有给出添加该选项的结果。

ROCmSoftwarePlatform/tensorflow-upstream?

github.com
图标

测试结果汇总:

后续可能会针对具体的硬体资源做一些分析吧,还有就是查看更加具体的一些性能表现,比如clk, power, bandwidth, latency, hit/miss rate等等。


推荐阅读:
相关文章