Jetson TX2是NVIDIA推出的一款基于 NVIDIA Pascal? 架构的 AI 单模块超级计算机,性能强大(1 TFLOPS),外形小巧,节能高效(7.5W),非常适合无人车,机器人、智能摄像机和便携医疗设备等智能终端设备。

TensorFlow是Google推出的一款开源深度学习框架,这篇文章主要介绍如何在Jetson TX2上通过源码构建 TensorFlow 1.6,构建需要占用大约10GB的可用空间,整个安装过程比较漫长,可能持续5-6个小时,而且可能会有一些意想不到的问题存在。


环境

因为安装环境比较重要,这里先说明我所使用的TX2的安装环境是新版的Jetpack 3.2, ,具体环境如下:

| L4T | 28.2 || CUDA Toolkit | 9.0 || cuDNN | 7.0.5 |

这里我建议将Jetson TX2重新刷机到Jetpack 3.2,因为环境不同的情况下安装可能会出现问题,更新为Jetpack3.2 可以参考我的另一篇文章:

Coulson:NVIDIA Jetson TX2 刷机 Jetpack 3.2 教程?

zhuanlan.zhihu.com
图标

安装

首先,安装TensorFlow的脚本文件位于JetsonTFBuild,我们先clone下来脚本文件:

git clone https://github.com/JasonAtNvidia/JetsonTFBuild.gitcd JetsonTFBuild

这里,下载好的脚本文本之后,执行构建Tensorflow 1.6的命令。因为脚本文件封装的很好,基本不需要我们其他操作。有些其他教程很复杂,这个脚本文件其实就是把原来教程的多项操作封装到了一起,这样我们执行起来就会方便很多。但是整个执行过程会持续5-6个小时,取决于网路速度和设备性能。

sudo bash BuildTensorFlow.sh -b r1.6

如果可以正确执行到最后那么安装就已经完成了,这当然是最好的。但是我在安装的过程中出现了一些错误,下面就我安装时出错的位置以及解决方案记录如下:


脚本文件分析:

首先我们打开执行的脚本文件BuildTensorFlow.sh,看看里面都有什么:

#!/bin/bash# Install Tensorflow (meant for Jetson TX# models)BRANCH=masterSWAPSIZE=8# Log the location this was run fromwhereami=$(pwd)install_dir=$whereami/TensorFlow_Installfunction usage{ echo "usage: sudo ./BuildTensorflow.sh [[-b branch ][-s swapsize][-d dir] | [-h]]" echo "-b | --branch <branchname> Github branch to clone, i.e r1.4 (default: master)" echo "-s | --swapsize <size> Size of swap file to create to assist building process in GB, i.e. 8" echo "-d | --dir <directory> Directory to download files and use for build process, default: pwd/TensorFlow_install" echo "-h | --help This message"}# Iterate through command line inputswhile [ "$1" != "" ]; do case $1 in -b | --branch ) shift BRANCH=$1 ;; -s | --swapsize ) shift SWAPSIZE=$1 ;; -d | --dir ) shift install_dir=$1 ;; -h | --help ) usage exit ;; * ) usage exit 1 esac shiftdone

这一段执行一般没什么问题,主要就是定义了安装文件路径和swap分区大小,以及命令行参数设置。

if [[ $EUID -ne 0 ]]; then echo "This script must be run as root, use sudo "$0" instead" 1>&2 exit 1fiecho "This bash script will install TensorFlow "echo "branch on a Jetson system that has been setup "echo "by Jetpack with CUDA and cuDNN already installed."echo " "echo "If this is not the case then this script will "echo "likely fail "echo " "echo "Expect this script to take up to 6+ hours "echo " "echo "Writen by: Jason Tichy < [email protected] > "echo "Version 1.0: Jan 3rd, 2018 "echo "Version 1.1: Mar 30, 2018 Added TensorRT support"echo " "echo "Note: TF v 1.7.0 release contains a bug for arm"echo "because of a hardcoded x86 path in the TensorRT"echo "Bazel script, you will need to use master to "echo "build with TensorRT support "sleep 5s # Sleep for 3 seconds# Regain CUDA in the PATH of rootexport CUDA_HOME=/usr/local/cudaexport PATH=${CUDA_HOME}/bin:${PATH}export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH# Set simple error failingset -e# Set Jetson Performance to Best{ #try the following command nvpmodel -m 0} || { #catch if failed echo "Performance function not available for this device"}# Update Repositoriesapt-get update# Create a directory to handle all of the installcd $HOMEif [ ! -d "$install_dir" ]; then mkdir $install_dirficd $install_dir# Install useful tools, # htop is nice graphical version of top, # ncdu is graphical disk utilization# mlocate contains the locate commandapt-get install htop ncdu mlocate -yupdatedb

这一段列印了一些描述信息,更新源文件库,设置了CUDA所需要使用环境变数,新建了安装文件夹TensorFlow_Install等等,一般也没什么问题。如果觉得apt-get update每次执行脚本文件耗时间比较长,可以在执行一次之后在脚本文件中将它注释掉。

########################################### Install Bazel ## Check if Bazel was already installed ###########################################{ # try in case youve run this beforebazel version} || { # catch## Install Prereqs for Bazel and Tensorflowapt-get install openjdk-8-jdk -yapt-get install zip unzip autoconf automake libtool curl zlib1g-dev maven -yapt-get install python-numpy swig python-dev python-pip python-wheel -yapt-get install python3-dev python3-pip python3-wheel python3-numpy -y# Go out and get Bazel 0.9wget --no-check-certificate https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-dist.zip# Unzip and install Bazelunzip bazel-0.11.1-dist.zip -d bazelchmod -R ug+rwx bazelcd bazel./compile.shcp output/bazel /usr/local/binchown -R $(whoami) /usr/local/bin# Cleanup and save disk spacecd $install_dirrm -r -f bazel-0.11.1-dist.ziprm -r -f bazel}

接下来就是安装bazel,Bazel 是 Google 的一款可再生的代码构建工具。它主要是用于构建 Google 的软体,处理出现在谷歌的开发环境的构建问题,比如说:大规模数据构建问题,共享代码库问题,从源代码构建的软体的相关问题。

这里首先要检查bazel是否已经安装,这样可以让我们重复执行脚本而不重复安装已经安装的软体。然后就是漫长的等待下载软体以及安装,这一步我在安装的时候没有什么问题。

########################################## Install Tensorflow########################################## Go out and get TensorFlow source codecd $install_dirif [ ! -d "tensorflow" ]; then git clone https://github.com/tensorflow/tensorflowficd tensorflow/git checkout $BRANCH

接下来是从GitHub上面下载TensorFlow 源码

在clone源码的过程中,可能出现以下错误

fatal: The remote end hung up unexpectedly

一般是因为git缓存不够或者网路不稳定,可以退出执行脚本文件,然后执行以下命令解决:

#增加git缓存区大小:git config --global http.postBuffer 2000000000#压缩配置git config --global core.compression -1 #修改配置文件export GIT_TRACE_PACKET=1export GIT_TRACE=1export GIT_CURL_VERBOSE=1

这里我因为git速度慢而且经常中途失败需要重新开始,卡了好久 (╯ ̄Д ̄)╯╘═╛

解决了这个问题之后,我们继续:

# Use the handy script to set Environment Variables# Otherwise, configure will be an interactive processsource $whereami/helperscript# Run the configure bash script (that runs configure.py)source ./configure# This goes much smoother with a swap space# Tensorflow will use more than 8 gigs of memory# while building itself#Create a swapfile for Ubuntu at the current directory locationif [ ! -f $install_dir/swapfile.swap ] && [ ! -z "$SWAPSIZE" ]; then echo "Creating Swap for build process" fallocate -l $SWAPSIZE"G" $install_dir/swapfile.swap chmod 600 $install_dir/swapfile.swap mkswap $install_dir/swapfile.swap fi{ #try swapon $install_dir/swapfile.swap} || { # catch echo "Looks like Swap not desired or is already in use"}

这一步是通过另一个脚本文件helperscript设置所有需要的环境变数,然后接著设置swap分区大小,这里设置8G,这一步执行也没有什么问题。

# Execute bazel to build TensorFlow, this takes a long timebazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package# Use build_pip_package to actually build the pip packagebazel-bin/tensorflow/tools/pip_package/build_pip_package $install_dir/tensorflow_pkg# Sleep the system to ensure the system can find the wheel filesleep 5schown $(whoami) $install_dir# Install the Tensorflow into Pythonfor entry in $install_dir/tensorflow_pkg/*; do pip install $entry#done

接下来就是最关键的几步,我们要使用之前安装的bazel编译TensorFlow 1.6的源码,然后生成轮子whl文件,接著用pip进行安装。

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

在执行这句代码的过程中出现了如下错误:

ERROR: Skipping //tensorflow/tools/pip_package:build_pip_package: error loading package tensorflow/tools/pip_package: Encountered error while reading extension file build_defs.bzl: no such package @local_config_tensorrt//: Traceback (most recent call last): File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 163 auto_configure_fail("TensorRT library (libnvinfer) v...") File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/gpus/cuda_configure.bzl", line 152, in auto_configure_fail fail(("
%sCuda Configuration Error:%...)))Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.WARNING: Target pattern parsing failed.ERROR: error loading package tensorflow/tools/pip_package: Encountered error while reading extension file build_defs.bzl: no such package @local_config_tensorrt//: Traceback (most recent call last): File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 163 auto_configure_fail("TensorRT library (libnvinfer) v...") File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/gpus/cuda_configure.bzl", line 152, in auto_configure_fail fail(("
%sCuda Configuration Error:%...)))Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.INFO: Elapsed time: 1.797sFAILED: Build did NOT complete successfully (0 packages loaded) currently loading: tensorflow/tools/pip_package

我的解决办法是退出执行脚本,然后到这个文件夹下(这里是相对路径,具体位置取决于你的安装位置) ???/TensorFlow_install/tensoflow,打开终端,执行下面的命令:

sudo bazel build --c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

这一步是编译生成build_pip_package,需要很长时间。

漫长等待过后如果没问题,我们可以在脚本文件BuildTensorFlow.sh中,将上面出错的bazel build语句注释掉,然后继续执行脚本文件。

sudo bash BuildTensorFlow.sh -b r1.6

继续执行,会生成tensorflow_pkg文件夹,里面就是我们想要的TensorFlow 1.6 whl安装文件。脚本文件会自动启动pip安装,这样我们就成功安装了tensorflow for python API

# Build the TensorFlow C++ API for funbazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.somkdir /usr/local/include/tfcp -r bazel-genfiles/ /usr/local/include/tf/cp -r tensorflow /usr/local/include/tf/cp -r third_party /usr/local/include/tf/cp -r bazel-bin/tensorflow/libtensorflow_cc.so /usr/local/lib/

接下来这段代码是编译安装TensorFlow for C++ API,因为一般很少人会用C++写神经网路,所以这一段可以直接注释掉不安装。

######################################## Clean up#######################################swapoff $install_dir/swapfile.swapswapoff -arm $install_dir/swapfile.swap# Back to where we came fromcd $whereami

最后这段是清理收尾工作,它会释放之前申请的swap分区,如果不释放的话,你的TX2的物理空间又会有8G左右一直不能使用。这一段代码执行时间也比较久,需要耐心等待。


测试

到此,我们就已经在Jetson TX2上面安装完成了TensorFlow 1.6,我们可以测试一下:

nvidia@tegra-ubuntu:~/Tensorflow-TX2/JetsonTFBuild$ python Python 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import tensorflow as tf>>> print(tf.__version__)1.6.0

可以看到,此时显示我们已经成功安装TensorFlow 1.6,下面就可以愉快的使用啦!

推荐阅读:

相关文章