Jetson TX2是NVIDIA推出的一款基於 NVIDIA Pascal? 架構的 AI 單模塊超級計算機,性能強大(1 TFLOPS),外形小巧,節能高效(7.5W),非常適合無人車,機器人、智能攝像機和便攜醫療設備等智能終端設備。

TensorFlow是Google推出的一款開源深度學習框架,這篇文章主要介紹如何在Jetson TX2上通過源碼構建 TensorFlow 1.6,構建需要佔用大約10GB的可用空間,整個安裝過程比較漫長,可能持續5-6個小時,而且可能會有一些意想不到的問題存在。


環境

因為安裝環境比較重要,這裡先說明我所使用的TX2的安裝環境是新版的Jetpack 3.2, ,具體環境如下:

| L4T | 28.2 || CUDA Toolkit | 9.0 || cuDNN | 7.0.5 |

這裡我建議將Jetson TX2重新刷機到Jetpack 3.2,因為環境不同的情況下安裝可能會出現問題,更新為Jetpack3.2 可以參考我的另一篇文章:

Coulson:NVIDIA Jetson TX2 刷機 Jetpack 3.2 教程?

zhuanlan.zhihu.com
圖標

安裝

首先,安裝TensorFlow的腳本文件位於JetsonTFBuild,我們先clone下來腳本文件:

git clone https://github.com/JasonAtNvidia/JetsonTFBuild.gitcd JetsonTFBuild

這裡,下載好的腳本文本之後,執行構建Tensorflow 1.6的命令。因為腳本文件封裝的很好,基本不需要我們其他操作。有些其他教程很複雜,這個腳本文件其實就是把原來教程的多項操作封裝到了一起,這樣我們執行起來就會方便很多。但是整個執行過程會持續5-6個小時,取決於網路速度和設備性能。

sudo bash BuildTensorFlow.sh -b r1.6

如果可以正確執行到最後那麼安裝就已經完成了,這當然是最好的。但是我在安裝的過程中出現了一些錯誤,下面就我安裝時出錯的位置以及解決方案記錄如下:


腳本文件分析:

首先我們打開執行的腳本文件BuildTensorFlow.sh,看看裡面都有什麼:

#!/bin/bash# Install Tensorflow (meant for Jetson TX# models)BRANCH=masterSWAPSIZE=8# Log the location this was run fromwhereami=$(pwd)install_dir=$whereami/TensorFlow_Installfunction usage{ echo "usage: sudo ./BuildTensorflow.sh [[-b branch ][-s swapsize][-d dir] | [-h]]" echo "-b | --branch <branchname> Github branch to clone, i.e r1.4 (default: master)" echo "-s | --swapsize <size> Size of swap file to create to assist building process in GB, i.e. 8" echo "-d | --dir <directory> Directory to download files and use for build process, default: pwd/TensorFlow_install" echo "-h | --help This message"}# Iterate through command line inputswhile [ "$1" != "" ]; do case $1 in -b | --branch ) shift BRANCH=$1 ;; -s | --swapsize ) shift SWAPSIZE=$1 ;; -d | --dir ) shift install_dir=$1 ;; -h | --help ) usage exit ;; * ) usage exit 1 esac shiftdone

這一段執行一般沒什麼問題,主要就是定義了安裝文件路徑和swap分區大小,以及命令行參數設置。

if [[ $EUID -ne 0 ]]; then echo "This script must be run as root, use sudo "$0" instead" 1>&2 exit 1fiecho "This bash script will install TensorFlow "echo "branch on a Jetson system that has been setup "echo "by Jetpack with CUDA and cuDNN already installed."echo " "echo "If this is not the case then this script will "echo "likely fail "echo " "echo "Expect this script to take up to 6+ hours "echo " "echo "Writen by: Jason Tichy < [email protected] > "echo "Version 1.0: Jan 3rd, 2018 "echo "Version 1.1: Mar 30, 2018 Added TensorRT support"echo " "echo "Note: TF v 1.7.0 release contains a bug for arm"echo "because of a hardcoded x86 path in the TensorRT"echo "Bazel script, you will need to use master to "echo "build with TensorRT support "sleep 5s # Sleep for 3 seconds# Regain CUDA in the PATH of rootexport CUDA_HOME=/usr/local/cudaexport PATH=${CUDA_HOME}/bin:${PATH}export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH# Set simple error failingset -e# Set Jetson Performance to Best{ #try the following command nvpmodel -m 0} || { #catch if failed echo "Performance function not available for this device"}# Update Repositoriesapt-get update# Create a directory to handle all of the installcd $HOMEif [ ! -d "$install_dir" ]; then mkdir $install_dirficd $install_dir# Install useful tools, # htop is nice graphical version of top, # ncdu is graphical disk utilization# mlocate contains the locate commandapt-get install htop ncdu mlocate -yupdatedb

這一段列印了一些描述信息,更新源文件庫,設置了CUDA所需要使用環境變數,新建了安裝文件夾TensorFlow_Install等等,一般也沒什麼問題。如果覺得apt-get update每次執行腳本文件耗時間比較長,可以在執行一次之後在腳本文件中將它注釋掉。

########################################### Install Bazel ## Check if Bazel was already installed ###########################################{ # try in case youve run this beforebazel version} || { # catch## Install Prereqs for Bazel and Tensorflowapt-get install openjdk-8-jdk -yapt-get install zip unzip autoconf automake libtool curl zlib1g-dev maven -yapt-get install python-numpy swig python-dev python-pip python-wheel -yapt-get install python3-dev python3-pip python3-wheel python3-numpy -y# Go out and get Bazel 0.9wget --no-check-certificate https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-dist.zip# Unzip and install Bazelunzip bazel-0.11.1-dist.zip -d bazelchmod -R ug+rwx bazelcd bazel./compile.shcp output/bazel /usr/local/binchown -R $(whoami) /usr/local/bin# Cleanup and save disk spacecd $install_dirrm -r -f bazel-0.11.1-dist.ziprm -r -f bazel}

接下來就是安裝bazel,Bazel 是 Google 的一款可再生的代碼構建工具。它主要是用於構建 Google 的軟體,處理出現在谷歌的開發環境的構建問題,比如說:大規模數據構建問題,共享代碼庫問題,從源代碼構建的軟體的相關問題。

這裡首先要檢查bazel是否已經安裝,這樣可以讓我們重複執行腳本而不重複安裝已經安裝的軟體。然後就是漫長的等待下載軟體以及安裝,這一步我在安裝的時候沒有什麼問題。

########################################## Install Tensorflow########################################## Go out and get TensorFlow source codecd $install_dirif [ ! -d "tensorflow" ]; then git clone https://github.com/tensorflow/tensorflowficd tensorflow/git checkout $BRANCH

接下來是從GitHub上面下載TensorFlow 源碼

在clone源碼的過程中,可能出現以下錯誤

fatal: The remote end hung up unexpectedly

一般是因為git緩存不夠或者網路不穩定,可以退出執行腳本文件,然後執行以下命令解決:

#增加git緩存區大小:git config --global http.postBuffer 2000000000#壓縮配置git config --global core.compression -1 #修改配置文件export GIT_TRACE_PACKET=1export GIT_TRACE=1export GIT_CURL_VERBOSE=1

這裡我因為git速度慢而且經常中途失敗需要重新開始,卡了好久 (╯ ̄Д ̄)╯╘═╛

解決了這個問題之後,我們繼續:

# Use the handy script to set Environment Variables# Otherwise, configure will be an interactive processsource $whereami/helperscript# Run the configure bash script (that runs configure.py)source ./configure# This goes much smoother with a swap space# Tensorflow will use more than 8 gigs of memory# while building itself#Create a swapfile for Ubuntu at the current directory locationif [ ! -f $install_dir/swapfile.swap ] && [ ! -z "$SWAPSIZE" ]; then echo "Creating Swap for build process" fallocate -l $SWAPSIZE"G" $install_dir/swapfile.swap chmod 600 $install_dir/swapfile.swap mkswap $install_dir/swapfile.swap fi{ #try swapon $install_dir/swapfile.swap} || { # catch echo "Looks like Swap not desired or is already in use"}

這一步是通過另一個腳本文件helperscript設置所有需要的環境變數,然後接著設置swap分區大小,這裡設置8G,這一步執行也沒有什麼問題。

# Execute bazel to build TensorFlow, this takes a long timebazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package# Use build_pip_package to actually build the pip packagebazel-bin/tensorflow/tools/pip_package/build_pip_package $install_dir/tensorflow_pkg# Sleep the system to ensure the system can find the wheel filesleep 5schown $(whoami) $install_dir# Install the Tensorflow into Pythonfor entry in $install_dir/tensorflow_pkg/*; do pip install $entry#done

接下來就是最關鍵的幾步,我們要使用之前安裝的bazel編譯TensorFlow 1.6的源碼,然後生成輪子whl文件,接著用pip進行安裝。

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

在執行這句代碼的過程中出現瞭如下錯誤:

ERROR: Skipping //tensorflow/tools/pip_package:build_pip_package: error loading package tensorflow/tools/pip_package: Encountered error while reading extension file build_defs.bzl: no such package @local_config_tensorrt//: Traceback (most recent call last): File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 163 auto_configure_fail("TensorRT library (libnvinfer) v...") File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/gpus/cuda_configure.bzl", line 152, in auto_configure_fail fail(("
%sCuda Configuration Error:%...)))Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.WARNING: Target pattern parsing failed.ERROR: error loading package tensorflow/tools/pip_package: Encountered error while reading extension file build_defs.bzl: no such package @local_config_tensorrt//: Traceback (most recent call last): File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/tensorrt/tensorrt_configure.bzl", line 163 auto_configure_fail("TensorRT library (libnvinfer) v...") File "/home/nvidia/Tensorflow-TX2/JetsonTFBuild/TensorFlow_Install/tensorflow/third_party/gpus/cuda_configure.bzl", line 152, in auto_configure_fail fail(("
%sCuda Configuration Error:%...)))Cuda Configuration Error: TensorRT library (libnvinfer) version is not set.INFO: Elapsed time: 1.797sFAILED: Build did NOT complete successfully (0 packages loaded) currently loading: tensorflow/tools/pip_package

我的解決辦法是退出執行腳本,然後到這個文件夾下(這裡是相對路徑,具體位置取決於你的安裝位置) ???/TensorFlow_install/tensoflow,打開終端,執行下面的命令:

sudo bazel build --c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

這一步是編譯生成build_pip_package,需要很長時間。

漫長等待過後如果沒問題,我們可以在腳本文件BuildTensorFlow.sh中,將上面出錯的bazel build語句注釋掉,然後繼續執行腳本文件。

sudo bash BuildTensorFlow.sh -b r1.6

繼續執行,會生成tensorflow_pkg文件夾,裡面就是我們想要的TensorFlow 1.6 whl安裝文件。腳本文件會自動啟動pip安裝,這樣我們就成功安裝了tensorflow for python API

# Build the TensorFlow C++ API for funbazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.somkdir /usr/local/include/tfcp -r bazel-genfiles/ /usr/local/include/tf/cp -r tensorflow /usr/local/include/tf/cp -r third_party /usr/local/include/tf/cp -r bazel-bin/tensorflow/libtensorflow_cc.so /usr/local/lib/

接下來這段代碼是編譯安裝TensorFlow for C++ API,因為一般很少人會用C++寫神經網路,所以這一段可以直接注釋掉不安裝。

######################################## Clean up#######################################swapoff $install_dir/swapfile.swapswapoff -arm $install_dir/swapfile.swap# Back to where we came fromcd $whereami

最後這段是清理收尾工作,它會釋放之前申請的swap分區,如果不釋放的話,你的TX2的物理空間又會有8G左右一直不能使用。這一段代碼執行時間也比較久,需要耐心等待。


測試

到此,我們就已經在Jetson TX2上面安裝完成了TensorFlow 1.6,我們可以測試一下:

nvidia@tegra-ubuntu:~/Tensorflow-TX2/JetsonTFBuild$ python Python 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import tensorflow as tf>>> print(tf.__version__)1.6.0

可以看到,此時顯示我們已經成功安裝TensorFlow 1.6,下面就可以愉快的使用啦!

推薦閱讀:

相关文章