TensorFlow 指南：GPU 的使用

大家好，今天我們來給講講關於 TensorFlow 在 GPU 中的使用規則。

支持的設備

在一套標準系統中通常有多臺計算設備。TensorFlow 支持 CPU 和 GPU 這兩種設備。它們均用 strings 表示。例如：

"/cpu:0"：機器的 CPU
"/device:GPU:0"：機器的 GPU（如果有一個）
"/device:GPU:1"：機器的第二個 GPU（以此類推）

如果 TensorFlow 指令中兼有 CPU 和 GPU 實現，當該指令分配到設備時，GPU 設備有優先權。例如，如果 matmul 同時存在 CPU 和 GPU 核函數，在同時有 cpu:0 和 gpu:0 設備的系統中，gpu:0 會被選來運行 matmul。

記錄設備分配方式

要找出您的指令和張量被分配到哪個設備，請創建會話並將 log_device_placement 配置選項設為 True。

# Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b) c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(c))

您應該會看到以下輸出內容：

Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus id: 0000:05:00.0 b: /job:localhost/replica:0/task:0/device:GPU:0 a: /job:localhost/replica:0/task:0/device:GPU:0 MatMul: /job:localhost/replica:0/task:0/device:GPU:0 [[ 22. 28.] [ 49. 64.]]

手動分配設備

如果您希望特定指令在您選擇的設備（而非系統自動為您選擇的設備）上運行，您可以使用 with tf.device 創建設備上下文，這個上下文中的所有指令都將被分配在同一個設備上運行。

# Creates a graph. with tf.device(/cpu:0): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b) c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(c))

您會看到現在 a 和 b 被分配到 cpu:0。由於未明確指定運行 MatMul 指令的設備，因此 TensorFlow 運行時將根據指令和可用設備（此示例中的 gpu:0）選擇一個設備，並會根據要求自動複製設備間的張量。

Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus id: 0000:05:00.0 b: /job:localhost/replica:0/task:0/cpu:0 a: /job:localhost/replica:0/task:0/cpu:0 MatMul: /job:localhost/replica:0/task:0/device:GPU:0 [[ 22. 28.] [ 49. 64.]]

允許增加 GPU 內存

默認情況下，TensorFlow 會映射進程可見的所有 GPU 的幾乎所有 GPU 內存（取決於 CUDA_VISIBLE_DEVICES）。通過減少內存碎片，可以更有效地使用設備上相對寶貴的 GPU 內存資源。

在某些情況下，最理想的是進程只分配可用內存的一個子集，或者僅根據進程需要增加內存使用量。TensorFlow 在 Session 上提供兩個 Config 選項來進行控制。

第一個是 allow_growth 選項，它試圖根據運行時的需要來分配 GPU 內存：它剛開始分配很少的內存，隨著 Session 開始運行並需要更多 GPU 內存，我們會擴展 TensorFlow 進程所需的 GPU 內存區域。請注意，我們不會釋放內存，因為這可能導致出現更嚴重的內存碎片情況。要開啟此選項，請通過以下方式在 ConfigProto 中設置選項：

config = tf.ConfigProto() config.gpu_options.allow_growth = True session = tf.Session(config=config, ...)

如要真正限制 TensorFlow 進程可使用的 GPU 內存量，這非常實用。

在多 GPU 系統中使用單一 GPU

如果您的系統中有多個 GPU，則默認情況下將選擇 ID 最小的 GPU。如果您希望在其他 GPU 上運行，則需要顯式指定偏好設置：

# Creates a graph. with tf.device(/device:GPU:2): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b) c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(c))

如果您指定的設備不存在，您會看到 InvalidArgumentError：

InvalidArgumentError: Invalid argument: Cannot assign a device to node b: Could not satisfy explicit device specification /device:GPU:2 [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2] values: 1 2 3...>, _device="/device:GPU:2"]()]]

當指定設備不存在時，如果您希望 TensorFlow 自動選擇現有的受支持設備來運行指令，則可以在創建會話時將配置選項中的 allow_soft_placement 設為 True。

# Creates a graph. with tf.device(/device:GPU:2): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b) c = tf.matmul(a, b) # Creates a session with allow_soft_placement and log_device_placement set # to True. sess = tf.Session(config=tf.ConfigProto( allow_soft_placement=True, log_device_placement=True)) # Runs the op. print(sess.run(c))

使用多個 GPU

如果您想要在多個 GPU 上運行 TensorFlow，則可以採用多塔式方式構建模型，其中每個塔都會分配給不同 GPU。例如：

# Creates a graph. c = [] for d in [/device:GPU:2, /device:GPU:3]: with tf.device(d): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) c.append(tf.matmul(a, b)) with tf.device(/cpu:0): sum = tf.add_n(c) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(sum))

您會看到以下輸出內容：

Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus id: 0000:03:00.0 /job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus id: 0000:83:00.0 /job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus id: 0000:84:00.0 Const_3: /job:localhost/replica:0/task:0/device:GPU:3 Const_2: /job:localhost/replica:0/task:0/device:GPU:3 MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3 Const_1: /job:localhost/replica:0/task:0/device:GPU:2 Const: /job:localhost/replica:0/task:0/device:GPU:2 MatMul: /job:localhost/replica:0/task:0/device:GPU:2 AddN: /job:localhost/replica:0/task:0/cpu:0 [[ 44. 56.] [ 98. 128.]]

TensorFlow 指南：GPU 的使用

熱門新聞

週熱門

TensorFlow 指南：GPU 的使用

PN-28: Sub-policy Adaptation for HRL (arXiv 1906)

重磅！圖像分類相關文獻/代碼大列表

輕量型網路：MixNet解讀

強化學習之Q-Learning

【學習筆記】cs231n中assignment1中的k-Nearest Neighbor (kNN) exercise

【學習筆記】cs231n中assignment2的dropout

詳解PyTorch中的ModuleList和Sequential

人工智慧ai演算法、深度學習、機器學習、自然語言處理工程師崗位

F-Principle：初探理解深度學習不能做什麼

換臉視頻後AI又出偏門應用：用演算法「脫」女性衣服

【KDD18最佳論文揭曉】中科大等斬獲最佳學生論文

用小樣本數據集構建強大的圖像分類模型

「知識星球」網路結構1000變上線，下半年更新500+網路模型解讀

EasyDL終於讓「人工智慧」的「人工」部分智能了

伯克利Deep Reinforcement Learning-1

熱門新聞

週熱門