[工作筆記] GPU 和 CUDA 的軟硬體基本概念
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gc3551568f691baad5fb776b7656ecc05
硬體方面
SP, WARP, SM, TPC
程式方面
Thread, block, grid, kernel
這篇講得非常清楚
https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/36197/
這篇的硬體更深入
https://www.cnblogs.com/effulgent/archive/2009/01/14/1375845.html
這篇CUDA程式的教學,有一些範例說明如何利用記憶體的存取加速平行運算。
https://sites.google.com/a/kimicat.com/www/%E7%AC%AC%E4%B8%80%E5%80%8Bcuda%E7%A8%8B%E5%BC%8F
影響平行運算的效率的因素:
thread和block的分配
在GPU內記憶體的存取位置
幾個重點:
thread 個數通常要是 32 的倍數
還不太懂GPU 共享記憶體的存取機制和如何比較快
GPU 的記憶體分成 registers, shared, local, constant, texture, Global
https://www.jianshu.com/p/3d4c9cc3a777
__constant__ --> constant memory
其他 CUDA 的指令 doc
=======================
Lab 的 GPU server
020-09-27 21:57:38.889179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.77GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-09-27 21:57:38.889645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:b3:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.77GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
(對,我居然為了看 GPU 規格去裝 tensorflow,因為我用 nvidia-smi 被碼住了
$ nvidia-smi
Sun Sep 27 21:37:10 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01 Driver Version: 440.95.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:65:00.0 Off | N/A |
| 0% 53C P8 31W / 300W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... On | 00000000:B3:00.0 Off | N/A |
| 0% 42C P8 11W / 300W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
留言
張貼留言