记录一下CUDA的学习

检查显卡的各项参数:

以我的为例，我的显卡参数如下：

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3080"
  Total amount of global memory:                 10240 MBytes (10736893952 bytes)
  GPU Max Clock rate:                            1710 MHz (1.71 GHz)
  L2 Cache Size:                                 5242880 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a block size (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size  (x,y,z): (2147483647, 65535, 65535)

所有的代码片段都是在该GPU上进行测试：采用NVIDIA Ampere架构，拥有8704个CUDA核心，核心加速频率1.71GHz，搭载10GB GDDR6X显存，整卡功耗为320W。

这些参数是关于CUDA（Compute Unified Device Architecture）编程环境中，NVIDIA GPU（显卡）的硬件规格和能力的详细信息。下面逐一解释这些参数的意义（感谢GPT）：

Detected 1 CUDA Capable device(s)
表示系统检测到了一块支持CUDA计算的设备（即GPU）。
Device 0: "NVIDIA GeForce RTX 3080"
表示检测到的设备是NVIDIA GeForce RTX 3080，这也是当前使用的GPU型号。
Total amount of global memory: 10240 MBytes (10736893952 bytes)
GPU的全局内存总量为10240 MB（即约10 GB）。全局内存是GPU用来存储数据的主要内存。
GPU Max Clock rate: 1710 MHz (1.71 GHz)
GPU的最大时钟频率为1710 MHz（即1.71 GHz）。时钟频率越高，GPU每秒钟可以执行的操作数就越多。
L2 Cache Size: 5242880 bytes
GPU的二级缓存（L2 Cache）大小为5242880字节（约5 MB）。L2缓存是用来加速数据访问的小型高速缓存。
Total amount of shared memory per block: 49152 bytes
每个CUDA块可使用的共享内存总量为49152字节（约48 KB）。共享内存是CUDA线程块内的线程之间共享的高速内存。
Total shared memory per multiprocessor: 102400 bytes
每个多处理器（Streaming Multiprocessor, SM）可使用的共享内存总量为102400字节（约100 KB）。
Total number of registers available per block: 65536
每个CUDA块可用的寄存器数量为65536。寄存器是最靠近计算单元的高速存储，用于存放线程操作所需的数据。
Warp size: 32
每个Warp（即CUDA中的一组线程）包含32个线程。Warp是CUDA调度线程的基本单位。
Maximum number of threads per multiprocessor: 1536
每个多处理器（SM）最大可运行1536个线程。更多的线程意味着能够处理更多的数据并行计算。
Maximum number of threads per block: 1024
每个CUDA块的最大线程数为1024。块中的线程越多，可以利用更多的并行计算能力。
Max dimension size of a block size (x,y,z): (1024, 1024, 64)
每个块的最大维度大小分别为：X方向1024，Y方向1024，Z方向64。这表示块的维度大小限制。
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
每个CUDA网格的最大维度大小分别为：X方向2147483647，Y方向65535，Z方向65535。网格由多个块组成，这是网格中块的维度大小限制。

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
03_vectorAdd		03_vectorAdd
05_reduce		05_reduce
00_helloCuda.cu		00_helloCuda.cu
01_helloDevice.cu		01_helloDevice.cu
02_scalarAdd.cu		02_scalarAdd.cu
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

记录一下CUDA的学习

About

Releases

Packages

Languages

License

ccc-fire/cudaCourse

Folders and files

Latest commit

History

Repository files navigation

记录一下CUDA的学习

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages