21. NVIDIA cuBLAS NVIDIA cuRAND NVIDIA cuSPARSE NVIDIA NPP
Vector Signal GPU Accelerated Matrix Algebra on
Image Processing Linear Algebra GPU and Multicore NVIDIA cuFFT
Sparse Linear Building-block C++ STL Features
IMSL Library Algebra Algorithms for CUDA for CUDA
亳弍仍亳仂亠从亳 亟仍 GPU
Copy-paste 亟仍 从仂亠仆亳 仗亳仍仂亢亠仆亳亶
21
22. 亳亠从亳于 OpenACC
CPU GPU
仂亠 从舒亰舒亠仍亳 亟仍
从仂仄仗亳仍仂舒
Program myscience
... serial code ...
!$acc kernels 仂仄仗亳仍仂 仗舒舒仍仍亠仍亳亰亠
从仂亟
do k = 1,n1
do i = 1,n2
OpenACC 仄亠从亳
... parallel code ... 亟仍 从仂仄仗亳仍仂舒
enddo
舒弍仂舒亠 仆舒 仄仆仂亞仂磲亠仆
enddo
!$acc end kernels
...
End Program myscience CPU 亳 仄舒亳于仆仂
仂亟仆亶 从仂亟 仗舒舒仍仍亠仍仆 GPU
仆舒 C/Fortran 22
24. 仂从仂仗 仗仂 OpenACC
于 仗亠从仂仄仗ム亠仆仂仄 亠仆亠 亳弍亞舒
从仂仆 于仂仂亞仂 亟仆
仗仂仍亠仆仂 10-从舒仆仂亠 从仂亠仆亳亠 仂亟仆仂亞仂 亳亰 舒仄仂亠仆 磲亠
6 亟亳亠从亳于
Technology Director
National Center for Atmospheric
Research (NCAR)
24
25. 仂亟亟亠亢从舒 磶从仂于 C, C++, Fortran 仄仂亟亠仍
仗舒舒仍仍亠仍仆仂亞仂 仗仂亞舒仄仄亳仂于舒仆亳 CUDA
GPU Computing Applications
Libraries and Middleware
cuFFT PhysX
LAPACK NPP VSIPL iray
cuBLAS Video MATLAB
CULA cuDPP SVM Rendering
cuRAND OptiX Ray Mathematica
MAGMA Thrust OpenCurrent RealityServer
cuSPARSE tracing
Java
Python Direct
C++ C Fortran OpenCL tm
Wrappers Compute
NVIDIA GPU
CUDA Parallel Computing Architecture
OpenCL is trademark of Apple Inc. used under license to the Khronos Group25
Inc.
26. C 亟仍 CUDA : C + 束亳仆舒从亳亠从亳亶 舒舒損
void saxpy_serial(int n, float a, float *x, float *y)
{
for (int i = 0; i < n; ++i)
y[i] = a*x[i] + y[i];
} 弌舒仆亟舒仆亶 从仂亟 C
// Invoke serial SAXPY kernel
saxpy_serial(n, 2.0, x, y);
__global__ void saxpy_parallel(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
舒舒仍仍亠仍仆亶 从仂亟 C
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
26
27. NVIDIA 亟亠仍舒亠 仗仍舒仂仄 CUDA 仂从仂亶 弍仍舒亞仂亟舒 LLVM
CUDA 仂亟亟亠亢从舒
CUDA 弍从亠仆亟 亠仗亠 亟仂仗亠仆 亟仍 LLVM C, C++, Fortran 仆仂于 磶从仂于
从仂仄仗亳仍仂舒
SDK 于从仍ム舒亠 亟仂从仄亠仆舒亳, 仗亳仄亠 亳
于亠亳亳从舒仂 LLVM 从仂仄仗亳仍仂
亟仍 CUDA
仂亰仄仂亢仆仂 亟仂弍舒于仍亠仆亳
仗仂亟亟亠亢从亳 CUDA 于 仆仂于亠 磶从亳 亳
仗仂亠仂 NVIDIA x86 仂亟亟亠亢从舒
GPUs CPUs 仂于 仗仂亠仂仂于
仂亟仂弍仆仂亳
http://developer.nvidia.com/cuda-source
27
28. Kepler: 于仗亠于亠 仗仂仍仆仂亠仆仆舒 仗仂亟亟亠亢从舒 GPUDirect
System System
Memory GDDR5 GDDR5 GDDR5 GDDR5 Memory
Memory Memory Memory Memory
CPU GPU1 GPU2 GPU2 GPU1 CPU
PCI-e PCI-e
Network Network Network
Card Card
弌亠于亠 1 弌亠于亠 2
28
29. CUDA 于 亳舒:
>375,000,000 CUDA GPU 仆舒 仆从亠
>1,000,000 从舒亳于舒仆亳亶 SDK
>120,000 舒从亳于仆 舒亰舒弍仂亳从仂于
>500 仆亳于亠亳亠仂于 仗亠仗仂亟舒ム CUDA
29
31. CUDA 亟仍 ARM
仍亠亟仂于舒亠仍从舒 仗仍舒仂仄舒
CUDA GPU Tegra ARM CPU 4- 磲亠仆亶 仗仂亠仂
NVIDIA Tegra 3 仆舒 弍舒亰亠 ARM
NVIDIA CUDA GPU
Gbit 亠
舒弍仂 亟仍 舒亰舒弍仂亳从仂于 CUDA SDK
http://www.secoqseven.com/en/item/secocq7-mxm/
仂仗仆仂 亠亶舒
31