1) GPU accelerated computing provides significant performance benefits over CPU-only systems, with NAMD simulations running faster on GPU systems in all tests.
2) GPU acceleration results in a large performance boost for a small additional price of the GPU hardware.
3) Energy usage is reduced by half when using GPU acceleration compared to CPU-only systems.
2. Summary/Conclusions
Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests
Large performance boost with small marginal price increase
Energy usage cut in half
GPUs scale very well within a node and over multiple nodes
Tesla K20 GPU is our fastest and lowest power high performance GPU to date
Try GPU accelerated NAMD for free ¨C www.nvidia.com/GPUTestDrive
3. Kepler - Our Fastest Family of GPUs Yet
4.50
ApoA1 Running NAMD version 2.9
4.00
4.00 The blue node contains Dual E5-2687W CPUs
3.57 (8 Cores per CPU).
3.45
3.50
The green nodes contain Dual E5-2687W CPUs (8
2.9x Cores per CPU) and either 1x NVIDIA M2090, 1x K10
3.00 or 1x K20 for the GPU
Nanoseconds/Day
2.63
2.6x
2.50
2.5x
2.00
1.50 1.37 1.9x
1.00
0.50
0.00
1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X
Apolipoprotein A1
M2090
GPU speedup/throughput increased from 1.9x (with M2090) to 2.9x (with K20X)
when compared to a CPU only node
3 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
4. Run NAMD 2.5x Faster with GPUs
3
Running NAMD 2.9 with CUDA 4.0 ECC Off
2.7
2.6
The blue node contains 2x Intel E5-2687W CPUs
2.5 2.4
(8 Cores per CPU)
Speedup Compared to CPU Only
Each green node contains 2x Intel E5-2687W
2 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs
1.5
1
0.5
0
CPU All Molecules ApoA1 F1-ATPase STMV
Apolipoprotein A1
Gain 2.5x throughput/performance by adding just 1 GPU
when compared to dual CPU performance
4 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
5. Kepler ¨C Universally Faster
6
Running NAMD version 2.9
The CPU Only node contains Dual E5-2687W CPUs
5 (8 Cores per CPU).
Speedup Compared to CPU Only
5.1x The Kepler nodes contain Dual E5-2687W CPUs (8
4 4.7x Cores per CPU) and 1 or two NVIDIA K10, K20, or
K20X GPUs.
4.3x
F1-ATPase
3
ApoA1
STMV
2.9x
2
2.6x
2.4x
1
0
CPU Only 1x K10 1x K20 1x K20X 2x K10 2x K20 2x K20X
F1-ATPase
| Kepler nodes use Dual CPUs |
The Kepler GPUs accelerate all simulations, up to 5x
Average acceleration printed in bars
6. Outstanding Strong Scaling with Multi-STMV
Running NAMD version 2.9
Each blue XE6 CPU node contains 1x AMD
100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU).
1.2
Fermi XK6 Each green XK6 CPU+GPU node contains
1x AMD 1600 Opteron (16 Cores per CPU)
1 and an additional 1x NVIDIA X2090 GPU.
CPU XK6
2.7x
Nanoseconds / Day
0.8
2.9x
0.6
0.4
0.2
3.6x
3.8x Concatenation of 100
0 Satellite Tobacco Mosaic Virus
32 64 128 256 512 640 768
# of Nodes
Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
7. Replace 3 Nodes with 1 2090 GPU
Running NAMD version 2.9
Each blue node contains 2x Intel Xeon X5550 CPUs
F1-ATPase (4 Cores, $1000 per CPU).
4 CPU Nodes
0.8 9000
0.74 The green node contains 2x Intel Xeon X5550 CPUs
$8,000
1 CPU Node +8000 (4 Cores, $1000 per CPU) and 1x NVIDIA M2090 GPU
0.7 1x M2090 GPUs
0.63 ($2000 each)
7000
0.6
6000
0.5
5000
0.4 $4,000
4000
0.3
3000
0.2
2000
0.1 1000
0 0 F1-ATPase
Nanoseconds/Day Cost
Speedup of 1.2x for 50% the cost
8. K20 - Greener: Twice The Science Per Watt
1200000
Energy Used in Simulating 1 Nanosecond of ApoA1
Running NAMD version 2.9
1000000 Each blue node contains Dual E5-2687W
CPUs (95W, 4 Cores per CPU).
Each green node contains 2x Intel Xeon X5550
Energy Expended (kJ)
800000
CPUs (95W, 4 Cores per CPU) and 2x NVIDIA
Lower is better K20 GPUs (225W per GPU)
600000
Energy Expended
400000
= Power x Time
200000
0
1 Node 1 Node + 2x K20
Cut down energy usage by ? with GPUs
8 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
9. Kepler - Greener: Twice The Science/Joule
Energy used in simulating 1 ns of SMTV
250000
Running NAMD version 2.9
The blue node contains Dual E5-2687W CPUs
200000 (150W each, 8 Cores per CPU).
Energy Expended (kJ)
Lower is better The green nodes contain Dual E5-2687W CPUs
(8 Cores per CPU) and 2x NVIDIA K10, K20, or
150000
K20X GPUs (235W each).
Energy Expended
100000
= Power x Time
50000
0
CPU Only CPU + 2 K10s CPU + 2 K20s CPU + 2 K20Xs
Cut down energy usage by ? with GPUs
Satellite Tobacco Mosaic Virus
10. Recommended GPU Node Configuration for
NAMD Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
Kepler K10, K20, K20X
GPUs
Fermi M2090, M2075, C2075
# of GPUs per CPU socket 1-2
GPU memory preference (GB) 6
GPU to CPU connection PCIe 2.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand
10 Scale to multiple nodes with same single node configuration NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
11. Summary/Conclusions
Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests
Large performance boost with small marginal price increase
Energy usage cut in half
GPUs scale very well within a node and over multiple nodes
Tesla K20 GPU is our fastest and lowest power high performance GPU to date
Try GPU accelerated NAMD for free ¨C www.nvidia.com/GPUTestDrive
11 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012