�ݺ�ߣ

Summary/Conclusions
Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests

Large performance boost with small marginal price increase

Energy usage cut in half

GPUs scale very well within a node and over multiple nodes

Tesla K20 GPU is our fastest and lowest power high performance GPU to date

Try GPU accelerated NAMD for free �C www.nvidia.com/GPUTestDrive

Kepler - Our Fastest Family of GPUs Yet
4.50
ApoA1 Running NAMD version 2.9
4.00
4.00 The blue node contains Dual E5-2687W CPUs
3.57 (8 Cores per CPU).
3.45
3.50
The green nodes contain Dual E5-2687W CPUs (8
2.9x Cores per CPU) and either 1x NVIDIA M2090, 1x K10
3.00 or 1x K20 for the GPU
Nanoseconds/Day

2.63
2.6x
2.50

2.5x
2.00

1.50 1.37 1.9x

1.00

0.50

0.00
1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X
Apolipoprotein A1
M2090

GPU speedup/throughput increased from 1.9x (with M2090) to 2.9x (with K20X)
when compared to a CPU only node
3 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Run NAMD 2.5x Faster with GPUs
3
Running NAMD 2.9 with CUDA 4.0 ECC Off
2.7
2.6
The blue node contains 2x Intel E5-2687W CPUs
2.5 2.4
(8 Cores per CPU)
Speedup Compared to CPU Only

Each green node contains 2x Intel E5-2687W
2 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs

1.5

1

0.5

0
CPU All Molecules ApoA1 F1-ATPase STMV
Apolipoprotein A1

Gain 2.5x throughput/performance by adding just 1 GPU
when compared to dual CPU performance


Kepler �C Universally Faster
6
Running NAMD version 2.9

The CPU Only node contains Dual E5-2687W CPUs
5 (8 Cores per CPU).
Speedup Compared to CPU Only

5.1x The Kepler nodes contain Dual E5-2687W CPUs (8
4 4.7x Cores per CPU) and 1 or two NVIDIA K10, K20, or
K20X GPUs.
4.3x
F1-ATPase
3
ApoA1
STMV
2.9x
2
2.6x
2.4x

1

0
CPU Only 1x K10 1x K20 1x K20X 2x K10 2x K20 2x K20X
F1-ATPase
| Kepler nodes use Dual CPUs |

The Kepler GPUs accelerate all simulations, up to 5x
Average acceleration printed in bars

Outstanding Strong Scaling with Multi-STMV
Each blue XE6 CPU node contains 1x AMD
100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU).
1.2

Fermi XK6 Each green XK6 CPU+GPU node contains
1x AMD 1600 Opteron (16 Cores per CPU)
1 and an additional 1x NVIDIA X2090 GPU.
CPU XK6
2.7x
Nanoseconds / Day

0.8

2.9x
0.6

0.4

0.2
3.6x
3.8x Concatenation of 100
0 Satellite Tobacco Mosaic Virus
32 64 128 256 512 640 768
# of Nodes

Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers

Replace 3 Nodes with 1 2090 GPU
Each blue node contains 2x Intel Xeon X5550 CPUs
F1-ATPase (4 Cores, $1000 per CPU).
4 CPU Nodes
0.8 9000
0.74 The green node contains 2x Intel Xeon X5550 CPUs
$8,000
1 CPU Node +8000 (4 Cores, $1000 per CPU) and 1x NVIDIA M2090 GPU
0.7 1x M2090 GPUs
0.63 ($2000 each)
7000
0.6
6000
0.5
5000
0.4 $4,000
4000
0.3
3000
0.2
2000

0.1 1000

0 0 F1-ATPase
Nanoseconds/Day Cost

Speedup of 1.2x for 50% the cost

K20 - Greener: Twice The Science Per Watt
1200000
Energy Used in Simulating 1 Nanosecond of ApoA1
1000000 Each blue node contains Dual E5-2687W
CPUs (95W, 4 Cores per CPU).

Each green node contains 2x Intel Xeon X5550
Energy Expended (kJ)

800000
CPUs (95W, 4 Cores per CPU) and 2x NVIDIA
Lower is better K20 GPUs (225W per GPU)

600000

Energy Expended
400000
= Power x Time

200000

0
1 Node 1 Node + 2x K20

Cut down energy usage by ? with GPUs


Kepler - Greener: Twice The Science/Joule
Energy used in simulating 1 ns of SMTV
250000

The blue node contains Dual E5-2687W CPUs
200000 (150W each, 8 Cores per CPU).
Energy Expended (kJ)

Lower is better The green nodes contain Dual E5-2687W CPUs
(8 Cores per CPU) and 2x NVIDIA K10, K20, or
150000
K20X GPUs (235W each).

Energy Expended
100000
= Power x Time

50000

0
CPU Only CPU + 2 K10s CPU + 2 K20s CPU + 2 K20Xs

Cut down energy usage by ? with GPUs

Satellite Tobacco Mosaic Virus

Recommended GPU Node Configuration for
NAMD Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
Kepler K10, K20, K20X
GPUs
Fermi M2090, M2075, C2075
# of GPUs per CPU socket 1-2
GPU memory preference (GB) 6
GPU to CPU connection PCIe 2.0 or higher

Server storage 500 GB or higher

Network configuration Gemini, InfiniBand

10 Scale to multiple nodes with same single node configuration NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Summary/Conclusions
Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests

Large performance boost with small marginal price increase

Energy usage cut in half

GPUs scale very well within a node and over multiple nodes

Tesla K20 GPU is our fastest and lowest power high performance GPU to date

Try GPU accelerated NAMD for free �C www.nvidia.com/GPUTestDrive

�ݺ�ߣ

NAMD Molecular Dynamics on GPU

More Related Content

NAMD Molecular Dynamics on GPU

Editor's Notes