19. 20
DEEP LEARNING INSIGHT
従来のアルゴリズム ディープラーニング
0%
20%
40%
60%
80%
100%
overall passenger
channel
indoor public area sunny day rainny day winter summer
Pedestrian detection Recall rate
Traditional Deep learning
70
75
80
85
90
95
100
vehicle color brand model sun blade safe belt phone calling
Vehicle feature accuracy increased by Deep Learning
traditional algorithm deep learning
監視カメラ
29. 30NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA DGX-1
世界初ディープラーニング スーパーコンピューター
ディープラーニング向けに設計
170 TF FP16
8個 Tesla P100 ハイブリッド?キューブメッシュ
主要なAIフレームワークを加速
30. 31
0x
16x
32x
48x
64x
0 16 32 48 64
ストロングスケール
1つのストロングノードは多くのウィークノードよりも高速
VASP 性能
2x P100
CPU: Dual Socket Intel E5-2680v3 12 cores, 128 GB DDR4 per node, FDR IB
VASP 5.4.1_05Feb16, Si-Huge Dataset. 16, 32 Nodes are estimated based on same scaling from 4 to 8 nodes
Caffe AlexNet scaling data: https://software.intel.com/en-us/articles/caffe-training-on-multi-node-distributed-memory-systems-based-on-intel-xeon-processor-e5
CAFFE ALEXNET 性能
4x P100
8x P100
Single P100 PCIe Node vs Lots of Weak Nodes
# of CPU Server Nodes
Speed-upvs1CPUServerNode
0x
2x
4x
6x
8x
10x
12x
0 4 8 12 16 20 24 28 32
2x P100
8x P100
Single P100 PCIe Node vs Lots of Weak Nodes
# of CPU Server Nodes
Speed-upvs1CPUServerNode
4x P100
64 CPU Nodes
32 CPU Nodes
31. 32
Fastest AI Supercomputer in TOP500
4.9 Petaflops Peak FP64 Performance
19.6 Petaflops DL FP16 Performance
124 NVIDIA DGX-1 Server Nodes
Most Energy Efficient Supercomputer
#1 on Green500 List
9.5 GFLOPS per Watt
2x More Efficient than Xeon Phi System
Rocket for Cancer Moonshot
CANDLE Development Platform
Optimized Frameworks
DGX-1 as Single Common Platform
INTRODUCING DGX SATURNV
World’s Most Efficient AI Supercomputer
32. 33
K80 M40 M4
P100
(SXM2)
P100
(PCIE)
P40 P4
GPU 2x GK210 GM200 GM206 GP100 GP100 GP102 GP104
PEAK FP64 (TFLOPs) 2.9 NA NA 5.3 4.7 NA NA
PEAK FP32 (TFLOPs) 8.7 7 2.2 10.6 9.3 12 5.5
PEAK FP16 (TFLOPs) NA NA NA 21.2 18.7 NA NA
PEAK TIOPs NA NA NA NA NA 47 22
Memory Size 2x 12GB GDDR5 24 GB GDDR5 4 GB GDDR5 16 GB HBM2 16/12 GB HBM2 24 GB GDDR5 8 GB GDDR5
Memory BW 480 GB/s 288 GB/s 80 GB/s 732 GB/s 732/549 GB/s 346 GB/s 192 GB/s
Interconnect PCIe Gen3 PCIe Gen3 PCIe Gen3
NVLINK +
PCIe Gen3
PCIe Gen3 PCIe Gen3 PCIe Gen3
ECC Internal + GDDR5 GDDR5 GDDR5 Internal + HBM2 Internal + HBM2 GDDR5 GDDR5
Form Factor PCIE Dual Slot PCIE Dual Slot PCIE LP SXM2 PCIE Dual Slot PCIE Dual Slot PCIE LP
Power 300 W 250 W 50-75 W 300 W 250 W 250 W 50-75 W
TESLA 製品一覧
33. 34
TEGRA JETSON TX1
モジュール型スーパーコンピュー
ター
主なスペック
GPU 1 TFLOP/s 256コア Maxwell
CPU 64ビット ARM A57 CPU
メモリ 4 GB LPDDR4 | 25.6 GB/s
ストレージ 16 GB eMMC
Wifi/BT 802.11 2x2 ac / BT Ready
ネットワーク 1 Gigabit Ethernet
サイズ 50mm x 87mm
インターフェース 400ピン ボード間接続コネクタ
消費電力 最大10W
Under 10 W for typical use cases
53. Joon Son Chung et al, Department of Engineering Science,
University of Oxford. Google DeepMind
LIP READING SENTENCES IN THE WILD
SESSION 2
https://arxiv.org/pdf/1611.05358v1.pdf
60. Olexandr Isayev Research Assistant Professor, University of
North Carolina at Chapel Hill
ACCURATE PREDICTION OF PROTEIN
KINASE INHIBITORS WITH DEEP
CONVOLUTIONAL NEURAL NETWORKS
SESSION 3
71. Han Zhang et al, Department of Computer Science, Rutgers
University et al.
STACKGAN: TEXT TO PHOTO-REALISTIC
IMAGE SYNTHESIS WITH STACKED
GENERATIVE ADVERSARIAL NETWORKS
SESSION 4
https://arxiv.org/pdf/1612.03242v1.pdf
73. GENERATIVE ADVERSARIAL TEXT TO IMAGE
SYNTHESIS
文章から画像を生成するGAN
ψ:Text Encoder (今回128次元)
https://arxiv.org/pdf/1605.05396v2.pdf
Scott Reed et al, University of Michigan
https://arxiv.org/pdf/1605.05396.pdf