狠狠撸

IEEE/ACM ?SC2013報告

?高野 ?了了成

産業技術総合研究所?　情報技術研究研究部?門
2014年年1?月15?日?　第41回グリッド協議会ワークショップ@秋葉葉原

SC13 ?“HPC ?Everywhere”
?? 25th ?IEEE/ACM ?International ?Conference ?for ?High ?
performance ?computing, ?Networking, ?Storage ?and ?
Analysis
–? 会議名にSuper ?Computingは残っていない。。
–? 今年年はBig ?data ?(Analysis)に注?目

?? 11?月10?日?～16?日 ??米国コロラド州デンバー
?? HPC関連のトップカンファレンス
–? 今年年の採択率率率20％ ?(90/456)

?? TOP500、各種Awards、
Workshop、Tutorial、BoFなど
?? 巨?大な展?示会場
–? ?米国DoE傘下研究所ブースが不不在

?? 参加者 ?10,500名

2

Big Data
?? 基調講演: ?G. ?Bell ?(Intel), ?“The ?Secret ?Life ?of ?Data”
?? 招待講演
–? A. ?N. ?Choudhary ?(Northwestern ?University)
–? S. ?Koonin ?(New ?York ?University)

Data ?Intensive

Data ?Driven

A.N.Choudhary,
?“Big
?Data
?+
?Big
?Compute
?=
?An
?
Extreme
?Scale
?Marriage
?for
?Smarter
?Science?”

http://cusp.nyu.edu/

S.
?Koonin,
?“Big
?Data
?for
?Big
?Ci-es”
?
3

TOP ?500
?? ランキングに?大きな変動無し
#
?

System

Rmax
?(TFlop/s)

Rpeak
?(TFlop/s)

Power
?(kW)

1

1

Tianhe-?‐2
?(Xeon/Phi)

33862.7

54902.4

17808
?

2

2

Titan
?(Opteron/K20x)

17590.0

27112.5

8209

3

3

Sequoia
?(BG/Q)

17173.2

20132.7

7890

4

4

K
?computer
?(SPARC64)

10510.0

11280.4

12660

5

5

Mira
?(BG/Q)

8586.6

10066.3

3945

6

-?‐

Piz
?Daint
?(Xeon/K20x)

6271.0

7788.9

2325

7

6

Stampede
?(Xeon/Phi)

5168.1

8520.1

4510

8

7

JUQUEEN
?(BG/Q)

5008.9

5872.0

2301

9

8

Vulcan
?(BG/Q)

4293.3

5033.2

1972

SuperMUC
?(Xeon)

2897.0

3185.1

3423

10 9

4

Green ?500
?? Xeon ?+ ?NVIDIA ?K20xの圧勝
#

System

MFlops/W

1

TSUBAME-?‐KFC
?(Xeon/K20x)

2

Wilkes
?(Xeon/K20)

3

HA-?‐PACS
?TCA
?(Xeon/K20x)

4

Piz
?Daint
?(Xeon/K20x)

3185.91

1753.66

5

romeo
?(Xeon/K20x)

3130.95

81.41

6

TSUBAME
?2.5
?(Xeon/K20x)

3068.71

922.54

7

iDataPlex
?DX360M4
?(Xeon/K20x)

2702.16

53.62

8

iDataPlex
?DX360M4
?(Xeon/K20x)

2629.10

269.94

9

iDataPlex
?DX360M4
?(Xeon/K20x)

2629.10

55.62

10 CSIRO
?GPU
?Cluster
?(Xeon/K20m)

2358.69

71.01

4503.17
?

Power
?(kW)
27.78

3631.86
52.62
TSUBAME-‐??KFC（油浸冷冷却）
3517.84
78.77

5

Graph ?500
?? 前回と変動なし
#
?

System

*)
?TEPS:
?Edge
?Traverse
?Per
?Second

GTEPS

1

1

Sequoia
?(BG/Q)

15363

2

2

Mira
?(BG/Q)

14328
?

3

3

JUQUEEN
?(BG/Q)

4

4

K
?computer
?(SPARC64)

5

5

Fermi
?(BG/Q)

6

6

Tianhe-?‐2
?(Xeon/Phi)

7

7

Turing
?(BG/Q)

1427

7

7

Blue
?Joule
?(BG/Q)

1427

7

7

DIRAC
?(BG/Q)

1427

7

7

Zumbrota
?(BG/Q)

1427

5848
5524.12
2567
2061.48

6

Green ?Graph ?500
?? TSUBAME-‐??KFCはGreen ?500との?二冠
?? Small ?DataではGraph ?CRESTチーム圧勝

Big
?data
?category:
# System

small
?data
?category
?(scale
?<
?30):

MTEPS/
W

Graph5
00
?rank

1 TSUBAME-?‐KFC

6.72
?

47

2 JUQUEEN

5.41
?

3

3 Mira

4.42

2

4 EBD-?‐RH5885v2

4.35

96

5 Sequoia

3.55

1

# System

MTEPS Graph5
/W
00
?rank

1 GraphCREST-?‐Xperia-?‐
A-?‐SO-?‐04E

153.17
?

143

2 GraphCREST-?‐
NEXUS7-?‐2013

129.63
?

141

3 Kicy6

73.57

58

4 GraphCREST-?‐Tegra3

64.12

150

5 GraphCREST-?‐Intel-?‐
NUC

53.82

124

7

30 ?Technical ?Sessions
??

Application ?Performance ?Characterization

??
??
??

Energy ?Management
Engineering ?Scalable ?Applications
Extreme-‐??Scale ?Applications

??
??
??
??
??

Fault-‐??Tolerant ?Computing
GPU ?Programming
Graph ?Partitioning ?and ?Data ?Clustering
I/O ?Tuning
Improving ?Large-‐??Scale ?Computation ?and ?
Data ?Resources
In-‐??Situ ?Data ?Analytics ?and ?Reduction
Inter-‐??Node ?Communication
Load ?Balancing

?? Cloud ?Resource ?Management ?and ?
Scheduling
?? Data ?Management ?in ?the ?Cloud

?? Fault ?Tolerance ?and ?Migration ?in ?
the ?Cloud

??
??
??

??
??
??
??

MPI ?Performance ?and ?Debugging
Matrix ?Computations
Memory ?Hierarchy
Memory ?Resilience

??
??
??
??

Optimizing ?Numerical ?Code
Parallel ?Performance ?Tools
Parallel ?Programming ?Models ?and ?Compilation
Performance ?Analysis ?of ?Applications ?at ?Large ?
Scale

??

??
??
??
??
??
??

Optimizing ?Data ?Movement

Performance ?Management ?of ?HPC ?
Systems

Physical ?Frontiers
Preconditioners ?and ?Unstructured ?Meshes
Sorting ?and ?Graph ?Algorithms
System-‐??wide ?Application ?Performance ?
Assessments
Tools ?for ?Scalable ?Analysis

8

?高速VMマイグレーション
?? ?高速かつネットワーク負荷が?小さいライブマイ
グレーションであるガイドコピーを提案

–? ポストコピー?方式の派?生
–? マイグレーション元に残したガイドVMのヒント情報
に従い、ページ転送を最適化
–? c.f. ?流流鏑?馬、都?鳥
source

destination

time

CPU

A

A

background copy

B

B

D

D

time

context transfer

CPU

background copy

shared memory

shared memory
guide
VM

page request

migration
manager

migration
manager

migrated
VM

page fault

read log
hypervisor

D

page request

command
signal

memory
access log

B

C

(a) Guide-copy architecture

C

wait

page transfer
C

memory
mapper

A
B

page transfer
new memory
access

A

C
D

hypervisor

(b) Guided memory transfer mechanism

Figure 3: The guide-copy migration’s architecture with an example of a guided memory transfer scenario.

J. ?Kim ?(POSTECH), ?et ?al., ?“Guide-‐??copy: ?fast ?and ?silent ?migration ?of ?virtual ?machine ?for ?
data ?centers”
9

900

300

guidecopy

2.1

Delay (s)

guidecopy

30
0

average

calculix

dealII

(b) Delay - 1Gbps

postcopy

60

postcopy

1.4
guidecopy

0.7

average

cactusADM

lbm

milc

bwaves

GemsFDTD

average

cactusADM

lbm

milc

bwaves

0.8
0.6
0.4
0.2
0.0

Post-copy
Guide-copy

0.2 0.4 0.6 0.8

0.0

(c) Page faults - 5Gbps

Figure 6: The execution time of workloads repeating
back-to-back post-copy and guide-copy migrations
↓利利?用帯域の削減
with a 5s interval.

Delay (s)

90

xalancbmk

gcc

average

calculix

dealII

xalancbmk

leslie3d

bzip2

gcc

(a) Page faults - 1Gbps

leslie3d

0

0

1

Network bandwidth (Gbps)

(a) Delay - bzip2

Delay (s)

guidecopy

←ページフォルトおよび遅延の削減

postcopy

bzip2

10

600

mcf

20

postcopy

Delay (ms)

30

GemsFDTD

ts
on
e,
ue
of

Unpredicted

40

mcf

ns.
wo
ehe
uhe
he
nb)
st
rng

Predicted

Page fault (MB)

es
kn2,
er
c,

?高速VMマイグレーション

Page fault (MB)

y

Figure 8: Guide-copy’s cost-e?ective adaptive migrat
(normalized to the baseline post-copy scheme)

2.0
1.5
1.0
0.5
0.0

Post-copy
Guide-copy

1

2

3

4

5

Network bandwidth (Gbps)

(b) Delay - cactusADM

(d) Delay - 5Gbps

Figure 5: Guide-copy’s in-time memory transfer
reducing the number of page faults and their service
latency.

ds
TCP bu?ering does not a?ect the guide-copy’s performance

Figure 7: The guide-copy migration delay with
varying network bandwidth availability.

bandwidth while limiting the bandwidth given to the 10

クラウド資源管理理
?? 背景と動機

Processors

–? パブリッククラウド上に仮想クラスタを
作成する環境の整備 ?e.g., ?StarCluster
–? 予約インスタンスを活?用して安く計算したい

C
(1,0.75)
(0.25,0.5)

D (1.75,1.5)

B
A (0,1.5)

?? クラウド資源を「グルーポン」のように
共同購?入して利利?用するSemi-‐??Elastic ?
Cluster ?(SEC)を提案
?? 負荷に応じてクラスタサイズを動的に調整
?? バッチスケジューリングの拡張で実現

1

2

3

Time (Hour)

(a) Pure on-demand cloud

Processors
(0.25,0.5)
C
(1,0.75)

B
A (0,1.5)
1

2

D (1.75,1.5)
3

Time (Hour)

(b) Traditional local cluster

Processors
C
(1,0.75)

(0.25,0.5)

–? シミュレーション実験で61%コスト削減

B

D (1.75,1.5)

A (0,1.5)
1

2

3

Time (Hour)

(c) Semi-elastic cluster

Figure 2: Semi-elastic cluster model

S. ?Niu ?(Tsinghua ?Univ.), ?et ?al., ?“Cost-‐??e?ective ?Cloud ?HPC ?Resource ?Provisioning ?by ?
with its (arrival time, execution time) pair. The g
Building ?Semi-‐??Elastic ?Virtual ?Clusters”

indicate the actual job execution periods on all

11

クラウドのデータ管理理（１）

?? 背景

–? 超?大規模データを扱うデータサイエンス分野では、データを
GridFTPで転送してクライアントサイドで処理理するか、SaaS版
Globus ?Onlineを?用いるのが?一般的
–? WAN越しに転送する場合、サーバサイドでユーザが定義した
データのサブセット化を?支援してデータ量量を削減する機能が必要
?? GridFTPのプラグインとしてSDQuery ?DSI ?(Scienti?c ?Data ?Query ?
Data ?Storage ?Interface)を開発
–? HDF5とNetCDFデータフォーマットに対応したサブセット化APIを提供
–? システム最適化
?? データセグメントのインデキシングベース検索索とインメモリフィルタリングによる
全検索索を?自動的に選択する性能モデル
?? 異異なるディスクブロックが読み出される場合、別のTCPストリームを?用いる
並列列ストリームデータ転送
?? 各サブブロックに対して同時にインデキシングを実?行行する並列列インデキシング

Y. ?Su ?(Ohio ?State ?Univ.), ?et ?al., ?“SDQuery ?DSI: ?Integrating ?Data ?Management ?Support ?
with ?a ?Wide ?Area ?Data ?Transfer ?Protocol”
12

クラウドのデータ管理理（１）

実験では、以下を?示した
?? 性能モデルの妥当性
?? 広帯域ネットワークではサブセット化
の効果が少ないが、帯域が?十分ない場
合は効果が?大きい
?? 並列列ストリームや並列列インデキシング
による性能向上
13

クラウドのデータ管理理（２）
?? 背景

–? データインテンシブアプリケーションでは超?高性能データ転送
ツールが必要
–? end-‐??to-‐??endパスにおけるホスト、ネットワーク、ストレージの
3つのボトルネックへの対応が必要

?? 100Gbpsのend-‐??to-‐??end?高速データ転送システムの設計、
最適化、性能評価を実施
–? バックエンドストレージ接続にiSER（iSCSI ?Extensions ?for ?
RDMA）を使?用
–? ホスト間通信にRFTP（RDMAベースファイル転送プロトコル）
を使?用
–? 各ホストでNUMA?用チューニングによる性能最適化

Y. ?Ren ?(Stony ?Brook ?Univ.), ?et ?al., ?“Design ?and ?Performance ?Evaluation ?of ?NUMA-‐??
Aware ?RDMA-‐??Based ?End-‐??to-‐??End ?Data ?Transfer ?Systems”
14

クラウドのデータ管理理（２）

バックエンドSANの設計
?? 提案?手法（RFTP）では100Gbps環境で
?? iSERプロトコルを利利?用
91Gbpsを達成。GridFTPでは29Gbps
?? 各ファイルを指定したNUMAノードメモリに置き、 CPU使?用率率率も提案?手法では削減できた
??
local ?I/Oになるようtargetプロセスを割り当て
?? 特にRFTP ?sink側（RDMA ?Write）
RDMAベースプロトコルRFTPの利利?用
では?大幅に削減できる
?? ゼロコピーで?高速データ転送するため、
CPU使?用率率率を?大幅に削減できる
15

ポストペタに向けた耐障害性
?? テクニカルセッション

–? Fault-‐??Tolerant ?Computing
–? Fault ?Tolerance ?and ?Migration ?in ?the ?Cloud
–? Matrix ?Computation

?? パネル

–? Fault ?Tolerance/Resilience ?at ?Petascale/Exascale: ?Is ?it ?
Really ?Critical?...

?? 並列列Hessenberg変換（チェックサム付きの線形代数
演算）のように、FTをアルゴリズムに?入れ込む発表は
あるが、Checkpoint/Restartで何とかなってしまう
（何とかしよう）という印象
Y. ?Jia ?(Univ. ?of ?Tennessee), ?et ?al., ?“Parallel ?Reduction ?to ?Hessenberg ?Form ?with ?
Algorithm-‐??Based ?Fault ?Tolerance”
16

Exhibition
?? 58カ国、350件の展?示、10,550名の参加
?? 各種メディアでレポート
–? http://news.mynavi.jp/column/sc13/
?? CUDA6、Post-‐??FX10、SX-‐??ACEなど
–? http://www.hpcwire.com/tag/sc13/
?? 3 ?main ?trends: ?Big ?data、Cloud、Exascale

17

ARM-‐??based ?system

EU ?exascale ?super-‐??
computer ?research ?project:
Mont-‐??Blanc
The ?above ?is ?another ?project?’s ?photo:-‐??)

Tiled ?wall ?display ?controlled
by ?RasPi ?cluster@SDSC

Charm++ ?cluster ?in ?a ?bag
18

FPGA
Convey ?HC ?memcached ?appliance@DELL
memcached ?benchmark:
3,644,876 ?-‐??> ?11,756,645 ?opts/s

19

Non ?silicon-‐??based ?computers

CNT ?Computer@Stanford

LEGO ?Turing ?Machine@Inria ?(http://rubens.ens-‐??lyon.fr/)
20

雑感
?? HPC ?Everywhere ?= ?HPC ?+ ?ビッグデータ

–? すでにHPCは科学技術のためだけのものではない
–? ハイブリッドアーキテクチャが必要（？）

?? HPC ?Cloudに対する注?目の?高まり

–? システム系会議かというような論論?文も
–? ここ数年年AISTブースではHPCクラウドについて展?示し
ているが、年年々興味を持ってくれる?人が増えているこ
とを肌で感じた

http://sc13.supercomputing.org/
22

狠狠撸

IEEE/ACM SC2013報告

Recommended

More Related Content

What's hot (20)

Similar to IEEE/ACM SC2013報告 (20)

More from Ryousei Takano (20)

Recently uploaded (6)

IEEE/ACM SC2013報告