第2回NHNテクノロジーカンファレンスで発表した資料ですー。
References: LINE Storage: Storing billions of rows in Sharded-Redis and HBase per Month (http://tech.naver.jp/blog/?p=1420), I posted this entry in 2012.3.
2018年11月3日にパナソニックスタジアム吹田で開催されたイベント「JAWS FESTA 2018 OSAKA ~Passionate~」のセッション「AWSとDockerで実現するAI研究のためのPipeline as Code」で使った資料です。
来栖川電算ではAWS BatchやAmazon SageMaker的なことをオンプレ環境やハイブリッドクラウド環境で実現し、その上で研究プロエスをコード化しているという話です。研究プロセスを工夫すればもっと良い成果がだせるようになるはずです。
第2回NHNテクノロジーカンファレンスで発表した資料ですー。
References: LINE Storage: Storing billions of rows in Sharded-Redis and HBase per Month (http://tech.naver.jp/blog/?p=1420), I posted this entry in 2012.3.
2018年11月3日にパナソニックスタジアム吹田で開催されたイベント「JAWS FESTA 2018 OSAKA ~Passionate~」のセッション「AWSとDockerで実現するAI研究のためのPipeline as Code」で使った資料です。
来栖川電算ではAWS BatchやAmazon SageMaker的なことをオンプレ環境やハイブリッドクラウド環境で実現し、その上で研究プロエスをコード化しているという話です。研究プロセスを工夫すればもっと良い成果がだせるようになるはずです。
輪講発表資料です。
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda
PPoPP ‘17 Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pages 193-205
【論文紹介】Spatial Temporal Graph Convolutional Networks for Skeleton-Based Acti...ddnpaa
?
(参考文献)Sijie Yan, Yuanjun Xiong, Dahua Lin.Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. Association for the Advancement of Artificial Intelligence (AAAI)2018?
1) The document explores a new concept called error permissive computing that improves computing capabilities and reduces power consumption by allowing and managing hardware errors through system software instead of eliminating errors through general purpose hardware error correction.
2) It describes several approaches for implementing error permissive computing including a software framework called BITFLEX that enables approximate computing, an FPGA-based memory emulator for evaluating new system software mechanisms, and techniques for sparse and topology-aware communication that can accelerate large-scale deep learning and reduce communication costs.
3) The goal is to take a holistic approach across hardware and software layers to perform lightweight error correction at the software level while eliminating general purpose error correction in hardware for improved efficiency.
Opportunities of ML-based data analytics in ABCIRyousei Takano
?
This document discusses opportunities for using machine learning-based data analytics on the ABCI supercomputer system. It summarizes:
1) An introduction to the ABCI system and how it is being used for AI research.
2) How sensor data from the ABCI system and job logs could be analyzed using machine learning to optimize data center operation and improve resource utilization and scheduling.
3) Two potential use cases - using workload prediction to enable more efficient cooling system control, and applying machine learning to better predict job execution times to improve scheduling.
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentRyousei Takano
?
AI Infrastructure for Everyone (Democratization AI) aims to build an AI infrastructure platform that is accessible to everyone from beginners to experts. The platform provides up to 512-node computing resources, ready-to-use software, datasets, and pre-trained models. It also offers services like an easy-to-use web-based IDE for beginners and an AI cloud with on-demand, reserved, and batch processing options. The goal is to accelerate AI research and promote social implementation of AI technologies.
The document discusses the performance of three SPEC CPU2006 benchmarks - 483.xalancbmk, 462.libquantum, and 471.omnetpp - under different last-level cache (LLC) configurations and when subjected to LLC cache interference from a background workload. Key findings include reduced performance for the benchmarks when run with a smaller LLC size or when interfered with by a LLC jammer workload, but maintained performance when QoS techniques were applied to isolate the benchmark workload in the LLC.
The document summarizes four presentations from the USENIX NSDI 2016 conference session on resource sharing:
1. "Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics" proposes a framework that uses results from small training jobs to efficiently predict performance of data analytics workloads in cloud environments and reduce the number of required training jobs.
2. "Cliffhanger: Scaling Performance Cliffs in Web Memory Caches" presents algorithms to dynamically allocate memory across queues in Memcached to smooth out performance cliffs and potentially save memory usage.
3. "FairRide: Near-Optimal, Fair Cache Sharing" introduces a caching policy that provides isolation guarantees, prevents strategic behavior, and
This document discusses optimizations for TCP/IP networking performance on multicore systems. It describes several inefficiencies in the Linux kernel TCP/IP stack related to shared resources between cores, broken data locality, and per-packet processing overhead. It then introduces mTCP, a user-level TCP/IP stack that addresses these issues through a thread model with pairwise threading, batch packet processing from I/O to applications, and a BSD-like socket API. mTCP achieves a 2.35x performance improvement over the kernel TCP/IP stack on a web server workload.
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
?
1) The document proposes a new "flow-centric computing" data center architecture for the post-Moore era that focuses on data flows.
2) It involves disaggregating server components and reassembling them as "slices" consisting of task-specific processors and storage connected by an optical network to efficiently process data.
3) The authors expect optical networks to enable high-speed communication between processors, replacing general CPUs, and to potentially revolutionize how data is processed in future data centers.
A Look Inside Google’s Data Center NetworksRyousei Takano
?
1) Google has been developing their own data center network architectures using merchant silicon switches and centralized network control since 2005 to keep up with increasing bandwidth demands.
2) Their network designs have evolved from Firehose and Watchtower to the current Saturn and Jupiter networks, increasing port speeds from 1/10Gbps to 40/100Gbps and aggregate bandwidth from terabits to petabits per second.
3) Their network architectures employ Clos topologies with merchant silicon switches at the top-of-rack, aggregation, and spine layers and centralized control of traffic routing.
- Hardware such as DRAM and NAND flash are facing scaling challenges as density increases, which could impact performance and cost. New non-volatile memory (NVM) technologies may provide opportunities to address these challenges but require software and system architecture changes to realize their full potential. Key considerations include persistence, performance, and programming models.
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
?
This document discusses lessons learned from operating the AIST Super Green Cloud (ASGC), a fully virtualized high-performance computing (HPC) cloud system. It summarizes key findings from the first six months of operation, including performance evaluations of SR-IOV virtualization and HPC applications. It also outlines conclusions and future work, such as improving data movement efficiency across hybrid cloud environments.
The document summarizes the author's participation report at the IEEE CloudCom 2014 conference. Some key points include:
- The author attended sessions on virtualization and HPC on cloud.
- Presentations had a strong academic focus and many presenters were Asian.
- Eight papers on HPC on cloud covered topics like reliability, energy efficiency, performance metrics, and applications like Monte Carlo simulations.
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
?
The document evaluates the performance impact of virtualization on high-performance computing (HPC) clouds. Experiments were conducted on the AIST Super Green Cloud, a 155-node HPC cluster. Benchmark results show that while PCI passthrough mitigates I/O overhead, virtualization still incurs performance penalties for MPI collectives as node counts increase. Application benchmarks demonstrate overhead is limited to around 5%. The study concludes HPC clouds are promising due to utilization improvements from virtualization, but further optimization of virtual machine placement and pass-through technologies could help reduce overhead.
From Rack scale computers to Warehouse scale computersRyousei Takano
?
This document discusses the transition from rack-scale computers to warehouse-scale computers through the disaggregation of technologies. It provides examples of rack-scale architectures like Open Compute Project and Intel Rack Scale Architecture. For warehouse-scale computers, it examines HP's The Machine project using application-specific cores, universal memory, and photonics fabric. It also outlines UC Berkeley's FireBox project utilizing 1 terabit/sec optical fibers, many-core systems-on-chip, and non-volatile memory modules connected via high-radix photonic switches.
高性能かつスケールアウト可能なHPCクラウド AIST Super Green CloudRyousei Takano
?
The document contains configuration instructions for creating a cluster in a cloud computing environment called myCluster. It specifies creating a frontend node and 16 compute nodes using specified templates, compute and disk offerings. It also defines the cluster name, zone, network, and SSH key to use. The cluster can then be started and later destroyed along with a configuration file.
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterRyousei Takano
?
The document describes Iris, an inter-cloud resource integration system that enables elastic cloud data centers. Iris uses nested virtualization technologies including nested KVM to construct a virtual infrastructure spanning multiple distributed data centers. It provides a new Hardware as a Service (HaaS) model for inter-cloud federation at the infrastructure provider level. The authors demonstrate Apache CloudStack can seamlessly manage resources across emulated inter-cloud environments using Iris.
9. ?高速VMマイグレーション
?? ?高速かつネットワーク負荷が?小さいライブマイ
グレーションであるガイドコピーを提案
–? ポストコピー?方式の派?生
–? マイグレーション元に残したガイドVMのヒント情報
に従い、ページ転送を最適化
–? c.f. ?流流鏑?馬、都?鳥
source
destination
time
CPU
A
A
background copy
B
B
D
D
time
context transfer
CPU
background copy
shared memory
shared memory
guide
VM
page request
migration
manager
migration
manager
migrated
VM
page fault
read log
hypervisor
D
page request
command
signal
memory
access log
B
C
(a) Guide-copy architecture
C
wait
page transfer
C
memory
mapper
A
B
page transfer
new memory
access
A
C
D
hypervisor
(b) Guided memory transfer mechanism
Figure 3: The guide-copy migration’s architecture with an example of a guided memory transfer scenario.
J. ?Kim ?(POSTECH), ?et ?al., ?“Guide-‐??copy: ?fast ?and ?silent ?migration ?of ?virtual ?machine ?for ?
data ?centers”
9
10. 900
300
guidecopy
2.1
Delay (s)
guidecopy
30
0
average
calculix
dealII
(b) Delay - 1Gbps
postcopy
60
postcopy
1.4
guidecopy
0.7
average
cactusADM
lbm
milc
bwaves
GemsFDTD
average
cactusADM
lbm
milc
bwaves
0.8
0.6
0.4
0.2
0.0
Post-copy
Guide-copy
0.2 0.4 0.6 0.8
0.0
(c) Page faults - 5Gbps
Figure 6: The execution time of workloads repeating
back-to-back post-copy and guide-copy migrations
↓利利?用帯域の削減
with a 5s interval.
Delay (s)
90
xalancbmk
gcc
average
calculix
dealII
xalancbmk
leslie3d
bzip2
gcc
(a) Page faults - 1Gbps
leslie3d
0
0
1
Network bandwidth (Gbps)
(a) Delay - bzip2
Delay (s)
guidecopy
←ページフォルトおよび遅延の削減
postcopy
bzip2
10
600
mcf
20
postcopy
Delay (ms)
30
GemsFDTD
ts
on
e,
ue
of
Unpredicted
40
mcf
ns.
wo
ehe
uhe
he
nb)
st
rng
Predicted
Page fault (MB)
es
kn2,
er
c,
?高速VMマイグレーション
Page fault (MB)
y
Figure 8: Guide-copy’s cost-e?ective adaptive migrat
(normalized to the baseline post-copy scheme)
2.0
1.5
1.0
0.5
0.0
Post-copy
Guide-copy
1
2
3
4
5
Network bandwidth (Gbps)
(b) Delay - cactusADM
(d) Delay - 5Gbps
Figure 5: Guide-copy’s in-time memory transfer
reducing the number of page faults and their service
latency.
ds
TCP bu?ering does not a?ect the guide-copy’s performance
Figure 7: The guide-copy migration delay with
varying network bandwidth availability.
bandwidth while limiting the bandwidth given to the 10
11. クラウド資源管理理
?? 背景と動機
Processors
–? パブリッククラウド上に仮想クラスタを
作成する環境の整備 ?e.g., ?StarCluster
–? 予約インスタンスを活?用して安く計算したい
C
(1,0.75)
(0.25,0.5)
D (1.75,1.5)
B
A (0,1.5)
?? クラウド資源を「グルーポン」のように
共同購?入して利利?用するSemi-‐??Elastic ?
Cluster ?(SEC)を提案
?? 負荷に応じてクラスタサイズを動的に調整
?? バッチスケジューリングの拡張で実現
1
2
3
Time (Hour)
(a) Pure on-demand cloud
Processors
(0.25,0.5)
C
(1,0.75)
B
A (0,1.5)
1
2
D (1.75,1.5)
3
Time (Hour)
(b) Traditional local cluster
Processors
C
(1,0.75)
(0.25,0.5)
–? シミュレーション実験で61%コスト削減
B
D (1.75,1.5)
A (0,1.5)
1
2
3
Time (Hour)
(c) Semi-elastic cluster
Figure 2: Semi-elastic cluster model
S. ?Niu ?(Tsinghua ?Univ.), ?et ?al., ?“Cost-‐??e?ective ?Cloud ?HPC ?Resource ?Provisioning ?by ?
with its (arrival time, execution time) pair. The g
Building ?Semi-‐??Elastic ?Virtual ?Clusters”
indicate the actual job execution periods on all
11