The document discusses optimization techniques for deep learning frameworks on Intel CPUs and Fugaku aimed architectures. It introduces oneDNN, a performance library for deep learning operations on Intel CPUs. It discusses issues with C++ implementation, and how just-in-time assembly generation using Xbyak can address these issues by generating optimal code depending on parameters. It also introduces Xbyak_aarch64 for generating optimized code for Fugaku's Scalable Vector Extension instructions.
1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
Deep learning reading club @ nimiri for SWESTKiyoshi Ogawa
?
We start from zero to learn deep learning with python. Members have each goal respectively and I will try language processing and gene/genome processing.
6. TVMスタック
(論文より引用)MAPL’18, June 18, 2018, Philadelphia, PA, USA
Frameworks
Computational Graph
High level Data-?ow Rewriting
Tensor Operator Description
Schedule
LLVMAccelerators CUDA/Metal/OpenCL
CNTK
CoreML These graphs areeasy to optimize
struct programs in a deeply-embe
guage (eDSL) without high-level a
A moreexpressivestyle popular
workslikeChainer, PyTorch, and G
tion of graphs with dynamic topo
runtime data and support di ere
tive computations. This expressiv
user but has limited the ability fo
optimize user-de ned graphs. Mo
requires a Python interpreter, ma
accelerators and FPGAsextremely
In summary, static graphs are
the expressivity found in higher-
<グラフIR> ※ 従来は NNVM というのがあった
グラフ最適化(オペレータのfusionなど)
オペレータ最適化(ループ最適化、並列化)
<オペレータIR> ※ HalideIR というのがある
8. Relay ベース
の新しいTVM
スタック
programs’ computational expressivity. FrameworkslikeTen-
sorFlow represent di erentiable computation using static
graphs, which are data ow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
w
w
9. Relay ベース
の新しいTVM
スタック
programs’ computational expressivity. FrameworkslikeTen-
sorFlow represent di erentiable computation using static
graphs, which are data ow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
w
w
TVMの一部というより、
新しい汎用グラフIR
or 深層学習用DSLであり、
TVMはそのバックエンドに
なるというイメージ
(紹介者の主観)
#4: Title: Toward new definitions of equivalence in verifying deep learning compilers
Author name: Takeo Imai
Author affiliation: LeapMind Inc.
(Currently, he is a freelance engineer and also at National Institute of Informatics)
Abstract (word count: 373/500):
A deep learning compiler is a compiler that takes a deep neural network (or DNN) as an input, optimizes it for efficient computation, and outputs code that runs on hardware or a platform. Optimizations applied during the compilation include graph optimizations like operator fusion and tensor optimizations like loop optimizations for matrix multiplication and accumulation. In addition to those classical optimizations, a deep learning compiler often applies optimizations specific to deep learning accelerators. One common example is quantization, which reduces the bit length of parameters and its computations in a DNN. A compiler may quantize 32bit float values into n-bit integer, where n = 8 is most common and n = 1 or 2 for some specific hardware devices.
In this talk, we shed light to the difficulties in defining what is equivalence for deep learning compilers. For compilers of ordinary programming languages, the behavior of a program before/after compilation must be equivalent, regardless of optimization passes applied in the compilation process. It is commonly understood that having the “same” behavior is not to have exactly the same output values from the same input, and an output value including some tiny errors like rounding errors in floating point operations are generally accepted, according to the ordinary equivalence criteria. A deep learning compiler, however, sometimes produces a code that does not keep the equivalence in a classical sense; for example, a tiny rounding error caused at a hidden layer may change the value from 0 to 1 after a 1-bit quantization, which may bring a completely different final classification result compared with the original DNN’s behavior. This is because the final output of a DNN is discrete-valued .The classical equivalence criteria do not take into consideration such tiny-error, big-difference cases.
This is a fundamental issue in the equivalence verification of deep learning compilers. We consider that we need to start from redefining the equivalence of DNN computation, or “relaxing” the equivalence criteria in order that some difference of individual discrete-valued results can be acceptable. And then, we need to propose new testing or verification methods according to the new equivalence criteria.
We present issues around the correctness of deep learning compilers described above, and offer a direction for our future work about DNN compilation.