贵笔骋础アクセラレータの作り方
IBM POWER + CAPI編
http://www.slideshare.net/ssuser479fa3/ibm-capi
にHDLシミュレーション環境を追記したものです。
How to make an FPGA accelerator
IBM POWER + CAPI version
The HDL simulation environment is added to the document of
http://www.slideshare.net/ssuser479fa3/ibm-capi .
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
?
2016年2月20日(金)のZynq Ultrasclae+ MPSoC 勉強会で使った資料です。
追記) 2016.05.08
公式ARM Trusted Firmwareのサイトに、Zynq UltraScale+ MPSoCの実装が追加されていていることを明記した
This is the material I used at Zynq Ultrasclae + MPSoC SIG on 20th February (Friday).
Addendum) 2016.05.08
We stated that the implementation of Zynq UltraScale + MPSoC was added to the official ARM Trusted Firmware site.
TensorFlow XLAのコード解析をしました。
この資料は、TensorFlow XLAのAOT部分に関するものです。
I analyzed the code of TensorFlow XLA.
This document pertains to JIT part of TensorFlow XLA.
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)Mr. Vengineer
?
XilinxのSDSoCが生成するソフトウェアの調査結果をまとめたものです。
2016.09.09 : 新規登録しました。
2016.09.10 : SDSoCが生成するSDカード内のファイルの部分を追記しました。
It summarizes the investigation result of software generated by Xilinx's SDSoC v2016.2.
2016.09.09: I newly registered.
2016.09.10: The file part in the SD card generated by SDSoC was added.
SDSoC解体新書2016.2版ソフトウェア編 (チラ見) : Inside SDSoC v2016.2 (Software short edtion)Mr. Vengineer
?
XilinxのSDSoCが生成するソフトウェアの調査結果をまとめたものです。
2016.09.09 : 新規登録しました。
2016.09.10 : SDSoCが生成するSDカード内のファイルの部分を追記しました。
It summarizes the investigation result of software generated by Xilinx's SDSoC v2016.2.
2016.09.09: I newly registered.
2016.09.10: The file part in the SD card generated by SDSoC was added.
This document summarizes Intel Nervana Graph, a graph compiler developed by Nervana Systems and now maintained by Intel. It discusses how Nervana Graph can import models from frameworks like Caffe, TensorFlow, MXNet and convert them to an intermediate graph representation. It then describes how different transformers can convert the graph to executable code for CPUs or GPUs. The document provides code examples for using Nervana Graph with Caffe and TensorFlow models and discusses the implementation of the graph transformations and compiler passes.
The document discusses software driven verification using Xilinx's xsim simulator. It describes using the Xilinx Simulator Interface (XSI) which allows a C/C++ program to act as a testbench for an HDL design in xsim. It provides details on how to use XSI functions like getting port numbers and values, running simulation, and controlling simulation from C++. It also discusses calling XSI functions through dynamic linking and using SystemVerilog DPI to directly access the DUT from C++.
TVM uses Verilator and DPI to connect Verilog/Chisel accelerator models written in SystemVerilog/Chisel to Python code. It initializes the hardware model and controls simulation using methods like SimLaunch, SimWait, SimResume. The Python code loads the accelerator module, allocates memory, runs the accelerator by calling driver functions that interface with the DPI to initialize, launch and wait for completion of the accelerator. This allows accelerators developed in Verilog/Chisel to be tested from Python.
Cloud Deep Learning Chips Training & InferenceMr. Vengineer
?
This document summarizes various chips for deep learning training and inference in the cloud from companies such as Google, Intel, Habana Labs, Alibaba, and Graphcore. It provides information on the specs and capabilities of each chip, such as the memory type and TFLOPS, and links to product pages and documentation. It also discusses collaborations between companies on projects like Glow, ONNX, and OCP accelerator modules.
Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
?
This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
?
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
TensorFlow XLAの中では、
XLA Client を Pythonで利用できるようになっています。
また、2018年2月に開催されたSysMLの論文(JAX@Google)についても追記しました。
In TensorFlow XLA,
XLA Client is now available in Python.
Also added about SysML's paper (JAX @ Google) held in February 2018.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。