A Random Forest using a Multi-valued Decision Diagram on an FPGaHiroki Nakahara
?
The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.
A Random Forest using a Multi-valued Decision Diagram on an FPGaHiroki Nakahara
?
The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.
A digital spectrometer using an FPGA is proposed for use on a radio telescope. The spectrometer would provide high-resolution spectral analysis of wideband radio frequency signals received by the telescope. To achieve high throughput on the FPGA, a nested residue number system is used to implement the fast Fourier transforms in the spectrometer. This decomposes large moduli into smaller nested ones, allowing uniform circuit sizes and enabling fully parallel implementation of the arithmetic.
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
?
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1995-2003, 2016.
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...Hiroki Nakahara
?
This document presents a method for high-throughput convolutional neural network (CNN) inference on an FPGA using customized JPEG compression. It decomposes convolutions using channel shift and pointwise operations, employs binary weight quantization, and uses a fully pipelined architecture. Experimental results show the proposed JPEG compression achieves an 82x speedup with 0.3% accuracy drop. When implemented on an FPGA, the CNN achieves 3,321 frames per second at 75 watts, providing over 100x and 10x speedups over CPU and GPU respectively.
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...Hiroki Nakahara
?
The document discusses implementing a deep neural network object detector called YOLOv2 on an FPGA using a technique called Nested Residue Number System (NRNS). Key points:
1. YOLOv2 is used for real-time object detection but requires high performance and low power.
2. NRNS decomposes large integer operations into smaller ones using a nested set of prime number moduli, enabling parallelization on FPGA.
3. The authors implemented a Tiny YOLOv2 model using NRNS on a NetFPGA-SUME board, achieving 3.84 FPS at 3.5W power and 1.097 FPS/W efficiency.
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural NetworkHiroki Nakahara
?
This document summarizes a research paper that proposes a ternary weight binary input convolutional neural network (CNN).
The paper proposes using ternary (-1, 0, +1) weights instead of binary weights to improve recognition accuracy over binary CNNs. By setting many weights to zero, computations can be skipped, reducing operations. Experimental results show the ternary CNN model reduced non-zero weights to 5.3% while maintaining accuracy comparable to binary CNNs. Implementation on an ARM processor demonstrated the ternary CNN was 8 times faster than a binary CNN.
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...Hiroki Nakahara
?
This document presents a mixed-precision convolutional neural network (CNN) called a Lightweight YOLOv2 for real-time object detection on an FPGA. The network uses binary precision for the feature extraction layers and half precision for the localization and classification layers. An FPGA implementation of the network achieves 40.81 FPS for object detection, outperforming an embedded GPU and CPU. Future work will apply this approach to other CNN-based applications such as semantic segmentation and pose estimation.
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
?
1) The document describes an object detection system that uses a multiscale sliding window approach with fully pipelined binarized convolutional neural networks (BCNNs) implemented on an FPGA.
2) The system detects and classifies multiple objects in images by applying BCNNs to windows at different scales and locations, and suppresses overlapping detections.
3) Experimental results on a Zynq UltraScale+ MPSoC FPGA demonstrate that the proposed pipelined BCNN architecture can achieve higher accuracy than GPU-based detectors while using less than 5W of power.
8. 分類? (Decision Tree)
? 特徴マップを分類する, 弱学習器といわれる
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8
12. Random Forest (RF)
? アンサンブル学習の?種
? 複数の分類?(弱学習器)で構成
? クラス分類と回帰が可能
12
Tree?1 Tree?2 Tree?n
C1
C2
C1
Voter
C1?(Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree?1
Binary?Decision?Tree?(BDT) Random?Forest
...
13. RFのアプリケーション
? Key point matching [Lepetit et al., 2006]
? Object detector [Shotton et al., 2008][Gall et al., 2011]
? Hand written character recognition [Amit&Geman, 1997]
? Visual word clustering
[Moosmann et al.,2006]
? Pose recognition
[Yamashita et al., 2010]
? Human detector
[Mitsui et al., 2011]
[Dahang et al., 2012]
? Human pose estimation
[Shotton 2011]
13
15. FPGA (Field Programmable
Gate Array)
? Reconfigurable architecture
? Look-up Table (LUT)
? Configurable channel
? Advantages
? Faster than CPU
? Dissipate lower power
than GPU
? Short time design
than ASIC
15
22. システムデザインツールの利?
22
①
②
④
③
1. Behavior design
+ pragmas
2. Profile analysis
3. IP core generation by HLS
4. Bitstream generation by
FPGA CAD tool
5. Middle ware generation
↓
Automatically done
30. 他のプラットフォームとの?較
? Implemented RF following devices
? CPU: Intel Core i7 650
? GPU: NVIDIA GeForce GTX Titan
? FPGA: Terasic DE5-NET
? Measure dynamic power including
the host PC
? Test bench: 10,000 random vectors
? Execution time including
communication time between
the host PC and devices
30
GPU
FPGA