2019年4月現在の最新状況を踏まえた資料更新です。
確率的LUTモデルの追加とBinaryBrain version3 の公開が主な内容です。
English Version
/ryuz88/lutnetwork-revision2-english-version
BinaryBrain
https://github.com/ryuz/BinaryBrain
2019年4月現在の最新状況を踏まえた資料更新です。
確率的LUTモデルの追加とBinaryBrain version3 の公開が主な内容です。
English Version
/ryuz88/lutnetwork-revision2-english-version
BinaryBrain
https://github.com/ryuz/BinaryBrain
A digital spectrometer using an FPGA is proposed for use on a radio telescope. The spectrometer would provide high-resolution spectral analysis of wideband radio frequency signals received by the telescope. To achieve high throughput on the FPGA, a nested residue number system is used to implement the fast Fourier transforms in the spectrometer. This decomposes large moduli into smaller nested ones, allowing uniform circuit sizes and enabling fully parallel implementation of the arithmetic.
A Random Forest using a Multi-valued Decision Diagram on an FPGaHiroki Nakahara
?
The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.
Scan Registration for Autonomous Mining Vehicles Using 3D-NDTKitsukawa Yuki
?
研究室のゼミの論文紹介の発表資料です。
Magnusson, M., Lilienthal, A. and Duckett, T. (2007), Scan registration for autonomous mining vehicles using 3D-NDT. J. Field Robotics, 24: 803–827. doi: 10.1002/rob.20204
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...Hiroki Nakahara
?
This document presents a method for high-throughput convolutional neural network (CNN) inference on an FPGA using customized JPEG compression. It decomposes convolutions using channel shift and pointwise operations, employs binary weight quantization, and uses a fully pipelined architecture. Experimental results show the proposed JPEG compression achieves an 82x speedup with 0.3% accuracy drop. When implemented on an FPGA, the CNN achieves 3,321 frames per second at 75 watts, providing over 100x and 10x speedups over CPU and GPU respectively.
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...Hiroki Nakahara
?
The document discusses implementing a deep neural network object detector called YOLOv2 on an FPGA using a technique called Nested Residue Number System (NRNS). Key points:
1. YOLOv2 is used for real-time object detection but requires high performance and low power.
2. NRNS decomposes large integer operations into smaller ones using a nested set of prime number moduli, enabling parallelization on FPGA.
3. The authors implemented a Tiny YOLOv2 model using NRNS on a NetFPGA-SUME board, achieving 3.84 FPS at 3.5W power and 1.097 FPS/W efficiency.
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural NetworkHiroki Nakahara
?
This document summarizes a research paper that proposes a ternary weight binary input convolutional neural network (CNN).
The paper proposes using ternary (-1, 0, +1) weights instead of binary weights to improve recognition accuracy over binary CNNs. By setting many weights to zero, computations can be skipped, reducing operations. Experimental results show the ternary CNN model reduced non-zero weights to 5.3% while maintaining accuracy comparable to binary CNNs. Implementation on an ARM processor demonstrated the ternary CNN was 8 times faster than a binary CNN.
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...Hiroki Nakahara
?
This document presents a mixed-precision convolutional neural network (CNN) called a Lightweight YOLOv2 for real-time object detection on an FPGA. The network uses binary precision for the feature extraction layers and half precision for the localization and classification layers. An FPGA implementation of the network achieves 40.81 FPS for object detection, outperforming an embedded GPU and CPU. Future work will apply this approach to other CNN-based applications such as semantic segmentation and pose estimation.
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
?
1) The document describes an object detection system that uses a multiscale sliding window approach with fully pipelined binarized convolutional neural networks (BCNNs) implemented on an FPGA.
2) The system detects and classifies multiple objects in images by applying BCNNs to windows at different scales and locations, and suppresses overlapping detections.
3) Experimental results on a Zynq UltraScale+ MPSoC FPGA demonstrate that the proposed pipelined BCNN architecture can achieve higher accuracy than GPU-based detectors while using less than 5W of power.
14. 剰余数系
(Residue Number System: RNS)
? 整数XをL個の互いに素な整数 {m1,m2,...,mL} に
よる剰余に分解 X={x1,x2,...,xL}して表現
ここで,
ダイナミックレンジ:
14
Xi ? X mi
X mi
? X modmi
M ? mi
i?1
L
?
(X-|X|miがmi (mi>1)の倍数)つまり最小非負剰余