【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
?
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
?
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...Hiroki Nakahara
?
This document presents a method for high-throughput convolutional neural network (CNN) inference on an FPGA using customized JPEG compression. It decomposes convolutions using channel shift and pointwise operations, employs binary weight quantization, and uses a fully pipelined architecture. Experimental results show the proposed JPEG compression achieves an 82x speedup with 0.3% accuracy drop. When implemented on an FPGA, the CNN achieves 3,321 frames per second at 75 watts, providing over 100x and 10x speedups over CPU and GPU respectively.
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...Hiroki Nakahara
?
The document discusses implementing a deep neural network object detector called YOLOv2 on an FPGA using a technique called Nested Residue Number System (NRNS). Key points:
1. YOLOv2 is used for real-time object detection but requires high performance and low power.
2. NRNS decomposes large integer operations into smaller ones using a nested set of prime number moduli, enabling parallelization on FPGA.
3. The authors implemented a Tiny YOLOv2 model using NRNS on a NetFPGA-SUME board, achieving 3.84 FPS at 3.5W power and 1.097 FPS/W efficiency.
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural NetworkHiroki Nakahara
?
This document summarizes a research paper that proposes a ternary weight binary input convolutional neural network (CNN).
The paper proposes using ternary (-1, 0, +1) weights instead of binary weights to improve recognition accuracy over binary CNNs. By setting many weights to zero, computations can be skipped, reducing operations. Experimental results show the ternary CNN model reduced non-zero weights to 5.3% while maintaining accuracy comparable to binary CNNs. Implementation on an ARM processor demonstrated the ternary CNN was 8 times faster than a binary CNN.
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...Hiroki Nakahara
?
This document presents a mixed-precision convolutional neural network (CNN) called a Lightweight YOLOv2 for real-time object detection on an FPGA. The network uses binary precision for the feature extraction layers and half precision for the localization and classification layers. An FPGA implementation of the network achieves 40.81 FPS for object detection, outperforming an embedded GPU and CPU. Future work will apply this approach to other CNN-based applications such as semantic segmentation and pose estimation.
FPT17: An object detector based on multiscale sliding window search using a f...Hiroki Nakahara
?
1) The document describes an object detection system that uses a multiscale sliding window approach with fully pipelined binarized convolutional neural networks (BCNNs) implemented on an FPGA.
2) The system detects and classifies multiple objects in images by applying BCNNs to windows at different scales and locations, and suppresses overlapping detections.
3) Experimental results on a Zynq UltraScale+ MPSoC FPGA demonstrate that the proposed pipelined BCNN architecture can achieve higher accuracy than GPU-based detectors while using less than 5W of power.
A Random Forest using a Multi-valued Decision Diagram on an FPGaHiroki Nakahara
?
The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.
A digital spectrometer using an FPGA is proposed for use on a radio telescope. The spectrometer would provide high-resolution spectral analysis of wideband radio frequency signals received by the telescope. To achieve high throughput on the FPGA, a nested residue number system is used to implement the fast Fourier transforms in the spectrometer. This decomposes large moduli into smaller nested ones, allowing uniform circuit sizes and enabling fully parallel implementation of the arithmetic.
53. 既存の実装結果との?較
Implementation
(Year)
Zhao?et?al.
(2017)?[1]
FINN
(2017)?[2]
Boucle?et?al.
(2017)?[3]
Ours
(2019)
CNN Binary Binary Ternary Noise
Clock?(MHz) 143 166 250 199
#LUTs
#18Kb?BRAMs
#DSP?48Es
46900
94
3
42823
270
32
67300
667
0
40911
228
192
Accuracy?(%) 87.73% 80.10% 86.71% 92.35%
Time?[msec]
(FPS?[s‐1])
5.94
(168)
2.24
(445)
2.36
(423)
1.80
(557)
Power 4.7 2.5 6.8 3.5
53
Binary, Ternaryよりも?速かつ?認識精度, ただしDSPブロック必要
VGG9をベースにしたCNNで評価, データセットはCIFAR10
[1] R. Zhao, W. Song, W. Zhang, T. Xing, J.‐H. Lin, M. Srivastava, R. Gupta and Z. Zhang, “Accelerating
Binarized Convolutional Neural Networks with Software‐Programmable FPGAs,” ISFPGA, 2017, pp.15‐24.
[2] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers,
“FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” ISFPGA, 2017.
[3] A. P‐. Boucle, A. Bourge, F. Ptrot, H. Alemdar, N. Caldwell, and V. Leroy, “Scalable high‐performance
architecture for convolutional ternary neural networks on FPGA,” FPL, 2017, pp.1–7.