�ݺ�ߣ

Jetson AGX Xavier ?? ? ???
??
???
https://leejaymin.github.io/index.html

?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 2 |
NVDLA4
??? ???? ? ?? ??5

? AI Server Performance in 30W, 15W, and 10W
? 512 Volta CUDA Cores and 2x NVDLA
? 8 core CPU
? 32 DL TOPS
??? ?
?? ??
| 3 |
[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf

Model Number: Tegra194
Name: Xavier
? 8x Volta SM 1377MHz
? 512 CUDA cores, 64 Tensor Cores
? 22 TOPS INT8, 11 TFLOPS FP16
GPU
?? ??
| 4 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf

??? ??
?? ??
| 5 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf

?
?
?? ??1
YOLOv3 ?? ? ???3
| 6 |
NVDLA4
??? ???? ? ?? ??5

??: JetPack 4.1.1 Developer Preview (18.11.08)
??
? OS Image
- L4T 31.1: ??? 18.04 (Stability and Security fixes)
? Libraries
- TensorRT 5.0.3.2-1 (the latest version: 5.0.4)
- cuDNN 7.3.1
- CUDA 10
- OpenCV, Multimedia API, VisionWorks
? Developer Tools
- CUDA tools
- NVIDIA Nsight systems 2018.1
? Profiling on Jetson AGX Xavier
? Ability to trace cuDNN, cuBLAS, and OS runtime library API calls
- NVIDIA Nsight Graphics 2018.6
? Debugging and profiling
? Resource monitoring
Jetpack ??
??: Jetpack, TensorFlow
| 7 |

(1) Download JetPack installer to your Linux host computer.
(2) Connect your developer kit to the Linux host computer.
(3) Put your developer kit into Force Recovery Mode.
(4) Run JetPack installer to select and install desired components.
Jetpack ?? ???
| 8 |

Jetpack? ??? ?? Samples
| 9 |
[1] https://elinux.org/Jetson_AGX_Xavier

VisionWorks: Feature Tracker Demo
| 10 |

TensorFlow ?? ?? ???: Link, Blog
Jetpack ??? ??
? tf_gpu-1.12.1+nv19.1-py3
? tf_gpu-1.12.1+nv19.1-py2
? tf_gpu-1.12.1+nv18.12-py3
? tf_gpu-1.12.1+nv18.12-py2
? tf_gpu-1.12.0rc2+nv18.11-py3
? tf_gpu-1.12.0rc2+nv18.11-py2
?? ??
? Install JetPack 4.1.1 Developer Preview
? Install HDF5
- apt-get install libhdf5-serial-dev hdf5-tools
? Install pip3 (python3.6)
? Install the following packages:
? pip3 install --upgrade pip sudo apt-get install zlib1g-dev zip libjpeg8-dev
libhdf5-dev sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil
portpicker grpcio six mock requests gast h5py astor termcolor
TensorFlow ?? (1)
| 11 |

Tensorflow-gpu ?? ? ??
? Installing TensorFlow
? pip3 install --extra-index-url
https://developer.download.nvidia.com/compute/redist/jp/v411
tensorflow-gpu
? ??? ??? ????? ?? ???.
- https://developer.download.nvidia.com/compute/redist/jp/v411/tensorflo
w-gpu/
- ?? ?? ?? stable ??? ??: 1.12
TensorFlow ?? (2)
| 12 |

NVPMODEL
? Default mode? Mode ID=2? 15W? ??
? ?? ???
- sudo nvpmodel -q (for current mode) ??? ?? ??? --verbose option
??
- sudo nvpmodel -m 0 (for changing mode, persists after reboot)
- sudo ~/tegrastats (for monitoring clocks & core utilization)
?? ?? ??
| 13 |

??: /etc/nvpmodel.conf
?? ?? ??
| 14 |

jetson_clocks.sh
-- show
?? ??
| 15 |

MNIST TF CNN ?? ?? ??? ??? ?? ??
? https://github.com/leejaymin/TensorFlowLecture/tree/master/5.CNN
?? ?? ???
| 16 |

?
?
?? ??1
YOLOv3 ?? ? ???3
| 17 |
NVDLA4
??? ???? ? ?? ??5

Most accurate: Faster-R-CNN with inception ResNet with 300
proposals (1 frame)
? An ensemble model would be better
Fastest: SSD with MobileNet ,YOLOv3
? ??? Single shot multibox detection (SSD) ??
Object detection: speed and accuracy comparison
YOLOv3 ?? ? ???
| 18 |

?? ? ??? ?? (??? ??)
Makefile ??
? OpenCV, GPU, cuDNN ?? 1? ??
? -gencode arch=compute_72,code=[sm_72,compute_72]
???
Yolov3 ?? ? ???
YOLOv3 ?? ? ???
| 19 |

??? ???: dog.jpg
? Xavier: 0.164729 sec
? Geforce-1080: 0.051647 sec
?? ???
YOLOv3 ?? ? ???
| 20 |

./darknet detector demo cfg/coco.data cfg/yolov3.cfg
yolov3.weights traffic.mp4
Live demo
??? ??
YOLOv3 ?? ? ???
| 21 |

Deep Learning Inference Engine (TensorRT)
? High-performance deep learning inference runtime for production
deployment
Deep Learning Primitives (cuDNN)
? High-performance building blocks for deep neural network
applications including convolutions, activation functions, and tensor
transformations
TensorRT? ??? ???
YOLOv3 ?? ? ???
| 22 |

Compile and optimize neural networks support for every
framework optimize for each target platform
? Fuse network layers
? Eliminate concatenation layers
? Kernel specialization
? Auto-tuning for target platform
? Select optimal tensor layout
? Batch size tuning
? Mixed-precision INT8/FP16 support
tensorRTv5
? Volta GPU INT8 Tensor Cores (HMMA/IMMA)
? Early-Access DLA FP 16 support
? Fine-grained control of DLA layers and GPU Fallback
TensorRT
YOLOv3 ?? ? ???
| 23 |
[1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

Using TensorRT in TensorFlow (TF-TRT)
? tensorflow/tensorflow/contrib/tensorrt/
? https://github.com/tensorflow/tensorflow/tree/master/tensorflow/co
ntrib/tensorrt
TensorRT? ?? ??
YOLOv3 ?? ? ???
| 24 |

????
? https://github.com/vat-nvidia/deepstream-plugins#note
? sources/apps/trt-yolo
??
? yolo model weights? ?? ??
data/yolo??? ??
Makefile.config
? CUDA_VER:=10.0
? PLATFORM:=Tegra
? cd source/apps/trt-yolo
make && sudo make install
? config/yolov3.txt
? trt-yolo-app --flagfile=./config/yolov3.txt
Trt-YOLOv3 ??
YOLOv3 ?? ? ???
| 25 |

--network_type=yolov3
--config_file_path=data/yolo/yolov3.cfg
--wts_file_path=data/yolo/yolov3.weights
--labels_file_path=data/yolo/labels.txt
--precision= [kINT8, kHALF, kFLOAT]
--calibration_table_path=data/yolo/calibration/yolov3-calibration.table
--engine_file_path=
--print_prediction_info=true
--print_perf_info=true
--batch_size=4
--view_detections=true
--save_detections=true
--save_detections_path=data/yolo/detections/
--decode=false
--seed
./config/yolov3.txt ??
YOLOv3 ?? ? ???
| 26 |

FP32: full precision
YOLOv3 ?? ? ???
| 27 |

FP16: half precision
YOLOv3 ?? ? ???
| 28 |

?
?
?? ??1
YOLOv3 ?? ? ???3
| 29 |
NVDLA4
??? ???? ? ?? ??5

NVIDIA?? ??? ???? ?? ??? ??? ?? ????? ???? ??? ??? ? ??? ?.
???? ????
? Convolution Core: ?? ?? ??? ??
? Single Data Point Processor: activation function? ?? ??, linear? non-linear??? ??
? Planar Data Processor: pooling? ?? ?
? Cross-Channel Data Processor: local normalization? ?? ?
? Data Reshape Engines: ?? ??? ???? ?? memory to memory transformation acceleration. ?,
splitting, slicing, merging, contraction, reshape transpose.
? Bridge DMA: ??? DRAM????? ??? ??? ???
FPGA? Amazon EC2 F1 environment (verilog code ??)
??? ????
NVDLA
| 31 |
[1] http://nvdla.org/

Xavier? ????? open source NVDLA? ???
2x DLA engines: 5 TOPS INT8, 2.5 TFLOPS FP16 per DLA
Optimized for energy efficiency (500-1500mW)
TensorRTv5 ? ???? Xavier NVDLA? ?? ??
? DLA: supported layers
- Activiation, Concatenation, Convolution, Deconvolution, ElementWise,
FullyConnected, LRN, Poolling, and Scale
? ??? ??: Alexnet, GoogleNet, ResNet-50, LeNet for MNIST
NVIDIA Deep Learning Accelerator (DLA)
NVDLA
| 32 |
[1] http://nvdla.org/primer.html

?? ??
? Max batch size 32
? Input and output tensor data format FP16
??? ??
? Convolution and Deconvolution Layers
- Width and height of kernel size must be in the range [1, 32]
- Width and height of padding must be in the range [0, 31]
- Width and height of stride must be in the range [1,8] for Convolution Layer and [1,32] for Deconvolution layer
- Number of output maps must be in the range [1, 8192]
- Axis must be 1
- Grouped and dilated convolution supported. Dilation values must be in the range [1,32]
? Pooling Layer
- Operations supported: kMIN, kMAX, kAVERAGE
- Width and height of the window size must be in the range [1, 8]
- Width and height of padding must be in the range [0, 7]
- Width and height of stride must be in the range [1, 16]
? Activation Layer
- Functions supported: ReLU, Sigmoid, Hyperbolic Tangent
? Negative slope not supported for ReLU
? ElementWise Layer
- Operations supported: Sum, Product, Max, and Min
? Scale Layer
- Mode supported: Uniform, Per-Channel, and Elementwise
? LRN (Local Response Normalization) Layer
- Window size is configurable to 3, 5, 7, or 9
- Normalization region supported is: ACROSS_CHANNELS
? Concatenation Layer
- DLA supports concatenation only along the channel axis
DLA Supported Layers
NVDLA
| 33 |
[1] http://nvdla.org/primer.html

TensorRT? ???? ?? ??
Trtexec tool: command line wrapper for TensorRT
?? ???? ?? ?? ???? ?????? ?? ??? ?? ???
?? serialized engine? ??
???? ????? tensorRT release notes? ?? ??
? useDLA -> useDLACore
? 1 to N? ??? 0 to N-1
? trtexec? ?? ONNX model? DLA??? ???? ??
DLA ??
NVDLA
| 34 |
[1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_topic

AlexNet: ImageNet challenge 2012 winner
? GPU-INT8
- Average over 100 runs is 4.86918 ms (host walltime is 4.88175 ms, 99%
percentile time is 4.96976).
? GPU-FP16
? GPU DLA=0, GPU fallback, FP16
? GPU DLA=1, GPU fallback, FP16
AlexNet ??
NVDLA
| 35 |

ResNet-50: https://github.com/KaimingHe/deep-residual-networks
? ImageNet challenge 2015 winner
? GPU-INT8
? GPU-FP16
? DLA0 and GPU fallback, FP16
? DLA1 and GPU fallback, FP16
ResNet
NVDLA
| 36 |

?
?
?? ??1
YOLOv3 ?? ? ???3
| 37 |
NVDLA4
??? ???? ? ?? ??5

NVIDIA?? ???? 2? ??? ?? ????
??? ?? ?? ??
$ git clone https://github.com/dusty-nv/jetson-inference
Two day a demo
??? ???? ? ?? ??
| 38 |

NeurIPS Expo 2018 - Session 3: Inference and Quantization[Link]
? Mixed Precision Networks
NVIDIA AT NeurIPS 2018, 2-8 Dec. 2018
??? ???? ? ?? ??
| 39 |
[1] https://www.nvidia.com/en-us/events/neurips/
[2] https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php

Auto-Tuner? ???
? ?? ??? ??? ??? Quantization ??? ??? [1]
? ???? ??? ??? [TensorRTv5. NIPS2018 workshop]
? ??? ??? precision ??? Calibration ???? ???? ??? ??
[TensorRT]
? Mixed Precision? ? ?? ??? ???: NIPS 2018.12 workshop
?? ??
? NVIDIA TensorRT
? Google Firebase? Cloud AutoML? ?? ?? (Alpha Test ?? ?)
- Custom On-Device ML Models with Learn2Compress
? ?? ??? ??
DLA (NPU) ??
? ?? ??? ?? operation? ???? ???? ?? ??
? Precision ?? ??? ???? ??? GPU ?? ?? ??
? TensorRT? ??? ????? ?? ???? ?
- NPU ??? ??? ??? SDK ??
- GPU? ?????? ??? ? ??? ????? ??? ?? ?? ?? ? ????
???? ??? ??? ???? ?? ?? ?? ???
??? NPU?? ?? ??? ? ??? ??
?? ??
??? ???? ? ?? ??
| 40 |
[1] Value-aware Quantization for Training and Inference of Neural Networks, ECCV 2018

�ݺ�ߣ

Jetson agx xavier and nvdla introduction and usage

Recommended

More Related Content

What's hot (20)

Similar to Jetson agx xavier and nvdla introduction and usage (20)

More from jemin lee (6)

Jetson agx xavier and nvdla introduction and usage