ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Jetson AGX Xavier ?? ? ???
??
???
https://leejaymin.github.io/index.html
?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 2 |
NVDLA4
??? ???? ? ?? ??5
? AI Server Performance in 30W, 15W, and 10W
? 512 Volta CUDA Cores and 2x NVDLA
? 8 core CPU
? 32 DL TOPS
??? ?
?? ??
| 3 |
[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
Model Number: Tegra194
Name: Xavier
? 8x Volta SM 1377MHz
? 512 CUDA cores, 64 Tensor Cores
? 22 TOPS INT8, 11 TFLOPS FP16
GPU
?? ??
| 4 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
??? ??
?? ??
| 5 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 6 |
NVDLA4
??? ???? ? ?? ??5
??: JetPack 4.1.1 Developer Preview (18.11.08)
??
? OS Image
- L4T 31.1: ??? 18.04 (Stability and Security fixes)
? Libraries
- TensorRT 5.0.3.2-1 (the latest version: 5.0.4)
- cuDNN 7.3.1
- CUDA 10
- OpenCV, Multimedia API, VisionWorks
? Developer Tools
- CUDA tools
- NVIDIA Nsight systems 2018.1
? Profiling on Jetson AGX Xavier
? Ability to trace cuDNN, cuBLAS, and OS runtime library API calls
- NVIDIA Nsight Graphics 2018.6
? Debugging and profiling
? Resource monitoring
Jetpack ??
??: Jetpack, TensorFlow
| 7 |
(1) Download JetPack installer to your Linux host computer.
(2) Connect your developer kit to the Linux host computer.
(3) Put your developer kit into Force Recovery Mode.
(4) Run JetPack installer to select and install desired components.
Jetpack ?? ???
??: Jetpack, TensorFlow
| 8 |
Jetpack? ??? ?? Samples
??: Jetpack, TensorFlow
| 9 |
[1] https://elinux.org/Jetson_AGX_Xavier
VisionWorks: Feature Tracker Demo
??: Jetpack, TensorFlow
| 10 |
TensorFlow ?? ?? ???: Link, Blog
Jetpack ??? ??
? tf_gpu-1.12.1+nv19.1-py3
? tf_gpu-1.12.1+nv19.1-py2
? tf_gpu-1.12.1+nv18.12-py3
? tf_gpu-1.12.1+nv18.12-py2
? tf_gpu-1.12.0rc2+nv18.11-py3
? tf_gpu-1.12.0rc2+nv18.11-py2
?? ??
? Install JetPack 4.1.1 Developer Preview
? Install HDF5
- apt-get install libhdf5-serial-dev hdf5-tools
? Install pip3 (python3.6)
? Install the following packages:
? pip3 install --upgrade pip sudo apt-get install zlib1g-dev zip libjpeg8-dev
libhdf5-dev sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil
portpicker grpcio six mock requests gast h5py astor termcolor
TensorFlow ?? (1)
??: Jetpack, TensorFlow
| 11 |
Tensorflow-gpu ?? ? ??
? Installing TensorFlow
? pip3 install --extra-index-url
https://developer.download.nvidia.com/compute/redist/jp/v411
tensorflow-gpu
? ??? ??? ????? ?? ???.
- https://developer.download.nvidia.com/compute/redist/jp/v411/tensorflo
w-gpu/
- ?? ?? ?? stable ??? ??: 1.12
TensorFlow ?? (2)
??: Jetpack, TensorFlow
| 12 |
NVPMODEL
? Default mode? Mode ID=2? 15W? ??
? ?? ???
- sudo nvpmodel -q (for current mode) ??? ?? ??? --verbose option
??
- sudo nvpmodel -m 0 (for changing mode, persists after reboot)
- sudo ~/tegrastats (for monitoring clocks & core utilization)
?? ?? ??
??: Jetpack, TensorFlow
| 13 |
[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
??: /etc/nvpmodel.conf
?? ?? ??
??: Jetpack, TensorFlow
| 14 |
[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
jetson_clocks.sh
-- show
?? ??
??: Jetpack, TensorFlow
| 15 |
MNIST TF CNN ?? ?? ??? ??? ?? ??
? https://github.com/leejaymin/TensorFlowLecture/tree/master/5.CNN
?? ?? ???
??: Jetpack, TensorFlow
| 16 |
?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 17 |
NVDLA4
??? ???? ? ?? ??5
Most accurate: Faster-R-CNN with inception ResNet with 300
proposals (1 frame)
? An ensemble model would be better
Fastest: SSD with MobileNet ,YOLOv3
? ??? Single shot multibox detection (SSD) ??
Object detection: speed and accuracy comparison
YOLOv3 ?? ? ???
| 18 |
?? ? ??? ?? (??? ??)
Makefile ??
? OpenCV, GPU, cuDNN ?? 1? ??
? -gencode arch=compute_72,code=[sm_72,compute_72]
???
Yolov3 ?? ? ???
YOLOv3 ?? ? ???
| 19 |
??? ???: dog.jpg
? Xavier: 0.164729 sec
? Geforce-1080: 0.051647 sec
?? ???
YOLOv3 ?? ? ???
| 20 |
./darknet detector demo cfg/coco.data cfg/yolov3.cfg
yolov3.weights traffic.mp4
Live demo
??? ??
YOLOv3 ?? ? ???
| 21 |
Deep Learning Inference Engine (TensorRT)
? High-performance deep learning inference runtime for production
deployment
Deep Learning Primitives (cuDNN)
? High-performance building blocks for deep neural network
applications including convolutions, activation functions, and tensor
transformations
TensorRT? ??? ???
YOLOv3 ?? ? ???
| 22 |
Compile and optimize neural networks support for every
framework optimize for each target platform
? Fuse network layers
? Eliminate concatenation layers
? Kernel specialization
? Auto-tuning for target platform
? Select optimal tensor layout
? Batch size tuning
? Mixed-precision INT8/FP16 support
tensorRTv5
? Volta GPU INT8 Tensor Cores (HMMA/IMMA)
? Early-Access DLA FP 16 support
? Fine-grained control of DLA layers and GPU Fallback
TensorRT
YOLOv3 ?? ? ???
| 23 |
[1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html
Using TensorRT in TensorFlow (TF-TRT)
? tensorflow/tensorflow/contrib/tensorrt/
? https://github.com/tensorflow/tensorflow/tree/master/tensorflow/co
ntrib/tensorrt
TensorRT? ?? ??
YOLOv3 ?? ? ???
| 24 |
????
? https://github.com/vat-nvidia/deepstream-plugins#note
? sources/apps/trt-yolo
??
? yolo model weights? ?? ??
data/yolo??? ??
Makefile.config
? CUDA_VER:=10.0
? PLATFORM:=Tegra
? cd source/apps/trt-yolo
make && sudo make install
? config/yolov3.txt
? trt-yolo-app --flagfile=./config/yolov3.txt
Trt-YOLOv3 ??
YOLOv3 ?? ? ???
| 25 |
--network_type=yolov3
--config_file_path=data/yolo/yolov3.cfg
--wts_file_path=data/yolo/yolov3.weights
--labels_file_path=data/yolo/labels.txt
--precision= [kINT8, kHALF, kFLOAT]
--calibration_table_path=data/yolo/calibration/yolov3-calibration.table
--engine_file_path=
--print_prediction_info=true
--print_perf_info=true
--batch_size=4
--view_detections=true
--save_detections=true
--save_detections_path=data/yolo/detections/
--decode=false
--seed
./config/yolov3.txt ??
YOLOv3 ?? ? ???
| 26 |
FP32: full precision
YOLOv3 ?? ? ???
| 27 |
FP16: half precision
YOLOv3 ?? ? ???
| 28 |
?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 29 |
NVDLA4
??? ???? ? ?? ??5
????? ???? ??
NVDLA
| 30 |
NVIDIA?? ??? ???? ?? ??? ??? ?? ????? ???? ??? ??? ? ??? ?.
???? ????
? Convolution Core: ?? ?? ??? ??
? Single Data Point Processor: activation function? ?? ??, linear? non-linear??? ??
? Planar Data Processor: pooling? ?? ?
? Cross-Channel Data Processor: local normalization? ?? ?
? Data Reshape Engines: ?? ??? ???? ?? memory to memory transformation acceleration. ?,
splitting, slicing, merging, contraction, reshape transpose.
? Bridge DMA: ??? DRAM????? ??? ??? ???
FPGA? Amazon EC2 F1 environment (verilog code ??)
??? ????
NVDLA
| 31 |
[1] http://nvdla.org/
Xavier? ????? open source NVDLA? ???
2x DLA engines: 5 TOPS INT8, 2.5 TFLOPS FP16 per DLA
Optimized for energy efficiency (500-1500mW)
TensorRTv5 ? ???? Xavier NVDLA? ?? ??
? DLA: supported layers
- Activiation, Concatenation, Convolution, Deconvolution, ElementWise,
FullyConnected, LRN, Poolling, and Scale
? ??? ??: Alexnet, GoogleNet, ResNet-50, LeNet for MNIST
NVIDIA Deep Learning Accelerator (DLA)
NVDLA
| 32 |
[1] http://nvdla.org/primer.html
?? ??
? Max batch size 32
? Input and output tensor data format FP16
??? ??
? Convolution and Deconvolution Layers
- Width and height of kernel size must be in the range [1, 32]
- Width and height of padding must be in the range [0, 31]
- Width and height of stride must be in the range [1,8] for Convolution Layer and [1,32] for Deconvolution layer
- Number of output maps must be in the range [1, 8192]
- Axis must be 1
- Grouped and dilated convolution supported. Dilation values must be in the range [1,32]
? Pooling Layer
- Operations supported: kMIN, kMAX, kAVERAGE
- Width and height of the window size must be in the range [1, 8]
- Width and height of padding must be in the range [0, 7]
- Width and height of stride must be in the range [1, 16]
? Activation Layer
- Functions supported: ReLU, Sigmoid, Hyperbolic Tangent
? Negative slope not supported for ReLU
? ElementWise Layer
- Operations supported: Sum, Product, Max, and Min
? Scale Layer
- Mode supported: Uniform, Per-Channel, and Elementwise
? LRN (Local Response Normalization) Layer
- Window size is configurable to 3, 5, 7, or 9
- Normalization region supported is: ACROSS_CHANNELS
? Concatenation Layer
- DLA supports concatenation only along the channel axis
DLA Supported Layers
NVDLA
| 33 |
[1] http://nvdla.org/primer.html
TensorRT? ???? ?? ??
Trtexec tool: command line wrapper for TensorRT
?? ???? ?? ?? ???? ?????? ?? ??? ?? ???
?? serialized engine? ??
???? ????? tensorRT release notes? ?? ??
? useDLA -> useDLACore
? 1 to N? ??? 0 to N-1
? trtexec? ?? ONNX model? DLA??? ???? ??
DLA ??
NVDLA
| 34 |
[1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_topic
AlexNet: ImageNet challenge 2012 winner
? GPU-INT8
- Average over 100 runs is 4.86918 ms (host walltime is 4.88175 ms, 99%
percentile time is 4.96976).
? GPU-FP16
- Average over 100 runs is 5.09872 ms (host walltime is 5.11733 ms, 99%
percentile time is 6.23514).
? GPU DLA=0, GPU fallback, FP16
- Average over 100 runs is 43.8821 ms (host walltime is 44.1185 ms, 99%
percentile time is 46.3073).
? GPU DLA=1, GPU fallback, FP16
- Average over 100 runs is 43.381 ms (host walltime is 43.5552 ms, 99%
percentile time is 43.9859).
AlexNet ??
NVDLA
| 35 |
ResNet-50: https://github.com/KaimingHe/deep-residual-networks
? ImageNet challenge 2015 winner
? GPU-INT8
- Average over 100 runs is 7.36345 ms (host walltime is 7.38333 ms, 99%
percentile time is 8.55971).
? GPU-FP16
- Average over 100 runs is 12.3128 ms (host walltime is 12.3288 ms, 99%
percentile time is 14.1207).
? DLA0 and GPU fallback, FP16
- Average over 100 runs is 48.9775 ms (host walltime is 49.0705 ms, 99%
percentile time is 49.794).
? DLA1 and GPU fallback, FP16
- Average over 100 runs is 48.6207 ms (host walltime is 48.7205 ms, 99%
percentile time is 49.832).
ResNet
NVDLA
| 36 |
?
?
?? ??1
??: Jetpack, TensorFlow2
YOLOv3 ?? ? ???3
| 37 |
NVDLA4
??? ???? ? ?? ??5
NVIDIA?? ???? 2? ??? ?? ????
??? ?? ?? ??
$ git clone https://github.com/dusty-nv/jetson-inference
Two day a demo
??? ???? ? ?? ??
| 38 |
NeurIPS Expo 2018 - Session 3: Inference and Quantization[Link]
? Mixed Precision Networks
NVIDIA AT NeurIPS 2018, 2-8 Dec. 2018
??? ???? ? ?? ??
| 39 |
[1] https://www.nvidia.com/en-us/events/neurips/
[2] https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
Auto-Tuner? ???
? ?? ??? ??? ??? Quantization ??? ??? [1]
? ???? ??? ??? [TensorRTv5. NIPS2018 workshop]
? ??? ??? precision ??? Calibration ???? ???? ??? ??
[TensorRT]
? Mixed Precision? ? ?? ??? ???: NIPS 2018.12 workshop
?? ??
? NVIDIA TensorRT
? Google Firebase? Cloud AutoML? ?? ?? (Alpha Test ?? ?)
- Custom On-Device ML Models with Learn2Compress
? ?? ??? ??
DLA (NPU) ??
? ?? ??? ?? operation? ???? ???? ?? ??
? Precision ?? ??? ???? ??? GPU ?? ?? ??
? TensorRT? ??? ????? ?? ???? ?
- NPU ??? ??? ??? SDK ??
- GPU? ?????? ??? ? ??? ????? ??? ?? ?? ?? ? ????
???? ??? ??? ???? ?? ?? ?? ???
??? NPU?? ?? ??? ? ??? ??
?? ??
??? ???? ? ?? ??
| 40 |
[1] Value-aware Quantization for Training and Inference of Neural Networks, ECCV 2018

More Related Content

What's hot (20)

FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
?
Run Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using YoctoRun Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using Yocto
Marco Cavallini
?
Tail f - Why ConfD
Tail f - Why ConfDTail f - Why ConfD
Tail f - Why ConfD
Tail-f Systems
?
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage
jemin lee
?
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
Koan-Sin Tan
?
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
National Cheng Kung University
?
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃų§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
Ö±¾Ã ס´¨
?
FreeRTOS
FreeRTOSFreeRTOS
FreeRTOS
Ankita Tiwari
?
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
?
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
?
Userspace networking
Userspace networkingUserspace networking
Userspace networking
Stephen Hemminger
?
Module 4 Embedded Linux
Module 4 Embedded LinuxModule 4 Embedded Linux
Module 4 Embedded Linux
Tushar B Kute
?
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
Ayush Singh, MS
?
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
DarkStarSword
?
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
?
Linux Internals - Part III
Linux Internals - Part IIILinux Internals - Part III
Linux Internals - Part III
Emertxe Information Technologies Pvt Ltd
?
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
?
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
Houcheng Lin
?
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
Bangladesh Network Operators Group
?
Linux systems - Getting started with setting up and embedded platform
Linux systems - Getting started with setting up and embedded platformLinux systems - Getting started with setting up and embedded platform
Linux systems - Getting started with setting up and embedded platform
Emertxe Information Technologies Pvt Ltd
?
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
?
Run Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using YoctoRun Qt on Linux embedded systems using Yocto
Run Qt on Linux embedded systems using Yocto
Marco Cavallini
?
Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage Versatile tensor accelerator (vta) introduction and usage
Versatile tensor accelerator (vta) introduction and usage
jemin lee
?
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
Koan-Sin Tan
?
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃų§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
³§·É¼¼ÊõÕߤËËͤë´Ú±è²µ²¹ÈëÃÅ
Ö±¾Ã ס´¨
?
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
?
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
?
Module 4 Embedded Linux
Module 4 Embedded LinuxModule 4 Embedded Linux
Module 4 Embedded Linux
Tushar B Kute
?
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
?
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym
?
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
Houcheng Lin
?

Similar to Jetson agx xavier and nvdla introduction and usage (20)

Pivot3 overview
Pivot3 overviewPivot3 overview
Pivot3 overview
CDIT-HCI
?
??MDS_?? RTOS NEOS ?? ? ?? ??
??MDS_?? RTOS NEOS ?? ? ?? ????MDS_?? RTOS NEOS ?? ? ?? ??
??MDS_?? RTOS NEOS ?? ? ?? ??
HANCOM MDS
?
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
hoondong kim
?
resource on openstack
 resource on openstack resource on openstack
resource on openstack
jieun kim
?
Dragon flow and tricircle
Dragon flow and tricircleDragon flow and tricircle
Dragon flow and tricircle
Yongyoon Shin
?
Image Deep Learning ????
Image Deep Learning ????Image Deep Learning ????
Image Deep Learning ????
Youngjae Kim
?
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
(Joe), Sanghun Kim
?
SDN - 2018 Zeropage Devil's Camp
SDN - 2018 Zeropage Devil's CampSDN - 2018 Zeropage Devil's Camp
SDN - 2018 Zeropage Devil's Camp
MookeunJi
?
3.[d2 ???????????]????????????? ????? ??? ????? n base arc
3.[d2 ???????????]????????????? ????? ??? ????? n base arc3.[d2 ???????????]????????????? ????? ??? ????? n base arc
3.[d2 ???????????]????????????? ????? ??? ????? n base arc
NAVER D2
?
[???????] Open stack kilo with DVR_CEPH_v1.1
[???????] Open stack kilo with DVR_CEPH_v1.1[???????] Open stack kilo with DVR_CEPH_v1.1
[???????] Open stack kilo with DVR_CEPH_v1.1
Ji-Woong Choi
?
2nd SDN Interest Group Seminar-Session3 (121218)
2nd SDN Interest Group Seminar-Session3 (121218)2nd SDN Interest Group Seminar-Session3 (121218)
2nd SDN Interest Group Seminar-Session3 (121218)
NAIM Networks, Inc.
?
cdit hci zerto '???? ???' ????(201705)
cdit hci zerto '???? ???' ????(201705)cdit hci zerto '???? ???' ????(201705)
cdit hci zerto '???? ???' ????(201705)
CDIT-HCI
?
Kafka slideshare
Kafka   slideshareKafka   slideshare
Kafka slideshare
wonyong hwang
?
Infra as Code with Packer, Ansible and Terraform
Infra as Code with Packer, Ansible and TerraformInfra as Code with Packer, Ansible and Terraform
Infra as Code with Packer, Ansible and Terraform
Inho Kang
?
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? DockerXECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XpressEngine
?
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
OpenStack Korea Community
?
NDC14 ??? ?????? ?? ?? ???? ??/?? ???
NDC14 ??? ?????? ?? ?? ???? ??/?? ???NDC14 ??? ?????? ?? ?? ???? ??/?? ???
NDC14 ??? ?????? ?? ?? ???? ??/?? ???
Jinuk Kim
?
3rd SDN Interest Group Seminar-Session 3 (130123)
3rd SDN Interest Group Seminar-Session 3 (130123)3rd SDN Interest Group Seminar-Session 3 (130123)
3rd SDN Interest Group Seminar-Session 3 (130123)
NAIM Networks, Inc.
?
2node cluster
2node cluster2node cluster
2node cluster
sprdd
?
Pivot3 overview
Pivot3 overviewPivot3 overview
Pivot3 overview
CDIT-HCI
?
??MDS_?? RTOS NEOS ?? ? ?? ??
??MDS_?? RTOS NEOS ?? ? ?? ????MDS_?? RTOS NEOS ?? ? ?? ??
??MDS_?? RTOS NEOS ?? ? ?? ??
HANCOM MDS
?
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
Auto Scalable ? Deep Learning Production ? ?? AI Serving Infra ?? ? AI DevOps...
hoondong kim
?
resource on openstack
 resource on openstack resource on openstack
resource on openstack
jieun kim
?
Dragon flow and tricircle
Dragon flow and tricircleDragon flow and tricircle
Dragon flow and tricircle
Yongyoon Shin
?
Image Deep Learning ????
Image Deep Learning ????Image Deep Learning ????
Image Deep Learning ????
Youngjae Kim
?
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
[IBM ???] AI ??? ??? IBM AC922 ??? ?? ??
(Joe), Sanghun Kim
?
SDN - 2018 Zeropage Devil's Camp
SDN - 2018 Zeropage Devil's CampSDN - 2018 Zeropage Devil's Camp
SDN - 2018 Zeropage Devil's Camp
MookeunJi
?
3.[d2 ???????????]????????????? ????? ??? ????? n base arc
3.[d2 ???????????]????????????? ????? ??? ????? n base arc3.[d2 ???????????]????????????? ????? ??? ????? n base arc
3.[d2 ???????????]????????????? ????? ??? ????? n base arc
NAVER D2
?
[???????] Open stack kilo with DVR_CEPH_v1.1
[???????] Open stack kilo with DVR_CEPH_v1.1[???????] Open stack kilo with DVR_CEPH_v1.1
[???????] Open stack kilo with DVR_CEPH_v1.1
Ji-Woong Choi
?
2nd SDN Interest Group Seminar-Session3 (121218)
2nd SDN Interest Group Seminar-Session3 (121218)2nd SDN Interest Group Seminar-Session3 (121218)
2nd SDN Interest Group Seminar-Session3 (121218)
NAIM Networks, Inc.
?
cdit hci zerto '???? ???' ????(201705)
cdit hci zerto '???? ???' ????(201705)cdit hci zerto '???? ???' ????(201705)
cdit hci zerto '???? ???' ????(201705)
CDIT-HCI
?
Infra as Code with Packer, Ansible and Terraform
Infra as Code with Packer, Ansible and TerraformInfra as Code with Packer, Ansible and Terraform
Infra as Code with Packer, Ansible and Terraform
Inho Kang
?
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? DockerXECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XECon2015 :: [1-5] ??? - ?? ???? ? ??? ? Docker
XpressEngine
?
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
[OpenStack Days Korea 2016] Track3 - ????? UHD ???? ?? ? ???
OpenStack Korea Community
?
NDC14 ??? ?????? ?? ?? ???? ??/?? ???
NDC14 ??? ?????? ?? ?? ???? ??/?? ???NDC14 ??? ?????? ?? ?? ???? ??/?? ???
NDC14 ??? ?????? ?? ?? ???? ??/?? ???
Jinuk Kim
?
3rd SDN Interest Group Seminar-Session 3 (130123)
3rd SDN Interest Group Seminar-Session 3 (130123)3rd SDN Interest Group Seminar-Session 3 (130123)
3rd SDN Interest Group Seminar-Session 3 (130123)
NAIM Networks, Inc.
?
2node cluster
2node cluster2node cluster
2node cluster
sprdd
?

More from jemin lee (6)

MobileViTv1
MobileViTv1MobileViTv1
MobileViTv1
jemin lee
?
HAWQ-V3: Dyadic Neural Network Quantization
HAWQ-V3: Dyadic Neural Network QuantizationHAWQ-V3: Dyadic Neural Network Quantization
HAWQ-V3: Dyadic Neural Network Quantization
jemin lee
?
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
jemin lee
?
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
jemin lee
?
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
jemin lee
?
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
jemin lee
?
HAWQ-V3: Dyadic Neural Network Quantization
HAWQ-V3: Dyadic Neural Network QuantizationHAWQ-V3: Dyadic Neural Network Quantization
HAWQ-V3: Dyadic Neural Network Quantization
jemin lee
?
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
jemin lee
?
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
jemin lee
?
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
jemin lee
?
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
PACT19, MOSAIC : Heterogeneity-, Communication-, and Constraint-Aware Model ...
jemin lee
?

Jetson agx xavier and nvdla introduction and usage

  • 1. Jetson AGX Xavier ?? ? ??? ?? ??? https://leejaymin.github.io/index.html
  • 2. ? ? ?? ??1 ??: Jetpack, TensorFlow2 YOLOv3 ?? ? ???3 | 2 | NVDLA4 ??? ???? ? ?? ??5
  • 3. ? AI Server Performance in 30W, 15W, and 10W ? 512 Volta CUDA Cores and 2x NVDLA ? 8 core CPU ? 32 DL TOPS ??? ? ?? ?? | 3 | [1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
  • 4. Model Number: Tegra194 Name: Xavier ? 8x Volta SM 1377MHz ? 512 CUDA cores, 64 Tensor Cores ? 22 TOPS INT8, 11 TFLOPS FP16 GPU ?? ?? | 4 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
  • 5. ??? ?? ?? ?? | 5 |[1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
  • 6. ? ? ?? ??1 ??: Jetpack, TensorFlow2 YOLOv3 ?? ? ???3 | 6 | NVDLA4 ??? ???? ? ?? ??5
  • 7. ??: JetPack 4.1.1 Developer Preview (18.11.08) ?? ? OS Image - L4T 31.1: ??? 18.04 (Stability and Security fixes) ? Libraries - TensorRT 5.0.3.2-1 (the latest version: 5.0.4) - cuDNN 7.3.1 - CUDA 10 - OpenCV, Multimedia API, VisionWorks ? Developer Tools - CUDA tools - NVIDIA Nsight systems 2018.1 ? Profiling on Jetson AGX Xavier ? Ability to trace cuDNN, cuBLAS, and OS runtime library API calls - NVIDIA Nsight Graphics 2018.6 ? Debugging and profiling ? Resource monitoring Jetpack ?? ??: Jetpack, TensorFlow | 7 |
  • 8. (1) Download JetPack installer to your Linux host computer. (2) Connect your developer kit to the Linux host computer. (3) Put your developer kit into Force Recovery Mode. (4) Run JetPack installer to select and install desired components. Jetpack ?? ??? ??: Jetpack, TensorFlow | 8 |
  • 9. Jetpack? ??? ?? Samples ??: Jetpack, TensorFlow | 9 | [1] https://elinux.org/Jetson_AGX_Xavier
  • 10. VisionWorks: Feature Tracker Demo ??: Jetpack, TensorFlow | 10 |
  • 11. TensorFlow ?? ?? ???: Link, Blog Jetpack ??? ?? ? tf_gpu-1.12.1+nv19.1-py3 ? tf_gpu-1.12.1+nv19.1-py2 ? tf_gpu-1.12.1+nv18.12-py3 ? tf_gpu-1.12.1+nv18.12-py2 ? tf_gpu-1.12.0rc2+nv18.11-py3 ? tf_gpu-1.12.0rc2+nv18.11-py2 ?? ?? ? Install JetPack 4.1.1 Developer Preview ? Install HDF5 - apt-get install libhdf5-serial-dev hdf5-tools ? Install pip3 (python3.6) ? Install the following packages: ? pip3 install --upgrade pip sudo apt-get install zlib1g-dev zip libjpeg8-dev libhdf5-dev sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker grpcio six mock requests gast h5py astor termcolor TensorFlow ?? (1) ??: Jetpack, TensorFlow | 11 |
  • 12. Tensorflow-gpu ?? ? ?? ? Installing TensorFlow ? pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v411 tensorflow-gpu ? ??? ??? ????? ?? ???. - https://developer.download.nvidia.com/compute/redist/jp/v411/tensorflo w-gpu/ - ?? ?? ?? stable ??? ??: 1.12 TensorFlow ?? (2) ??: Jetpack, TensorFlow | 12 |
  • 13. NVPMODEL ? Default mode? Mode ID=2? 15W? ?? ? ?? ??? - sudo nvpmodel -q (for current mode) ??? ?? ??? --verbose option ?? - sudo nvpmodel -m 0 (for changing mode, persists after reboot) - sudo ~/tegrastats (for monitoring clocks & core utilization) ?? ?? ?? ??: Jetpack, TensorFlow | 13 | [1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
  • 14. ??: /etc/nvpmodel.conf ?? ?? ?? ??: Jetpack, TensorFlow | 14 | [1] http://info.nvidia.com/rs/156-OFN-742/images/Jetson_AGX_Xavier_New_Era_Autonomous_Machines.pdf
  • 15. jetson_clocks.sh -- show ?? ?? ??: Jetpack, TensorFlow | 15 |
  • 16. MNIST TF CNN ?? ?? ??? ??? ?? ?? ? https://github.com/leejaymin/TensorFlowLecture/tree/master/5.CNN ?? ?? ??? ??: Jetpack, TensorFlow | 16 |
  • 17. ? ? ?? ??1 ??: Jetpack, TensorFlow2 YOLOv3 ?? ? ???3 | 17 | NVDLA4 ??? ???? ? ?? ??5
  • 18. Most accurate: Faster-R-CNN with inception ResNet with 300 proposals (1 frame) ? An ensemble model would be better Fastest: SSD with MobileNet ,YOLOv3 ? ??? Single shot multibox detection (SSD) ?? Object detection: speed and accuracy comparison YOLOv3 ?? ? ??? | 18 |
  • 19. ?? ? ??? ?? (??? ??) Makefile ?? ? OpenCV, GPU, cuDNN ?? 1? ?? ? -gencode arch=compute_72,code=[sm_72,compute_72] ??? Yolov3 ?? ? ??? YOLOv3 ?? ? ??? | 19 |
  • 20. ??? ???: dog.jpg ? Xavier: 0.164729 sec ? Geforce-1080: 0.051647 sec ?? ??? YOLOv3 ?? ? ??? | 20 |
  • 21. ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights traffic.mp4 Live demo ??? ?? YOLOv3 ?? ? ??? | 21 |
  • 22. Deep Learning Inference Engine (TensorRT) ? High-performance deep learning inference runtime for production deployment Deep Learning Primitives (cuDNN) ? High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations TensorRT? ??? ??? YOLOv3 ?? ? ??? | 22 |
  • 23. Compile and optimize neural networks support for every framework optimize for each target platform ? Fuse network layers ? Eliminate concatenation layers ? Kernel specialization ? Auto-tuning for target platform ? Select optimal tensor layout ? Batch size tuning ? Mixed-precision INT8/FP16 support tensorRTv5 ? Volta GPU INT8 Tensor Cores (HMMA/IMMA) ? Early-Access DLA FP 16 support ? Fine-grained control of DLA layers and GPU Fallback TensorRT YOLOv3 ?? ? ??? | 23 | [1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html
  • 24. Using TensorRT in TensorFlow (TF-TRT) ? tensorflow/tensorflow/contrib/tensorrt/ ? https://github.com/tensorflow/tensorflow/tree/master/tensorflow/co ntrib/tensorrt TensorRT? ?? ?? YOLOv3 ?? ? ??? | 24 |
  • 25. ???? ? https://github.com/vat-nvidia/deepstream-plugins#note ? sources/apps/trt-yolo ?? ? yolo model weights? ?? ?? data/yolo??? ?? Makefile.config ? CUDA_VER:=10.0 ? PLATFORM:=Tegra ? cd source/apps/trt-yolo make && sudo make install ? config/yolov3.txt ? trt-yolo-app --flagfile=./config/yolov3.txt Trt-YOLOv3 ?? YOLOv3 ?? ? ??? | 25 |
  • 26. --network_type=yolov3 --config_file_path=data/yolo/yolov3.cfg --wts_file_path=data/yolo/yolov3.weights --labels_file_path=data/yolo/labels.txt --precision= [kINT8, kHALF, kFLOAT] --calibration_table_path=data/yolo/calibration/yolov3-calibration.table --engine_file_path= --print_prediction_info=true --print_perf_info=true --batch_size=4 --view_detections=true --save_detections=true --save_detections_path=data/yolo/detections/ --decode=false --seed ./config/yolov3.txt ?? YOLOv3 ?? ? ??? | 26 |
  • 27. FP32: full precision YOLOv3 ?? ? ??? | 27 |
  • 28. FP16: half precision YOLOv3 ?? ? ??? | 28 |
  • 29. ? ? ?? ??1 ??: Jetpack, TensorFlow2 YOLOv3 ?? ? ???3 | 29 | NVDLA4 ??? ???? ? ?? ??5
  • 31. NVIDIA?? ??? ???? ?? ??? ??? ?? ????? ???? ??? ??? ? ??? ?. ???? ???? ? Convolution Core: ?? ?? ??? ?? ? Single Data Point Processor: activation function? ?? ??, linear? non-linear??? ?? ? Planar Data Processor: pooling? ?? ? ? Cross-Channel Data Processor: local normalization? ?? ? ? Data Reshape Engines: ?? ??? ???? ?? memory to memory transformation acceleration. ?, splitting, slicing, merging, contraction, reshape transpose. ? Bridge DMA: ??? DRAM????? ??? ??? ??? FPGA? Amazon EC2 F1 environment (verilog code ??) ??? ???? NVDLA | 31 | [1] http://nvdla.org/
  • 32. Xavier? ????? open source NVDLA? ??? 2x DLA engines: 5 TOPS INT8, 2.5 TFLOPS FP16 per DLA Optimized for energy efficiency (500-1500mW) TensorRTv5 ? ???? Xavier NVDLA? ?? ?? ? DLA: supported layers - Activiation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Poolling, and Scale ? ??? ??: Alexnet, GoogleNet, ResNet-50, LeNet for MNIST NVIDIA Deep Learning Accelerator (DLA) NVDLA | 32 | [1] http://nvdla.org/primer.html
  • 33. ?? ?? ? Max batch size 32 ? Input and output tensor data format FP16 ??? ?? ? Convolution and Deconvolution Layers - Width and height of kernel size must be in the range [1, 32] - Width and height of padding must be in the range [0, 31] - Width and height of stride must be in the range [1,8] for Convolution Layer and [1,32] for Deconvolution layer - Number of output maps must be in the range [1, 8192] - Axis must be 1 - Grouped and dilated convolution supported. Dilation values must be in the range [1,32] ? Pooling Layer - Operations supported: kMIN, kMAX, kAVERAGE - Width and height of the window size must be in the range [1, 8] - Width and height of padding must be in the range [0, 7] - Width and height of stride must be in the range [1, 16] ? Activation Layer - Functions supported: ReLU, Sigmoid, Hyperbolic Tangent ? Negative slope not supported for ReLU ? ElementWise Layer - Operations supported: Sum, Product, Max, and Min ? Scale Layer - Mode supported: Uniform, Per-Channel, and Elementwise ? LRN (Local Response Normalization) Layer - Window size is configurable to 3, 5, 7, or 9 - Normalization region supported is: ACROSS_CHANNELS ? Concatenation Layer - DLA supports concatenation only along the channel axis DLA Supported Layers NVDLA | 33 | [1] http://nvdla.org/primer.html
  • 34. TensorRT? ???? ?? ?? Trtexec tool: command line wrapper for TensorRT ?? ???? ?? ?? ???? ?????? ?? ??? ?? ??? ?? serialized engine? ?? ???? ????? tensorRT release notes? ?? ?? ? useDLA -> useDLACore ? 1 to N? ??? 0 to N-1 ? trtexec? ?? ONNX model? DLA??? ???? ?? DLA ?? NVDLA | 34 | [1] https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_topic
  • 35. AlexNet: ImageNet challenge 2012 winner ? GPU-INT8 - Average over 100 runs is 4.86918 ms (host walltime is 4.88175 ms, 99% percentile time is 4.96976). ? GPU-FP16 - Average over 100 runs is 5.09872 ms (host walltime is 5.11733 ms, 99% percentile time is 6.23514). ? GPU DLA=0, GPU fallback, FP16 - Average over 100 runs is 43.8821 ms (host walltime is 44.1185 ms, 99% percentile time is 46.3073). ? GPU DLA=1, GPU fallback, FP16 - Average over 100 runs is 43.381 ms (host walltime is 43.5552 ms, 99% percentile time is 43.9859). AlexNet ?? NVDLA | 35 |
  • 36. ResNet-50: https://github.com/KaimingHe/deep-residual-networks ? ImageNet challenge 2015 winner ? GPU-INT8 - Average over 100 runs is 7.36345 ms (host walltime is 7.38333 ms, 99% percentile time is 8.55971). ? GPU-FP16 - Average over 100 runs is 12.3128 ms (host walltime is 12.3288 ms, 99% percentile time is 14.1207). ? DLA0 and GPU fallback, FP16 - Average over 100 runs is 48.9775 ms (host walltime is 49.0705 ms, 99% percentile time is 49.794). ? DLA1 and GPU fallback, FP16 - Average over 100 runs is 48.6207 ms (host walltime is 48.7205 ms, 99% percentile time is 49.832). ResNet NVDLA | 36 |
  • 37. ? ? ?? ??1 ??: Jetpack, TensorFlow2 YOLOv3 ?? ? ???3 | 37 | NVDLA4 ??? ???? ? ?? ??5
  • 38. NVIDIA?? ???? 2? ??? ?? ???? ??? ?? ?? ?? $ git clone https://github.com/dusty-nv/jetson-inference Two day a demo ??? ???? ? ?? ?? | 38 |
  • 39. NeurIPS Expo 2018 - Session 3: Inference and Quantization[Link] ? Mixed Precision Networks NVIDIA AT NeurIPS 2018, 2-8 Dec. 2018 ??? ???? ? ?? ?? | 39 | [1] https://www.nvidia.com/en-us/events/neurips/ [2] https://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
  • 40. Auto-Tuner? ??? ? ?? ??? ??? ??? Quantization ??? ??? [1] ? ???? ??? ??? [TensorRTv5. NIPS2018 workshop] ? ??? ??? precision ??? Calibration ???? ???? ??? ?? [TensorRT] ? Mixed Precision? ? ?? ??? ???: NIPS 2018.12 workshop ?? ?? ? NVIDIA TensorRT ? Google Firebase? Cloud AutoML? ?? ?? (Alpha Test ?? ?) - Custom On-Device ML Models with Learn2Compress ? ?? ??? ?? DLA (NPU) ?? ? ?? ??? ?? operation? ???? ???? ?? ?? ? Precision ?? ??? ???? ??? GPU ?? ?? ?? ? TensorRT? ??? ????? ?? ???? ? - NPU ??? ??? ??? SDK ?? - GPU? ?????? ??? ? ??? ????? ??? ?? ?? ?? ? ???? ???? ??? ??? ???? ?? ?? ?? ??? ??? NPU?? ?? ??? ? ??? ?? ?? ?? ??? ???? ? ?? ?? | 40 | [1] Value-aware Quantization for Training and Inference of Neural Networks, ECCV 2018