ÎÄÏ×½B½é£ºYOLO series£ºv1-v5, X, F, and YOWOToru Tamaki
?
20220617_You_Only_Look_Once_Series.pdf
You Only Look Once: Unified, Real-Time Object Detection
https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Redmon_You_Only_Look_CVPR_2016_paper.html
YOLO9000: Better, Faster, Stronger
https://openaccess.thecvf.com/content_cvpr_2017/html/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.html
YOLOv3: An Incremental Improvement
https://arxiv.org/abs/1804.02767
YOLOv4: Optimal Speed and Accuracy of Object Detection
https://arxiv.org/abs/2004.10934
YOLOv5
https://github.com/ultralytics/yolov5
YOLOX: Exceeding YOLO Series in 2021
https://arxiv.org/abs/2107.08430
You Only Look One-Level Feature
https://openaccess.thecvf.com/content/CVPR2021/html/Chen_You_Only_Look_One-Level_Feature_CVPR_2021_paper.html
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
https://openaccess.thecvf.com/content/ICCV2021/html/Chen_Watch_Only_Once_An_End-to-End_Video_Action_Detection_Framework_ICCV_2021_paper.html
This document summarizes the DeepLab models for semantic image segmentation: DeepLab v1 used atrous convolution with VGG-16 as the backbone network. DeepLab v2 improved on this with atrous spatial pyramid pooling and added ResNet-101 as an option. DeepLab v3 removed dense CRFs and introduced multi-grid atrous convolution and bootstrapping. DeepLab v3+ uses an encoder-decoder architecture with Xception or ResNet-101 as the backbone and atrous separable convolutions.
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
?
There¡¯s been enormous progress in object detection algorithms. Starting from multi-stage ones like R-CNN to end-to-end ones like SSD or YOLO, accuracy of the methods improved significantly. Current applications include pedestrian detection for cars and face detection on facebook.
But that¡¯s just the beginning. I am going to show the algorithms for solving the problem, show what¡¯s currently possible, and what will be possible in the near future.
This document summarizes the DeepLab models for semantic image segmentation: DeepLab v1 used atrous convolution with VGG-16 as the backbone network. DeepLab v2 improved on this with atrous spatial pyramid pooling and added ResNet-101 as an option. DeepLab v3 removed dense CRFs and introduced multi-grid atrous convolution and bootstrapping. DeepLab v3+ uses an encoder-decoder architecture with Xception or ResNet-101 as the backbone and atrous separable convolutions.
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
?
There¡¯s been enormous progress in object detection algorithms. Starting from multi-stage ones like R-CNN to end-to-end ones like SSD or YOLO, accuracy of the methods improved significantly. Current applications include pedestrian detection for cars and face detection on facebook.
But that¡¯s just the beginning. I am going to show the algorithms for solving the problem, show what¡¯s currently possible, and what will be possible in the near future.