研究室の輪講で使った古いスライド。物体検出の黎明期からシングルショット系までのまとめ。
Old slides used in a lab lecture. A summary of object detection from its early days to single-shot systems.
フォント不足による表示崩れがあります(筑紫A丸ゴシック、Montserratを使用)。
【輪読会】Learning Continuous Image Representation with Local Implicit Image Funct...Deep Learning JP
?
1. The document discusses a new method for single image super-resolution using local implicit image functions (LIIF) based on implicit neural representations. LIIF allows for arbitrary upsampling scales beyond just integer scales.
2. Key techniques include feature unfolding to enrich latent codes, local ensemble of nearby latent codes to reduce artifacts, and cell decoding conditioned on query pixel coordinates to improve quality at high upsampling scales.
3. Experiments show the method achieves performance on par with MetaSR at trained scales and surpasses MetaSR at untrained scales, and it can generate high resolution images even at a scale of 30x through appropriate cell decoding settings.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
?
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
【輪読会】Learning Continuous Image Representation with Local Implicit Image Funct...Deep Learning JP
?
1. The document discusses a new method for single image super-resolution using local implicit image functions (LIIF) based on implicit neural representations. LIIF allows for arbitrary upsampling scales beyond just integer scales.
2. Key techniques include feature unfolding to enrich latent codes, local ensemble of nearby latent codes to reduce artifacts, and cell decoding conditioned on query pixel coordinates to improve quality at high upsampling scales.
3. Experiments show the method achieves performance on par with MetaSR at trained scales and surpasses MetaSR at untrained scales, and it can generate high resolution images even at a scale of 30x through appropriate cell decoding settings.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
?
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
文献紹介:EfficientDet: Scalable and Efficient Object DetectionToru Tamaki
?
Mingxing Tan, Ruoming Pang, Quoc V. Le; EfficientDet: Scalable and Efficient Object Detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10781-10790
https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html
論文紹介:A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, a...Toru Tamaki
?
Chaoyang Zhu, Long Chen, "A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future" arXiv2023
https://arxiv.org/abs/2307.09220
16. End-to-End Instance Segmentation and
Counting with Recurrent Attention
? Instance Segmentation用のニューラルネットワーク
? ステップ毎に1つの物体に注目して領域分割する
? 一度見た領域は記憶しておく
(人間の数え方を参考にしている)
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
16
17. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
? モデルの全体像:
17
18. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
一度見た領域を記憶しておく部品
18
19. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
どこに注目するかを決める
19
20. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
注目した領域のSegmentationを行う
20
21. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
物体が見つかったかどうかの判定を行う
(スコアが0.5以下になったら終了)
21
22. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
一度見た部分は記憶する。
(以下繰返し)
22
23. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
? 結果(1)葉っぱの領域分割
23
24. End-to-End Instance Segmentation and
Counting with Recurrent Attention
End-to-End Instance Segmentation and Counting with Recurrent Attention: https://arxiv.org/abs/1605.09410
? 結果(2)車両の領域分割
24