【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
?
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
?
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
?
Neural Radiance Flow (NeRFlow) is a method that extends Neural Radiance Fields (NeRF) to model dynamic scenes from video data. NeRFlow simultaneously learns two fields - a radiance field to reconstruct images like NeRF, and a flow field to model how points in space move over time using optical flow. This allows it to generate novel views from a new time point. The model is trained end-to-end by minimizing losses for color reconstruction from volume rendering and optical flow reconstruction. However, the method requires training separate models for each scene and does not generalize to unknown scenes.
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry...Masaya Kaneko
?
SfMLearner + KF selectionを提案した"Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM [ICCV19]"を論文読み会で紹介した時の資料です.
8. 7
関連研究:SfM/SLAM
? SfM/SLAM [Structure from Motion / Simultaneous Localization and Mapping]
– 画像群から, 抽出した特徴点の三次元位置と各画像のカメラ姿勢
(三次元地図)を同時に求める
三次元地図の作成
(点の三次元位置+カメラ姿勢)
画像群
[1] Building Rome in a Day [Agarwal+, ICCV2009]
9. 8
関連研究:SfM/SLAM
? SfM/SLAM [Structure from Motion / Simultaneous Localization and Mapping]
– 画像群から, 抽出した特徴点の三次元位置と各画像のカメラ姿勢
(三次元地図)を同時に求める
– 三次元地図を使い, 画像から位置推定もできる(Localization)
三次元地図の作成
(点の三次元位置+カメラ姿勢)
画像群
Localization
カメラ姿勢
[1] Building Rome in a Day [Agarwal+, ICCV2009]
10. 9
関連研究:SfM/SLAM
? SfM/SLAM [Structure from Motion / Simultaneous Localization and Mapping]
– 画像群から, 抽出した特徴点の三次元位置と各画像のカメラ姿勢
(三次元地図)を同時に求める
– 三次元地図を使い, 画像から位置推定もできる(Localization)
– 逆に位置から画像の推定も可能 (Rendering)
三次元地図の作成
(点の三次元位置+カメラ姿勢)
画像群
Localization Rendering
カメラ姿勢
[1] Building Rome in a Day [Agarwal+, ICCV2009]