The document presents an overview of the research group 'Generations' focused on image generation and generative models, detailing their contributions to fields like unpaired image-to-image translation and domain adaptation. It highlights various studies and techniques, including CycleGAN and neural radiance fields, aimed at enhancing image translation while preserving contextual integrity. The group is actively seeking new members for collaboration on these innovative themes.
This document outlines high-quality measurement point acquisition and automatic modeling technology for equipment and environments using 3D laser scanning. It discusses techniques for efficient point cloud processing, optimal scanner placement, and registration methods, emphasizing the importance of data quality and measurement efficiency. Various applications in urban and industrial settings are highlighted, showcasing significant improvements in model accuracy and operational efficiency.
The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
The document presents an overview of the research group 'Generations' focused on image generation and generative models, detailing their contributions to fields like unpaired image-to-image translation and domain adaptation. It highlights various studies and techniques, including CycleGAN and neural radiance fields, aimed at enhancing image translation while preserving contextual integrity. The group is actively seeking new members for collaboration on these innovative themes.
This document outlines high-quality measurement point acquisition and automatic modeling technology for equipment and environments using 3D laser scanning. It discusses techniques for efficient point cloud processing, optimal scanner placement, and registration methods, emphasizing the importance of data quality and measurement efficiency. Various applications in urban and industrial settings are highlighted, showcasing significant improvements in model accuracy and operational efficiency.
The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
YouTube nnabla channelの次の動画で利用したスライドです。
【DeepLearning研修】Transformerの基礎と応用 --第3回 Transformerの画像での応用
https://youtu.be/rkuayDInyF0
【参考文献】
?Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1512.03385
?An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://arxiv.org/abs/2010.11929
?ON THE RELATIONSHIP BETWEEN SELF-ATTENTION AND CONVOLUTIONAL LAYERS
https://arxiv.org/abs/1911.03584
?Image Style Transfer Using Convolutional Neural Networks
https://ieeexplore.ieee.org/document/7780634
?Are Convolutional Neural Networks or Transformers more like human vision
https://arxiv.org/abs/2105.07197
?HOW DO VISION TRANSFORMERS WORK?
https://arxiv.org/abs/2202.06709
?Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
https://arxiv.org/abs/1610.02391
?Quantifying Attention Flow in Transformers
https://arxiv.org/abs/2005.00928
?Transformer Interpretability Beyond Attention Visualization
https://arxiv.org/abs/2012.09838
?End-to-End Object Detection with Transformers
https://arxiv.org/abs/2005.12872
?SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
https://arxiv.org/abs/2105.15203
?Training data-efficient image transformers & distillation through attention
https://arxiv.org/abs/2012.12877
?Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
https://arxiv.org/abs/2103.14030
?Masked Autoencoders Are Scalable Vision Learners
https://arxiv.org/abs/2111.06377
?Emerging Properties in Self-Supervised Vision Transformers
https://arxiv.org/abs/2104.14294
?Scaling Laws for Neural Language Models
https://arxiv.org/abs/2001.08361
?Learning Transferable Visual Models From Natural Language Supervision
https://arxiv.org/abs/2103.00020
?Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2403.03206
?Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
https://arxiv.org/abs/2402.17177
?SSII2024技術マップ
https://confit.atlas.jp/guide/event/ssii2024/static/special_project_tech_map
文献紹介:TSM: Temporal Shift Module for Efficient Video UnderstandingToru Tamaki
?
Ji Lin, Chuang Gan, Song Han; TSM: Temporal Shift Module for Efficient Video Understanding, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7083-7093
https://openaccess.thecvf.com/content_ICCV_2019/html/Lin_TSM_Temporal_Shift_Module_for_Efficient_Video_Understanding_ICCV_2019_paper.html
21. 論文の紹介
21
? STN
– Spatial Transformer Networks
M. Jaderberg+ / NIPS2015 (arXiv:1506.02025)
? PointNet
– PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
C. R. Qi+ / CVPR2017 (arXiv:1612.00593)
スライドのフォーマットが
サーベイスライドのままなのはお許し下さい..
46. 46
? PointNetが本当にブレイクスルーだったのか
PointNetのちょっと前にSymmetric Functionで
点群を扱う論文が出ている
? Deep Learning with Sets and Point Clouds
– S. Ravanbakhsh, J. Schneider, B. Poczos. / ICLR2017-WS
– 2016/11/14 (arxiv:1611.04500)
? PointNet
– C. R. Qi, H. Su, K. Mo, L. J. Guibas. / CVPR2017
– 2016/12/02 (arxiv:1612.00593)
? Deep Sets
– M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. Salakhutdinov,
A. Smola. / NIPS2017
– 2017/03/10 (arXiv:1703.06114)
有名になる要因はアイデアの新しさだけではない
考えられる他の要因例:
学会,完成度,評価,見せ方,タイトル,初期の被引用数???
メタな話