Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
104. 参考文献
?Digital Image Processing: International Edition: Rafael C. Gonzalez, Richard E. Woods
?Wavelet Analysis for Image Processing: Tzu-Heng Henry Lee Graduate Institute of
Communication Engineering, National Taiwan University, Taipei, Taiwan, ROC
?よくわかる信号処理 - フーリエ解析からウェーブレット変換まで: 和田成夫
?フーリエ変換からウェーブレット変換へ やり直しのための通信数学: 三谷政昭
?ウェーブレット変換の基礎と応用事例: 橘亮輔
?画像処理のための複素数離散ウェーブレット変換の設計と応用に関する研究: 加藤毅
104