Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
?
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
?
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
Image Restoration with Union of Directional Orthonormal DWTsShogo Muramatsu
?
This document proposes using a union of directional symmetric orthonormal wavelet transforms (DirSOWTs) as a redundant dictionary for image restoration tasks like deblurring, super resolution, and inpainting. A DirSOWT provides a critically sampled, overlapping, orthonormal, symmetric, real-valued, and compact support basis that is capable of representing directional features. By taking a union of DirSOWTs with different orientations, the resulting dictionary is both redundant and directional. Iterative shrinkage-thresholding algorithm (ISTA) can be used to solve the sparse representation problem for image restoration, provided the dictionary forms a tight frame. Simulation results applying this approach to various image restoration tasks are presented.
Design Method of Directional GenLOT with Trend Vanishing MomentsShogo Muramatsu
?
Proc. of Proc. of Asia Pacific Signal and Information Proc. Assoc. Annual Summit and Conf. (APSIPA ASC), pp.692-701, Biopolis, Singapore, Dec. 14 – 17, 2010