PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
油
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Jonathan Frankle, David J. Schwab, Ari S. Morcos
ICLR 2021
Paper link: https://arxiv.org/abs/2008.09093
Video presentation link: https://youtu.be/bI8ceHOoYxk
reviewed by Sunghoon Joo (譯殊燕)
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
油
PR-339: Maintaining discrimination and fairness in class incremental learning
Paper link: http://arxiv.org/abs/1911.07053
Video presentation link: https://youtu.be/hptinxZIXT4
#class imbalance, #knowledge distillation, # class incremental learning
Imagination-Augmented Agents for Deep Reinforcement Learning煙 豕
油
I will introduce a paper about I2A architecture made by deepmind. That is about Imagination-Augmented Agents for Deep Reinforcement Learning
This slide were presented at Deep Learning Study group in DAVIAN LAB.
Paper link: https://arxiv.org/abs/1707.06203
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
油
1. Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
https://arxiv.org/abs/1806.00746
2. 3D human pose estimation in video with temporal convolutions and semi-supervised training
https://arxiv.org/abs/1811.11742
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
油
PR-339: Maintaining discrimination and fairness in class incremental learning
Paper link: http://arxiv.org/abs/1911.07053
Video presentation link: https://youtu.be/hptinxZIXT4
#class imbalance, #knowledge distillation, # class incremental learning
Imagination-Augmented Agents for Deep Reinforcement Learning煙 豕
油
I will introduce a paper about I2A architecture made by deepmind. That is about Imagination-Augmented Agents for Deep Reinforcement Learning
This slide were presented at Deep Learning Study group in DAVIAN LAB.
Paper link: https://arxiv.org/abs/1707.06203
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...Gyubin Son
油
1. Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
https://arxiv.org/abs/1806.00746
2. 3D human pose estimation in video with temporal convolutions and semi-supervised training
https://arxiv.org/abs/1811.11742
Unsupervised anomaly detection using style distillationLEE HOSEONG
油
The document discusses using convolutional autoencoders for unsupervised anomaly detection. It describes training a convolutional autoencoder model on normal data to learn the distribution of normal examples, then using the model to detect anomalies in new data based on the reconstruction error. The process involves training the autoencoder to minimize the difference between inputs and outputs, then using the trained model to encode new data and flag examples with a high reconstruction error as anomalies.
do adversarially robust image net models transfer betterLEE HOSEONG
油
The document discusses an experiment comparing the transfer learning performance of standard ImageNet models versus adversarially robust ImageNet models. The experiment finds that robust models consistently match or outperform standard models on a variety of downstream transfer learning tasks, despite having lower accuracy on ImageNet. Further analysis shows robust models improve with increased width and that the optimal level of robustness depends on properties of the downstream task like dataset granularity. Overall, the findings suggest adversarially robust models transfer learned representations better than standard models.
This document discusses mixed precision training techniques for deep neural networks. It introduces three techniques to train models with half-precision floating point without losing accuracy: 1) Maintaining a FP32 master copy of weights, 2) Scaling the loss to prevent small gradients, and 3) Performing certain arithmetic like dot products in FP32. Experimental results show these techniques allow a variety of networks to match the accuracy of FP32 training while reducing memory and bandwidth. The document also discusses related work and PyTorch's new Automatic Mixed Precision features.
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
油
YOLOv4 builds upon previous YOLO models and introduces techniques like CSPDarknet53, SPP, PAN, Mosaic data augmentation, and modifications to existing methods to achieve state-of-the-art object detection speed and accuracy while being trainable on a single GPU. Experiments show that combining these techniques through a "bag of freebies" and "bag of specials" approach improves classifier and detector performance over baselines on standard datasets. The paper contributes an efficient object detection model suitable for production use with limited resources.
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
油
This document summarizes the FixMatch paper, which proposes a simple semi-supervised learning method that achieves state-of-the-art results. FixMatch combines pseudo-labeling and consistency regularization by generating pseudo-labels for unlabeled data using a model's prediction on a weakly augmented version and enforcing consistency on a strongly augmented version. Extensive ablation studies show that FixMatch outperforms previous methods on standard benchmarks even with limited labeled data and identifies consistency regularization and pseudo-labeling as the most important factors for its success.
"Revisiting self supervised visual representation learning" Paper ReviewLEE HOSEONG
油
This paper revisits self-supervised visual representation learning techniques. It conducts a large-scale study comparing different CNN architectures (ResNet, RevNet, VGG) and self-supervised techniques (rotation, exemplar, jigsaw, relative patch location). The study finds that using modern CNN architectures like ResNet instead of older AlexNet models significantly improves performance. Increasing the width of networks also boosts performance of self-supervised learning. Evaluation of representations on a new dataset shows the learned features generalize well.
Self-supervised learning uses unlabeled data to learn visual representations through pretext tasks like predicting relative patch location, solving jigsaw puzzles, or image rotation. These tasks require semantic understanding to solve but only use unlabeled data. The features learned through pretraining on pretext tasks can then be transferred to downstream tasks like image classification and object detection, often outperforming supervised pretraining. Several papers introduce different pretext tasks and evaluate feature transfer on datasets like ImageNet and PASCAL VOC. Recent work combines multiple pretext tasks and shows improved generalization across tasks and datasets.
Human uncertainty makes classification more robust, ICCV 2019 ReviewLEE HOSEONG
油
1. The document summarizes a research paper that proposes training deep neural networks on soft labels representing human uncertainty in image classification, which improves generalization and robustness compared to training on hard labels.
2. Experiments show that models trained on soft labels constructed from human responses better fit patterns of human uncertainty and improve accuracy, cross-entropy, and a new second-best accuracy measure on various generalization datasets.
3. Alternative soft label methods are also explored, finding that human uncertainty provides a more important contribution than soft labels alone. While robustness to adversarial attacks is improved, defenses are still needed.
This document provides an overview of single image super resolution using deep learning. It discusses how super resolution can be used to generate a high resolution image from a low resolution input. Deep learning models like SRCNN were early approaches for super resolution but newer models use deeper networks and perceptual losses. Generative adversarial networks have also been applied to improve perceptual quality. Key applications are in satellite imagery, medical imaging, and video enhancement. Metrics like PSNR and SSIM are commonly used but may not correlate with human perception. Overall, deep learning has advanced super resolution techniques but challenges remain in fully evaluating perceptual quality.
This document provides a review of the paper "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks" presented at ICLR 2019. The paper proposes that dense neural networks contain sparse subnetworks that are capable of learning in isolation with the same accuracy in fewer iterations if they retain their original initialization weights. Through experiments on MNIST, CIFAR10 and ImageNet datasets, the paper finds evidence that iterative pruning can discover such "winning tickets" and achieve better performance than one-shot pruning or training sparse subnetworks from random initialization. However, further work is needed to test the hypothesis on larger datasets and optimize the resulting architectures.
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
油
This document summarizes the Pelee object detection system which uses the PeleeNet efficient feature extraction network for real-time object detection on mobile devices. PeleeNet improves on DenseNet with two-way dense layers, a stem block, dynamic bottleneck layers, and transition layers without compression. Pelee uses SSD with PeleeNet, selecting fewer feature maps and adding residual prediction blocks for faster, more accurate detection compared to SSD and YOLO. The document concludes that PeleeNet and Pelee achieve real-time classification and detection on devices, outperforming existing models in speed, cost and accuracy with simple code.
6. Method(1)
Overall architectures of the convolutional nets are manually
predetermined
Normal Cell - convolutional cells that return a feature map of the same
dimension
Reduction Cell - convolutional cells that return a feature map where the
feature map height and width is reduced by a factor of two (Initial
operation with stride of two conv)
Using common heuristic to double the number of filters in the
output whenever the spatial activation size is reduced
N, 豌 convolution cell filter 螳 User螳 讌伎殊伎 覲