[PR-358] Training Differentially Private Generative Models with Sinkhorn Dive...HyunKyu Jeon[PR-358] Paper Review - Training Differentially Private Generative Models with Sinkhorn Divergence
Super tickets in pre trained language modelsHyunKyu JeonThis document discusses finding "super tickets" in pre-trained language models through pruning attention heads and feedforward layers. It shows that lightly pruning BERT models can improve generalization without degrading accuracy (phase transition phenomenon). The authors propose a new pruning approach for multi-task fine-tuning of language models called "ticket sharing" where pruned weights are shared across tasks. Experiments on GLUE benchmarks show their proposed super ticket and ticket sharing methods consistently outperform unpruned baselines, with more significant gains on smaller tasks. Analysis indicates pruning reduces model variance and some tasks share more task-specific knowledge than others.
Domain Invariant Representation Learning with Domain Density TransformationsHyunKyu JeonThe document discusses domain invariant representation learning aimed at creating models that generalize well to unseen domains, contrasting it with domain adaptation. It proposes a method that enforces invariance across transformations between domains and utilizes generative adversarial networks to implement these transformations. The effectiveness of the proposed approach is demonstrated on various datasets, achieving competitive results compared to state-of-the-art methods in domain generalization.
Meta back translationHyunKyu JeonThis document summarizes Meta Back-Translation, a method for improving back-translation by training the backward model to directly optimize the performance of the forward model during training. The key points are:
1. Back-translation typically relies on a fixed backward model, which can lead the forward model to overfit to its outputs. Meta back-translation instead continually trains the backward model to generate pseudo-parallel data that improves the forward model.
2. Experiments show Meta back-translation generates translations with fewer pathological outputs like greatly differing in length from references. It also avoids both overfitting and underfitting of the forward model by flexibly controlling the diversity of pseudo-parallel data.
3. Related work leverages mon
Maxmin qlearning controlling the estimation bias of qlearningHyunKyu JeonThis document summarizes the Maxmin Q-learning paper published at ICLR 2020. Maxmin Q-learning aims to address the overestimation bias of Q-learning and underestimation bias of Double Q-learning by maintaining multiple Q-functions and using the minimum value across them for the target in the Q-learning update. It defines the action selection and target construction for the update based on taking the maximum over the minimum Q-value for each action. The algorithm initializes multiple Q-functions, selects a random subset to update using the maxmin target constructed from the minimum Q-values. This approach reduces the biases seen in prior methods.
십분수학거리(پٲԳ)HyunKyu JeonDistance in terms of Data Sciences/Math
about Euclidean Distance, Manhattan Distance, Minkowski Distance, and Mahalanobis distance.
십분수학거리(پٲԳ)HyunKyu JeonDistance in terms of Data Sciences/Math
about Euclidean Distance, Manhattan Distance, Minkowski Distance, and Mahalanobis distance.