This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
The document discusses deep kernel learning, which combines deep learning and Gaussian processes (GPs). It briefly reviews the predictive equations and marginal likelihood for GPs, noting their computational requirements. GPs assume datasets with input vectors and target values, modeling the values as joint Gaussian distributions based on a mean function and covariance kernel. Predictive distributions for test points are also Gaussian. The goal of deep kernel learning is to leverage recent work on efficiently representing kernel functions to produce scalable deep kernels, allowing outperformance of standalone deep learning and GPs on various datasets.
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
The document discusses deep kernel learning, which combines deep learning and Gaussian processes (GPs). It briefly reviews the predictive equations and marginal likelihood for GPs, noting their computational requirements. GPs assume datasets with input vectors and target values, modeling the values as joint Gaussian distributions based on a mean function and covariance kernel. Predictive distributions for test points are also Gaussian. The goal of deep kernel learning is to leverage recent work on efficiently representing kernel functions to produce scalable deep kernels, allowing outperformance of standalone deep learning and GPs on various datasets.
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
?
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
?
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
DeNA AIシステム部内の輪講で発表した資料です。Deep fakesの種類やその検出法の紹介です。
主に下記の論文の紹介
S. Agarwal, et al., "Protecting World Leaders Against Deep Fakes," in Proc. of CVPR Workshop on Media Forensics, 2019.
A. Rossler, et al., "FaceForensics++: Learning to Detect Manipulated Facial Images," in Proc. of ICCV, 2019.