You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
?
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
This document summarizes the correspondence between single-layer neural networks and Gaussian processes (GPs). It reviews how the outputs of a single-layer neural network converge to a GP in the infinite-width limit, with the network's covariance function determined by its architecture. The document derives the mean and covariance functions for the GP corresponding to a single-layer network, and notes that different network outputs are independent GPs.
This document discusses the connections between generative adversarial networks (GANs) and energy-based models (EBMs). It shows that GAN training can be interpreted as approximating maximum likelihood training of an EBM by replacing the intractable data distribution with a generator distribution. Specifically:
1. GANs train a discriminator to estimate the energy function of an EBM, with the generator minimizing that energy of its samples.
2. EBM training can be seen as alternatively updating the generator and sampling from it, in a manner similar to contrastive divergence for EBMs.
3. This perspective unifies GANs and EBMs, and suggests ways to combine their training procedures to leverage their respective advantages
The document lists three conferences: ECCV2016 from October 8-16, ACMMM2016 from October 15-19, and MADiMa2016 on October 16. It provides some details about papers presented at ECCV2016 and ACMMM2016, including papers on deep learning, ingredient recognition for recipes, and adaptive facial expression feedback. It also mentions receptions for ECCV2016 and ACMMM2016.
- The document discusses generative adversarial networks (GANs) which use two neural networks, a generator (G) and discriminator (D), that compete against each other. The generator tries to generate fake images that look real, while the discriminator tries to tell the difference between real and fake images.
- GANs can be used for tasks like image-to-image translation by using models like CycleGAN that use cycle consistency losses to train the generator and discriminator on translating images between two domains while preserving content.
- The document provides examples of neural network architectures that can be used for the generator and discriminator, and discusses training procedures like using auxiliary classifier losses to improve GAN training stability and quality.
The document summarizes the 2017 UEC Tokyo conference. It provides details about:
- The conference topics which included Keras, TensorFlow, deep learning, computer vision, and generative models.
- The schedule which included keynotes on October 20th and technical sessions from October 21st to 22nd.
- Information on registration, venue, and organizers of the conference.
This document describes research on running deep learning models on mobile devices. The researchers created Caffe2C, which converts Caffe models and parameters into a single C source code. This allows deep learning models trained in Caffe to run efficiently on mobile. Caffe2C is 15x faster than OpenCV DNN. Four mobile apps were created demonstrating Caffe2C: DeepFoodCam for food recognition, DeepStyleCam for neural style transfer, DeepMaterialCam for image translation, and DeepTextInpaintCam for text inpainting.
1. The document describes a mobile image recognition system using a CNN model called Network-in-Network. It was implemented as iOS and Android apps that can recognize food images without needing an online server.
2. The system achieves high accuracy of 78.8% for top-1 and 95.2% for top-5 recognition of food images from the UECFOOD100 dataset, with processing speeds of 55.7ms. It uses techniques like batch normalization and multi-threading to optimize performance on mobile.
3. The architecture was modified from the original Network-in-Network by adding batch normalization, reducing layers and kernels, and using multiple image sizes to balance recognition accuracy and speed. Global average pooling
This document describes the mathematical operations of a convolutional layer in a neural network. It shows that a convolutional layer can be represented as a matrix multiplication between the input feature maps and the convolutional kernels. To perform this matrix multiplication, the input is first transformed using im2col to form a 2D matrix, where each column consists of a patch from the input. This matrix is then multiplied with the kernel matrix to produce the output feature maps.
Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Ryosuke Tanno
?
Caffe2C is a framework that converts CNN models and parameters trained in Caffe into a single C source code file that can run efficiently on mobile devices. It achieves faster runtime than OpenCV DNN by directly compiling the network like a compiler rather than interpreting it. The authors implemented 4 image recognition apps for iOS using Caffe2C: DeepFoodCam (food), DeepBirdCam (birds), DeepDogCam (dogs), and DeepFlowerCam (flowers). They fine-tuned pre-trained models on various datasets, achieving top-1 accuracies of 74.5-69% for the different classification tasks. Caffe2C allows easy development of CNN-based mobile apps using models trained in Caffe
The document discusses conditional generative adversarial networks (GANs) for image-to-image translation tasks. It presents the conditional CycleGAN model which uses cycle consistency loss to learn mappings between domains without paired training examples. The model consists of generators and discriminators trained in an adversarial manner to translate images from one domain to another and back again.
1. AR DeepCalorieCam is a mobile application that uses augmented reality and a multi-task convolutional neural network to detect and estimate the calorie content of foods in images.
2. The multi-task CNN was trained on over 83,000 images from the UECFOOD100 dataset to perform both food detection and calorie estimation simultaneously.
3. In user tests, the multi-task CNN achieved calorie estimation accuracy within 50 calories for 76% of images and within 85 calories for 80% of images, representing an improvement over previous single-task methods.
OpenCV DNN moduleとOur methodのruntimeを比較したスライドで、13th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services(MOBIQUITOUS)(http://mobiquitous.org/2016/show/home) のworkshopで発表したスライドの一部になっています。画像認識部分の詳細は省略しました。
35. パラメータのいろいろ
1. 機械全体の形
l レイヤの数
l モデルの選択
l 判別器の選択
2. 各層のモデルのパラメタ
l 隠れ層のユニット数
l 重みの正規化
l ユニットのスパース化
l 活性化関数の選択
3. 最適化のためのパラメタ
l 学習?法
l 学習率(初期値,減衰)
l ミニバッチサイズ
l イテレーション回数
l Momentum
4. その他
l 重みの初期値(スケール)
l 乱数
l 前処理