The document discusses texture analysis in computer vision. It begins by asking what texture is and whether objects themselves can be considered textures. It then outlines several statistical and Fourier approaches to texture analysis, citing specific papers on texture energy measures, texton theory, and using textons to model materials. Deep convolutional neural networks are also discussed as being able to recognize and describe texture through learned filter banks. The concept of texels is introduced as low-level features that make up texture at different scales from edges to shapes. The document hypothesizes that CNNs are sensitive to texture because texture repeats across images while object shapes do not, and that CNNs act as texture mappers rather than template matchers. It also questions whether primary visual cortex
This document discusses active learning techniques called Deep Badge Active Learning. It proposes using gradient embeddings to represent samples and k-means++ initialization for sample selection. Specifically, it uses the gradient embedding for feature representation, then performs k-means++ initialization to select samples by finding those with the maximum 2-norm and those farthest from existing samples, adding them to the set iteratively. This aims to select a diverse set of samples, similar to how binary search works. The technique could improve over entropy-based and core-set selection approaches for active learning with convolutional neural networks.
Neural Radiance Fields (NeRF) represent scenes as neural networks that map 5D input (3D position and 2D viewing direction) to a 4D output (RGB color and opacity). NeRF uses an MLP that is trained to predict volumetric density and color for a scene from many camera views. Key aspects of NeRF include using positional encodings as inputs to help model view-dependent effects, and training to optimize for integrated color and density values along camera rays. NeRF has enabled novel applications beyond novel view synthesis, including pose estimation, dense descriptors, and self-supervised segmentation.
The document discusses various pooling operations used in image processing and convolutional neural networks (CNNs). It provides an overview of common pooling methods like max pooling, average pooling, and spatial pyramid pooling. It also discusses more advanced and trainable pooling techniques like stochastic pooling, mixed/gated pooling, fractional pooling, local importance pooling, and global feature guided local pooling. The document analyzes the tradeoffs of different pooling methods and how they can balance preserving details versus achieving invariance to changes in position or lighting. It references several influential papers that analyzed properties of pooling operations.
This document discusses background elimination techniques which involve three main steps: object detection to select the target, segmentation to isolate the target from the background, and refinement to improve the quality of the segmented mask. It provides an overview of approaches that have been used for each step, including early methods based on SVM and more recent deep learning-based techniques like Mask R-CNN that integrate detection and segmentation. The document also notes that segmentation is challenging without object detection cues and discusses types of segmentation as well as refinement methods that use transformations, dimension reduction, and graph-based modeling.
1. TL-GAN matches feature axes in the latent space to generate images without fine-tuning the neural network.
2. It discovers correlations between the latent vector Z and image labels by applying multivariate linear regression and normalizing the coefficients.
3. The vectors are then adjusted to be orthogonal, allowing different properties to be matched while labeling unlabeled data to add descriptions.
Image matting is the process of separating the foreground and background of an image by assigning each pixel an alpha value between 0 and 1 indicating its transparency. Traditionally, matting uses a trimap to classify pixels as foreground, background, or uncertain. Early sampling-based methods calculated alpha values based on feature distances of closest foreground and background pixels. More recent approaches use deep learning, where the first deep learning matting method in 2016 took local and non-local information as input, and the 2017 Deep Image Matting method used an RGB image and trimap as input in a fully deep learning framework.
Multi object Deep reinforcement learningDong Heon Cho
油
This document discusses multi-objective reinforcement learning and introduces Deep OLS Learning, which combines multi-objective learning with deep Q-networks. It presents Deep OLS Learning with Partial Reuse and Full Reuse to handle multi-objective Markov decision processes by finding a convergence set of policies that optimize multiple conflicting objectives, such as maximizing server performance while minimizing power consumption. The approach is evaluated on multi-objective versions of mountain car and deep sea treasure problems.
Multi agent reinforcement learning for sequential social dilemmasDong Heon Cho
油
This document summarizes research on multi-agent reinforcement learning in sequential social dilemmas. It discusses how sequential social dilemmas extend traditional matrix games by adding temporal aspects like partial observability. Simulation experiments are described where agents learn cooperative or defective policies for tasks like fruit gathering and wolfpack hunting in a partially observable environment. The agents' learned policies are then used to construct an empirical payoff matrix to analyze whether cooperation or defection is rewarded more, relating the multi-agent reinforcement learning results back to classic social dilemmas.
This document discusses multi-agent systems and their applications. It provides examples of multi-agent systems for spacecraft control, manufacturing scheduling, and more. Key points:
- Multi-agent systems consist of interacting intelligent agents that can cooperate, coordinate, and negotiate to achieve goals. They offer benefits like robustness, scalability, and reusability.
- Challenges include defining global goals from local actions and incentivizing cooperation. Games like the prisoner's dilemma model social dilemmas around cooperation versus defection.
- The document outlines architectures like the blackboard model and BDI (belief-desire-intention) model. It also provides a manufacturing example using the JADE platform.
The document discusses Hybrid Reward Architecture (HRA), a reinforcement learning method that decomposes the reward function of an environment into multiple sub-reward functions. In HRA, each sub-reward function is learned by a separate agent using DQN. This allows HRA to learn complex reward functions more quickly and stably compared to using a single reward signal. An experiment is described where HRA learns to eat 5 randomly placed fruits in an environment over 300 steps more effectively than a standard DQN agent.
Deep Learning AtoC with Image PerspectiveDong Heon Cho
油
Deep learning models like CNNs, RNNs, and GANs are widely used for image classification and computer vision tasks. CNNs are commonly used for tasks like classification, detection, segmentation through learning hierarchical image features. Fully convolutional networks with encoder-decoder architectures like SegNet and Mask R-CNN can perform pixel-level semantic segmentation and instance segmentation by combining classification and bounding box detection. Deep learning has achieved state-of-the-art performance on many image applications due to its ability to learn powerful visual representations from large datasets.
The document discusses approaches for using deep learning with small datasets, including transfer learning techniques like fine-tuning pre-trained models, multi-task learning, and metric learning approaches for few-shot and zero-shot learning problems. It also covers domain adaptation techniques when labels are not available, as well as anomaly detection for skewed label distributions. Traditional models like SVM are suggested as initial approaches, with deep learning techniques applied if those are not satisfactory.
The document discusses domain adaptation and transfer learning techniques in deep learning such as feature extraction, fine tuning, and parameter sharing. It specifically describes domain-adversarial neural networks which aim to make the source and target feature distributions indistinguishable and domain separation networks which extract domain-invariant and private features to model each domain separately.
This document discusses various techniques for compressing and speeding up deep neural networks, including singular value decomposition, pruning, and SqueezeNet. Singular value decomposition can be used to compress fully connected layers by minimizing the difference between the original weight matrix and its low-rank approximation. Pruning techniques remove unimportant weights below a threshold. SqueezeNet is highlighted as designing a small CNN architecture from the start that achieves AlexNet-level accuracy with 50x fewer parameters and less than 0.5MB in size.
The document discusses texture analysis in computer vision. It begins by asking what texture is and whether objects themselves can be considered textures. It then outlines several statistical and Fourier approaches to texture analysis, citing specific papers on texture energy measures, texton theory, and using textons to model materials. Deep convolutional neural networks are also discussed as being able to recognize and describe texture through learned filter banks. The concept of texels is introduced as low-level features that make up texture at different scales from edges to shapes. The document hypothesizes that CNNs are sensitive to texture because texture repeats across images while object shapes do not, and that CNNs act as texture mappers rather than template matchers. It also questions whether primary visual cortex
This document discusses active learning techniques called Deep Badge Active Learning. It proposes using gradient embeddings to represent samples and k-means++ initialization for sample selection. Specifically, it uses the gradient embedding for feature representation, then performs k-means++ initialization to select samples by finding those with the maximum 2-norm and those farthest from existing samples, adding them to the set iteratively. This aims to select a diverse set of samples, similar to how binary search works. The technique could improve over entropy-based and core-set selection approaches for active learning with convolutional neural networks.
Neural Radiance Fields (NeRF) represent scenes as neural networks that map 5D input (3D position and 2D viewing direction) to a 4D output (RGB color and opacity). NeRF uses an MLP that is trained to predict volumetric density and color for a scene from many camera views. Key aspects of NeRF include using positional encodings as inputs to help model view-dependent effects, and training to optimize for integrated color and density values along camera rays. NeRF has enabled novel applications beyond novel view synthesis, including pose estimation, dense descriptors, and self-supervised segmentation.
The document discusses various pooling operations used in image processing and convolutional neural networks (CNNs). It provides an overview of common pooling methods like max pooling, average pooling, and spatial pyramid pooling. It also discusses more advanced and trainable pooling techniques like stochastic pooling, mixed/gated pooling, fractional pooling, local importance pooling, and global feature guided local pooling. The document analyzes the tradeoffs of different pooling methods and how they can balance preserving details versus achieving invariance to changes in position or lighting. It references several influential papers that analyzed properties of pooling operations.
This document discusses background elimination techniques which involve three main steps: object detection to select the target, segmentation to isolate the target from the background, and refinement to improve the quality of the segmented mask. It provides an overview of approaches that have been used for each step, including early methods based on SVM and more recent deep learning-based techniques like Mask R-CNN that integrate detection and segmentation. The document also notes that segmentation is challenging without object detection cues and discusses types of segmentation as well as refinement methods that use transformations, dimension reduction, and graph-based modeling.
1. TL-GAN matches feature axes in the latent space to generate images without fine-tuning the neural network.
2. It discovers correlations between the latent vector Z and image labels by applying multivariate linear regression and normalizing the coefficients.
3. The vectors are then adjusted to be orthogonal, allowing different properties to be matched while labeling unlabeled data to add descriptions.
Image matting is the process of separating the foreground and background of an image by assigning each pixel an alpha value between 0 and 1 indicating its transparency. Traditionally, matting uses a trimap to classify pixels as foreground, background, or uncertain. Early sampling-based methods calculated alpha values based on feature distances of closest foreground and background pixels. More recent approaches use deep learning, where the first deep learning matting method in 2016 took local and non-local information as input, and the 2017 Deep Image Matting method used an RGB image and trimap as input in a fully deep learning framework.
Multi object Deep reinforcement learningDong Heon Cho
油
This document discusses multi-objective reinforcement learning and introduces Deep OLS Learning, which combines multi-objective learning with deep Q-networks. It presents Deep OLS Learning with Partial Reuse and Full Reuse to handle multi-objective Markov decision processes by finding a convergence set of policies that optimize multiple conflicting objectives, such as maximizing server performance while minimizing power consumption. The approach is evaluated on multi-objective versions of mountain car and deep sea treasure problems.
Multi agent reinforcement learning for sequential social dilemmasDong Heon Cho
油
This document summarizes research on multi-agent reinforcement learning in sequential social dilemmas. It discusses how sequential social dilemmas extend traditional matrix games by adding temporal aspects like partial observability. Simulation experiments are described where agents learn cooperative or defective policies for tasks like fruit gathering and wolfpack hunting in a partially observable environment. The agents' learned policies are then used to construct an empirical payoff matrix to analyze whether cooperation or defection is rewarded more, relating the multi-agent reinforcement learning results back to classic social dilemmas.
This document discusses multi-agent systems and their applications. It provides examples of multi-agent systems for spacecraft control, manufacturing scheduling, and more. Key points:
- Multi-agent systems consist of interacting intelligent agents that can cooperate, coordinate, and negotiate to achieve goals. They offer benefits like robustness, scalability, and reusability.
- Challenges include defining global goals from local actions and incentivizing cooperation. Games like the prisoner's dilemma model social dilemmas around cooperation versus defection.
- The document outlines architectures like the blackboard model and BDI (belief-desire-intention) model. It also provides a manufacturing example using the JADE platform.
The document discusses Hybrid Reward Architecture (HRA), a reinforcement learning method that decomposes the reward function of an environment into multiple sub-reward functions. In HRA, each sub-reward function is learned by a separate agent using DQN. This allows HRA to learn complex reward functions more quickly and stably compared to using a single reward signal. An experiment is described where HRA learns to eat 5 randomly placed fruits in an environment over 300 steps more effectively than a standard DQN agent.
Deep Learning AtoC with Image PerspectiveDong Heon Cho
油
Deep learning models like CNNs, RNNs, and GANs are widely used for image classification and computer vision tasks. CNNs are commonly used for tasks like classification, detection, segmentation through learning hierarchical image features. Fully convolutional networks with encoder-decoder architectures like SegNet and Mask R-CNN can perform pixel-level semantic segmentation and instance segmentation by combining classification and bounding box detection. Deep learning has achieved state-of-the-art performance on many image applications due to its ability to learn powerful visual representations from large datasets.
The document discusses approaches for using deep learning with small datasets, including transfer learning techniques like fine-tuning pre-trained models, multi-task learning, and metric learning approaches for few-shot and zero-shot learning problems. It also covers domain adaptation techniques when labels are not available, as well as anomaly detection for skewed label distributions. Traditional models like SVM are suggested as initial approaches, with deep learning techniques applied if those are not satisfactory.
The document discusses domain adaptation and transfer learning techniques in deep learning such as feature extraction, fine tuning, and parameter sharing. It specifically describes domain-adversarial neural networks which aim to make the source and target feature distributions indistinguishable and domain separation networks which extract domain-invariant and private features to model each domain separately.
This document discusses various techniques for compressing and speeding up deep neural networks, including singular value decomposition, pruning, and SqueezeNet. Singular value decomposition can be used to compress fully connected layers by minimizing the difference between the original weight matrix and its low-rank approximation. Pruning techniques remove unimportant weights below a threshold. SqueezeNet is highlighted as designing a small CNN architecture from the start that achieves AlexNet-level accuracy with 50x fewer parameters and less than 0.5MB in size.