PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
?
- Masked Autoencoders Are Scalable Vision Learners presents a new self-supervised learning method called Masked Autoencoder (MAE) for computer vision.
- MAE works by masking random patches of input images, encoding the visible patches, and decoding to reconstruct the full image. This forces the model to learn visual representations from incomplete views of images.
- Experiments on ImageNet show that MAE achieves superior results compared to supervised pre-training from scratch as well as other self-supervised methods, scaling effectively to larger models. MAE representations also transfer well to downstream tasks like object detection, instance segmentation and semantic segmentation.
Conditional Image Generation with PixelCNN Decoderssuga93
?
The document summarizes research on conditional image generation using PixelCNN decoders. It discusses how PixelCNNs sequentially predict pixel values rather than the whole image at once. Previous work used PixelRNNs, but these were slow to train. The proposed approach uses a Gated PixelCNN that removes blind spots in the receptive field by combining horizontal and vertical feature maps. It also conditions PixelCNN layers on class labels or embeddings to generate conditional images. Experimental results show the Gated PixelCNN outperforms PixelCNN and achieves performance close to PixelRNN on CIFAR-10 and ImageNet, while training faster. It can also generate portraits conditioned on embeddings of people.
Neural Discrete Representation Learning - A paper reviewAbhishek Koirala
?
1) The paper introduces Vector Quantization Variational Autoencoders (VQ-VAEs), which use discrete rather than continuous latent codes. This allows the prior to be learned from the data distribution rather than assuming a fixed prior.
2) VQ-VAEs train with a loss that enforces the encoder outputs to be close to embeddings in a learned codebook. This allows generating new samples by sampling the prior rather than relying only on reconstruction.
3) Experiments show VQ-VAEs can generate images, video, and speech that retains semantic content, while achieving likelihoods comparable to continuous latent variable models. The discrete latent space captures long-term dependencies without supervision.
The document discusses capsule networks, focusing on their architecture and functionality, including dynamic routing, activations, and handling of perspective transformations. It highlights the advantages of capsule networks, such as high accuracy on MNIST and robustness to affine transformations, while also addressing limitations like slow training and issues with closely clustered objects. Additionally, it provides links to various implementations and resources related to capsule networks.
The document discusses Vector Quantized Variational Auto Encoder 2 (VQ-VAE2), a generative model that uses discrete latent representations. VQ-VAE2 builds upon VQ-VAE by introducing hierarchical discrete latent variables to generate high-fidelity images at resolutions up to 1024x1024 in 3 sentences or less. VQ-VAE2 uses a neural network architecture with residual and skip connections, sometimes with gating operations, to model discrete latent variables at multiple levels of abstraction for generating diverse, high-quality images.
This document discusses deep generative models including variational autoencoders (VAEs) and generational adversarial networks (GANs). It explains that generative models learn the distribution of input data and can generate new samples from that distribution. VAEs use variational inference to learn a latent space and generate new data by varying the latent variables. The document outlines the key concepts of VAEs including the evidence lower bound objective used for training and how it maximizes the likelihood of the data.
This document summarizes recent advances in deep generative models with explicit density estimation. It discusses variational autoencoders (VAEs), including techniques to improve VAEs such as importance weighting, semi-amortized inference, and mitigating posterior collapse. It also covers energy-based models, autoregressive models, flow-based models, vector-quantized VAEs, hierarchical VAEs, and diffusion probabilistic models. The document provides an overview of these generative models with a focus on density estimation and generation quality.
Architecture Design for Deep Neural Networks IIIWanjin Yu
?
Neural architecture search aims to automate neural network design. Recent approaches include:
(1) Reinforcement learning searches over large spaces but requires extensive computation.
(2) One-shot approaches like DARTS jointly optimize weights and architecture, improving efficiency.
(3) New methods like Proxyless NAS directly search on target tasks and hardware, finding mobile architectures.
Neural architecture search represents progress toward fully automatic deep learning and more specialized models.
The document discusses the challenges faced in machine learning due to the need for substantial labeled training data and suggests that weak supervision can leverage abundant unlabeled data by using domain knowledge for more efficient labeling. It outlines the weak supervision process, which includes creating labeling functions, combining them with a label model, and training a downstream model, all while iterating to improve accuracy. The document concludes by emphasizing the shift to data-centric AI and mentions frameworks like Wrench and Snorkel that facilitate this new approach.
The document summarizes improvements made in MobileNetV3 models, including using complementary search techniques to find efficient building blocks, modifying nonlinearities like h-swish to be more efficient, and improving expensive layers through techniques like removing unnecessary projections. It also describes experiments that showed MobileNetV3 models achieving better performance versus V1/V2 models on tasks like image classification, object detection, and semantic segmentation while maintaining high efficiency for mobile applications.
The document outlines various methodologies for visualizing and interpreting neural networks, notably focusing on learned weights, activations, and gradient-based techniques. Key concepts discussed include class activation maps, occlusion experiments, and activation maximization, with references to notable research papers in the field. Additionally, it introduces practical applications and tools for training and visualizing deep learning models.
The document discusses federated learning, a decentralized approach to machine learning that enhances privacy by allowing nodes to share model updates instead of training data. It outlines the benefits, challenges, and tools associated with federated learning, including performance improvements with more data and considerations regarding privacy and communication efficiency. Use cases for federated learning include predictive maintenance, healthcare, and enterprise IT applications.
This document presents an overview of hypernetworks, a method where one neural network generates weights for another, inspired by natural genotype and phenotype concepts. Key applications include hyperlstm, which demonstrates improved performance over standard lstm models in various tasks such as language modeling and machine translation, achieving near state-of-the-art results. The document concludes that hypernetworks are scalable and can enhance existing architectures in large-scale production systems.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
?
This document presents advancements in deep residual networks for single image super-resolution (SISR), emphasizing techniques like global-local skip connections, post-upscaling methods, and eliminating batch normalization for improved performance. It introduces multi-scale super-resolution (MDSR) as a more efficient model, sharing parameters across scales while maintaining stability and reduced complexity compared to traditional models. The study concludes with a state-of-the-art SISR system that effectively addresses multi-scale challenges and incorporates geometric self-ensemble techniques.
The document provides an overview of graph neural networks (GNNs) and discusses their relevance in processing graph-structured data compared to traditional methods such as network embedding and graph kernel techniques. It categorizes GNNs into several types including recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs, each with unique architectures and applications. The paper also outlines key historical developments and the evolution of GNN methodologies in the context of deep learning and relational data analysis.
This document discusses Gaussian processes in machine learning. It begins by introducing Gaussian distributed random variables and the central limit theorem. It then covers maximum likelihood estimation versus maximum a posteriori probability. Next, it explains how Gaussian processes can be used for linear regression and defines a Gaussian process as a collection of random variables with a joint Gaussian distribution. The document proceeds to describe Gaussian process regression, covering properties of the covariance matrix and how predictions are made. It concludes by noting desirable properties of Gaussian process regression and references for further reading.
Image anomaly detection with generative adversarial networksSakshiSingh480
?
This document summarizes a research paper on image anomaly detection using generative adversarial networks (GANs). The proposed algorithm, called ADGAN, uses a pre-trained GAN generator to search for latent space representations of test images. If no close representation is found, the image is flagged as anomalous. In experiments on benchmark datasets, ADGAN outperformed traditional anomaly detection methods. Future work could involve jointly optimizing latent vectors and generator parameters to improve performance.
Ensemble learning uses multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent models alone. It involves techniques such as bagging and boosting. Bagging generates additional training data sets by sampling the original data with replacement and trains an ensemble of models on these data sets. Boosting trains models sequentially such that subsequent models focus on instances incorrectly predicted by preceding models, reducing errors. Both aim to reduce variance and improve predictive accuracy through model averaging or voting.
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery ivaderivader
?
The paper presents three methods for text-driven manipulation of StyleGAN imagery using CLIP:
1. Direct optimization of the latent w vector to match a text prompt
2. Training a mapping function to map text to changes in the latent space
3. Finding global directions in the latent space corresponding to attributes by measuring distances between text embeddings
The methods allow editing StyleGAN images based on natural language instructions and demonstrate CLIP's ability to provide fine-grained controls, but rely on pretrained StyleGAN and CLIP models and may struggle with unseen text or image domains.
This document summarizes recent advances in deep generative models with explicit density estimation. It discusses variational autoencoders (VAEs), including techniques to improve VAEs such as importance weighting, semi-amortized inference, and mitigating posterior collapse. It also covers energy-based models, autoregressive models, flow-based models, vector-quantized VAEs, hierarchical VAEs, and diffusion probabilistic models. The document provides an overview of these generative models with a focus on density estimation and generation quality.
Architecture Design for Deep Neural Networks IIIWanjin Yu
?
Neural architecture search aims to automate neural network design. Recent approaches include:
(1) Reinforcement learning searches over large spaces but requires extensive computation.
(2) One-shot approaches like DARTS jointly optimize weights and architecture, improving efficiency.
(3) New methods like Proxyless NAS directly search on target tasks and hardware, finding mobile architectures.
Neural architecture search represents progress toward fully automatic deep learning and more specialized models.
The document discusses the challenges faced in machine learning due to the need for substantial labeled training data and suggests that weak supervision can leverage abundant unlabeled data by using domain knowledge for more efficient labeling. It outlines the weak supervision process, which includes creating labeling functions, combining them with a label model, and training a downstream model, all while iterating to improve accuracy. The document concludes by emphasizing the shift to data-centric AI and mentions frameworks like Wrench and Snorkel that facilitate this new approach.
The document summarizes improvements made in MobileNetV3 models, including using complementary search techniques to find efficient building blocks, modifying nonlinearities like h-swish to be more efficient, and improving expensive layers through techniques like removing unnecessary projections. It also describes experiments that showed MobileNetV3 models achieving better performance versus V1/V2 models on tasks like image classification, object detection, and semantic segmentation while maintaining high efficiency for mobile applications.
The document outlines various methodologies for visualizing and interpreting neural networks, notably focusing on learned weights, activations, and gradient-based techniques. Key concepts discussed include class activation maps, occlusion experiments, and activation maximization, with references to notable research papers in the field. Additionally, it introduces practical applications and tools for training and visualizing deep learning models.
The document discusses federated learning, a decentralized approach to machine learning that enhances privacy by allowing nodes to share model updates instead of training data. It outlines the benefits, challenges, and tools associated with federated learning, including performance improvements with more data and considerations regarding privacy and communication efficiency. Use cases for federated learning include predictive maintenance, healthcare, and enterprise IT applications.
This document presents an overview of hypernetworks, a method where one neural network generates weights for another, inspired by natural genotype and phenotype concepts. Key applications include hyperlstm, which demonstrates improved performance over standard lstm models in various tasks such as language modeling and machine translation, achieving near state-of-the-art results. The document concludes that hypernetworks are scalable and can enhance existing architectures in large-scale production systems.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
?
This document presents advancements in deep residual networks for single image super-resolution (SISR), emphasizing techniques like global-local skip connections, post-upscaling methods, and eliminating batch normalization for improved performance. It introduces multi-scale super-resolution (MDSR) as a more efficient model, sharing parameters across scales while maintaining stability and reduced complexity compared to traditional models. The study concludes with a state-of-the-art SISR system that effectively addresses multi-scale challenges and incorporates geometric self-ensemble techniques.
The document provides an overview of graph neural networks (GNNs) and discusses their relevance in processing graph-structured data compared to traditional methods such as network embedding and graph kernel techniques. It categorizes GNNs into several types including recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs, each with unique architectures and applications. The paper also outlines key historical developments and the evolution of GNN methodologies in the context of deep learning and relational data analysis.
This document discusses Gaussian processes in machine learning. It begins by introducing Gaussian distributed random variables and the central limit theorem. It then covers maximum likelihood estimation versus maximum a posteriori probability. Next, it explains how Gaussian processes can be used for linear regression and defines a Gaussian process as a collection of random variables with a joint Gaussian distribution. The document proceeds to describe Gaussian process regression, covering properties of the covariance matrix and how predictions are made. It concludes by noting desirable properties of Gaussian process regression and references for further reading.
Image anomaly detection with generative adversarial networksSakshiSingh480
?
This document summarizes a research paper on image anomaly detection using generative adversarial networks (GANs). The proposed algorithm, called ADGAN, uses a pre-trained GAN generator to search for latent space representations of test images. If no close representation is found, the image is flagged as anomalous. In experiments on benchmark datasets, ADGAN outperformed traditional anomaly detection methods. Future work could involve jointly optimizing latent vectors and generator parameters to improve performance.
Ensemble learning uses multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent models alone. It involves techniques such as bagging and boosting. Bagging generates additional training data sets by sampling the original data with replacement and trains an ensemble of models on these data sets. Boosting trains models sequentially such that subsequent models focus on instances incorrectly predicted by preceding models, reducing errors. Both aim to reduce variance and improve predictive accuracy through model averaging or voting.
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery ivaderivader
?
The paper presents three methods for text-driven manipulation of StyleGAN imagery using CLIP:
1. Direct optimization of the latent w vector to match a text prompt
2. Training a mapping function to map text to changes in the latent space
3. Finding global directions in the latent space corresponding to attributes by measuring distances between text embeddings
The methods allow editing StyleGAN images based on natural language instructions and demonstrate CLIP's ability to provide fine-grained controls, but rely on pretrained StyleGAN and CLIP models and may struggle with unseen text or image domains.
MS AI School 5? - 3? ???? - AI ?? ???? ?? ?? ??? ?? ????KYOYOON JUNG
?
1. MS AI School 5?
3? ???? - 7? ???
AI ?? ???? ?? ?? ??? ?? ???? (????: ??????)
2. ?? ????? ? ?? ???
2-1. ???? ?? ????:?JIRA, Conflluence,?
MS Teams
2-2. UI/UX ???:?Figma
2-3. FrontEnd:?PowerApps
2-4. BackEnd: Ngnix, Flask API
2-5. RPA ?? ????: PowerAutomate
2-6. ??????:?MS SharePoint
2-7. ?? ? ??? ?? ???: MS Outlook, MS Teams
2-8. ??? ??:? Azure Digital Twin
2-9. 3D Modeing Tool: Blender
2-10. AI Model: Azure OpenAI (GPT-4o, Dall-e-3), YOLO, HRNet, Transformer?
2-11. AI Tool: Napkin AI
---------------------------------------
1. MS AI School 5th
3rd Project - 7 Team Outputs
AI-based Unmanned Store Theft Detection System Development Project (Service Name: Don't Steal it)
2. Utilized software and collaboration platform
2-1. Project management and collaboration:JIRA, Conflluence, MS Teams
2-2. UI/UX design: Figma
2-3. FrontEnd: PowerApps
2-4. BackEnd: Ngnix, Flask API
2-5. RPA-based automation tool: PowerAutomate
2-6. Database: MS SharePoint
2-7. Mail and message transmission platform: MS Outlook, MS Teams
2-8. Digital twin: Azure Digital Twin
2-9. 3D Modeing Tool: Blender
2-10. AI Model: Azure OpenAI (GPT-4o, Dall-e-3), YOLO, HRNet, Transformer
2-11. AI Tool: Napkin AI
26. Chatbot ?? ?? ??
? Intents: to convey purpose or goal
? Entities: make logical decisions based on user input
? Dialogs: design a conversation
? Slots: collect important information to fulfill an
intent
? Digressions and Handlers: handle unexpected
conversations.
26
82. Intent ??? -
#book_table
82
User examples
Can I book a table for 4 people, at 7pm?
can i book a table?
Can I reserve a table for 10 people at 8pm?
can I reserve a table?
I'd like to book a table
I'd like to book a table for 3 people, at 10am
I'd like to reserve a table
I'd like to reserve a table for 7 people, at 6pm
85. Intent ??? -
#locations
85
User examples
can you tell me where it is?
can you tell me where the restaurant is?
how can i get there?
please, let me know the address of the restaurant
please, let me know the location of the restaurant
please, tell me the address of the restaurant
please, tell me the location of the restaurant
plz, let me know the address of it
Where is the store at
86. Intent ??? -
#locations
86
User examples
plz, let me know the location
plz, let me know the location of it
plz, tell me the address
plz, tell me the address of it
plz, tell me the location of it
what's the address of the restaurant?
where is the restaurant
where is the restaurant located?
where is it
112. Dialog ??? -
¡°Book Table¡± Node
112
CHECK FOR SAVE IT AS
IF NOT
PRESENT, ASK
TYPE
@locations $locations Which store did you want
to go to? First or Main? Required
@sys-date $date What day do want to
come in? Required
@sys-time $time What time did you want
to arrive? Required
@sys-number $number How many people in your
party? Required
141. Node-Red ???
Node-Red: an open source logic engine that allows programmers of any level to easily write code
that can connect to APIs, hardware, IoT devices or online services
141
205. ??? API ??
205
<??>
Through Twitter API or Twitter data, I'd like to create a chatbot for showing
users recent tweets and the relevant information to enable them to utilize those things.
Since I'm a beginner as a chatbot developer, I just want to use a function of showing
last tweets on my chatbot. That's it.
212. ??? API ??
212
??: This app is about showing users last tweets posted from twitter members.
213. ??? API ??
213
??: This app is just to retrieve last tweets posted from twitter member.
I'd like to create a chatbot for showing users a couple of recent tweets. That's it .
232. Skill?? Intent ?? - #twitter
232
User examples
@blackmirror
any news from twitter
hey give me the last 3 tweets
news on the event from twitter
show me some news from twitter
show me the last 3 tweets
show me what social media is saying about the event
233. Skill?? Intent ?? - #twitter
233
User examples
some tweets
twitter @account
twitter messages
twitts please
what people is saying about the show
what people is saying on twitter
what the social media is telling about the event ?