Vector Optimization (by Jinhwan Seok. M.S student at KAIST)
The concept of vector optimization and its applications
-Regularized least squares
-Smoothing approximation
-Reconstruction
Reference)
convex optimization, Boyd (2004)
A detailed explanation of Gradient Boosted Regression Tree.
All of the top ranked teams in 2010 Yahoo! L2R challenge used Boosting methods or their variants. This slide gives a gentle, but detailed introduction to GBRT with examples.
references:
"Yahoo! Learning to Rank Challenge Overview"
"Introduction to Boosted Trees" : https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
The document discusses metric-based few-shot learning approaches. It introduces Matching Networks, which use an attention mechanism to calculate similarity between support and query embeddings. Prototypical Networks determine class membership for a query based on distance to prototype representations of each class. Relation Networks concatenate support and query embeddings and pass them through a relation module to predict relations as classification scores. The approaches aim to learn from few examples by leveraging metric learning in an embedding space.
Main obstacles of Bayesian statistics or Bayesian machine learning is computing posterior distribution. In many contexts, computing posterior distribution is intractable. Today, there are two main stream to detour directly computing posterior distribution. One is using sampling method(ex. MCMC) and another is Variational inference. Compared to Variational inference, MCMC takes more time and vulnerable to high-dimensional parameters. However, MCMC has strength in simplicity and guarantees of convergence. I'll briefly introduce several methods people using in application.
A detailed explanation of Gradient Boosted Regression Tree.
All of the top ranked teams in 2010 Yahoo! L2R challenge used Boosting methods or their variants. This slide gives a gentle, but detailed introduction to GBRT with examples.
references:
"Yahoo! Learning to Rank Challenge Overview"
"Introduction to Boosted Trees" : https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
The document discusses metric-based few-shot learning approaches. It introduces Matching Networks, which use an attention mechanism to calculate similarity between support and query embeddings. Prototypical Networks determine class membership for a query based on distance to prototype representations of each class. Relation Networks concatenate support and query embeddings and pass them through a relation module to predict relations as classification scores. The approaches aim to learn from few examples by leveraging metric learning in an embedding space.
Main obstacles of Bayesian statistics or Bayesian machine learning is computing posterior distribution. In many contexts, computing posterior distribution is intractable. Today, there are two main stream to detour directly computing posterior distribution. One is using sampling method(ex. MCMC) and another is Variational inference. Compared to Variational inference, MCMC takes more time and vulnerable to high-dimensional parameters. However, MCMC has strength in simplicity and guarantees of convergence. I'll briefly introduce several methods people using in application.
Texture synthesis aims to produce new texture samples from an example that are similar but not repetitive. It analyzes the example using a CNN to compute gram matrices representing the texture at different layers, then synthesizes new textures by passing noise through the CNN and minimizing differences from the example's gram matrices. Style transfer extends this to merge the texture of one image onto the content of another by matching gram matrices between layers to transfer style while preserving content. It has been shown that style and content are separable in CNN representations. Style transfer can be viewed as a type of domain adaptation between content and style domains.
Towards Deep Learning Models Resistant to Adversarial Attacks.SEMINARGROOT
Ìý
This document discusses approaches to training deep neural networks to be robust against adversarial examples. It frames adversarial robustness as a minimax game between the network and an attacker. It presents projected gradient descent (PGD) and the Fast Gradient Sign Method (FGSM) as ways to solve the inner maximization problem during training. Experiments show that adversarially trained models can achieve increased robustness compared to standard networks.
Node embedding techniques learn vector representations of nodes in a graph that can be used for downstream machine learning tasks like classification, clustering, and link prediction. DeepWalk uses random walks to generate sequences of nodes that are treated similarly to sentences, and learns embeddings by predicting nodes using their neighbors, like word2vec. It does not incorporate node features or labels. Node2vec extends DeepWalk by introducing a biased random walk to learn embeddings, addressing some limitations of DeepWalk while maintaining scalability.
This document discusses graph convolutional networks (GCNs), which are neural network models for graph-structured data. GCNs aim to learn functions on graphs by preserving the graph's spatial structure and enabling weight sharing. The document outlines the basic components of a GCN, including the adjacency matrix, node features, and application of deep neural network layers. It also notes some challenges with applying convolutions to graphs and discusses approaches like using the graph Fourier transform based on the Laplacian matrix.
The document discusses different methods for denoising images in the spatial and frequency domains. It introduces spatial domain denoising techniques like mean filtering, median filtering, and adaptive filtering. It then explains how spatial domain images can be transformed into the frequency domain using Fourier and wavelet transforms. This allows denoising based on frequency content, where high frequencies associated with noise can be removed. It concludes by mentioning the CVPR Denoising Workshop as a resource.
The document contains code snippets and explanations for solving three LeetCode problems: Power of Two, Valid Parentheses, and Find Minimum in Rotated Sorted Array. For Power of Two, it provides an O(log n) solution that uses modulo and division to check if a number is a power of two. For Valid Parentheses, it provides an O(n) solution that uses a string to track opening and closing parentheses. For Find Minimum, it provides both an O(n) solution that finds the minimum by checking if each number is less than the previous, and an O(log n) solution that recursively searches halves of the array to find the minimum.
This document provides an overview of time series models and concepts. It discusses stochastic processes, stationarity, the Wold decomposition, impulse response analysis, and ARMA processes. The key points are:
1) Time series models are used to identify shocks and responses over time from stochastic processes.
2) Stationarity assumptions are needed to estimate expectations and variances from time series data using the concept that these values are time-invariant.
3) The Wold decomposition represents a stationary process as the sum of a deterministic component and stochastic prediction errors/shocks.
4) Impulse response analysis examines how past shocks continue to impact the present and future through their effect over time which decays as time
This document summarizes generative models like VAEs and GANs. It begins with an introduction to information theory, defining key concepts like entropy and maximum likelihood estimation. It then explains generative models as estimating the joint distribution P(X,Y) compared to discriminative models estimating P(Y|X). VAEs are discussed as maximizing the evidence lower bound (ELBO) to estimate the latent variable distribution P(Z|X), allowing generation of new X values. GANs are also covered, defining their minimax game between a generator G and discriminator D, with G learning to generate samples resembling the real data distribution Pemp.
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
Ìý
Pang Wei Koh and Percy Liang
"Understanding Black-Box prediction via influence functions" ICML 2017 Best paper
References:
https://youtu.be/0w9fLX_T6tY
https://arxiv.org/abs/1703.04730
Attention Is All You Need (NIPS 2017)
(Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Åukasz Kaiser, Illia Polosukhin)
paper link: https://arxiv.org/pdf/1706.03762.pdf
Reference:
https://youtu.be/mxGCEWOxfe8 (by Minsuk Heo)
https://youtu.be/5vcj8kSwBCY (Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention)
The document discusses different types of attention mechanisms used in neural machine translation and image captioning models. It describes global attention which considers all encoder hidden states when deriving context vectors, and local attention which selectively focuses on a small window of context. Hard attention selects a single location to focus on, while soft attention takes a weighted average over locations. The document also discusses input feeding which makes the model aware of previous alignment choices.
This document is a tutorial on explainable AI from the WWW 2020 conference. It introduces explainable AI and discusses explanations from both a model and regulatory perspective. It then explores different methods for explaining individual predictions, global models, and building interpretable models. The remainder of the tutorial provides case studies on explaining diabetic retinopathy predictions, building an explainable AI engine for talent search, and using model interpretations for sales predictions. References are also included.
This document contains summaries of two LeetCode problems - Single Number and Product of Array Except Self.
For Single Number, it provides two O(n) solutions, one using a dictionary to track duplicate numbers and another using math by summing all elements and multiplying by 2, then subtracting the original sum.
For Product of Array Except Self, it again provides two O(n) solutions. The first uses a variable to track the running product and another to count zeros, updating the output array accordingly. The second avoids division by calculating left and right running products in two arrays and multiplying the values together for each output element.
This document summarizes the key steps in the locality sensitive hashing (LSH) algorithm for finding similar documents:
1. Documents are converted to sets of shingles (sequences of tokens) to represent them as high-dimensional data points.
2. MinHashing is applied to generate signatures (hashes) for each document such that similar documents are likely to have the same signatures. This compresses the data into a signature matrix.
3. LSH uses the signature matrix to hash similar documents into the same buckets with high probability, finding candidate pairs for further similarity evaluation and filtering out dissimilar pairs from consideration. This improves the computation efficiency over directly comparing all pairs.
This document discusses two algorithms for solving the Two Sum problem from LeetCode: an O(n^2) nested loop solution and an O(n) hash table solution. It also presents a coding interview question to find the maximum prime factor of a given number N and provides a solution using a while loop to iteratively check for divisibility.