Energy based models and boltzmann machines - v2.0Soowan Lee
油
This is version 2.0 of my previous slides. (http://www.slideshare.net/blaswan/energy-based-models-and-boltzmann-machines)
Removed very simple recommendation example and Added feature extractor example referenced from Hinton's lecture.
This document describes using Bayesian inference to locate an opponent in a two-dimensional paintball arena based on the locations of paint spatters on the wall. It defines a joint distribution over all possible (x,y) coordinates of the opponent's location. Given observed spatter locations, it computes the posterior distribution, which provides the likelihood of each possible location. This allows extracting marginal and conditional distributions over each dimension, as well as computing credible intervals to identify likely regions where the opponent may be hiding.
Energy based models and boltzmann machines - v2.0Soowan Lee
油
This is version 2.0 of my previous slides. (http://www.slideshare.net/blaswan/energy-based-models-and-boltzmann-machines)
Removed very simple recommendation example and Added feature extractor example referenced from Hinton's lecture.
This document describes using Bayesian inference to locate an opponent in a two-dimensional paintball arena based on the locations of paint spatters on the wall. It defines a joint distribution over all possible (x,y) coordinates of the opponent's location. Given observed spatter locations, it computes the posterior distribution, which provides the likelihood of each possible location. This allows extracting marginal and conditional distributions over each dimension, as well as computing credible intervals to identify likely regions where the opponent may be hiding.
Kernel functions allow measuring the similarity between objects without explicitly representing them as feature vectors. The kernel trick enables applying algorithms designed for explicit feature vectors, like support vector machines (SVMs), to implicit spaces defined by kernels. SVMs find a sparse set of support vectors that define the decision boundary by maximizing margin and minimizing error. They can perform both classification using a hinge loss function and regression using an epsilon-insensitive loss function.
This document describes a simulation of kidney tumor growth to determine the probability that a tumor formed before a patient's retirement from the military. The simulation models tumor growth rates based on data from a medical study. It generates random growth rates from a fitted distribution and simulates tumor size over time. The results are cached in a joint distribution of size and age. Conditional distributions of age given size are extracted and percentiles plotted to summarize the results. Potential issues with the modeling assumptions are discussed, including the effects of non-spherical tumor shapes and serial correlation in growth rates over time.
This document describes using a feature-based Markov Decision Process (MDP) and policy iteration to develop an algorithm that learns to play the game Tetris well. It formulates Tetris as an MDP with states defined by wall configurations and piece placement. An approximated value function is defined using features of the game state like column heights. Policy iteration is then used to iteratively update the weight vector of this approximated value function to learn an optimal policy. Simulation results show the learning algorithm achieves much higher scores on Tetris compared to a heuristic algorithm.
This document discusses sparse linear models and Bayesian variable selection. It introduces the spike and slab model for Bayesian variable selection, which uses a binary vector 粒 to indicate whether features are relevant or not. Computing the posterior p(粒|D) involves calculating the marginal likelihood p(D|粒). Greedy search and stochastic search methods are discussed to approximate the posterior over models. L1 regularization, also known as lasso, is introduced as an optimization technique since computing the posterior for discrete 粒 is difficult. Lasso replaces the discrete priors with continuous priors to encourage sparsity. Coordinate descent is discussed as an algorithm to optimize the lasso objective function.
A Markov model assumes that the current state captures all relevant information for predicting the future. It can be used for language modeling by assigning probabilities to word sequences. Google's PageRank algorithm ranks web pages based on the principle that more authoritative pages, as determined by other pages linking to them, should rank higher. It models the probability of being on a page as a stationary distribution of a Markov chain defined by the link structure of the web.
21. 9.1 Introduction
9.2 The exponential family
9.2.1 Definition
9.2.2.1 Bernoulli
9.2.2.2 Multinoulli
9.2.2.3 Univariate Gaussian
9.2.3 Log partition function
9.2.3.1 Example: the Bernoulli distribution
9.3 Generalized linear models (GLMs)
9.3.1 Basics
9.3.2 ML and MAP estimation
9.3.3 Bayesian inference