際際滷

際際滷Share a Scribd company logo
Probably, Definitely, 
Maybe 
James McGivern
Does intelligent life 
exist in our galaxy?
N = R* 揃 fp 揃 ne 揃 fL 揃 fi 揃 fc 揃 L 
N = the number of civilisa!ons in our galaxy with which radio-communica!on might be 
possible 
R* = the average rate of star forma!on in our galaxy 
fp = the frac!on of those stars that have planets 
ne = the average number of planets that can poten!ally support life per star that has 
planets 
fl = the frac!on of planets that could support life that actually develop life at some point 
fi = the frac!on of planets with life that actually go on to develop intelligent life 
(civilisa!ons) 
fc = the frac!on of civilisa!ons that develop a technology that releases detectable signs of 
their existence into space 
L = the length of !me for which such civilisa!ons release detectable signals into space
MAths 
Warning
- Chapter I - 
Bayesian Probability 
& 
Bayesian Statistics
The Mean 
Given a set of variables 
we define the mean (average)
Example: Amazon 
 Users can rate a product by vo!ng 1-5 stars 
 product ra!ng is the mean of the user votes 
Q: how can we rank products with different number of votes?
Simple Bayesian Ranking 
 C - the mean vote across all items 
 v - number of votes for a given item 
 R - the mean ra!ng of the item 
 m - number of votes required to be in top n percen!le
Book Number of votes (v) Average Ra!ng (R) Bayesian Rank 
A 100 5 4.333333... 
B 70 5 4.17 
C 50 4 3.5 
D 30 4 3.375 
E 20 3.5 3.14 
F 30 3 3 
G 5 2 2.91 
C = 3 m = 50
A Detour In To 
Probability Basics
Events 
 Consider an experiment whose set of all possible outcomes 立, called the 
sample space, is {x1,... ,xn} 
 We define an event E as a subset of 立 and say that E occurs iff the 
experiment outcomes equal E 
 Union 
 Intersec!on
Probability Axioms: 1 
 We denote the probability of an event A by P(A) 
 For any event A, 0  P(A)  1 
 The certain event, 立, always occurs and P(立)=1 
 The impossible event  never occurs and P()=0
Probability Axioms: 2 
 We say that events A and B are disjoint if AB =  
 if A and B are disjoint then P(AB) = P(A) + P(B) 
 for a set of disjoint events, the addi!on law gives us:
Probability Lemmas 
 For any event E, P(Ec) = P(測E) = 1 - P(E) 
 P(AB) = P(A)  P(AB) 
 If AB then P(A)  P(B) 
 P(AB) = P(A) + P(B)  P(AB)
Random Variables 
 Consider a random variable X, then {Xx} is the event that X has a value less 
than or equal to the real number x. Hence the probability that this event 
occurs is P(Xx) 
 If we allow x to vary we can define the distribu!on func!on 
F(x) = P(Xx) - < x <  
 Note that: 
 P(X>x) = 1 - F(x) 
 P(a<X<b) = F(b) - F(a)
Probability Mass Function 
The probability mass func!on (PMF) of X 
is a probability measure of the possible values for the 
random variable. Of course 
PMF for a fair 
6 sided dice
Example: PMF of 2 Fair Die 
1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 
1,6 
1,5 2,5 2,6 
1,4 2,4 3,4 3,5 3,6 
1,3 2,3 3,3 4,3 4,4 4,5 4,6 
1,2 2,2 3,2 4,2 5,2 5,3 5,4 5,5 5,6 
1,1 2,1 3,1 4,1 5,1 6,1 6,2 6,3 6,4 6,5 6,6 
2 3 4 5 6 7 8 9 10 11 12
Uniform Distribution 
0.5 
0.375 
0.25 
0.125 
0 
0 1 2 3 4 5 6 7 8 9
Bernoulli Distribution 
0.7 
0.525 
0.35 
0.175 
0 
0 1
Binomial Distribution
Probability Axioms: 3 
 Given two events A and B we define the condi!onal probability P(A | B) by: 
P(AB) = P(A | B) P(B) 
 Given two events A and B we say that they are independent iff: 
P(AB) = P(A) P(B)
Prior & Posterior Distributions 
 P(慮) the prior probability distribu!on of 慮 
 P(慮|X) is the posterior probability of 慮 given X 
 The posterior probability can be written in the memorable form as: 
posterior probability * likelihood * prior probability 
 If the posterior distribu!ons P(慮|X) are in the same family as the prior 
probability distribu!on P(慮), the prior and posterior are then called conjugate 
distribu!ons, and the prior is called a conjugate prior
Bayes Theorem 
From the defini!on of condi!onal probability we know: 
Hence 
Or if B1,...,Bn form a par!!on of the sample space
Example: Pregnancy Tests 
 Pregnancy tests detect the presence of hCG, or human chorionic gonadotropin, in 
the blood or urine 
 A false posi!ve is when the test incorrectly returns a posi!ve result, and false 
nega!ve when it incorrectly returns a false one. 
 False posi!ves in the hcg test include: 
 non-pregnant produc!on of the hCG molecule 
 use of drugs containing the hCG molecule 
 Some medica!ons cause a posi!ve reac!on in the tests 
 The actual probability of being pregnant depends on many messy biological factors
Pregnant? 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Pregnant? 
P(B) 
P(測B) 
P(BA) 
P(B測A) 
P(測BA) 
P(測B測A) 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Pregnant? 
P(B) 
P(測B) 
P(A|B) 
P(測A|B) 
P(A|測B) 
P(測A|測B) 
P(B) 
P(測B) 
P(BA) 
P(B測A) 
P(測BA) 
P(測B測A) 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Q:Given the test is posi!ve what is the probability 
that the subject is pregnant?
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant 
P(A) 
P(測A) 
P(AB) 
P(測AB) 
P(A測B) 
P(測A測B)
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant 
P(A) 
P(測A) 
P(B|A) 
P(測B|A) 
P(測B|A) 
P(測B|測A) 
P(A) 
P(測A) 
P(AB) 
P(測AB) 
P(A測B) 
P(測A測B)
Example: Disease Diagnosis 
Q: Consider the set S={si : i = 0...N} of all disease symptoms, and Ds+={si : si in S} are 
the diagnos!cally inclusive symptoms of Ebola, and Ds-={si : si in S} the 
exclusionary symptoms. Given a pa!ent has some combina!on of symptoms 
Ps={si : si in S}, what is the probability they have Ebola? 
 The presence or absence of some symptoms can completely rule out the 
diagnosis 
 By upda!ng the model based on real outcomes it is possible to provide more 
and more accurate predic!ons
Expected Values & Moments 
 Suppose random variable X can take value x1 with probability p1, value x2 with 
probability p2, and so on. Then the expecta!on of this random variable X is 
defined as: 
E[X] = p1x1 + p2x2 + ... + pkxk 
 The variance of a random variable X is its second central moment, the 
expected value of the squared devia!on from the mean 亮 = E[X]: 
Var(X) = E[(X-亮)2]
Variance & Covariance 
 Variance is a measure of how far a set of numbers differs from the mean of 
those numbers. The square root of the variance is the standard devia!on  
 CERN uses the 5-sigma rule to rule out sta!s!cal anomalies in sensor 
readings, i.e. is the value NOT the expected value of noise 
 The covariance between two jointly distributed random variables X and Y 
with finite second moments is defined as: 
(X,Y) = E[(X - E[X])  (Y - E[Y])]
Gaussian Distribution 
P(x) x 亮  2 3 4 5
Beta Distribution
End of Detour
Example: Amazon (Revisited) 
 Users can rate a product by vo!ng 1-5 stars 
 product ra!ng is the mean of the user votes 
Q: how can we rank products with different number of votes?
Simple Bayesian Ranking 
Assume the vote posterior distribu!on is a Normal, then the prior is also a Normal*, 
with mean 
prior mean 
prior precision 
(*) http://en.wikipedia.org/wiki/Conjugate_prior 
precision of vote 
distribu!on
Example: YouTube 
 Users can rate a clip by vo!ng +/-1 
 clip ra!ng is the mean of the user votes 
 clips also record the number of views 
Q: how can we compare clip ra!ngs with different number of votes and views? 
Q: how can we make results more relevant? e.g. take in to account how old 
ra!ngs are, author provenance, cost of incorrect ranking promo!on
Bayesian Rank 
 Since this is a +/- Bernoulli Trial we can model the prior belief distribu!on by a 
beta func!on* 
 Let = upvote bias + number of up votes 
 Let = downvote bias + number of down votes + 1 
 Every !me we receive a new vote we just recalculate the distribu!on
 To map between a belief and a sor!ng criterion we make a decision using a 
loss func!on L 
 Since the value of L depends on the value of a random variable we use 
instead the expected value of L 
 Consider a mul!linear loss func!on: 
since we want to minimise the loss we have:
Extending The Loss Function 
 Suppose in addi!on to the vote counts we also record the !mestamp of the 
votes 
 Items becomes less relevant in the rank the longer it is since the last vote 
following a pattern of exponen!al decay 
 Hence the current up or down vote count is now determined by 
 Thus we derive an updated rank func!on from
The House THat Skynet Built 
 SkyNet SmartHome is a system designed to manage the state of various 
household resources, e.g. hea!ng, ligh!ng, media-centre, etc 
 it communicates with users smart phones to iden!fy their loca!on 
 its aims to are to maximise the comfort (e.g. room temperature, hot water) of 
the users and minimise on waste (e.g. power consump!on) 
 users can provide feedback to correct inappropriate behaviour, e.g turning 
the hea!ng up too high
Predicting Behaviour 
Q: Given the user is leaving the office what ac!ons should Skynet smarthome 
take? 
 the variables could include: 
 day of the week 
 work day or holiday 
 weather 
 season 
 calendar events
Bayesian Networks 
 A BN is a probabilis!c directed acyclic graphical model that represents a set of 
random variables and their condi!onal dependencies 
 Ver!ces may be observable quan!!es, latent variables, unknown parameters 
or hypotheses 
 Ver!ces that are not connected represent variables that are condi!onally 
independent of each other
Example: Cancer 
Pollu!on Smoker 
Cancer 
X-Ray Dyspnoea
Example: Cancer 
S P(S) 
TRUE 0.3 
FALSE 0.7 
C P(D+|C) 
T 0.65 
F 0.3 
P P(P) 
Low 0.9 
High 0.1 
C P(XRay+|C) 
T 0.9 
F 0.2 
P S P(C|PmS) 
Low T 0.03 
Low F 0.001 
High T 0.05 
High F 0.02
Predictive Reasoning 
Pollu!on Smoker 
Cancer 
X-Ray Dyspnoea 
Direc!on of Reasoning 
EVIDENCE 
QUERY 
QUERY QUERY
Diagnostic Reasoning 
Pollu!on Smoker 
QUERY QUERY 
Cancer 
X-Ray Dyspnoea 
Direc!on of Reasoning 
EVIDENCE 
QUERY
Intercausal Reasoning 
Pollu!on Smoker 
QUERY EVIDENCE 
Cancer 
EVIDENCE 
X-Ray Dyspnoea
Combined Reasoning 
Pollu!on Smoker 
Cancer 
EVIDENCE 
QUERY 
X-Ray Dyspnoea 
EVIDENCE
- Chapter II - 
Markov Models
Stochastic Process 
 A stochas!c process, or random process, is a collec!on of random variables 
represen!ng the evolu!on of some system over !me 
 Examples: stock market value and exchange rate fluctua!ons, audio and video 
signals, EKG & EEG readings 
 They can be classified as: 
 Discrete !me & discrete space 
 Discrete !me & con!nuous space 
 Con!nuous !me & discrete space 
 Con!nuous !me & con!nuous space
A Detour In To Matrix 
Algebra
Vectors & Matrices 
1x3 matrix 
or vector 3x2 matrix
Matrix Addition 
If A and B are two m by n matrices then addi!on is defined by:
Matrix Multiplication 
If A is n*m matrix and B is an m*p matrix then mul!plica!on is defined by: 
where
Probably, Definitely, Maybe
Probably, Definitely, Maybe
Probably, Definitely, Maybe
Probably, Definitely, Maybe
Probably, Definitely, Maybe
The Identity Matrix 
If A is n*m matrix and then the iden!ty matrix I is an m*n matrix such that A I=A. 
I is defined by: 
So the 3*3 iden!ty matrix is:
Inverse Matrix 
If A is n*n matrix and then the inverse of A , A-1, is an n*n matrix such that A 
A-1=I. Non-square matrices do not have inverses
Matrix Transposition 
If A is n*m matrix and then the transpose of A , AT, is an m*n matrix defined by: 
For example, consider a 2*2 matrix:
End of Detour
 A Markov chain is a directed graph whose ver!ces represent states and the 
edges the probability of transi!on between the two states 
 Frequently we use an adjacency matrix representa!on of the graph T called 
the transi!on matrix 
 Hence for a chain with N ver!ces the transi!on matrix is NxN 
 The ini!al state of the system, S0, is also an NxN matrix 
 The state evolves according to Sn+1 = SnT = STn 
 The value of Sn+1(i,j) is the probability of being at that state in step n+1
Example: British Weather 
 England is the land of rain. We never have two nice days in a row. In fact if a 
nice day is always followed by either rain or snow. If there is a change from 
rain or snow only half the !me is this a change to sunny weather 
Rain Nice Snow 
Rain 
Nice 
Snow
Snow 
1/4 
1/4 
1/4 
1/2 
1/4 
Rain Nice 
1/2 
1/2 
1/2 
0 
R N S 
R 
N 
S
Absorbing Markov Chains 
 A state si of a Markov chain is called absorbing if it is impossible to leave it (i.e., 
Ti,i = 1) 
 A Markov chain is absorbing if: 
 it has at least one absorbing state 
 every state is connected to at least one absorbing state 
 In an absorbing Markov chain, the probability that the process will be absorbed 
is 1 (i.e., Qn  0 as n  )
Example: The Wandering Drunk 
 Consider a city divided up in some some grid, e.g. square blocks. 
 The drunk can move 1 block per turn, each direc!on has equal probability 
 If the drunk reaches Home or the Bar they will stay there 
 Ques!ons we can answer: 
 What is the expect !me un!l an absorbing state is reached? 
 How many !mes does the drunk visit each intersec!on?
Ergodic Markov Chains 
 Markov chain is called a regular chain if some power of the transi!on matrix has 
only posi!ve elements 
 A Markov chain is called an ergodic chain if it is possible to go from every state 
to every state 
 Therefore every regular chain is ergodic but not every ergodic chain is 
regular! 
 It can be shown that Tn converges to a constant matrix with all posi!ve 
entries as n
Example: Queuing Theory 
 Queuing theory is, naturally, the mathema!cal formula!on of queues, aimed at 
making predic!ons of queue length, wai!ng !mes and throughput 
 Single queueing nodes are usually described using Kendall's nota!on in the 
form A/S/C 
 First rigourous model was M/D/1, followed by M/D/k 
 M stands for Markov and means arrivals occur according to a Poisson process 
 D stands for determinis!c and means jobs arriving at the queue require a 
fixed amount of service 
 k describes the number of servers at the queueing node (k = 1, 2,...).
Hidden Markov models 
 The name is misleading! Nothing is unknown... 
 In a basic Markov model the states of the system are visible. E.g. we can see if 
it is snowing 
 In a hidden Markov Model the (en!re) state is not directly visible but some 
outputs dependent on the state are observable 
 However we s!ll need to know all the transi!on probability values!
Markov decision processes 
 Markov decision processes are an extension of Markov chains; the difference is 
the addi!on of ac!ons (allowing choice) and rewards (giving mo!va!on) 
 A Markov decision process is a 5-tuple (S, A, PA, RA, L) 
 The core problem is to choose an ac!on  that will maximise some cumula!ve 
func!on, e.g. 
 MDPs can be easily solved by linear (e.g. Simplex method) or dynamic programming 
(e.g. Map-Reduce)
- Chapter III - 
Kalman Filters
A Linear Dynamic System 
 Con!nuous !me defini!on: 
 Discrete !me defini!on: 
 The systems are called linear since given any two solu!ons x(t) & y(t) then 
any linear combina!on of these solu!ons is also a solu!on
Example: Climate Control 
 SkyNet SmartHome is able to monitor the temperature of rooms in the house 
and effect hea!ng/AC to regulate the temperature 
 The temperature sensors contain noise 
 hea!ng and AC are either ON or OFF 
 similar systems exist for humidity
Consider...
Kalman Filters 
 Developed ~1960 by Rudolf E. K叩lm叩n, Peter Swerling, and Richard S. Bucy. 
 First implemented in NASA as part of the Apollo naviga!on computer 
 S!ll used in many aeronau!c and military applica!ons, e.g. submarines, cruise 
missiles, NASA Space Shuttle, ISS 
 There are generalisa!on of the basic Kalman filters for con!nuous !me 
systems as well as non-linear systems
 Kalman Filters work by making a predic!on of the future, get!ng a 
measurement from reality, comparing the two, modera!ng this difference, and 
adjus!ng its es!mate with this moderated value. 
 Kalman filters are: 
 discrete 
 recursive 
 Extremely accurate if you have a good model
Overview 
current 
es!mate measured 
value 
previous 
es!mate 
Kalman gain
Basic Kalman Filters 
 Modelled on a Markov chain built on linear operators perturbed by errors that 
may include Gaussian noise 
 The state of the system is represented by a vector of real numbers 
 The filter is recursive, i.e. only the es!mated state from the previous !me step 
and the current measurement are needed to compute the es!mate for the 
current state 
 typically we describe the algorithm in two phases: predict & update
1: Predict: State Estimation 
Predicted state 
Previous es!mate of 
state 
Control vector 
State transi!on 
matrix 
Control matrix
2: Predict: Error Estimation 
State transi!on 
matrix 
Es!mated process 
error covariance 
Covariance predic!on 
Previous 
covariance 
es!mate
3: Update: Innovation Covariance 
Observa!on Matrix 
Measurement Vector 
Covariance Innova!on 
Predicted state (step 1)
4: Update: Innovation Covariance 
Observa!on Matrix 
Covariance predic!on 
Covariance (step 2) 
Innova!on 
Es!mated measurement error 
covariance
5: Update: Kalman Gain 
Observa!on Matrix 
Covariance predic!on 
(step 2) 
Kalman Gain 
Covariance Innova!on 
(step 3)
6: Update: State 
Predicted state (step 1) Covariance Innova!on 
Kalman Gain (step 5) 
New state es!mate 
(step 3)
7: Update: Covariance 
New es!mate of 
error 
Kalman Gain (step 5) 
Observa!on Matrix Covariance 
predic!on (step 2)
Example: Voltmeter 
 Consider a voltmeter measuring a constant DC voltage via a sensor with noise. 
 The system can be described by: 
Vn = Vn-1 + wn 
 Since the voltage is constant using a Kalman filter allows us to filter out the 
noise wn 
 Also since this is a single state example all matrices are of size 1*1
 A: State transi!on - since the previous state should equal the current state 
A=1 
 H: Observa!on transform - since were taking direct measurements from the 
sensor H=1 
 B: Control matrix - we have no controls so B=0 
 Q: Process covariance - since we know the model very accurately Q=0.00001 
 R: Measurement covariance - we dont trust the sensor too much so R=0.1 
 X: Ini!al state es!mate = any number 
 P: Inital covariance es!mate = 1 (because)
Probably, Definitely, Maybe
Probably, Definitely, Maybe
Probably, Definitely, Maybe
As a programmer your 
challenge is to find the right 
filter model and determine 
the values of the matrices
Example: Robo-copter 
XCell Tempest 
Helicopter 
Freezin 
Eskimo
RL:Helicopter 
 http://library.rl-community.org/wiki/Helicopter_(Java) 
 Sensors to determine: 
 bearing 
 accelera!on (velocity) 
 posi!on (GPS) 
 rota!onal rates 
 iner!al measurement unit 
 and more...
Summary
Probably, Definitely, Maybe
Events & Random Variables
Events & Random Variables 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Networks 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Markov Networks 
Chains 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Markov Networks 
Chains Random 
Conditional 
Probability 
Walks
Events & Random Variables 
Bayesian 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Events & Random Variables 
Bayesian 
Kalman 
Filters 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Events & Random Variables 
Bayesian 
Kalman 
Filters 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Books 
 Introduc!on to Probability - Grindstead & Snell 
http://www.dartmouth.edu/~chance/teaching_aids/books_ar!cles/probability_book/book.html 
 Bayesian Ar!ficial Intelligence - Kevin B. Korb & Ann E. Nicholson 
 An Introduc!on to Stochas!c Modelling - Mark A Pinsky & Samuel Karlin 
 Stochas!c Processes and Filtering Theory - Andrew H. Jazwinski 
 Ar!ficial Intelligence: A Modern Approach - Stuart Russell and Peter Norvig
Java Libraries 
 Apache Commons Math: http://commons.apache.org/proper/commons-math/ 
 Colt - high performance data structures and algorithms: http://dst.lbl.gov/ 
ACSSoftware/colt/ 
 Parallel Colt: https://sites.google.com/site/piotrwendykier/software/parallelcolt 
 JBlas - high performance Java API for na!ve libraries LAPACK, BLAS, & ATLAS: 
http://mikiobraun.github.io/jblas/ 
 The rest... http://code.google.com/p/java-matrix-benchmark/
Other Resources 
 http://www.probabilitycourse.com 
 http://masanjin.net/blog/bayesian-average - detailed deriva!on of bayesian 
averaging via normal distribu!ons 
 http://fulmicoton.com/posts/bayesian_ra!ng/ - an alterna!ve deriva!on of 
bayesian averaging 
 http://www.!na-vision.net/docs/memos/1996-002.pdf - a beau!fully simple 
deriva!on of Kalman filters 
 http://www.intechopen.com/books/kalman-filter - ar!cles on applica!ons of 
Kalman filters
Thank You
Probably, Definitely, 
Maybe 
James McGivern
Ad

Recommended

A1 25 Life
A1 25 Life
Park University
Lec13_Bayes.pptx
Lec13_Bayes.pptx
KhushiDuttVatsa
pattern recognition
pattern recognition
MohammadMoattar2
Statistics And Exploratory Data Analysis
Statistics And Exploratory Data Analysis
Jason J Pulikkottil
PTSP PPT.pdf
PTSP PPT.pdf
goutamkrsahoo
Bayesian data analysis1
Bayesian data analysis1
SaritaTripathy3
Bayesian inference
Bayesian inference
CharthaGaglani
Project
Project
Eoghan Sheridan
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
ZineddineALICHE1
Bayesian statistics
Bayesian statistics
Alberto Labarga
Lec12-Probability.ppt
Lec12-Probability.ppt
akashok1v
Lec12-Probability.ppt
Lec12-Probability.ppt
RohitKumar639388
Lec12-Probability.ppt
Lec12-Probability.ppt
ssuserc7c104
Lec12-Probability (1).ppt
Lec12-Probability (1).ppt
ShubhamChauhan562815
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
ImpanaR2
Uncertainty
Uncertainty
Digvijay Singh
02 math essentials
02 math essentials
Poongodi Mano
Bayes Theorem - Probability and Statistics
Bayes Theorem - Probability and Statistics
EMALLIKARJUNAREDDY
Probability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
Econometrics 2.pptx
Econometrics 2.pptx
fuad80
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
Lecture 2
Lecture 2
Shravan Vasishth
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth
Probability cheatsheet
Probability cheatsheet
Suvrat Mishra
Introduction to Naive bayes and baysian belief network
Introduction to Naive bayes and baysian belief network
Kp Sharma
Introduction to Bayesian Statistics
Introduction to Bayesian Statistics
Philipp Singer
Bayesian networks
Bayesian networks
Massimiliano Patacchiola
Probability_Review.ppt
Probability_Review.ppt
GireeshNcs
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance

More Related Content

Similar to Probably, Definitely, Maybe (20)

lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
ZineddineALICHE1
Bayesian statistics
Bayesian statistics
Alberto Labarga
Lec12-Probability.ppt
Lec12-Probability.ppt
akashok1v
Lec12-Probability.ppt
Lec12-Probability.ppt
RohitKumar639388
Lec12-Probability.ppt
Lec12-Probability.ppt
ssuserc7c104
Lec12-Probability (1).ppt
Lec12-Probability (1).ppt
ShubhamChauhan562815
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
ImpanaR2
Uncertainty
Uncertainty
Digvijay Singh
02 math essentials
02 math essentials
Poongodi Mano
Bayes Theorem - Probability and Statistics
Bayes Theorem - Probability and Statistics
EMALLIKARJUNAREDDY
Probability Cheatsheet.pdf
Probability Cheatsheet.pdf
ChinmayeeJonnalagadd2
Econometrics 2.pptx
Econometrics 2.pptx
fuad80
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
Lecture 2
Lecture 2
Shravan Vasishth
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth
Probability cheatsheet
Probability cheatsheet
Suvrat Mishra
Introduction to Naive bayes and baysian belief network
Introduction to Naive bayes and baysian belief network
Kp Sharma
Introduction to Bayesian Statistics
Introduction to Bayesian Statistics
Philipp Singer
Bayesian networks
Bayesian networks
Massimiliano Patacchiola
Probability_Review.ppt
Probability_Review.ppt
GireeshNcs
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
ZineddineALICHE1
Lec12-Probability.ppt
Lec12-Probability.ppt
akashok1v
Lec12-Probability.ppt
Lec12-Probability.ppt
ssuserc7c104
2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
ImpanaR2
02 math essentials
02 math essentials
Poongodi Mano
Bayes Theorem - Probability and Statistics
Bayes Theorem - Probability and Statistics
EMALLIKARJUNAREDDY
Econometrics 2.pptx
Econometrics 2.pptx
fuad80
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth
Probability cheatsheet
Probability cheatsheet
Suvrat Mishra
Introduction to Naive bayes and baysian belief network
Introduction to Naive bayes and baysian belief network
Kp Sharma
Introduction to Bayesian Statistics
Introduction to Bayesian Statistics
Philipp Singer
Probability_Review.ppt
Probability_Review.ppt
GireeshNcs

Recently uploaded (20)

You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
Securing AI - There Is No Try, Only Do!.pdf
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
Ad

Probably, Definitely, Maybe

  • 2. Does intelligent life exist in our galaxy?
  • 3. N = R* 揃 fp 揃 ne 揃 fL 揃 fi 揃 fc 揃 L N = the number of civilisa!ons in our galaxy with which radio-communica!on might be possible R* = the average rate of star forma!on in our galaxy fp = the frac!on of those stars that have planets ne = the average number of planets that can poten!ally support life per star that has planets fl = the frac!on of planets that could support life that actually develop life at some point fi = the frac!on of planets with life that actually go on to develop intelligent life (civilisa!ons) fc = the frac!on of civilisa!ons that develop a technology that releases detectable signs of their existence into space L = the length of !me for which such civilisa!ons release detectable signals into space
  • 5. - Chapter I - Bayesian Probability & Bayesian Statistics
  • 6. The Mean Given a set of variables we define the mean (average)
  • 7. Example: Amazon Users can rate a product by vo!ng 1-5 stars product ra!ng is the mean of the user votes Q: how can we rank products with different number of votes?
  • 8. Simple Bayesian Ranking C - the mean vote across all items v - number of votes for a given item R - the mean ra!ng of the item m - number of votes required to be in top n percen!le
  • 9. Book Number of votes (v) Average Ra!ng (R) Bayesian Rank A 100 5 4.333333... B 70 5 4.17 C 50 4 3.5 D 30 4 3.375 E 20 3.5 3.14 F 30 3 3 G 5 2 2.91 C = 3 m = 50
  • 10. A Detour In To Probability Basics
  • 11. Events Consider an experiment whose set of all possible outcomes 立, called the sample space, is {x1,... ,xn} We define an event E as a subset of 立 and say that E occurs iff the experiment outcomes equal E Union Intersec!on
  • 12. Probability Axioms: 1 We denote the probability of an event A by P(A) For any event A, 0 P(A) 1 The certain event, 立, always occurs and P(立)=1 The impossible event never occurs and P()=0
  • 13. Probability Axioms: 2 We say that events A and B are disjoint if AB = if A and B are disjoint then P(AB) = P(A) + P(B) for a set of disjoint events, the addi!on law gives us:
  • 14. Probability Lemmas For any event E, P(Ec) = P(測E) = 1 - P(E) P(AB) = P(A) P(AB) If AB then P(A) P(B) P(AB) = P(A) + P(B) P(AB)
  • 15. Random Variables Consider a random variable X, then {Xx} is the event that X has a value less than or equal to the real number x. Hence the probability that this event occurs is P(Xx) If we allow x to vary we can define the distribu!on func!on F(x) = P(Xx) - < x < Note that: P(X>x) = 1 - F(x) P(a<X<b) = F(b) - F(a)
  • 16. Probability Mass Function The probability mass func!on (PMF) of X is a probability measure of the possible values for the random variable. Of course PMF for a fair 6 sided dice
  • 17. Example: PMF of 2 Fair Die 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1,6 1,5 2,5 2,6 1,4 2,4 3,4 3,5 3,6 1,3 2,3 3,3 4,3 4,4 4,5 4,6 1,2 2,2 3,2 4,2 5,2 5,3 5,4 5,5 5,6 1,1 2,1 3,1 4,1 5,1 6,1 6,2 6,3 6,4 6,5 6,6 2 3 4 5 6 7 8 9 10 11 12
  • 18. Uniform Distribution 0.5 0.375 0.25 0.125 0 0 1 2 3 4 5 6 7 8 9
  • 19. Bernoulli Distribution 0.7 0.525 0.35 0.175 0 0 1
  • 21. Probability Axioms: 3 Given two events A and B we define the condi!onal probability P(A | B) by: P(AB) = P(A | B) P(B) Given two events A and B we say that they are independent iff: P(AB) = P(A) P(B)
  • 22. Prior & Posterior Distributions P(慮) the prior probability distribu!on of 慮 P(慮|X) is the posterior probability of 慮 given X The posterior probability can be written in the memorable form as: posterior probability * likelihood * prior probability If the posterior distribu!ons P(慮|X) are in the same family as the prior probability distribu!on P(慮), the prior and posterior are then called conjugate distribu!ons, and the prior is called a conjugate prior
  • 23. Bayes Theorem From the defini!on of condi!onal probability we know: Hence Or if B1,...,Bn form a par!!on of the sample space
  • 24. Example: Pregnancy Tests Pregnancy tests detect the presence of hCG, or human chorionic gonadotropin, in the blood or urine A false posi!ve is when the test incorrectly returns a posi!ve result, and false nega!ve when it incorrectly returns a false one. False posi!ves in the hcg test include: non-pregnant produc!on of the hCG molecule use of drugs containing the hCG molecule Some medica!ons cause a posi!ve reac!on in the tests The actual probability of being pregnant depends on many messy biological factors
  • 25. Pregnant? Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 26. Pregnant? P(B) P(測B) P(BA) P(B測A) P(測BA) P(測B測A) Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 27. Pregnant? P(B) P(測B) P(A|B) P(測A|B) P(A|測B) P(測A|測B) P(B) P(測B) P(BA) P(B測A) P(測BA) P(測B測A) Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 28. Q:Given the test is posi!ve what is the probability that the subject is pregnant?
  • 29. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant
  • 30. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant P(A) P(測A) P(AB) P(測AB) P(A測B) P(測A測B)
  • 31. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant P(A) P(測A) P(B|A) P(測B|A) P(測B|A) P(測B|測A) P(A) P(測A) P(AB) P(測AB) P(A測B) P(測A測B)
  • 32. Example: Disease Diagnosis Q: Consider the set S={si : i = 0...N} of all disease symptoms, and Ds+={si : si in S} are the diagnos!cally inclusive symptoms of Ebola, and Ds-={si : si in S} the exclusionary symptoms. Given a pa!ent has some combina!on of symptoms Ps={si : si in S}, what is the probability they have Ebola? The presence or absence of some symptoms can completely rule out the diagnosis By upda!ng the model based on real outcomes it is possible to provide more and more accurate predic!ons
  • 33. Expected Values & Moments Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, and so on. Then the expecta!on of this random variable X is defined as: E[X] = p1x1 + p2x2 + ... + pkxk The variance of a random variable X is its second central moment, the expected value of the squared devia!on from the mean 亮 = E[X]: Var(X) = E[(X-亮)2]
  • 34. Variance & Covariance Variance is a measure of how far a set of numbers differs from the mean of those numbers. The square root of the variance is the standard devia!on CERN uses the 5-sigma rule to rule out sta!s!cal anomalies in sensor readings, i.e. is the value NOT the expected value of noise The covariance between two jointly distributed random variables X and Y with finite second moments is defined as: (X,Y) = E[(X - E[X]) (Y - E[Y])]
  • 35. Gaussian Distribution P(x) x 亮 2 3 4 5
  • 38. Example: Amazon (Revisited) Users can rate a product by vo!ng 1-5 stars product ra!ng is the mean of the user votes Q: how can we rank products with different number of votes?
  • 39. Simple Bayesian Ranking Assume the vote posterior distribu!on is a Normal, then the prior is also a Normal*, with mean prior mean prior precision (*) http://en.wikipedia.org/wiki/Conjugate_prior precision of vote distribu!on
  • 40. Example: YouTube Users can rate a clip by vo!ng +/-1 clip ra!ng is the mean of the user votes clips also record the number of views Q: how can we compare clip ra!ngs with different number of votes and views? Q: how can we make results more relevant? e.g. take in to account how old ra!ngs are, author provenance, cost of incorrect ranking promo!on
  • 41. Bayesian Rank Since this is a +/- Bernoulli Trial we can model the prior belief distribu!on by a beta func!on* Let = upvote bias + number of up votes Let = downvote bias + number of down votes + 1 Every !me we receive a new vote we just recalculate the distribu!on
  • 42. To map between a belief and a sor!ng criterion we make a decision using a loss func!on L Since the value of L depends on the value of a random variable we use instead the expected value of L Consider a mul!linear loss func!on: since we want to minimise the loss we have:
  • 43. Extending The Loss Function Suppose in addi!on to the vote counts we also record the !mestamp of the votes Items becomes less relevant in the rank the longer it is since the last vote following a pattern of exponen!al decay Hence the current up or down vote count is now determined by Thus we derive an updated rank func!on from
  • 44. The House THat Skynet Built SkyNet SmartHome is a system designed to manage the state of various household resources, e.g. hea!ng, ligh!ng, media-centre, etc it communicates with users smart phones to iden!fy their loca!on its aims to are to maximise the comfort (e.g. room temperature, hot water) of the users and minimise on waste (e.g. power consump!on) users can provide feedback to correct inappropriate behaviour, e.g turning the hea!ng up too high
  • 45. Predicting Behaviour Q: Given the user is leaving the office what ac!ons should Skynet smarthome take? the variables could include: day of the week work day or holiday weather season calendar events
  • 46. Bayesian Networks A BN is a probabilis!c directed acyclic graphical model that represents a set of random variables and their condi!onal dependencies Ver!ces may be observable quan!!es, latent variables, unknown parameters or hypotheses Ver!ces that are not connected represent variables that are condi!onally independent of each other
  • 47. Example: Cancer Pollu!on Smoker Cancer X-Ray Dyspnoea
  • 48. Example: Cancer S P(S) TRUE 0.3 FALSE 0.7 C P(D+|C) T 0.65 F 0.3 P P(P) Low 0.9 High 0.1 C P(XRay+|C) T 0.9 F 0.2 P S P(C|PmS) Low T 0.03 Low F 0.001 High T 0.05 High F 0.02
  • 49. Predictive Reasoning Pollu!on Smoker Cancer X-Ray Dyspnoea Direc!on of Reasoning EVIDENCE QUERY QUERY QUERY
  • 50. Diagnostic Reasoning Pollu!on Smoker QUERY QUERY Cancer X-Ray Dyspnoea Direc!on of Reasoning EVIDENCE QUERY
  • 51. Intercausal Reasoning Pollu!on Smoker QUERY EVIDENCE Cancer EVIDENCE X-Ray Dyspnoea
  • 52. Combined Reasoning Pollu!on Smoker Cancer EVIDENCE QUERY X-Ray Dyspnoea EVIDENCE
  • 53. - Chapter II - Markov Models
  • 54. Stochastic Process A stochas!c process, or random process, is a collec!on of random variables represen!ng the evolu!on of some system over !me Examples: stock market value and exchange rate fluctua!ons, audio and video signals, EKG & EEG readings They can be classified as: Discrete !me & discrete space Discrete !me & con!nuous space Con!nuous !me & discrete space Con!nuous !me & con!nuous space
  • 55. A Detour In To Matrix Algebra
  • 56. Vectors & Matrices 1x3 matrix or vector 3x2 matrix
  • 57. Matrix Addition If A and B are two m by n matrices then addi!on is defined by:
  • 58. Matrix Multiplication If A is n*m matrix and B is an m*p matrix then mul!plica!on is defined by: where
  • 64. The Identity Matrix If A is n*m matrix and then the iden!ty matrix I is an m*n matrix such that A I=A. I is defined by: So the 3*3 iden!ty matrix is:
  • 65. Inverse Matrix If A is n*n matrix and then the inverse of A , A-1, is an n*n matrix such that A A-1=I. Non-square matrices do not have inverses
  • 66. Matrix Transposition If A is n*m matrix and then the transpose of A , AT, is an m*n matrix defined by: For example, consider a 2*2 matrix:
  • 68. A Markov chain is a directed graph whose ver!ces represent states and the edges the probability of transi!on between the two states Frequently we use an adjacency matrix representa!on of the graph T called the transi!on matrix Hence for a chain with N ver!ces the transi!on matrix is NxN The ini!al state of the system, S0, is also an NxN matrix The state evolves according to Sn+1 = SnT = STn The value of Sn+1(i,j) is the probability of being at that state in step n+1
  • 69. Example: British Weather England is the land of rain. We never have two nice days in a row. In fact if a nice day is always followed by either rain or snow. If there is a change from rain or snow only half the !me is this a change to sunny weather Rain Nice Snow Rain Nice Snow
  • 70. Snow 1/4 1/4 1/4 1/2 1/4 Rain Nice 1/2 1/2 1/2 0 R N S R N S
  • 71. Absorbing Markov Chains A state si of a Markov chain is called absorbing if it is impossible to leave it (i.e., Ti,i = 1) A Markov chain is absorbing if: it has at least one absorbing state every state is connected to at least one absorbing state In an absorbing Markov chain, the probability that the process will be absorbed is 1 (i.e., Qn 0 as n )
  • 72. Example: The Wandering Drunk Consider a city divided up in some some grid, e.g. square blocks. The drunk can move 1 block per turn, each direc!on has equal probability If the drunk reaches Home or the Bar they will stay there Ques!ons we can answer: What is the expect !me un!l an absorbing state is reached? How many !mes does the drunk visit each intersec!on?
  • 73. Ergodic Markov Chains Markov chain is called a regular chain if some power of the transi!on matrix has only posi!ve elements A Markov chain is called an ergodic chain if it is possible to go from every state to every state Therefore every regular chain is ergodic but not every ergodic chain is regular! It can be shown that Tn converges to a constant matrix with all posi!ve entries as n
  • 74. Example: Queuing Theory Queuing theory is, naturally, the mathema!cal formula!on of queues, aimed at making predic!ons of queue length, wai!ng !mes and throughput Single queueing nodes are usually described using Kendall's nota!on in the form A/S/C First rigourous model was M/D/1, followed by M/D/k M stands for Markov and means arrivals occur according to a Poisson process D stands for determinis!c and means jobs arriving at the queue require a fixed amount of service k describes the number of servers at the queueing node (k = 1, 2,...).
  • 75. Hidden Markov models The name is misleading! Nothing is unknown... In a basic Markov model the states of the system are visible. E.g. we can see if it is snowing In a hidden Markov Model the (en!re) state is not directly visible but some outputs dependent on the state are observable However we s!ll need to know all the transi!on probability values!
  • 76. Markov decision processes Markov decision processes are an extension of Markov chains; the difference is the addi!on of ac!ons (allowing choice) and rewards (giving mo!va!on) A Markov decision process is a 5-tuple (S, A, PA, RA, L) The core problem is to choose an ac!on that will maximise some cumula!ve func!on, e.g. MDPs can be easily solved by linear (e.g. Simplex method) or dynamic programming (e.g. Map-Reduce)
  • 77. - Chapter III - Kalman Filters
  • 78. A Linear Dynamic System Con!nuous !me defini!on: Discrete !me defini!on: The systems are called linear since given any two solu!ons x(t) & y(t) then any linear combina!on of these solu!ons is also a solu!on
  • 79. Example: Climate Control SkyNet SmartHome is able to monitor the temperature of rooms in the house and effect hea!ng/AC to regulate the temperature The temperature sensors contain noise hea!ng and AC are either ON or OFF similar systems exist for humidity
  • 81. Kalman Filters Developed ~1960 by Rudolf E. K叩lm叩n, Peter Swerling, and Richard S. Bucy. First implemented in NASA as part of the Apollo naviga!on computer S!ll used in many aeronau!c and military applica!ons, e.g. submarines, cruise missiles, NASA Space Shuttle, ISS There are generalisa!on of the basic Kalman filters for con!nuous !me systems as well as non-linear systems
  • 82. Kalman Filters work by making a predic!on of the future, get!ng a measurement from reality, comparing the two, modera!ng this difference, and adjus!ng its es!mate with this moderated value. Kalman filters are: discrete recursive Extremely accurate if you have a good model
  • 83. Overview current es!mate measured value previous es!mate Kalman gain
  • 84. Basic Kalman Filters Modelled on a Markov chain built on linear operators perturbed by errors that may include Gaussian noise The state of the system is represented by a vector of real numbers The filter is recursive, i.e. only the es!mated state from the previous !me step and the current measurement are needed to compute the es!mate for the current state typically we describe the algorithm in two phases: predict & update
  • 85. 1: Predict: State Estimation Predicted state Previous es!mate of state Control vector State transi!on matrix Control matrix
  • 86. 2: Predict: Error Estimation State transi!on matrix Es!mated process error covariance Covariance predic!on Previous covariance es!mate
  • 87. 3: Update: Innovation Covariance Observa!on Matrix Measurement Vector Covariance Innova!on Predicted state (step 1)
  • 88. 4: Update: Innovation Covariance Observa!on Matrix Covariance predic!on Covariance (step 2) Innova!on Es!mated measurement error covariance
  • 89. 5: Update: Kalman Gain Observa!on Matrix Covariance predic!on (step 2) Kalman Gain Covariance Innova!on (step 3)
  • 90. 6: Update: State Predicted state (step 1) Covariance Innova!on Kalman Gain (step 5) New state es!mate (step 3)
  • 91. 7: Update: Covariance New es!mate of error Kalman Gain (step 5) Observa!on Matrix Covariance predic!on (step 2)
  • 92. Example: Voltmeter Consider a voltmeter measuring a constant DC voltage via a sensor with noise. The system can be described by: Vn = Vn-1 + wn Since the voltage is constant using a Kalman filter allows us to filter out the noise wn Also since this is a single state example all matrices are of size 1*1
  • 93. A: State transi!on - since the previous state should equal the current state A=1 H: Observa!on transform - since were taking direct measurements from the sensor H=1 B: Control matrix - we have no controls so B=0 Q: Process covariance - since we know the model very accurately Q=0.00001 R: Measurement covariance - we dont trust the sensor too much so R=0.1 X: Ini!al state es!mate = any number P: Inital covariance es!mate = 1 (because)
  • 97. As a programmer your challenge is to find the right filter model and determine the values of the matrices
  • 98. Example: Robo-copter XCell Tempest Helicopter Freezin Eskimo
  • 99. RL:Helicopter http://library.rl-community.org/wiki/Helicopter_(Java) Sensors to determine: bearing accelera!on (velocity) posi!on (GPS) rota!onal rates iner!al measurement unit and more...
  • 102. Events & Random Variables
  • 103. Events & Random Variables Conditional Probability
  • 104. Events & Random Variables Bayesian Networks Conditional Probability
  • 105. Events & Random Variables Bayesian Markov Networks Chains Conditional Probability
  • 106. Events & Random Variables Bayesian Markov Networks Chains Random Conditional Probability Walks
  • 107. Events & Random Variables Bayesian Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 108. Events & Random Variables Bayesian Kalman Filters Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 109. Events & Random Variables Bayesian Kalman Filters Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 110. Books Introduc!on to Probability - Grindstead & Snell http://www.dartmouth.edu/~chance/teaching_aids/books_ar!cles/probability_book/book.html Bayesian Ar!ficial Intelligence - Kevin B. Korb & Ann E. Nicholson An Introduc!on to Stochas!c Modelling - Mark A Pinsky & Samuel Karlin Stochas!c Processes and Filtering Theory - Andrew H. Jazwinski Ar!ficial Intelligence: A Modern Approach - Stuart Russell and Peter Norvig
  • 111. Java Libraries Apache Commons Math: http://commons.apache.org/proper/commons-math/ Colt - high performance data structures and algorithms: http://dst.lbl.gov/ ACSSoftware/colt/ Parallel Colt: https://sites.google.com/site/piotrwendykier/software/parallelcolt JBlas - high performance Java API for na!ve libraries LAPACK, BLAS, & ATLAS: http://mikiobraun.github.io/jblas/ The rest... http://code.google.com/p/java-matrix-benchmark/
  • 112. Other Resources http://www.probabilitycourse.com http://masanjin.net/blog/bayesian-average - detailed deriva!on of bayesian averaging via normal distribu!ons http://fulmicoton.com/posts/bayesian_ra!ng/ - an alterna!ve deriva!on of bayesian averaging http://www.!na-vision.net/docs/memos/1996-002.pdf - a beau!fully simple deriva!on of Kalman filters http://www.intechopen.com/books/kalman-filter - ar!cles on applica!ons of Kalman filters
  • 114. Probably, Definitely, Maybe James McGivern