際際滷

際際滷Share a Scribd company logo
An introduction of Markov Chain Monte Carlo
Vasin,Rachel,Jinchao
April 5, 2016
Vasin,Rachel,Jinchao MCMC April 5, 2016 1 / 64
Outline
1 Introduction
Motivation
Statistical Mechanics Example
Bayesian Example
Monte Carlo Methods
2 Markov Chain Monte Carlo
Gibbs Sampling
Metropolis Method
Metropolis-Hastings algorithm
3 MCMC Diagnostic
Vasin,Rachel,Jinchao MCMC April 5, 2016 2 / 64
Statistical Mechanics Example
Now let us recall an example we learned in 鍖rst class
Probability of the con鍖guration 0 at inverse temperature 硫 = 1
kT :
袖,硫( = 0) =
1
Z,硫
exp(硫HN (0))PN ( = 0)
where Hamiltonian HN () = 1
2 x=y J(x, y)(x)(y) + h x (x)
Partition function Z,硫 = 0
exp(硫HN (0))PN ( = 0)
Prior distribution PN ( = 0) = x P((x) = (x0))
Vasin,Rachel,Jinchao MCMC April 5, 2016 3 / 64
Statistical Mechanics Example
We want to compute the mean value of an observable A
E[A] = A()袖,硫(d).
However, usually the lattice size N 1, and the Partition function
Z,硫 is hard to compute.
Thus, we try to sample from the Gibbs distribution regardless the
normal constant and approximate the integral by numerical result.
Vasin,Rachel,Jinchao MCMC April 5, 2016 4 / 64
Bayesian Example
Observed data Y1, ..., Yn follows some distribution p(y | 慮)
慮 is an unknown parameter
Posterior distribution p(慮 | y) provides information for 慮
By Bayes Rule:
p(慮 | y) =
p(y | 慮)p(慮)
 p(y | 慮)p(慮)d慮
If we assume a prior distribution for 慮, say U(0, 1), we can use
MCMC methods to approximate p(慮 | y)
Vasin,Rachel,Jinchao MCMC April 5, 2016 5 / 64
Bayesian Example
Construct Markov Chain with steady state p(慮 | y)
End with a sample (慮1, ..., 慮M ) which can be assumed to be a draw
from p(慮 | y)
Can use histogram or time series plot to analyze
Popular applications include linear regression and biomedical
studies
Vasin,Rachel,Jinchao MCMC April 5, 2016 6 / 64
The problems to be solved
The fundamental problem that we wish to solve is 鍖nding the
expectation of some function f(z) with respect to a probability
distribution p(z), i.e.
E[f] = f(z)p(z)dz
The general idea of Monte Carlo methods is to obtain a set of samples
z(l) (where l = 1, . . . , L) drawn independently from the distribution
p(z). This allows the expectation to be approximated by a 鍖nite sum
f =
1
L
L
l=1
f(z(l)
).
The methods we can use to generate the samples include rejection
sampling, importance sampling, etc.
Vasin,Rachel,Jinchao MCMC April 5, 2016 7 / 64
Markov Chain Monte Carlo
The problem, however, is that the samples {z(l)} might not be
independent.
Suppose we want to draw from our posterior distribution p(慮|y), but
we cannot sample independent draws from it. For example, we often do
not know the normalizing constant.
So we consider sampling slightly dependent draws using a Markov
chain, then we can still 鍖nd quantities of interests from those draws.
MCMC is a class of methods using this idea.
Vasin,Rachel,Jinchao MCMC April 5, 2016 8 / 64
Markov Chain Monte Carlo
Set up an ergodic Markov chain {z(l)}, such that
lim
l
Kl
(z(0)
, z) = p(z) (stationary distribution)
irrespective of the choice of initial value of z(0). K is the transition
kernel here.
Then by Ergodic Theorem,
lim
L
1
L
L
l=1
f(z(l)
) = f(z)p(z)dz
In Bayesian statistics, there are generally two MCMC algorithms that
we use: the Gibbs Sampler and the Metropolis-Hastings algorithm.
Vasin,Rachel,Jinchao MCMC April 5, 2016 9 / 64
Gibbs Sampling
MCMC method that indirectly samples from joint distribution
Requires initial values for each random variable
Uses conditional distributions for sampling
Updates one value per time step
Sample eventually converges to the desired joint distribution
Can also approximate marginal distributions
Vasin,Rachel,Jinchao MCMC April 5, 2016 10 / 64
Gibbs Sampling
Procedure:
1 Identify joint distribution p(z) = p(z1, ... , zk) from which to
sample
2 Select initial values for z1, ... , zk. Call them z
(0)
1 , ... , z
(0)
k .
3 Draw a new value for z1, call it z
(1)
1 , from conditional distribution
p(z1 | z
(0)
2 , ... , z
(0)
k )
4 Draw a new value for z2, call it z
(1)
2 , from conditional distribution
p(z2 | z
(1)
1 , z
(0)
3 , ... z
(0)
k )
5 Repeat for all zi, updating the conditioned values each time
6 Repeat steps 3-5 for a large number of times T
Vasin,Rachel,Jinchao MCMC April 5, 2016 11 / 64
Gibbs Sampling
Example:
f(X, Y ) = 1
2xy

12
exp( 1
2(12)
[(x袖x)2
2
x
+
(y袖y)2
2
y

2(x袖x)(y袖y)
xy
])
Bivariate normal with Corr(X, Y )=
Suppose 袖x = 袖y = 0 and x = y = 1
f(X | Y ) = f(X,Y )
R f(X,Y )dx
= 1
2(12)
exp((xy)2
2(12)
)
f(Y | X) = f(X,Y )
R f(X,Y )dy
= 1
2(12)
exp((yx)2
2(12)
)
Initialize x, y = 0
Let x(1) be a random draw from N(0, 1  2)
Let y(1) be a random draw from N(x(1), 1  2)
Repeat last 2 steps 10000 times
Vasin,Rachel,Jinchao MCMC April 5, 2016 12 / 64
Gibbs Sampling
Vasin,Rachel,Jinchao MCMC April 5, 2016 13 / 64
Gibbs Sampling
Vasin,Rachel,Jinchao MCMC April 5, 2016 14 / 64
Metropolis Method
Now suppose we have a posterior p(慮|y) that we want to sample from,
but
the posterior is not any distribution we know
some (or all) of the full conditionals is not any distributions we
know (no Gibbs sampling for those whose full conditionals we
dont know)
Then, we can consider using the Metropolis Method.
Vasin,Rachel,Jinchao MCMC April 5, 2016 15 / 64
Metropolis Method
The Metropolis Method (Metropolis et al., 1953) follows the following
steps:
Choose a starting value z(1).
At the current state z(), generate a candidate z by the proposal
distribution q(z|z()) depends on this current state.
Accepted the candidate sample with probability
A(z
, z()
) = min(1,
p(z)
p(z())
).
If the candidate sample is accepted, then z(+1) = z, otherwise the
candidate point z is discarded, z(+1) is set to z() and another
candidate sample is drawn from the distribution q(z|z(+1)).
Vasin,Rachel,Jinchao MCMC April 5, 2016 16 / 64
Comments
For Metropolis Method, the proposal distribution is symmetric,
that is q(zA|zB) = q(zB|zA) for all values of zA and zB.
In general, we usually use normal distribution N(z(), ) as
proposal distribution .
Note that if the step from z() to z causes an increase in the value
of p(z), then the candidate point is certain to be kept.
We can write p(z) = p/Zp,and we will assume that p(z) can readily
be evaluated for any given value of z, although the value of Zp
may be unknown. Like for
袖,硫( = 0) = 1
Z,硫
exp(硫HN (0))PN ( = 0),
we have A(,  ) = min(1, 袖( )
袖() ) = min(1, exp(硫H).
Vasin,Rachel,Jinchao MCMC April 5, 2016 17 / 64
Metropolis-Hastings algorithm
A generalization of Metropolis Method is known as the
Metropolis-Hastings algorithm (Hastings, 1970), where the proposal
distribution is no longer a symmetric function of its arguments.
In particular, when the current state is z(), we draw a sample z from
the distribution qk(z|z()) and then accept it with probability
Ak(z, z()) where
Ak(z
, z()
) = min(1,
p(z)qk(z()|z)
p(z())qk(z|z())
).
Vasin,Rachel,Jinchao MCMC April 5, 2016 18 / 64
Terminology
De鍖nition : A transition kernel is a function K de鍖ned on X  B (X)
such that
x  X, K (x, 揃) is a probability measure
A  B (X), K (揃, A) is measurable.
De鍖nition : A Markov chain with transition kernel K satis鍖es the
detailed balance condition if there exists a function f satisfying
f (x) K (x, y) = f (y) K (y, x)
for every (x, y) .
The balance detailed condition express an equilibrium in the 鍖ow of the
Markov chain, namely that the probability of being in x and moving to
y is the same as the probability of being in y and moving back to x.
Vasin,Rachel,Jinchao MCMC April 5, 2016 19 / 64
Terminology
De鍖nition : Suppose that a Markov chain with transition density q
satis鍖es the detailed balance condition with a probability density
function  i.e.
 (x) q (x, y) =  (y) q (y, x) .
Then the chain is reversible.
De鍖nition : A probability density  : R  [0, ) is a stationary
density for this Markov chain, if it satis鍖es
S
 (x) q (x, y) dx =  (y)
for all y  S.
Vasin,Rachel,Jinchao MCMC April 5, 2016 20 / 64
Reversibility
Theorem : If X is a reversible Markov chain, then  is a
stationary distribution of X.
Now we have solved the problem of constructing an MCMC
method. The Markov chain generated by Metropolis-Hastings
algorithm is stationary.
The detailed balanced condition is not necessary for f to be a
stationary measure associated with transition kernel K, it just
provides a su鍖cient condition that is often easy to check and that
can be used for most MCMC algorithms.
Vasin,Rachel,Jinchao MCMC April 5, 2016 21 / 64
Metropolis-Hastings algorithm
We can show that p(z) is an invariant distribution of the Markov chain
de鍖ned by the Metropolis-Hastings algorithm by showing that detailed
balance. We have
p(z)q(z|z )A(z , z) = min(p(z)q(z|z ), p(z )q(z |z))
= min(p(z )q(z |z), p(z)q(z|z ))
= p(z )q(z |z)A(z, z )
as required.
Vasin,Rachel,Jinchao MCMC April 5, 2016 22 / 64
Metropolis-Hastings algorithm
And if z is generated independently, then we have
Theorem(Mengersen and Tweedie.Ann.Statist.(1996))
The Metropolis-Hasting Algorithm produces a uniform ergodic chain if
there exist a constant M such that
p(z)  Mq(z), x  supp p,
In this case,
Kn
(z, 揃)  p TV  (1 
1
M
)n
,
where K is the transition kernel and 揃 TV denotes the total variation
norm.
Vasin,Rachel,Jinchao MCMC April 5, 2016 23 / 64
Metropolis-Hastings algorithm
Proof. If p(z)  Mq(z), the transition kernel satis鍖es
K(z, z )  q(z ) min{
p(z )q(z)
p(z)q(z )
, 1}
= min{p(z )
q(z)
p(z)
, q(z )} 
1
M
p(z ).
To establish the bound on Kn(z, 揃)  p TV , 鍖rst we write
K(z, 揃)  p TV = sup
A
|
A
(K(z, y)  p(y))dy|
=
{y;p(y)K(z,y)}
(p(y)  K(z, y))dy
 (1 
1
M
)
{y;p(y)K(z,y)}
p(y)dy
 (1 
1
M
)
Vasin,Rachel,Jinchao MCMC April 5, 2016 24 / 64
Metropolis-Hastings algorithm
We now continue with a recursion argument. We can write
A
(K2
(z, y)  p(z))dy =
supp p
[
A
(K(u, y)  p(y))dy]
 (K(z, u)  p(u))du,
and by the same procedure we can get
K2
(z, 揃)  p TV  (1 
1
M
)2
.
We next write a general recursion relation
A
(Kn+1
(z, y)  p(z))dy =
supp p
[
A
(Kn
(u, y)  p(y))dy]
 (K(z, u)  p(u))du,
and by induction we have
Kn
(z, 揃)  p TV  (1 
1
M
)n
.
Vasin,Rachel,Jinchao MCMC April 5, 2016 25 / 64
Simple Example
Now, consider using Metropolis Method to sample a bi-modal target
distribution
p(y) =
1
2

2
exp(
(y + 5)2
2
) +
1
6

2
exp(
(y  7)2
18
)
we begin with y(1) = 0 and let the proposal distribution to be
q(y(n1)
|y(n)
)  N(y(n1)
,

0.5)
Vasin,Rachel,Jinchao MCMC April 5, 2016 26 / 64
Simple Example
Histogram of y_n[1:100]
y_n[1:100]
Density
10 0 5 10 15 20
0.000.150.30
Histogram of y_n[1:1000]
y_n[1:1000]
Density
10 0 5 10 15 20
0.000.060.12
Histogram of y_n[1:10000]
y_n[1:10000]
Density
10 0 5 10 15 20
0.000.100.200.30
Histogram of y_n[1:1e+05]
y_n[1:1e+05]
Density
10 0 5 10 15 20
0.000.100.20
Vasin,Rachel,Jinchao MCMC April 5, 2016 27 / 64
Simple Example
Histogram of y_n
q(y(n1)|y(n))~N(y(n1),sqrt(0.5))
y_n
Density
10 5 0 5 10 15 20
0.000.100.20
Histogram of y2_n
q(y(n1)|y(n))~N(5,sqrt(10))
y2_n
Density
10 5 0 5 10 15 20
0.00.20.4
Histogram of y3_n
q(y(n1)|y(n))~N(0,sqrt(10))
y3_n
Density
10 5 0 5 10 15 20
0.000.15
Vasin,Rachel,Jinchao MCMC April 5, 2016 28 / 64
MCMC Diagnostic
The purpose of Markov Chain Monte Carlo approximation is to obtain
a sequence of parameter values (1), (2), ..., (S) such that
1
S
S
s=1
g (s)
 g () p () d.
for any function g of interest. In other word, we want the empirical
average of g (1) , g (2) , ..., g (S) to estimate the expected
value of g () under target distribution p ().
Vasin,Rachel,Jinchao MCMC April 5, 2016 29 / 64
MCMC Diagnostic
Isnt the Gibbs sampler guaranteed to eventually provide a good
approximation to p (慮) ? It is but eventually can be very long time in
some situations.
1 In the case of a generic parameter  and a target distribution
p (), it is helpful to think of the sequence (1), (2), ..., (S) as
the trajectory of a particle  moving around the parameter space.
2 In term of MCMC integral approximation, the critical thing is
that the amount of time the particle spends in given set A is
proportional to the target probability A p () d.
Vasin,Rachel,Jinchao MCMC April 5, 2016 30 / 64
MCMC Diagnostic : Mixing
Speed of mixing describes how quickly particle moves around the
parameter space. An independent MC sampler has perfect mixing : It
has zero auto correlation and can jump around between regions of the
parameter space in one step. On the other hands, MCMC samples are
NOT independent draws from a target distribution because
1 The 鍖rst draw is set by the user and thus not a random draw from
the target distribution.
2 Subsequently, draw s + 1 depends on draw s: We say that the
samples are auto correlated.
Vasin,Rachel,Jinchao MCMC April 5, 2016 31 / 64
MCMC Diagnostic : Traceplot
The initial phase of an MCMC chain is called the burn-in phase,
during which the chain converges towards the target distribution. The
trace plots can be used to detect burn-in. Note that we can never be
sure that a chain has converged but at least we can detect lack of
convergence.
A traceplot is a plot of the iteration number against the value of the
draw of the parameter at each iteration. We can see whether our chain
gets stuck in certain areas of the parameter space, which indicates bad
mixing.
Vasin,Rachel,Jinchao MCMC April 5, 2016 32 / 64
Traceplot
Vasin,Rachel,Jinchao MCMC April 5, 2016 33 / 64
MCMC Diagnostic : Auto Correlation Function
Another way to assess convergence is to assess the auto correlations
between the draws of our Markov chain.
Let f = {f (x)}xS be a real value function de鍖ned on the state space
S. Then for the stationary markov chain, {ft}  {f (Xt)} is a
stationary stochastic process with mean
袖f  ft =
x
xf (x) .
Vasin,Rachel,Jinchao MCMC April 5, 2016 34 / 64
MCMC Diagnostic : Auto Correlation Function
We de鍖ne covariance matrix as
Xi, Xj  (Xi  Xi ) (Xj  Xj )  XiXj  Xi Xj
and de鍖ne un-normalized auto correlation function or auto covariance
function as
Cff (t)  fs, fs+t  ft
2
where normalized auto correlation function is given by
ff (t) 
Cff (t)
Cff (0)
.
Typically, ff (t) decays exponentially  e|t|/ .
Vasin,Rachel,Jinchao MCMC April 5, 2016 35 / 64
MCMC Diagnostic : Auto Correlation Function
For a given observable f we de鍖ne the integrated auto correlation time
int,f =
1
2

t=
ff (t)

1
2
+

t=1
ff .
The integrated auto correlation time controls the statistical error in
Monte Carlo measurements of f .
Vasin,Rachel,Jinchao MCMC April 5, 2016 36 / 64
MCMC Diagnostic : Auto Correlation Function
More precisely the sample mean
俗f =
1
n
n
t=1
ft
has variance
V ar 俗f =
1
n2
n
r,s=1
Cff (r  s)

1
n
(2int,f ) Cff (0)
for n . Thus the variance of 俗f is a factor 2int,f larger than it would
be if the {ft} were statistically independent. Stated di鍖erently the
number of e鍖ectively independent samples in a run of length n is
roughly n
2int,f
.
Vasin,Rachel,Jinchao MCMC April 5, 2016 37 / 64
MCMC Diagnostic : Auto Correlation Function
If autocorrelation is still relatively high for higher values of k, this
indicates high degree of correlation between our draws and slow mixing.
Vasin,Rachel,Jinchao MCMC April 5, 2016 38 / 64
MCMC Diagnostic : E鍖ective Sample Size
One measure to evaluate whether we have enough MCMC samples is
the e鍖ective sample size Seff of an MCMC sample. Seff can be
interpreted as the number of independent MC samples needed to
obtain the same precision for the mean. Suppose we approximate E (慮)
by the sample mean
俗慮 =
1
S
慮(s)
.
Vasin,Rachel,Jinchao MCMC April 5, 2016 39 / 64
MCMC Diagnostic : E鍖ective Sample Size
For an MC sample we have:
V arMC
俗慮 =
V ar (慮)
S
.
For an MCMC sample we have
V arMCMC
俗慮 =
V ar (慮)
S
+
1
S2
s=t
E 慮(s)
 E (慮) 慮(t)
 E (慮)
where the last term depends on the auto-correlation in the MCMC
chain, and is generally positive.
Vasin,Rachel,Jinchao MCMC April 5, 2016 40 / 64
MCMC Diagnostic : E鍖ective Sample Size
The e鍖ective sample size is de鍖ned as follows:
V arMCMC
俗慮 =
V ar (慮)
Seff
.
Potential scale reduction factor or shrink factor R is, approximately,
the square root of the variance of the mixture of the chains, divided by
the average within chain variance. If chains have mixed well, then R is
close to 1.
Vasin,Rachel,Jinchao MCMC April 5, 2016 41 / 64
Golden Rule !
Some rules of thumb : R < 1.1 and Seff 100 !!!
Vasin,Rachel,Jinchao MCMC April 5, 2016 42 / 64
References
Christopher Bishop (2006)
Pattern Recognition and Machine Learning
Springer, p523556.
Christian Robert and George Casella (2004)
Monte Carlo Statistical Methods
Springer, p267318.
Jun S. Liu (2001)
Monte Carlo Strategies In Scienti鍖c Computing
Springer, p105-127
http://www.mas.ncl.ac.uk/ ndjw1/teaching/sim/gibbs/gibbs.r
http://www.stats.ox.ac.uk/ cholmes/Courses/BDA/bda-mcmc.pdf
Peter D. Ho鍖 (2009)
A First Course in Bayesian Statistical Methods
Vasin,Rachel,Jinchao MCMC April 5, 2016 43 / 64
Stochastic Volatility
Vasin,Rachel,Jinchao
April 5, 2016
Vasin,Rachel,Jinchao MCMC April 5, 2016 44 / 64
Introduction
Financial returns indicate the pro鍖t from an investment
From 鍖nance group, rate of return  assumed constant
In reality, volatility can vary with respect to time
To model volatility, we can use deterministic or probabilistic
methods
We will focus on probabilistic methods using MCMC and Bayesian
analysis
Vasin,Rachel,Jinchao MCMC April 5, 2016 45 / 64
SV Model
Let y = (y1, y2, ..., yn) be a collection of observed returns
Each return is normally distributed with mean 0 and unknown
variance
Log-variance is normally distributed with unknown mean and
variance
Log-variance also follows a Markov chain where the distribution of
the current one depends only on the value of the previous one
Vasin,Rachel,Jinchao MCMC April 5, 2016 46 / 64
SV model : Model Speci鍖cation
Mathematically:
yt | ht  N(0, eht
)
h0 | 袖, ,   N(袖,
2
1  2
)
ht | ht1, 袖, ,   N(袖 + (ht1  袖), 2
)
h = (h1, h2, ..., hn) = log-variance or volatility process
袖 = level of log-variance
 = persistence of log-variance
 = volatility of log-variance
Vasin,Rachel,Jinchao MCMC April 5, 2016 47 / 64
Prior distributions
To complete the model setup, a prior distribution for the parameter
vector 慮 needs to be speci鍖ed. Following Kim S, Shephard N, Chib S
(1998), we choose independent components for each parameter
p (慮) = p (袖) p () p (侶) .
The level 袖  R is is equipped with the usual normal prior
袖  N (b袖, B袖).
In practical applications, this prior is usually chosen to be rather
uninformative; namely, b袖 = 0 and B袖  100 for daily log returns.
Vasin,Rachel,Jinchao MCMC April 5, 2016 48 / 64
Prior distributions
For the persistence parameter   (1, 1) , we choose (+1)
2  B (a0, b0)
implying that
p () =
1
2B (a0, b0)
1 + 
2
a01
1  
2
b01
where a0 and b0 are positive parameters and
B (x, y) =
1
0
tx1
(1  t)y1
dt
denotes the beta function.
Vasin,Rachel,Jinchao MCMC April 5, 2016 49 / 64
Prior distributions
Lastly, for the volatility log - variance 侶  R+, we choose
2
侶  Gamma
1
2
,
1
2B侶
.
This choice is motivated by Fruhwirth-Schnatter and Wagner (2010)
who equivalently stipulate the prior for 賊 2
侶 to follow a centered
normal distribution.
Vasin,Rachel,Jinchao MCMC April 5, 2016 50 / 64
Posterior distribution
Then, we can estimate our parameters (h, 袖, , ) using the posterior
distribution given by the Bayesian formula:
p(h, 袖, , |y)  p(y|h)p(h|h0, 袖, , )p(h0|袖, , )p0(袖, , )
where y is our data, p(y|h)p(h|h0, 袖, , )p(h0|袖, , ) is the lkelihood
function which is given by our SV-model, and p0(袖, , ) = p(慮) is the
prior distribution we construct just now.
Now, we can use MCMC method to sampling the parameters from the
posterior distribution.
Vasin,Rachel,Jinchao MCMC April 5, 2016 51 / 64
Example : EUR/USD
Euro to US dollar exchange rates
Data available: http://sdw.ecb.europa.eu/
Also available under exrates within the R package stochvol
Vasin,Rachel,Jinchao MCMC April 5, 2016 52 / 64
Example : Prior distributions
Prior distributions :
袖  N (0, 100)
( + 1)
2
 B (20, 1.5)
2
侶  Gamma
1
2
,
1
0.2
= 2
1  0.1
For some discussion about this issue, see, Kim S, Shephard N, Chib S
(1998) who choose a0 = 20 and b0 = 1.5 implying a prior mean of 0.86
with the prior standard deviation of 0.11.
Vasin,Rachel,Jinchao MCMC April 5, 2016 53 / 64
Example : Output Summary
The summary of 100,000 MCMC draws after a burn-in of 50,000
Mean SD 5% quantile 50% quantile 95% quantile Seff
袖 -10.1364 0.23363 -10.4630 -10.1381 -9.7980 51118
 0.9932 0.00286 0.9881 0.9935 0.9974 2914
 0.0660 0.01020 0.0505 0.0651 0.0838 1347
2
0.0045 0.00141 0.0026 0.0042 0.0070 1347
Vasin,Rachel,Jinchao MCMC April 5, 2016 54 / 64
Example : Estimated volatilities
Then we can get the empirical quantiles of the posterior distribution of
100 exp(ht/2), the latent volatilities of yt over time in percent:
Vasin,Rachel,Jinchao MCMC April 5, 2016 55 / 64
Example : Forecast volatility
Forecast volatility of yt in percentage
t + 1 0.584132065586393
t + 2 0.584587272795878
t + 3 0.584863398891602
t + 4 0.585436225819106
t + 5 0.585646621839762
t + 6 0.585741891671509
t + 7 0.586295462435408
t + 8 0.586822153121447
t + 9 0.587354949275841
t + 10 0.587772869210331
Vasin,Rachel,Jinchao MCMC April 5, 2016 56 / 64
Example : Traceplot
Vasin,Rachel,Jinchao MCMC April 5, 2016 57 / 64
Example : Posterior and Prior densities
Vasin,Rachel,Jinchao MCMC April 5, 2016 58 / 64
Example : Mean standardized residuals
Vasin,Rachel,Jinchao MCMC April 5, 2016 59 / 64
Result from STAN : Traceplot
Vasin,Rachel,Jinchao MCMC April 5, 2016 60 / 64
Result from STAN : Traceplot
Vasin,Rachel,Jinchao MCMC April 5, 2016 61 / 64
Result from STAN : Traceplot
Vasin,Rachel,Jinchao MCMC April 5, 2016 62 / 64
Result from STAN : Posterior Density
Vasin,Rachel,Jinchao MCMC April 5, 2016 63 / 64
References
Gregor Kastner (2016)
Dealing with Stochastic Volatility in Time Series Using the R
package stochvol
Kim S, Shephard N, Chib S (1998)
Stochastic Volatility: Likelihood Inference and Com- parison With
ARCH Models
Fruhwirth-Schnatter S, Wagner H (2010)
Vasin,Rachel,Jinchao MCMC April 5, 2016 64 / 64

More Related Content

mcmc

  • 1. An introduction of Markov Chain Monte Carlo Vasin,Rachel,Jinchao April 5, 2016 Vasin,Rachel,Jinchao MCMC April 5, 2016 1 / 64
  • 2. Outline 1 Introduction Motivation Statistical Mechanics Example Bayesian Example Monte Carlo Methods 2 Markov Chain Monte Carlo Gibbs Sampling Metropolis Method Metropolis-Hastings algorithm 3 MCMC Diagnostic Vasin,Rachel,Jinchao MCMC April 5, 2016 2 / 64
  • 3. Statistical Mechanics Example Now let us recall an example we learned in 鍖rst class Probability of the con鍖guration 0 at inverse temperature 硫 = 1 kT : 袖,硫( = 0) = 1 Z,硫 exp(硫HN (0))PN ( = 0) where Hamiltonian HN () = 1 2 x=y J(x, y)(x)(y) + h x (x) Partition function Z,硫 = 0 exp(硫HN (0))PN ( = 0) Prior distribution PN ( = 0) = x P((x) = (x0)) Vasin,Rachel,Jinchao MCMC April 5, 2016 3 / 64
  • 4. Statistical Mechanics Example We want to compute the mean value of an observable A E[A] = A()袖,硫(d). However, usually the lattice size N 1, and the Partition function Z,硫 is hard to compute. Thus, we try to sample from the Gibbs distribution regardless the normal constant and approximate the integral by numerical result. Vasin,Rachel,Jinchao MCMC April 5, 2016 4 / 64
  • 5. Bayesian Example Observed data Y1, ..., Yn follows some distribution p(y | 慮) 慮 is an unknown parameter Posterior distribution p(慮 | y) provides information for 慮 By Bayes Rule: p(慮 | y) = p(y | 慮)p(慮) p(y | 慮)p(慮)d慮 If we assume a prior distribution for 慮, say U(0, 1), we can use MCMC methods to approximate p(慮 | y) Vasin,Rachel,Jinchao MCMC April 5, 2016 5 / 64
  • 6. Bayesian Example Construct Markov Chain with steady state p(慮 | y) End with a sample (慮1, ..., 慮M ) which can be assumed to be a draw from p(慮 | y) Can use histogram or time series plot to analyze Popular applications include linear regression and biomedical studies Vasin,Rachel,Jinchao MCMC April 5, 2016 6 / 64
  • 7. The problems to be solved The fundamental problem that we wish to solve is 鍖nding the expectation of some function f(z) with respect to a probability distribution p(z), i.e. E[f] = f(z)p(z)dz The general idea of Monte Carlo methods is to obtain a set of samples z(l) (where l = 1, . . . , L) drawn independently from the distribution p(z). This allows the expectation to be approximated by a 鍖nite sum f = 1 L L l=1 f(z(l) ). The methods we can use to generate the samples include rejection sampling, importance sampling, etc. Vasin,Rachel,Jinchao MCMC April 5, 2016 7 / 64
  • 8. Markov Chain Monte Carlo The problem, however, is that the samples {z(l)} might not be independent. Suppose we want to draw from our posterior distribution p(慮|y), but we cannot sample independent draws from it. For example, we often do not know the normalizing constant. So we consider sampling slightly dependent draws using a Markov chain, then we can still 鍖nd quantities of interests from those draws. MCMC is a class of methods using this idea. Vasin,Rachel,Jinchao MCMC April 5, 2016 8 / 64
  • 9. Markov Chain Monte Carlo Set up an ergodic Markov chain {z(l)}, such that lim l Kl (z(0) , z) = p(z) (stationary distribution) irrespective of the choice of initial value of z(0). K is the transition kernel here. Then by Ergodic Theorem, lim L 1 L L l=1 f(z(l) ) = f(z)p(z)dz In Bayesian statistics, there are generally two MCMC algorithms that we use: the Gibbs Sampler and the Metropolis-Hastings algorithm. Vasin,Rachel,Jinchao MCMC April 5, 2016 9 / 64
  • 10. Gibbs Sampling MCMC method that indirectly samples from joint distribution Requires initial values for each random variable Uses conditional distributions for sampling Updates one value per time step Sample eventually converges to the desired joint distribution Can also approximate marginal distributions Vasin,Rachel,Jinchao MCMC April 5, 2016 10 / 64
  • 11. Gibbs Sampling Procedure: 1 Identify joint distribution p(z) = p(z1, ... , zk) from which to sample 2 Select initial values for z1, ... , zk. Call them z (0) 1 , ... , z (0) k . 3 Draw a new value for z1, call it z (1) 1 , from conditional distribution p(z1 | z (0) 2 , ... , z (0) k ) 4 Draw a new value for z2, call it z (1) 2 , from conditional distribution p(z2 | z (1) 1 , z (0) 3 , ... z (0) k ) 5 Repeat for all zi, updating the conditioned values each time 6 Repeat steps 3-5 for a large number of times T Vasin,Rachel,Jinchao MCMC April 5, 2016 11 / 64
  • 12. Gibbs Sampling Example: f(X, Y ) = 1 2xy 12 exp( 1 2(12) [(x袖x)2 2 x + (y袖y)2 2 y 2(x袖x)(y袖y) xy ]) Bivariate normal with Corr(X, Y )= Suppose 袖x = 袖y = 0 and x = y = 1 f(X | Y ) = f(X,Y ) R f(X,Y )dx = 1 2(12) exp((xy)2 2(12) ) f(Y | X) = f(X,Y ) R f(X,Y )dy = 1 2(12) exp((yx)2 2(12) ) Initialize x, y = 0 Let x(1) be a random draw from N(0, 1 2) Let y(1) be a random draw from N(x(1), 1 2) Repeat last 2 steps 10000 times Vasin,Rachel,Jinchao MCMC April 5, 2016 12 / 64
  • 15. Metropolis Method Now suppose we have a posterior p(慮|y) that we want to sample from, but the posterior is not any distribution we know some (or all) of the full conditionals is not any distributions we know (no Gibbs sampling for those whose full conditionals we dont know) Then, we can consider using the Metropolis Method. Vasin,Rachel,Jinchao MCMC April 5, 2016 15 / 64
  • 16. Metropolis Method The Metropolis Method (Metropolis et al., 1953) follows the following steps: Choose a starting value z(1). At the current state z(), generate a candidate z by the proposal distribution q(z|z()) depends on this current state. Accepted the candidate sample with probability A(z , z() ) = min(1, p(z) p(z()) ). If the candidate sample is accepted, then z(+1) = z, otherwise the candidate point z is discarded, z(+1) is set to z() and another candidate sample is drawn from the distribution q(z|z(+1)). Vasin,Rachel,Jinchao MCMC April 5, 2016 16 / 64
  • 17. Comments For Metropolis Method, the proposal distribution is symmetric, that is q(zA|zB) = q(zB|zA) for all values of zA and zB. In general, we usually use normal distribution N(z(), ) as proposal distribution . Note that if the step from z() to z causes an increase in the value of p(z), then the candidate point is certain to be kept. We can write p(z) = p/Zp,and we will assume that p(z) can readily be evaluated for any given value of z, although the value of Zp may be unknown. Like for 袖,硫( = 0) = 1 Z,硫 exp(硫HN (0))PN ( = 0), we have A(, ) = min(1, 袖( ) 袖() ) = min(1, exp(硫H). Vasin,Rachel,Jinchao MCMC April 5, 2016 17 / 64
  • 18. Metropolis-Hastings algorithm A generalization of Metropolis Method is known as the Metropolis-Hastings algorithm (Hastings, 1970), where the proposal distribution is no longer a symmetric function of its arguments. In particular, when the current state is z(), we draw a sample z from the distribution qk(z|z()) and then accept it with probability Ak(z, z()) where Ak(z , z() ) = min(1, p(z)qk(z()|z) p(z())qk(z|z()) ). Vasin,Rachel,Jinchao MCMC April 5, 2016 18 / 64
  • 19. Terminology De鍖nition : A transition kernel is a function K de鍖ned on X B (X) such that x X, K (x, 揃) is a probability measure A B (X), K (揃, A) is measurable. De鍖nition : A Markov chain with transition kernel K satis鍖es the detailed balance condition if there exists a function f satisfying f (x) K (x, y) = f (y) K (y, x) for every (x, y) . The balance detailed condition express an equilibrium in the 鍖ow of the Markov chain, namely that the probability of being in x and moving to y is the same as the probability of being in y and moving back to x. Vasin,Rachel,Jinchao MCMC April 5, 2016 19 / 64
  • 20. Terminology De鍖nition : Suppose that a Markov chain with transition density q satis鍖es the detailed balance condition with a probability density function i.e. (x) q (x, y) = (y) q (y, x) . Then the chain is reversible. De鍖nition : A probability density : R [0, ) is a stationary density for this Markov chain, if it satis鍖es S (x) q (x, y) dx = (y) for all y S. Vasin,Rachel,Jinchao MCMC April 5, 2016 20 / 64
  • 21. Reversibility Theorem : If X is a reversible Markov chain, then is a stationary distribution of X. Now we have solved the problem of constructing an MCMC method. The Markov chain generated by Metropolis-Hastings algorithm is stationary. The detailed balanced condition is not necessary for f to be a stationary measure associated with transition kernel K, it just provides a su鍖cient condition that is often easy to check and that can be used for most MCMC algorithms. Vasin,Rachel,Jinchao MCMC April 5, 2016 21 / 64
  • 22. Metropolis-Hastings algorithm We can show that p(z) is an invariant distribution of the Markov chain de鍖ned by the Metropolis-Hastings algorithm by showing that detailed balance. We have p(z)q(z|z )A(z , z) = min(p(z)q(z|z ), p(z )q(z |z)) = min(p(z )q(z |z), p(z)q(z|z )) = p(z )q(z |z)A(z, z ) as required. Vasin,Rachel,Jinchao MCMC April 5, 2016 22 / 64
  • 23. Metropolis-Hastings algorithm And if z is generated independently, then we have Theorem(Mengersen and Tweedie.Ann.Statist.(1996)) The Metropolis-Hasting Algorithm produces a uniform ergodic chain if there exist a constant M such that p(z) Mq(z), x supp p, In this case, Kn (z, 揃) p TV (1 1 M )n , where K is the transition kernel and 揃 TV denotes the total variation norm. Vasin,Rachel,Jinchao MCMC April 5, 2016 23 / 64
  • 24. Metropolis-Hastings algorithm Proof. If p(z) Mq(z), the transition kernel satis鍖es K(z, z ) q(z ) min{ p(z )q(z) p(z)q(z ) , 1} = min{p(z ) q(z) p(z) , q(z )} 1 M p(z ). To establish the bound on Kn(z, 揃) p TV , 鍖rst we write K(z, 揃) p TV = sup A | A (K(z, y) p(y))dy| = {y;p(y)K(z,y)} (p(y) K(z, y))dy (1 1 M ) {y;p(y)K(z,y)} p(y)dy (1 1 M ) Vasin,Rachel,Jinchao MCMC April 5, 2016 24 / 64
  • 25. Metropolis-Hastings algorithm We now continue with a recursion argument. We can write A (K2 (z, y) p(z))dy = supp p [ A (K(u, y) p(y))dy] (K(z, u) p(u))du, and by the same procedure we can get K2 (z, 揃) p TV (1 1 M )2 . We next write a general recursion relation A (Kn+1 (z, y) p(z))dy = supp p [ A (Kn (u, y) p(y))dy] (K(z, u) p(u))du, and by induction we have Kn (z, 揃) p TV (1 1 M )n . Vasin,Rachel,Jinchao MCMC April 5, 2016 25 / 64
  • 26. Simple Example Now, consider using Metropolis Method to sample a bi-modal target distribution p(y) = 1 2 2 exp( (y + 5)2 2 ) + 1 6 2 exp( (y 7)2 18 ) we begin with y(1) = 0 and let the proposal distribution to be q(y(n1) |y(n) ) N(y(n1) , 0.5) Vasin,Rachel,Jinchao MCMC April 5, 2016 26 / 64
  • 27. Simple Example Histogram of y_n[1:100] y_n[1:100] Density 10 0 5 10 15 20 0.000.150.30 Histogram of y_n[1:1000] y_n[1:1000] Density 10 0 5 10 15 20 0.000.060.12 Histogram of y_n[1:10000] y_n[1:10000] Density 10 0 5 10 15 20 0.000.100.200.30 Histogram of y_n[1:1e+05] y_n[1:1e+05] Density 10 0 5 10 15 20 0.000.100.20 Vasin,Rachel,Jinchao MCMC April 5, 2016 27 / 64
  • 28. Simple Example Histogram of y_n q(y(n1)|y(n))~N(y(n1),sqrt(0.5)) y_n Density 10 5 0 5 10 15 20 0.000.100.20 Histogram of y2_n q(y(n1)|y(n))~N(5,sqrt(10)) y2_n Density 10 5 0 5 10 15 20 0.00.20.4 Histogram of y3_n q(y(n1)|y(n))~N(0,sqrt(10)) y3_n Density 10 5 0 5 10 15 20 0.000.15 Vasin,Rachel,Jinchao MCMC April 5, 2016 28 / 64
  • 29. MCMC Diagnostic The purpose of Markov Chain Monte Carlo approximation is to obtain a sequence of parameter values (1), (2), ..., (S) such that 1 S S s=1 g (s) g () p () d. for any function g of interest. In other word, we want the empirical average of g (1) , g (2) , ..., g (S) to estimate the expected value of g () under target distribution p (). Vasin,Rachel,Jinchao MCMC April 5, 2016 29 / 64
  • 30. MCMC Diagnostic Isnt the Gibbs sampler guaranteed to eventually provide a good approximation to p (慮) ? It is but eventually can be very long time in some situations. 1 In the case of a generic parameter and a target distribution p (), it is helpful to think of the sequence (1), (2), ..., (S) as the trajectory of a particle moving around the parameter space. 2 In term of MCMC integral approximation, the critical thing is that the amount of time the particle spends in given set A is proportional to the target probability A p () d. Vasin,Rachel,Jinchao MCMC April 5, 2016 30 / 64
  • 31. MCMC Diagnostic : Mixing Speed of mixing describes how quickly particle moves around the parameter space. An independent MC sampler has perfect mixing : It has zero auto correlation and can jump around between regions of the parameter space in one step. On the other hands, MCMC samples are NOT independent draws from a target distribution because 1 The 鍖rst draw is set by the user and thus not a random draw from the target distribution. 2 Subsequently, draw s + 1 depends on draw s: We say that the samples are auto correlated. Vasin,Rachel,Jinchao MCMC April 5, 2016 31 / 64
  • 32. MCMC Diagnostic : Traceplot The initial phase of an MCMC chain is called the burn-in phase, during which the chain converges towards the target distribution. The trace plots can be used to detect burn-in. Note that we can never be sure that a chain has converged but at least we can detect lack of convergence. A traceplot is a plot of the iteration number against the value of the draw of the parameter at each iteration. We can see whether our chain gets stuck in certain areas of the parameter space, which indicates bad mixing. Vasin,Rachel,Jinchao MCMC April 5, 2016 32 / 64
  • 34. MCMC Diagnostic : Auto Correlation Function Another way to assess convergence is to assess the auto correlations between the draws of our Markov chain. Let f = {f (x)}xS be a real value function de鍖ned on the state space S. Then for the stationary markov chain, {ft} {f (Xt)} is a stationary stochastic process with mean 袖f ft = x xf (x) . Vasin,Rachel,Jinchao MCMC April 5, 2016 34 / 64
  • 35. MCMC Diagnostic : Auto Correlation Function We de鍖ne covariance matrix as Xi, Xj (Xi Xi ) (Xj Xj ) XiXj Xi Xj and de鍖ne un-normalized auto correlation function or auto covariance function as Cff (t) fs, fs+t ft 2 where normalized auto correlation function is given by ff (t) Cff (t) Cff (0) . Typically, ff (t) decays exponentially e|t|/ . Vasin,Rachel,Jinchao MCMC April 5, 2016 35 / 64
  • 36. MCMC Diagnostic : Auto Correlation Function For a given observable f we de鍖ne the integrated auto correlation time int,f = 1 2 t= ff (t) 1 2 + t=1 ff . The integrated auto correlation time controls the statistical error in Monte Carlo measurements of f . Vasin,Rachel,Jinchao MCMC April 5, 2016 36 / 64
  • 37. MCMC Diagnostic : Auto Correlation Function More precisely the sample mean 俗f = 1 n n t=1 ft has variance V ar 俗f = 1 n2 n r,s=1 Cff (r s) 1 n (2int,f ) Cff (0) for n . Thus the variance of 俗f is a factor 2int,f larger than it would be if the {ft} were statistically independent. Stated di鍖erently the number of e鍖ectively independent samples in a run of length n is roughly n 2int,f . Vasin,Rachel,Jinchao MCMC April 5, 2016 37 / 64
  • 38. MCMC Diagnostic : Auto Correlation Function If autocorrelation is still relatively high for higher values of k, this indicates high degree of correlation between our draws and slow mixing. Vasin,Rachel,Jinchao MCMC April 5, 2016 38 / 64
  • 39. MCMC Diagnostic : E鍖ective Sample Size One measure to evaluate whether we have enough MCMC samples is the e鍖ective sample size Seff of an MCMC sample. Seff can be interpreted as the number of independent MC samples needed to obtain the same precision for the mean. Suppose we approximate E (慮) by the sample mean 俗慮 = 1 S 慮(s) . Vasin,Rachel,Jinchao MCMC April 5, 2016 39 / 64
  • 40. MCMC Diagnostic : E鍖ective Sample Size For an MC sample we have: V arMC 俗慮 = V ar (慮) S . For an MCMC sample we have V arMCMC 俗慮 = V ar (慮) S + 1 S2 s=t E 慮(s) E (慮) 慮(t) E (慮) where the last term depends on the auto-correlation in the MCMC chain, and is generally positive. Vasin,Rachel,Jinchao MCMC April 5, 2016 40 / 64
  • 41. MCMC Diagnostic : E鍖ective Sample Size The e鍖ective sample size is de鍖ned as follows: V arMCMC 俗慮 = V ar (慮) Seff . Potential scale reduction factor or shrink factor R is, approximately, the square root of the variance of the mixture of the chains, divided by the average within chain variance. If chains have mixed well, then R is close to 1. Vasin,Rachel,Jinchao MCMC April 5, 2016 41 / 64
  • 42. Golden Rule ! Some rules of thumb : R < 1.1 and Seff 100 !!! Vasin,Rachel,Jinchao MCMC April 5, 2016 42 / 64
  • 43. References Christopher Bishop (2006) Pattern Recognition and Machine Learning Springer, p523556. Christian Robert and George Casella (2004) Monte Carlo Statistical Methods Springer, p267318. Jun S. Liu (2001) Monte Carlo Strategies In Scienti鍖c Computing Springer, p105-127 http://www.mas.ncl.ac.uk/ ndjw1/teaching/sim/gibbs/gibbs.r http://www.stats.ox.ac.uk/ cholmes/Courses/BDA/bda-mcmc.pdf Peter D. Ho鍖 (2009) A First Course in Bayesian Statistical Methods Vasin,Rachel,Jinchao MCMC April 5, 2016 43 / 64
  • 44. Stochastic Volatility Vasin,Rachel,Jinchao April 5, 2016 Vasin,Rachel,Jinchao MCMC April 5, 2016 44 / 64
  • 45. Introduction Financial returns indicate the pro鍖t from an investment From 鍖nance group, rate of return assumed constant In reality, volatility can vary with respect to time To model volatility, we can use deterministic or probabilistic methods We will focus on probabilistic methods using MCMC and Bayesian analysis Vasin,Rachel,Jinchao MCMC April 5, 2016 45 / 64
  • 46. SV Model Let y = (y1, y2, ..., yn) be a collection of observed returns Each return is normally distributed with mean 0 and unknown variance Log-variance is normally distributed with unknown mean and variance Log-variance also follows a Markov chain where the distribution of the current one depends only on the value of the previous one Vasin,Rachel,Jinchao MCMC April 5, 2016 46 / 64
  • 47. SV model : Model Speci鍖cation Mathematically: yt | ht N(0, eht ) h0 | 袖, , N(袖, 2 1 2 ) ht | ht1, 袖, , N(袖 + (ht1 袖), 2 ) h = (h1, h2, ..., hn) = log-variance or volatility process 袖 = level of log-variance = persistence of log-variance = volatility of log-variance Vasin,Rachel,Jinchao MCMC April 5, 2016 47 / 64
  • 48. Prior distributions To complete the model setup, a prior distribution for the parameter vector 慮 needs to be speci鍖ed. Following Kim S, Shephard N, Chib S (1998), we choose independent components for each parameter p (慮) = p (袖) p () p (侶) . The level 袖 R is is equipped with the usual normal prior 袖 N (b袖, B袖). In practical applications, this prior is usually chosen to be rather uninformative; namely, b袖 = 0 and B袖 100 for daily log returns. Vasin,Rachel,Jinchao MCMC April 5, 2016 48 / 64
  • 49. Prior distributions For the persistence parameter (1, 1) , we choose (+1) 2 B (a0, b0) implying that p () = 1 2B (a0, b0) 1 + 2 a01 1 2 b01 where a0 and b0 are positive parameters and B (x, y) = 1 0 tx1 (1 t)y1 dt denotes the beta function. Vasin,Rachel,Jinchao MCMC April 5, 2016 49 / 64
  • 50. Prior distributions Lastly, for the volatility log - variance 侶 R+, we choose 2 侶 Gamma 1 2 , 1 2B侶 . This choice is motivated by Fruhwirth-Schnatter and Wagner (2010) who equivalently stipulate the prior for 賊 2 侶 to follow a centered normal distribution. Vasin,Rachel,Jinchao MCMC April 5, 2016 50 / 64
  • 51. Posterior distribution Then, we can estimate our parameters (h, 袖, , ) using the posterior distribution given by the Bayesian formula: p(h, 袖, , |y) p(y|h)p(h|h0, 袖, , )p(h0|袖, , )p0(袖, , ) where y is our data, p(y|h)p(h|h0, 袖, , )p(h0|袖, , ) is the lkelihood function which is given by our SV-model, and p0(袖, , ) = p(慮) is the prior distribution we construct just now. Now, we can use MCMC method to sampling the parameters from the posterior distribution. Vasin,Rachel,Jinchao MCMC April 5, 2016 51 / 64
  • 52. Example : EUR/USD Euro to US dollar exchange rates Data available: http://sdw.ecb.europa.eu/ Also available under exrates within the R package stochvol Vasin,Rachel,Jinchao MCMC April 5, 2016 52 / 64
  • 53. Example : Prior distributions Prior distributions : 袖 N (0, 100) ( + 1) 2 B (20, 1.5) 2 侶 Gamma 1 2 , 1 0.2 = 2 1 0.1 For some discussion about this issue, see, Kim S, Shephard N, Chib S (1998) who choose a0 = 20 and b0 = 1.5 implying a prior mean of 0.86 with the prior standard deviation of 0.11. Vasin,Rachel,Jinchao MCMC April 5, 2016 53 / 64
  • 54. Example : Output Summary The summary of 100,000 MCMC draws after a burn-in of 50,000 Mean SD 5% quantile 50% quantile 95% quantile Seff 袖 -10.1364 0.23363 -10.4630 -10.1381 -9.7980 51118 0.9932 0.00286 0.9881 0.9935 0.9974 2914 0.0660 0.01020 0.0505 0.0651 0.0838 1347 2 0.0045 0.00141 0.0026 0.0042 0.0070 1347 Vasin,Rachel,Jinchao MCMC April 5, 2016 54 / 64
  • 55. Example : Estimated volatilities Then we can get the empirical quantiles of the posterior distribution of 100 exp(ht/2), the latent volatilities of yt over time in percent: Vasin,Rachel,Jinchao MCMC April 5, 2016 55 / 64
  • 56. Example : Forecast volatility Forecast volatility of yt in percentage t + 1 0.584132065586393 t + 2 0.584587272795878 t + 3 0.584863398891602 t + 4 0.585436225819106 t + 5 0.585646621839762 t + 6 0.585741891671509 t + 7 0.586295462435408 t + 8 0.586822153121447 t + 9 0.587354949275841 t + 10 0.587772869210331 Vasin,Rachel,Jinchao MCMC April 5, 2016 56 / 64
  • 57. Example : Traceplot Vasin,Rachel,Jinchao MCMC April 5, 2016 57 / 64
  • 58. Example : Posterior and Prior densities Vasin,Rachel,Jinchao MCMC April 5, 2016 58 / 64
  • 59. Example : Mean standardized residuals Vasin,Rachel,Jinchao MCMC April 5, 2016 59 / 64
  • 60. Result from STAN : Traceplot Vasin,Rachel,Jinchao MCMC April 5, 2016 60 / 64
  • 61. Result from STAN : Traceplot Vasin,Rachel,Jinchao MCMC April 5, 2016 61 / 64
  • 62. Result from STAN : Traceplot Vasin,Rachel,Jinchao MCMC April 5, 2016 62 / 64
  • 63. Result from STAN : Posterior Density Vasin,Rachel,Jinchao MCMC April 5, 2016 63 / 64
  • 64. References Gregor Kastner (2016) Dealing with Stochastic Volatility in Time Series Using the R package stochvol Kim S, Shephard N, Chib S (1998) Stochastic Volatility: Likelihood Inference and Com- parison With ARCH Models Fruhwirth-Schnatter S, Wagner H (2010) Vasin,Rachel,Jinchao MCMC April 5, 2016 64 / 64