This document discusses challenges with Markov chain Monte Carlo (MCMC) methods for Bayesian phylogenetic inference on distributions with multiple peaks or rugged topographies. It proposes a method called Metropolis-coupled MCMC (MC3) that uses additional heated Markov chains to improve mixing between peaks. While MC3 can find all peaks, the estimated probabilities for different peaks may be incorrect if too few chains or runs are used. The document recommends using higher temperatures rather than more chains for rugged distributions and more chains for broad distributions, with multiple runs, to obtain accurate probability estimates.
1 of 32
Download to read offline
More Related Content
Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions
1. Climbing Peaks and Crossing Valleys:
Metropolis Coupling and Rugged Phylogenetic Distributions
Jeremy M. Brown Robert C. Thomson
@jembrown www.phyleauxgenetics.org
3. Markov Chain Monte Carlo (MCMC)
Tree,Parameter Space
ProbabilityDensity
1) Start somewhere
2) Propose a new position
3) Calculate posterior density
ratio (r) of new to old states
- If r > 1, accept
- If r < 1, accept with
probability r.
4) Record state.
5) Repeat many times.
Yes!
Maybe
11. Alternative Insertion Swaps are Difficult
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
Data
Data
12. The Po-Boy Problem
How do you change the seafood on your po-boy
while someone’s holding the sandwich?
Shrimp
Oysters
Halves of french roll = Naturally monophyletic taxa
Seafood = Inserted taxon
13. Metropolis Coupling (MC3) Improves Mixing
Tree,Parameter Space
ProbabilityDensity Additional heated chains
can act as “scouts”.
Swap?
14. Peaks All Found, But Different Probabilities?
homo_sapiens
chrysemys_picta
sphenodon_tuatara
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
pantherophis_guttata
pelomedusa_subrufa
crocodylus_porosus
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.50
0.25
0.24
0.38
0.25
0.24
Run 1
Run 2
GenerationLnL
15. A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
16. A Closer Look at the Acceptance Ratio
Does chain i like
where chain j is?
Does chain j like
where chain i is?
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
17. A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
r =
p(⌧j, ✓j|D)
p(⌧i, ✓i|D)
1
Ti
1
Tj
18. A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
r =
p(⌧j, ✓j|D)
p(⌧i, ✓i|D)
1
Ti
1
Tj
When temps equal, ALL swaps accepted
regardless of posterior density.
19. A Simple One-Parameter Example
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
https://github.com/jembrown/toyMC3/
20. Max Temp > Number of Chains
2 4 6 8 10
0.00.20.40.60.81.0
Maximum Temperature
PeakOneProbability
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
21. Peaks Have Different “Capture” Probabilities
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
P=0.8 P=0.2
22. Spurious Convergence by Chain Number
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
P=0.8 P=0.2
When two runs end up
with the same distribution
of poorly mixing
chains across peaks,
they will estimate nearly
identical (but incorrect!)
probabilities.
23. Lots of Chains Looks Like Convergence
2 4 6 8 10
0.00.20.40.60.81.0
Maximum Temperature
PeakOneProbability/StandardDeviation
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
24. 0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
Peak One
0.8 * N
Peak Two
0.2 * N
P=0.8 P=0.2
N (large #) Chains
Law of Large Numbers
Lots of Chains Looks Like Convergence
25. Negative Constraint on Bird Monophyly
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Maximum Temperature
Probability
2 Chains
4 Chains
8 Chains
16 Chains
32 Chains
27. Warnings
• Despite improving mixing, MC3 analyses still require
careful thought.
• With small numbers of chains and small numbers of
runs, estimated probabilities can be incorrect but
identical across some runs.
• With large numbers of chains, estimated probabilities
become increasingly similar across all runs.
28. Broad v Rugged Distributions
Tree,Parameter Space
ProbabilityDensity
29. Recommendations
• For rugged distributions,
increase maximum chain
temperature not chain number
• For broad distributions,
increase chain number
• Use more than 2 runs