ݺߣ

ݺߣShare a Scribd company logo
Climbing Peaks and Crossing Valleys:
Metropolis Coupling and Rugged Phylogenetic Distributions
Jeremy M. Brown Robert C. Thomson
@jembrown www.phyleauxgenetics.org
Bayesian Inference Requires Integration
Tree,Parameter Space
ProbabilityDensity
Ƭ2
Ƭ1
Markov Chain Monte Carlo (MCMC)
Tree,Parameter Space
ProbabilityDensity
1) Start somewhere
2) Propose a new position
3) Calculate posterior density
ratio (r) of new to old states
- If r > 1, accept
- If r < 1, accept with
probability r.
4) Record state.
5) Repeat many times.
Yes!
Maybe
Markov Chain Monte Carlo (MCMC)
Tree,Parameter Space
ProbabilityDensity
MCMC Has Trouble With
Rugged Distributions
Tree,Parameter Space
ProbabilityDensity
Ƭ2
Ƭ1
Tree,Parameter Space
ProbabilityDensity
Tree,Parameter Space
ProbabilityDensity
MCMC Has Trouble With
Rugged Distributions
Tree,Parameter Space
ProbabilityDensity
MCMC Has Trouble With
Rugged Distributions
Tree,Parameter Space
ProbabilityDensity
Bipartition Bayes Factors
A
B
C
E
D
Marginal likelihood
with AB | CDE
Bayes Factor
Marginal likelihood
without AB | CDE+ -
Negative Constraints = Rugged Distributions
Negative Constraints = Rugged Distributions
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta
homo_sapiens
chrysemys_picta
sphenodon_tuatara
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
pantherophis_guttata
pelomedusa_subrufa
crocodylus_porosus
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
Alternative Insertion Swaps are Difficult
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
Data
Data
The Po-Boy Problem
How do you change the seafood on your po-boy
while someone’s holding the sandwich?
Shrimp
Oysters
Halves of french roll = Naturally monophyletic taxa
Seafood = Inserted taxon
Metropolis Coupling (MC3) Improves Mixing
Tree,Parameter Space
ProbabilityDensity Additional heated chains
can act as “scouts”.
Swap?
Peaks All Found, But Different Probabilities?
homo_sapiens
chrysemys_picta
sphenodon_tuatara
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
pantherophis_guttata
pelomedusa_subrufa
crocodylus_porosus
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.50
0.25
0.24
0.38
0.25
0.24
Run 1
Run 2
GenerationLnL
A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
A Closer Look at the Acceptance Ratio
Does chain i like
where chain j is?
Does chain j like
where chain i is?
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
r =

p(⌧j, ✓j|D)
p(⌧i, ✓i|D)
1
Ti
1
Tj
A Closer Look at the Acceptance Ratio
r =
pi(⌧j, ✓j|D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
r =

p(⌧j, ✓j|D)
p(⌧i, ✓i|D)
1
Ti
1
Tj
When temps equal, ALL swaps accepted
regardless of posterior density.
A Simple One-Parameter Example
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
https://github.com/jembrown/toyMC3/
Max Temp > Number of Chains
2 4 6 8 10
0.00.20.40.60.81.0
Maximum Temperature
PeakOneProbability
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
Peaks Have Different “Capture” Probabilities
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
P=0.8 P=0.2
Spurious Convergence by Chain Number
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
P=0.8 P=0.2
When two runs end up
with the same distribution
of poorly mixing
chains across peaks,
they will estimate nearly
identical (but incorrect!)
probabilities.
Lots of Chains Looks Like Convergence
2 4 6 8 10
0.00.20.40.60.81.0
Maximum Temperature
PeakOneProbability/StandardDeviation
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
Peak One
0.8 * N
Peak Two
0.2 * N
P=0.8 P=0.2
N (large #) Chains
Law of Large Numbers
Lots of Chains Looks Like Convergence
Negative Constraint on Bird Monophyly
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Maximum Temperature
Probability
2 Chains
4 Chains
8 Chains
16 Chains
32 Chains
Negative Constraint on Bird Monophyly
zebra_finch
homo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Maximum Temperature
Probability/StandardDeviation
2 Chains
4 Chains
8 Chains
16 Chains
32 Chains
Warnings
• Despite improving mixing, MC3 analyses still require
careful thought.
• With small numbers of chains and small numbers of
runs, estimated probabilities can be incorrect but
identical across some runs.
• With large numbers of chains, estimated probabilities
become increasingly similar across all runs.
Broad v Rugged Distributions
Tree,Parameter Space
ProbabilityDensity
Recommendations
• For rugged distributions,
increase maximum chain
temperature not chain number
• For broad distributions,
increase chain number
• Use more than 2 runs
ThankYou
DEB-1355071

DEB-1354506
@jembrown
Michael Landis
Karen Cranston
Negative Constraints = Rugged Distributions
TreeScaper
Guifang Zhou (SSB symposium lightning talk) - Monday, 1:45-1:50 - Ballroom A

"A network framework to explore phylogenetic structure in genome data"

Guifang Zhou (iEvoBio talk) - Tuesday, 2:05-2:12 - Meeting Room 9C

"TreeScaper: Software to visualize and extract phylogenetic signals from sets of trees”

https://github.com/whuang08/TreeScaper
Spurious Convergence by Chain Number
0.0 0.2 0.4 0.6 0.8 1.0
012345
Parameter Value
ProbabilityDensity
0.8
0.2
2 Chains, 0 Chains
0.64
1 Chain, 1 Chain
0.32
0 Chains, 2 Chains
0.04 P=0.8 P=0.2
2 Chains

More Related Content

Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

  • 1. Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions Jeremy M. Brown Robert C. Thomson @jembrown www.phyleauxgenetics.org
  • 2. Bayesian Inference Requires Integration Tree,Parameter Space ProbabilityDensity Ƭ2 Ƭ1
  • 3. Markov Chain Monte Carlo (MCMC) Tree,Parameter Space ProbabilityDensity 1) Start somewhere 2) Propose a new position 3) Calculate posterior density ratio (r) of new to old states - If r > 1, accept - If r < 1, accept with probability r. 4) Record state. 5) Repeat many times. Yes! Maybe
  • 4. Markov Chain Monte Carlo (MCMC) Tree,Parameter Space ProbabilityDensity
  • 5. MCMC Has Trouble With Rugged Distributions Tree,Parameter Space ProbabilityDensity Ƭ2 Ƭ1
  • 7. Tree,Parameter Space ProbabilityDensity MCMC Has Trouble With Rugged Distributions Tree,Parameter Space ProbabilityDensity
  • 8. Bipartition Bayes Factors A B C E D Marginal likelihood with AB | CDE Bayes Factor Marginal likelihood without AB | CDE+ -
  • 9. Negative Constraints = Rugged Distributions
  • 10. Negative Constraints = Rugged Distributions homo_sapiens pantherophis_guttata zebra_finch anolis_carolinensis gallus_gallus alligator_mississippiensis crocodylus_porosus pelomedusa_subrufa sphenodon_tuatara chrysemys_picta homo_sapiens chrysemys_picta sphenodon_tuatara zebra_finch anolis_carolinensis gallus_gallus alligator_mississippiensis pantherophis_guttata pelomedusa_subrufa crocodylus_porosus zebra_finch homo_sapiens crocodylus_porosus sphenodon_tuatara pantherophis_guttata chrysemys_picta alligator_mississippiensis gallus_gallus anolis_carolinensis pelomedusa_subrufa
  • 11. Alternative Insertion Swaps are Difficult homo_sapiens pantherophis_guttata zebra_finch anolis_carolinensis gallus_gallus alligator_mississippiensis crocodylus_porosus pelomedusa_subrufa sphenodon_tuatara chrysemys_picta zebra_finch homo_sapiens crocodylus_porosus sphenodon_tuatara pantherophis_guttata chrysemys_picta alligator_mississippiensis gallus_gallus anolis_carolinensis pelomedusa_subrufa Data Data
  • 12. The Po-Boy Problem How do you change the seafood on your po-boy while someone’s holding the sandwich? Shrimp Oysters Halves of french roll = Naturally monophyletic taxa Seafood = Inserted taxon
  • 13. Metropolis Coupling (MC3) Improves Mixing Tree,Parameter Space ProbabilityDensity Additional heated chains can act as “scouts”. Swap?
  • 14. Peaks All Found, But Different Probabilities? homo_sapiens chrysemys_picta sphenodon_tuatara zebra_finch anolis_carolinensis gallus_gallus alligator_mississippiensis pantherophis_guttata pelomedusa_subrufa crocodylus_porosus homo_sapiens pantherophis_guttata zebra_finch anolis_carolinensis gallus_gallus alligator_mississippiensis crocodylus_porosus pelomedusa_subrufa sphenodon_tuatara chrysemys_picta zebra_finch homo_sapiens crocodylus_porosus sphenodon_tuatara pantherophis_guttata chrysemys_picta alligator_mississippiensis gallus_gallus anolis_carolinensis pelomedusa_subrufa 0.50 0.25 0.24 0.38 0.25 0.24 Run 1 Run 2 GenerationLnL
  • 15. A Closer Look at the Acceptance Ratio r = pi(⌧j, ✓j|D) pj(⌧i, ✓i|D) pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
  • 16. A Closer Look at the Acceptance Ratio Does chain i like where chain j is? Does chain j like where chain i is? r = pi(⌧j, ✓j|D) pj(⌧i, ✓i|D) pi(⌧i, ✓i|D) pj(⌧j, ✓j|D)
  • 17. A Closer Look at the Acceptance Ratio r = pi(⌧j, ✓j|D) pj(⌧i, ✓i|D) pi(⌧i, ✓i|D) pj(⌧j, ✓j|D) r =  p(⌧j, ✓j|D) p(⌧i, ✓i|D) 1 Ti 1 Tj
  • 18. A Closer Look at the Acceptance Ratio r = pi(⌧j, ✓j|D) pj(⌧i, ✓i|D) pi(⌧i, ✓i|D) pj(⌧j, ✓j|D) r =  p(⌧j, ✓j|D) p(⌧i, ✓i|D) 1 Ti 1 Tj When temps equal, ALL swaps accepted regardless of posterior density.
  • 19. A Simple One-Parameter Example 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2 https://github.com/jembrown/toyMC3/
  • 20. Max Temp > Number of Chains 2 4 6 8 10 0.00.20.40.60.81.0 Maximum Temperature PeakOneProbability 5 Chains 10 Chains 20 Chains 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2
  • 21. Peaks Have Different “Capture” Probabilities 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2 P=0.8 P=0.2
  • 22. Spurious Convergence by Chain Number 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2 P=0.8 P=0.2 When two runs end up with the same distribution of poorly mixing chains across peaks, they will estimate nearly identical (but incorrect!) probabilities.
  • 23. Lots of Chains Looks Like Convergence 2 4 6 8 10 0.00.20.40.60.81.0 Maximum Temperature PeakOneProbability/StandardDeviation 5 Chains 10 Chains 20 Chains 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2
  • 24. 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2 Peak One 0.8 * N Peak Two 0.2 * N P=0.8 P=0.2 N (large #) Chains Law of Large Numbers Lots of Chains Looks Like Convergence
  • 25. Negative Constraint on Bird Monophyly zebra_finch homo_sapiens crocodylus_porosus sphenodon_tuatara pantherophis_guttata chrysemys_picta alligator_mississippiensis gallus_gallus anolis_carolinensis pelomedusa_subrufa 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.20.40.60.81.0 Maximum Temperature Probability 2 Chains 4 Chains 8 Chains 16 Chains 32 Chains
  • 26. Negative Constraint on Bird Monophyly zebra_finch homo_sapiens crocodylus_porosus sphenodon_tuatara pantherophis_guttata chrysemys_picta alligator_mississippiensis gallus_gallus anolis_carolinensis pelomedusa_subrufa 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.20.40.60.81.0 Maximum Temperature Probability/StandardDeviation 2 Chains 4 Chains 8 Chains 16 Chains 32 Chains
  • 27. Warnings • Despite improving mixing, MC3 analyses still require careful thought. • With small numbers of chains and small numbers of runs, estimated probabilities can be incorrect but identical across some runs. • With large numbers of chains, estimated probabilities become increasingly similar across all runs.
  • 28. Broad v Rugged Distributions Tree,Parameter Space ProbabilityDensity
  • 29. Recommendations • For rugged distributions, increase maximum chain temperature not chain number • For broad distributions, increase chain number • Use more than 2 runs
  • 31. Negative Constraints = Rugged Distributions TreeScaper Guifang Zhou (SSB symposium lightning talk) - Monday, 1:45-1:50 - Ballroom A "A network framework to explore phylogenetic structure in genome data" Guifang Zhou (iEvoBio talk) - Tuesday, 2:05-2:12 - Meeting Room 9C "TreeScaper: Software to visualize and extract phylogenetic signals from sets of trees” https://github.com/whuang08/TreeScaper
  • 32. Spurious Convergence by Chain Number 0.0 0.2 0.4 0.6 0.8 1.0 012345 Parameter Value ProbabilityDensity 0.8 0.2 2 Chains, 0 Chains 0.64 1 Chain, 1 Chain 0.32 0 Chains, 2 Chains 0.04 P=0.8 P=0.2 2 Chains