ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
using an accurate beta approximation
PAULA TATARU
THOMAS BATAILLON
ASGER HOBOLTH
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
CSHL, April 15th 2015
Inference under the Wright-Fisher model
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
2
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
2
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
›Provide a framework for inferring evolutionary paths
from observed data to
2
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems
›Inference of population history from DNA data
› (Variable) population size
› Migration / admixture
› Divergence times
› Selection coefficients
3
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: population size
4
H. Li and R. Durbin. Inference of human population history from individual whole-genome
sequences. Nature, 475:493–496, 2011
PSMC
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations divergence
5
M. Gautier and R. Vitalis. Inferring population histories using genome-wide allele frequency data.
Molecular biology and evolution, 30(3):654–668, 2013
Kim Tree
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
6
J. K. Pickrell and J. K. Pritchard. Inference of population splits and mixtures from genome-wide allele
frequency data. PLOS Genetics, 8(11):e1002967, 2012
TreeMix
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
7
Gronau I., Hubisz M. J., Gulko B., Danko C. G., Siepel A. Bayesian inference of ancient human
demography from individual genome sequences. Nature genetics 43(10): 1031-1034, 2011
G-PhoCS
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: loci under selection
8
Steinrücken M., Bhaskar A. and Song Y. S. A novel spectral method for inferring general selection from
time series genetic data. The Annals of Applied Statistics 8(4):2203–2222, 2014
spectralHMM
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Evolution of a population
forward in time
› Follow one locus (region
in the DNA)
› Different variants at the
locus are called alleles
9
individuals
generations(time)
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Basic model: only two
alleles per locus
› Follow the frequency of
one of the alleles
10
individuals
generations(time)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Allele frequency distribution
11
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
MRCA
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
› Coalescent process
terminates when
reaching MRCA
12
individuals
generations(time)
MRCA
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher ›The coalescent
Two dual models
13
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
›The coalescent
› Backward in time
Two dual models
13
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
›The coalescent
› Backward in time
› Follow genealogy
Two dual models
13
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
Two dual models
13
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
› Scalability
›Sample size decreases
uncertainty
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
› Scalability
›Sample size increases
complexity
Two dual models
13
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion ›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
› Problematic at boundaries
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution ›Beta distribution
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
› Intermediary frequencies
›Beta distribution
› Support: [0, 1]
› Intermediary frequencies
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes
›Use of Wright-Fisher
› Scalable
›Use of moments
› Simple mathematical calculations
›Improve behavior at boundaries
› Preserve mean and variance
16
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
› g encodes the
evolutionary pressures
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Drift only
18
individuals
generations(time)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
generations(time)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
generations(time)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
generations(time)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
generations(time)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
24
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
25
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
26
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
›Use recursive approach to calculate
› Loss and fixation probabilities
26
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
27
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: fixation probability
29
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
31
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
32
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Numerical accuracy: Drift only
33
Beta Beta with spikes
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
›Allele frequency distribution is used to
calculate likelihood of data
›Likelihood is numerically optimized
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference of divergence times: Drift only
35
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
› Recursive formulation enables incorporation
of variable population size
36
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
37
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
37
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
37
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
37
An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
› Extend the approximation for loss/fixation probabilities to
mean and variance
37

More Related Content

PaulaTataruCSHL

  • 1. using an accurate beta approximation PAULA TATARU THOMAS BATAILLON ASGER HOBOLTH AARHUS UNIVERSITY Bioinformatics Research Centre CSHL, April 15th 2015 Inference under the Wright-Fisher model
  • 2. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Theoretical population genetics 2
  • 3. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Theoretical population genetics ›Mathematical models formalize the evolution of genetic variation within and between populations 2
  • 4. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Theoretical population genetics ›Mathematical models formalize the evolution of genetic variation within and between populations ›Provide a framework for inferring evolutionary paths from observed data to 2
  • 5. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems ›Inference of population history from DNA data › (Variable) population size › Migration / admixture › Divergence times › Selection coefficients 3
  • 6. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems: population size 4 H. Li and R. Durbin. Inference of human population history from individual whole-genome sequences. Nature, 475:493–496, 2011 PSMC
  • 7. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems: populations divergence 5 M. Gautier and R. Vitalis. Inferring population histories using genome-wide allele frequency data. Molecular biology and evolution, 30(3):654–668, 2013 Kim Tree
  • 8. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems: populations admixture 6 J. K. Pickrell and J. K. Pritchard. Inference of population splits and mixtures from genome-wide allele frequency data. PLOS Genetics, 8(11):e1002967, 2012 TreeMix
  • 9. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems: populations admixture 7 Gronau I., Hubisz M. J., Gulko B., Danko C. G., Siepel A. Bayesian inference of ancient human demography from individual genome sequences. Nature genetics 43(10): 1031-1034, 2011 G-PhoCS
  • 10. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference problems: loci under selection 8 Steinrücken M., Bhaskar A. and Song Y. S. A novel spectral method for inferring general selection from time series genetic data. The Annals of Applied Statistics 8(4):2203–2222, 2014 spectralHMM
  • 11. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the Wright-Fisher model › Evolution of a population forward in time › Follow one locus (region in the DNA) › Different variants at the locus are called alleles 9 individuals generations(time)
  • 12. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the Wright-Fisher model › Basic model: only two alleles per locus › Follow the frequency of one of the alleles 10 individuals generations(time) 3 2 3 3 4 5 5 allele count
  • 13. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Allele frequency distribution 11
  • 14. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the coalescent model › Trace the genealogy of sampled individuals backward in time 12 individuals generations(time)
  • 15. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the coalescent model › Trace the genealogy of sampled individuals backward in time 12 individuals generations(time)
  • 16. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the coalescent model › Trace the genealogy of sampled individuals backward in time 12 individuals generations(time) MRCA
  • 17. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Population genetics: the coalescent model › Trace the genealogy of sampled individuals backward in time › Coalescent process terminates when reaching MRCA 12 individuals generations(time) MRCA
  • 18. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›The Wright-Fisher ›The coalescent Two dual models 13
  • 19. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›The Wright-Fisher › Forward in time ›The coalescent › Backward in time Two dual models 13
  • 20. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›The Wright-Fisher › Forward in time › Follow allele frequency ›The coalescent › Backward in time › Follow genealogy Two dual models 13
  • 21. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›The Wright-Fisher › Forward in time › Follow allele frequency › Selection ›The coalescent › Backward in time › Follow genealogy › Recombination Two dual models 13
  • 22. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›The Wright-Fisher › Forward in time › Follow allele frequency › Selection › Scalability ›Sample size decreases uncertainty ›The coalescent › Backward in time › Follow genealogy › Recombination › Scalability ›Sample size increases complexity Two dual models 13
  • 23. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Diffusion ›Moment-based Approximations to the Wright-Fisher 14
  • 24. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Diffusion › Large population size › Infinitesimal change ›Moment-based Approximations to the Wright-Fisher 14
  • 25. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Diffusion › Large population size › Infinitesimal change ›Moment-based › Convenient distributions › Normal distribution › Beta distribution Approximations to the Wright-Fisher 14
  • 26. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Diffusion › Large population size › Infinitesimal change › No closed solution › Cumbersome to evaluate ›Moment-based › Convenient distributions › Normal distribution › Beta distribution › Closed analytical forms › Fast to evaluate Approximations to the Wright-Fisher 14
  • 27. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Diffusion › Large population size › Infinitesimal change › No closed solution › Cumbersome to evaluate ›Moment-based › Convenient distributions › Normal distribution › Beta distribution › Closed analytical forms › Fast to evaluate › Problematic at boundaries Approximations to the Wright-Fisher 14
  • 28. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Normal distribution ›Beta distribution Behavior at the boundaries 15
  • 29. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Normal distribution › Support: real line ›Beta distribution › Support: [0, 1] Behavior at the boundaries 15
  • 30. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Normal distribution › Support: real line › Truncation ›Incorrect variance ›Beta distribution › Support: [0, 1] Behavior at the boundaries 15
  • 31. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre ›Normal distribution › Support: real line › Truncation ›Incorrect variance › Intermediary frequencies ›Beta distribution › Support: [0, 1] › Intermediary frequencies Behavior at the boundaries 15
  • 32. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes ›Use of Wright-Fisher › Scalable ›Use of moments › Simple mathematical calculations ›Improve behavior at boundaries › Preserve mean and variance 16
  • 33. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model › Zt allele count › Xt = Zt /2N › Zt+1 follows a binomial distribution 17 individuals generations(time) 3 2 3 3 4 5 5 allele count
  • 34. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model › Zt allele count › Xt = Zt /2N › Zt+1 follows a binomial distribution 17 individuals generations(time) 3 2 3 3 4 5 5 allele count
  • 35. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model › Zt allele count › Xt = Zt /2N › Zt+1 follows a binomial distribution › g encodes the evolutionary pressures 17 individuals generations(time) 3 2 3 3 4 5 5 allele count
  • 36. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Drift only 18 individuals generations(time) 3 2 3 3 4 5 5 allele count
  • 37. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Mutations 19 individuals generations(time) 3 2 4 5 4 3 2 allele count u v
  • 38. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Mutations 19 individuals generations(time) 3 2 4 5 4 3 2 allele count u v
  • 39. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Migration 20 individuals generations(time) 3 2 3 5 4 2 3 allele count m1 m2
  • 40. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Migration 20 individuals generations(time) 3 2 3 5 4 2 3 allele count m1 m2
  • 41. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Linear forces ›Mutations ›Migration ›Mutations & Migration 21
  • 42. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Wright Fisher model: Linear forces ›Mutations ›Migration ›Mutations & Migration 21
  • 43. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 22 The Beta approximation: Main idea ›The density of Xt
  • 44. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 22 The Beta approximation: Main idea ›The density of Xt ›Use recursive approach to calculate › Mean and variance
  • 45. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 22 The Beta approximation: Main idea ›The density of Xt ›Use recursive approach to calculate › Mean and variance
  • 46. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 23 The Beta approximation: Drift only
  • 47. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 23 The Beta approximation: Drift only
  • 48. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 24 The Beta approximation: Drift only
  • 49. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 25 The Beta approximation: Drift only
  • 50. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: Main idea ›The density of Xt 26
  • 51. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: Main idea ›The density of Xt ›Use recursive approach to calculate › Loss and fixation probabilities 26
  • 52. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: loss probability 27
  • 53. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: loss probability 28
  • 54. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: loss probability 28
  • 55. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: loss probability 28
  • 56. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre The Beta with spikes: fixation probability 29
  • 57. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 30 The Beta with spikes: Drift only
  • 58. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 30 The Beta with spikes: Drift only
  • 59. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 31 The Beta with spikes: Drift only
  • 60. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 32 The Beta with spikes: Drift only
  • 61. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Numerical accuracy: Drift only 33 Beta Beta with spikes
  • 62. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 34 Inference of divergence times: Drift only ›Simulated data › 5000 independent loci › 100 samples in each population › 50 data sets (replicates)
  • 63. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre 34 Inference of divergence times: Drift only ›Simulated data › 5000 independent loci › 100 samples in each population › 50 data sets (replicates) ›Allele frequency distribution is used to calculate likelihood of data ›Likelihood is numerically optimized
  • 64. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Inference of divergence times: Drift only 35
  • 65. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes 36
  • 66. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation 36
  • 67. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation › Improves the quality of the approximation 36
  • 68. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation › Improves the quality of the approximation › Simple mathematical formulation 36
  • 69. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation › Improves the quality of the approximation › Simple mathematical formulation › Works under linear evolutionary forces 36
  • 70. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation › Improves the quality of the approximation › Simple mathematical formulation › Works under linear evolutionary forces › Comparable to state of the art methods for inference of divergence times 36
  • 71. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Conclusions ›Beta with spikes › An extension built on the beta approximation › Improves the quality of the approximation › Simple mathematical formulation › Works under linear evolutionary forces › Comparable to state of the art methods for inference of divergence times › Recursive formulation enables incorporation of variable population size 36
  • 72. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Future work ›Incorporate selection 37
  • 73. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Future work ›Incorporate selection › Non-linear evolutionary force 37
  • 74. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Future work ›Incorporate selection › Non-linear evolutionary force › Positive selection increases probability of fixation 37
  • 75. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Future work ›Incorporate selection › Non-linear evolutionary force › Positive selection increases probability of fixation › Mean and variance are no longer available in closed form 37
  • 76. An accurate Beta approximation Paula Tataru paula@birc.au.dk AARHUS UNIVERSITY Bioinformatics Research Centre Future work ›Incorporate selection › Non-linear evolutionary force › Positive selection increases probability of fixation › Mean and variance are no longer available in closed form › Extend the approximation for loss/fixation probabilities to mean and variance 37