This document describes using a Beta approximation to model the Wright-Fisher model of genetic drift in population genetics. It discusses using a moment-based approach to calculate the mean and variance of allele frequencies over time, allowing the distribution to be approximated by a Beta distribution. It also describes adding "spikes" to the Beta distribution to better model loss and fixation probabilities at the boundaries of 0 and 1.
1 of 76
Download to read offline
More Related Content
PaulaTataruCSHL
1. using an accurate beta approximation
PAULA TATARU
THOMAS BATAILLON
ASGER HOBOLTH
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
CSHL, April 15th 2015
Inference under the Wright-Fisher model
2. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
2
3. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
2
4. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
›Provide a framework for inferring evolutionary paths
from observed data to
2
5. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems
›Inference of population history from DNA data
› (Variable) population size
› Migration / admixture
› Divergence times
› Selection coefficients
3
6. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: population size
4
H. Li and R. Durbin. Inference of human population history from individual whole-genome
sequences. Nature, 475:493–496, 2011
PSMC
7. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations divergence
5
M. Gautier and R. Vitalis. Inferring population histories using genome-wide allele frequency data.
Molecular biology and evolution, 30(3):654–668, 2013
Kim Tree
8. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
6
J. K. Pickrell and J. K. Pritchard. Inference of population splits and mixtures from genome-wide allele
frequency data. PLOS Genetics, 8(11):e1002967, 2012
TreeMix
9. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
7
Gronau I., Hubisz M. J., Gulko B., Danko C. G., Siepel A. Bayesian inference of ancient human
demography from individual genome sequences. Nature genetics 43(10): 1031-1034, 2011
G-PhoCS
10. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: loci under selection
8
Steinrücken M., Bhaskar A. and Song Y. S. A novel spectral method for inferring general selection from
time series genetic data. The Annals of Applied Statistics 8(4):2203–2222, 2014
spectralHMM
11. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Evolution of a population
forward in time
› Follow one locus (region
in the DNA)
› Different variants at the
locus are called alleles
9
individuals
generations(time)
12. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Basic model: only two
alleles per locus
› Follow the frequency of
one of the alleles
10
individuals
generations(time)
3
2
3
3
4
5
5
allele count
13. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Allele frequency distribution
11
14. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
15. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
16. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
generations(time)
MRCA
17. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
› Coalescent process
terminates when
reaching MRCA
12
individuals
generations(time)
MRCA
18. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher ›The coalescent
Two dual models
13
19. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
›The coalescent
› Backward in time
Two dual models
13
20. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
›The coalescent
› Backward in time
› Follow genealogy
Two dual models
13
21. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
Two dual models
13
22. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
› Scalability
›Sample size decreases
uncertainty
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
› Scalability
›Sample size increases
complexity
Two dual models
13
23. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion ›Moment-based
Approximations to the Wright-Fisher
14
24. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
Approximations to the Wright-Fisher
14
25. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
Approximations to the Wright-Fisher
14
26. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
Approximations to the Wright-Fisher
14
27. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
› Problematic at boundaries
Approximations to the Wright-Fisher
14
28. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution ›Beta distribution
Behavior at the boundaries
15
29. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
30. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
31. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
› Intermediary frequencies
›Beta distribution
› Support: [0, 1]
› Intermediary frequencies
Behavior at the boundaries
15
32. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes
›Use of Wright-Fisher
› Scalable
›Use of moments
› Simple mathematical calculations
›Improve behavior at boundaries
› Preserve mean and variance
16
33. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
34. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
35. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
› Zt allele count
› Xt = Zt /2N
› Zt+1 follows a binomial
distribution
› g encodes the
evolutionary pressures
17
individuals
generations(time)
3
2
3
3
4
5
5
allele count
36. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Drift only
18
individuals
generations(time)
3
2
3
3
4
5
5
allele count
37. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
generations(time)
3
2
4
5
4
3
2
allele count
u v
38. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
generations(time)
3
2
4
5
4
3
2
allele count
u v
39. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
generations(time)
3
2
3
5
4
2
3
allele count
m1 m2
40. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
generations(time)
3
2
3
5
4
2
3
allele count
m1 m2
41. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
42. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
43. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
44. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
45. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
46. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
23
The Beta approximation: Drift only
47. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
23
The Beta approximation: Drift only
48. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
24
The Beta approximation: Drift only
49. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
25
The Beta approximation: Drift only
50. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
26
51. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
›Use recursive approach to calculate
› Loss and fixation probabilities
26
52. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
27
53. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
54. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
55. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
56. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: fixation probability
29
57. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
30
The Beta with spikes: Drift only
58. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
30
The Beta with spikes: Drift only
59. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
31
The Beta with spikes: Drift only
60. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
32
The Beta with spikes: Drift only
61. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Numerical accuracy: Drift only
33
Beta Beta with spikes
62. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
63. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
›Allele frequency distribution is used to
calculate likelihood of data
›Likelihood is numerically optimized
64. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference of divergence times: Drift only
35
65. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
36
66. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
36
67. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
36
68. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
36
69. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
36
70. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
36
71. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
› Recursive formulation enables incorporation
of variable population size
36
72. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
37
73. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
37
74. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
37
75. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
37
76. An accurate Beta approximation
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
›Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
› Extend the approximation for loss/fixation probabilities to
mean and variance
37