The document provides an introduction to epistasis detection in genome-wide association studies (GWAS). It defines epistasis as the detection of causal SNPs for a disease through their interactions, rather than their individual effects. It outlines the problem of epistasis detection as analyzing large genotype datasets to find combinations of SNPs that maximize an association measure with binary disease status. Popular measures discussed are chi-squared and mutual information statistics. The document reviews computational methods for epistasis detection, including Multifactor Dimensionality Reduction, SNPHarvester, and SNPRuler. It notes the challenges of reducing computational burden and detecting higher-order epistatic interactions.
1 of 26
Downloaded 24 times
More Related Content
a brief introduction to epistasis detection
1. A brief introduction to
epistasis detection in GWAS
2014. 01. 27.
Hyun-hwan Jeong
4. Single Nucleotide Polymorphism
? A single letter change in DNA sequence
? DNA sequence : 99.9% identical
? Common type of genetic variation
? ¡Ý 1% changes in general population
¡ATTCGCCGGCTGCAACGTTAGA¡
¡ATTCGCCGGCTGCAGCGTTAGA¡
¡ATTCGCCGGCTGCATCGTTAGA¡
4
5. Genotype, phenotype and Allele
http://en.wikipedia.org/wiki/Phenotype
phenotype
genotype
allele
5
6. Genome Wide Association Study
for relation between Single SNP and disease
Manhattan plot of the GWAS of the discovery cohort comprising
2,346 SSc cases and 5,193 healthy controls. - Nature Genetics 42, 426¨C429 (2010)
6
7. Why is detecting epistasis needed
in GWAS?
An illustration of interaction pattern between two SNPs with no marginal effect. -
Bioinformatics 26, 30-37 (2010)
7
9. Problem definition
Epistasis detection problem
? Object
? Detection of causative SNPs for disease
? Maximum value for defined measure
? Dataset
? 0.5M ~ 1M SNPs
? 4,000 ~ 5,000 subjects
? Binary disease status(case/control)
? 100MB ~ 1GB genotype data file
9
17. Methods ¨C Computational
Approaches
? Multifactor Dimensionality Reduction
(Ritchie et al. 2002)
? SNPHarvester (Yang et al. 2009)
? SNPRuler (Wan et al. 2010)
? Mutual Information With Clustering
(Leem et al. 2014)
17
21. Methods
SNPHarvester(2/2)
? Local search
? Local optima problem
? PathSeeker algorithm
? Successive Runs
? Score function : ?2 ? ?????
21
22. Methods
SNPRuler
? Pattern-based method
? Predictive rule
? Branch-and-bound algorithm
? Upper bound of ?2 ? ????? in d.f. is 1
22
23. Methods
Mutual Information With Clustering(1/2)
: SNPs
: causative SNPs
d1
d2
distance
Score=d1+d2
Centroid 1
Centroid 2
Centroid 3
3 SNPs with the
highest mutual
information value
m candidates
m candidates
m candidates
23
24. Methods
Mutual Information With Clustering(2/2)
? Mutual information
? As distance measure for clustering
? K-means clustering algorithm
? Candidate selection
? Reduce search space dramtically
? Can detect high-order epistatic interaction
? Also, shows better performance (power, execution time)
than previous methods
24