A flexible computational framework for detecting epistasis
Our paper titled "A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility" has been accepted for publication in the Journal of Theoretical Biology. This paper presents a more general data mining approach to implementing the multifactor dimensionality reduction (MDR) method. It will appear in 2006. Here is the abstract for this paper:
Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n=500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.
The ideas in the Journal of Theoretical Biology paper by Moore et al. have been applied to a new paper by Andrew et al. that will appear in the journal Carcinogenesis with the title "Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility" [PubMed]. This paper will also appear in 2006. Here is the abstract:
Study results of single nucleotide polymorphisms (SNPs) and cancer susceptibility are often conflicting, possibly because of the analytic challenges of testing for multiple genetic and environmental risk factors using traditional analytic tools. We investigated the relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility in 355 cases and 559 controls enrolled in a population-based study of bladder cancer in the US. Our multifaceted analytical approach included logistic regression, multifactor dimensionality reduction (MDR), and hierarchical interaction graphs for the analysis of gene-gene and gene-environment interactions followed by linkage disequilibrium and haplotype analysis. Overall, we did not find an association between any single DNA repair gene SNP and bladder cancer risk. We did find a marginally significant elevated risk of the XPD codon 751 homozygote variant among never smokers (adjusted OR 2.5 95% CI (1.0-6.2)). In addition, the XRCC1 194 variant allele was associated with a reduced bladder cancer risk among heavy smokers (adjusted OR 0.4 95%CI (0.2-0.9)). The best predictors of bladder cancer included the XPD codon 751 and 312 SNPs along with smoking. Interpretation of this multifactor model revealed that the relationship between the XPD SNPs and bladder cancer is mostly non-additive while the effect of smoking is mostly additive. Since the two XPD SNPs are in significant linkage disequilibrium (D'=0.52, p=0.0001), we estimated XPD haplotypes. Individuals with variant XPD haplotypes were more susceptible to bladder cancer (e.g. adjusted OR 2.5 95%CI (1.7-3.6)) and the effect was magnified when smoking was considered. These results support the hypothesis that common polymorphisms in DNA repair genes modify bladder cancer risk and emphasize the need for a multifaceted statistical approach to identify gene-gene and gene-environment interactions. [PubMed]