Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Wednesday, November 30, 2005

A flexible computational framework for detecting epistasis

Our paper titled "A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility" has been accepted for publication in the Journal of Theoretical Biology. This paper presents a more general data mining approach to implementing the multifactor dimensionality reduction (MDR) method. It will appear in 2006. Here is the abstract for this paper:

Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n=500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.

The ideas in the Journal of Theoretical Biology paper by Moore et al. have been applied to a new paper by Andrew et al. that will appear in the journal Carcinogenesis with the title "Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility" [PubMed]. This paper will also appear in 2006. Here is the abstract:

Study results of single nucleotide polymorphisms (SNPs) and cancer susceptibility are often conflicting, possibly because of the analytic challenges of testing for multiple genetic and environmental risk factors using traditional analytic tools. We investigated the relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility in 355 cases and 559 controls enrolled in a population-based study of bladder cancer in the US. Our multifaceted analytical approach included logistic regression, multifactor dimensionality reduction (MDR), and hierarchical interaction graphs for the analysis of gene-gene and gene-environment interactions followed by linkage disequilibrium and haplotype analysis. Overall, we did not find an association between any single DNA repair gene SNP and bladder cancer risk. We did find a marginally significant elevated risk of the XPD codon 751 homozygote variant among never smokers (adjusted OR 2.5 95% CI (1.0-6.2)). In addition, the XRCC1 194 variant allele was associated with a reduced bladder cancer risk among heavy smokers (adjusted OR 0.4 95%CI (0.2-0.9)). The best predictors of bladder cancer included the XPD codon 751 and 312 SNPs along with smoking. Interpretation of this multifactor model revealed that the relationship between the XPD SNPs and bladder cancer is mostly non-additive while the effect of smoking is mostly additive. Since the two XPD SNPs are in significant linkage disequilibrium (D'=0.52, p=0.0001), we estimated XPD haplotypes. Individuals with variant XPD haplotypes were more susceptible to bladder cancer (e.g. adjusted OR 2.5 95%CI (1.7-3.6)) and the effect was magnified when smoking was considered. These results support the hypothesis that common polymorphisms in DNA repair genes modify bladder cancer risk and emphasize the need for a multifaceted statistical approach to identify gene-gene and gene-environment interactions. [PubMed]

Monday, November 21, 2005

Genomic buffering

A new paper by Maisnier-Patin et al. in Nature Genetics documents epistasis in bacteria.

Maisnier-Patin S, Roth JR, Fredriksson A, Nystrom T, Berg OG, Andersson DI. Genomic buffering mitigates the effects of deleterious mutations in bacteria. : Nat Genet. 2005 Nov 6; [Epub ahead of print] [PubMed]


The relationship between the number of randomly accumulated mutations in a genome and fitness is a key parameter in evolutionary biology. Mutations may interact such that their combined effect on fitness is additive (no epistasis), reinforced (synergistic epistasis) or mitigated (antagonistic epistasis). We measured the decrease in fitness caused by increasing mutation number in the bacterium Salmonella typhimurium using a regulated, error-prone DNA polymerase (polymerase IV, DinB). As mutations accumulated, fitness costs increased at a diminishing rate. This suggests that random mutations interact such that their combined effect on fitness is mitigated and that the genome is buffered against the fitness reduction caused by accumulated mutations. Levels of the heat shock chaperones DnaK and GroEL increased in lineages that had accumulated many mutations, and experimental overproduction of GroEL further increased the fitness of lineages containing deleterious mutations. These findings suggest that overexpression of chaperones contributes to antagonistic epistasis.

Sunday, November 20, 2005

Combinatorial Pharmacogenetics

Our paper with Dr. Russ Wilke on epistasis and pharmacogenetics has been published in the November issue of Nature Reviews Drug Discovery. We review our MDR method and introduce a flexible four-step framework for data mining and knowledge discovery in human genetics. A more detailed paper on the latter topic has been revised and is under review.

Wilke RA, Reif DM, Moore JH. Combinatorial pharmacogenetics. Nat Rev Drug Discov. 2005 Nov;4(11):911-8. [PubMed]


Combinatorial pharmacogenetics seeks to characterize genetic variations that affect reactions to potentially toxic agents within the complex metabolic networks of the human body. Polymorphic drug-metabolizing enzymes are likely to represent some of the most common inheritable risk factors associated with common 'disease' phenotypes, such as adverse drug reactions. The relatively high concordance between polymorphisms in drug-metabolizing enzymes and clinical phenotypes indicates that research into this class of polymorphisms could benefit patients in the near future. Characterization of other genes affecting drug disposition (absorption, distribution, metabolism and elimination) will further enhance this process. As with most questions concerning biological systems, the complexity arises out of the combinatorial magnitude of all the possible interactions and pathways. The high-dimensionality of the resulting analysis problem will often overwhelm traditional analysis methods. Novel analysis techniques, such as multifactor dimensionality reduction, offer viable options for evaluating such data.

MDR Detects Epistasis in Prostate Cancer

Jianfeng Xu et al. have used our MDR software to identify a four-locus interaction in a large case-control study of prostate cancer.

Xu et al. The interaction of four genes in the inflammation pathway significantly predicts prostate cancer risk. Cancer Epidemiol Biomarkers Prev. 2005 Nov;14(11):2563-8. [PubMed]


It is widely hypothesized that the interactions of multiple genes influence individual risk to prostate cancer. However, current efforts at identifying prostate cancer risk genes primarily rely on single-gene approaches. In an attempt to fill this gap, we carried out a study to explore the joint effect of multiple genes in the inflammation pathway on prostate cancer risk. We studied 20 genes in the Toll-like receptor signaling pathway as well as several cytokines. For each of these genes, we selected and genotyped haplotype-tagging single nucleotide polymorphisms (SNP) among 1,383 cases and 780 controls from the CAPS (CAncer Prostate in Sweden) study population. A total of 57 SNPs were included in the final analysis. A data mining method, multifactor dimensionality reduction, was used to explore the interaction effects of SNPs on prostate cancer risk. Interaction effects were assessed for all possible n SNP combinations, where n = 2, 3, or 4. For each n SNP combination, the model providing lowest prediction error among 100 cross-validations was chosen. The statistical significance levels of the best models in each n SNP combination were determined using permutation tests. A four-SNP interaction (one SNP each from IL-10, IL-1RN, TIRAP, and TLR5) had the lowest prediction error (43.28%, P = 0.019). Our ability to analyze a large number of SNPs in a large sample size is one of the first efforts in exploring the effect of high-order gene-gene interactions on prostate cancer risk, and this is an important contribution to this new and quickly evolving field.