Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Sunday, February 26, 2006

William Bateson: a biologist ahead of his time

William Bateson is credited as being the first to use the word epistasis to describe the distortion of Mendelian ratios due to gene-gene interactions. A 2002 paper in the Journal of Genetics by Bateson (no direct relation) provides a short biography of William Bateson.

Bateson, P. William Bateson: a biologist ahead of his time. J Genet. 2002 Aug;81(2):49-58. [PubMed]


William Bateson coined the term genetics and, more than anybody else, championed the principles of heredity discovered by Gregor Mendel. Nevertheless, his reputation is soured by the positions he took about the discontinuities in inheritance that might precede formation of a new species and by his reluctance to accept, in its full-blooded form, the view of chromosomes as the controllers of individual development. Growing evidence suggests that both of these positions have been vindicated. New species are now thought to arise as the result of genetic interactions, chromosomal rearrangements, or both, that render hybrids less viable or sterile. Chromosomes are the sites of genes but genes move between chromosomes much more readily than had been previously believed and chromosomes are not causal in individual development. Development, like speciation, requires an understanding of the interactions between genes and the interplay between the individual and its environment.

Saturday, February 25, 2006

Neural Networks

A new paper by Motsinger et al. in BMC Bioinformatics evaluates and applies a genetic programming neural network (GPNN) approach for detecting epistasis in case-control studies. The strength of this approach is the ability to discover the optimal NN architecture as part of the modeling process.

Motsinger AA, Lee SL, Mellick G, Ritchie MD. GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease. BMC Bioinformatics. 2006 Jan 25;7(1):39 [PubMed]

ABSTRACT: BACKGROUND: The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease. RESULTS: We show that GPNN has high power to detect even relatively small genetic effects (2-3% heritability) in simulated data models involving two and three locus interactions. The limits of detection were reached under conditions with very small heritability (<1%) or when interactions involved more than three loci. We tested GPNN on a real dataset comprised of Parkinson's disease cases and controls and found a two locus interaction between the DLST gene and sex. CONCLUSION: These results indicate that GPNN may be a useful pattern recognition approach for detecting gene-gene and gene-environment interactions.

Friday, February 24, 2006

An utter refutation of the 'Fundamental Theorem of the HapMap'

An interesting new paper by Terwilliger and Hiekkalinna in the European Journal of Human Genetics challenges the assumptions of the HapMap project:

Terwilliger JD, Hiekkalinna T. An utter refutation of the 'Fundamental Theorem of the HapMap'. Eur J Hum Genet. 2006 Feb 15 [PubMed]


The International HapMap Project was proposed in order to quantify linkage disequilibrium (LD) relationships among human DNA polymorphisms in an assortment of populations, in order to facilitate the process of selecting a minimal set of markers that could capture most of the signal from the untyped markers in a genome-wide association study. The central dogma can be summarized by the argument that if a marker is in tight LD with a polymorphism that directly impacts disease risk, as measured by the metric r(2), then one would be able to detect an association between the marker and disease with sample size that was increased by a factor of 1/r(2) over that needed to detect the effect of the functional variant directly. This 'fundamental theorem' holds, however, only if one assumes that the LD between loci and the etiological effect of the functional variant are independent of each other, that they are statistically independent of all other etiological factors (in exposure and action), that sampling is prospective, and that the estimates of r(2) are accurate. None of these are standard operating assumptions, however. We describe the ramifications of these implicit assumptions, and provide simple examples in which the effects of a functional variant could be unequivocally detected if it were directly genotyped, even as markers in high LD with the functional variant would never show association with disease, even in infinite sample sizes. Both theoretical and empirical refutation of the central dogma of genome-wide association studies is thus presented.

Monday, February 20, 2006

Digital Genetics

Dr. Chris Adami has a new review paper on digital genetics. We have previously written that digital experiments may be useful for understanding the relationship between biological and statistical epistasis (see Moore and Williams, Bioessays. 2005 Jun;27(6):637-46.).

Adami C. Digital genetics: unravelling the genetic basis of evolution. Nat Rev Genet. 2006 Feb;7(2):109-18.


Digital genetics, or the genetics of digital organisms, is a new field of research that has become possible as a result of the remarkable power of evolution experiments that use computers. Self-replicating strands of computer code that inhabit specially prepared computers can mutate, evolve and adapt to their environment. Digital organisms make it easy to conduct repeatable, controlled experiments, which have a perfect genetic 'fossil record'. This allows researchers to address fundamental questions about the genetic basis of the evolution of complexity, genome organization, robustness and evolvability, and to test the consequences of mutations, including their interaction and recombination, on the fate of populations and lineages.

Sunday, February 19, 2006


We have collaborated with Dr. Eden Martin at Duke University and Dr. Marylyn Ritchie at Vanderbilt University to merge multifactor dimensionality reduction (MDR) with the pedigree disequilibrium test (PDT) to facilitate the detection of epistasis in pedigrees. Our paper on MDR-PDT has been published in Genetic Epidemiology.

Martin ER, Ritchie MD, Hahn L, Kang S, Moore JH. A novel method to identify gene-gene effects in nuclear families: the MDR-PDT. Genet Epidemiol. 2006 Feb;30(2):111-23. [PubMed]


It is now well recognized that gene-gene and gene-environment interactions are important in complex diseases, and statistical methods to detect interactions are becoming widespread. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions that occur in the absence of detectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method [Ritchie et al., 2001: Am J Hum Genet 69:138-147] was developed. The MDR is well-suited for examining high-order interactions and detecting interactions without main effects. The MDR was originally designed to analyze balanced case-control data. The analysis can use family data, but requires a single matched pair be selected from each family. This may be a discordant sib pair, or may be constructed from triad data when parents are available. To take advantage of additional affected and unaffected siblings requires a test statistic that measures the association of genotype with disease in general nuclear families. We have developed a novel test, the MDR-PDT, by merging the MDR method with the genotype-Pedigree Disequilibrium Test (geno-PDT)[Martin et al., 2003: Genet Epidemiol 25:203-213]. MDR-PDT allows identification of single-locus effects or joint effects of multiple loci in families of diverse structure. We present simulations to demonstrate the validity of the test and evaluate its power. To examine its applicability to real data, we applied the MDR-PDT to data from candidate genes for Alzheimer disease (AD) in a large family dataset. These results show the utility of the MDR-PDT for understanding the genetics of complex diseases.