Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Wednesday, March 22, 2006

Epistasis and the release of genetic variation during long-term selection

A very nice paper by Carlborg et al. in Nature Genetics demonstrates that the genetic architecture of a major locus for growth in chicken can be attributed to a genetic network of four interacting loci. My prediction is that we will see more examples of this in model systems, agricultural species, and in humans over the next few years. Studies like this support Alan Templeton's statement that epistasis will be uncovered when properly investigated (see his chapter in Epistasis and the Evolutionary Process).

Carlborg O, Jacobsson L, Ahgren P, Siegel P, Andersson L. Epistasis and the release of genetic variation during long-term selection. Nature Genetics, March 12 (2006) [PubMed]


It is an enigma how long-term selection in model organisms and agricultural species can lead to marked phenotypic changes without exhausting genetic variation for the selected trait. Here, we show that the genetic architecture of an apparently major locus for growth in chicken dissects into a genetic network of four interacting loci. The interactions in this radial network mediate a considerably larger selection response than predicted by a single-locus model.

Monday, March 20, 2006

MDR Analysis of Myocardial Infarction

A new paper by Mannila et al. in Thrombosis and Haemostasis applied MDR to detecting epistatic effects on myocardial infarction.

Mannila MN, Eriksson P, Ericsson CG, Hamsten A, Silveira A. Epistatic and pleiotropic effects of polymorphisms in the fibrinogen and coagulation factor XIII genes on plasma fibrinogen concentration, fibrin gel structure and risk of myocardial infarction. Thromb Haemost. 2006 Mar;95(3):420-7. [PubMed]


An intricate interplay between the genes encoding fibrinogen gamma (FGG), alpha (FGA) and beta (FGB), coagulation factor XIII (F13A1) and interleukin 6 (IL6) and environmental factors is likely to influence plasma fibrinogen concentration, fibrin clot structure and risk of myocardial infarction (MI). In the present study, the potential contribution of SNPs harboured in the fibrinogen, IL6 and F13A1 genes to these biochemical and clinical phenotypes was examined. A database and biobank based on 387 survivors of a first MI and population-based controls were used. Sixty controls were selected according to FGG 9340T > C [rs1049636] genotype for studies on fibrin clot structure using the liquid permeation method. The multifactor dimensionality reduction method was used for interaction analyses. We here report that the FGA 2224G > A [rs2070011] SNP (9.2%), plasma fibrinogen concentration (13.1%) and age (8.1%) appeared as independent determinants of fibrin gel porosity. The FGA 2224G > A SNP modulated the relation between plasma fibrinogen concentration and fibrin clot porosity. The FGG-FGA*4 haplotype, composed of the minor FGG 9340C and FGA 2224A alleles, had similar effects, supporting its reported protective role in relation to MI. Significant epistasis on plasma fibrinogen concentration was detected between the FGA 2224G > A and F13A1 Val34Leu [rs5985] SNPs (p <> C and FGB 1038G > A [rs1800791] SNPs appeared to interact on MI risk, explaining the association of FGG-FGB haplotypes with MI in the absence of effects of individual SNPs. Thus, epistatic and pleiotropic effects of polymorphisms contribute to the variation in plasma fibrinogen concentration, fibrin clot structure and risk of MI.

Sunday, March 19, 2006

New Grant on the Genome-Wide Analysis of Epistasis

The Dartmouth Computational Genetics Laboratory (CGL) is pleased to announce our NIH R01 on "Machine Learning Prediction of Cancer Susceptibility" (PI - Moore) will be funded by the National Library of Medicine (NLM) starting July 1st. This is a four-year grant that will focus on developing, evaluating, and applying novel computational methods for detecting, characterizing, and interpreting epistasis on a genome-wide scale in a large epidemiologic study of bladder cancer susceptibility. All methods and algorithms developed as part of this proposal will be released as part of our open-source MDR software package. The previous Epistasis Blog post on Genetic Programming is an example of the type of methodology we will be exploring.

Tuesday, March 14, 2006

Genetic Programming

Genetic programming (GP) is a computational discovery tool that is inspired by Darwinian evolution and natural selection. We have applied GP and related algorithms to a wide variety of genetic problems including modeling epistasis and biochemical pathways. The GP Bibliography maintained by Dr. Bill Langdon is an important resource for GP publications. Many of these papers can't be found on PubMed. Our list of GP papers in the bibliography can be found here.

Our newest paper will be published and presented as part of the Genetic Programming Theory and Practice (GPTP IV) workshop at the Center for the Study of Complex Systems in Ann Arbor in May. Here is the title and abstract. A preprint will be available upon request in a few weeks.

Moore, J.H., White, B.C. Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice IV. Springer, in press.


Human genetics is undergoing an information explosion. The availability of chip-based technology facilitates the measurement of thousands of DNA sequence variation from across the human genome. The challenge is to sift through these high-dimensional datasets to identify combinations of interacting DNA sequence variations that are predictive of common diseases. The goal of this study is to develop and evaluate a genetic programming (GP) approach to attribute selection and classification in this domain. We simulated genetic datasets of varying size in which the disease model consists of two interacting DNA sequence variations that exhibit no independent effects on class (i.e. epistasis). We show that GP is no better than a simple random search when classification accuracy is used as the fitness function. We then show that including pre-processed estimates of attribute quality using Tuned ReliefF (TuRF) in a multi-objective fitness function that also includes accuracy significantly improves the performance of GP over that of random search. This study demonstrates that GP may be a useful computational discovery tool in this domain. This study raises important questions about the general utility of GP for these types of problems, the importance of data pre-processing, the ideal functional form of the fitness function, and the importance of expert knowledge. We anticipate this study will provide an important baseline for future studies investigating the usefulness of GP as a general computational discovery tool for large-scale genetic studies.

Sunday, March 12, 2006

Analysis of the human protein interactome

A new paper by Ghandi et al. in Nature Genetics analyses more than 70,000 protein-protein interactions in humans. Results like this will play an important role prioritizing gene-gene interaction analyses.

Ghandi et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nature Genetics 2006 Mar;38(3):285-93. [PubMed]


We present the first analysis of the human proteome with regard to interactions between proteins. We also compare the human interactome with the available interaction datasets from yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans) and fly (Drosophila melanogaster). Of >70,000 binary interactions, only 42 were common to human, worm and fly, and only 16 were common to all four datasets. An additional 36 interactions were common to fly and worm but were not observed in humans, although a coimmunoprecipitation assay showed that 9 of the interactions do occur in humans. A re-examination of the connectivity of essential genes in yeast and humans indicated that the available data do not support the presumption that the number of interaction partners can accurately predict whether a gene is essential. Finally, we found that proteins encoded by genes mutated in inherited genetic disorders are likely to interact with proteins known to cause similar disorders, suggesting the existence of disease subnetworks. The human interaction map constructed from our analysis should facilitate an integrative systems biology approach to elucidating the cellular networks that contribute to health and disease states.

Wednesday, March 08, 2006

MDR 1.0.0rc1 Released

The Dartmouth Computational Genetics Laboratory is pleased to announce the release of version 1.0.0rc1 of our multifactor dimensionality reduction (MDR) software package. This is the first release candidate (rc1) for version 1.0. This new version includes several important new features along with a number of minor tweaks and fixes. We will beta test this version over the next few weeks. Please send us your feedback and suggestions.

Download Open-Source MDR 1.0.0rc1 here.

Major New Features:

1) Attribute construction. At the heart of MDR is a contructive induction algorithm that takes two or more SNPs and creates a new attribute that is inserted into the dataset. The goal here is to change the representation space of the data to make interaction easier to detect using any statistical or computational classifier (e.g. naive Bayes, logistic regression, decision trees, etc.). The new attribute construction tab allows the user to select two or more variables and construct a new single variable that is inserted into the dataset. This new dataset can then be analyzed using MDR or exported to other analysis software packages (e.g. R, SAS, SPSS, Weka). This may also be useful for modeling hierarchical epistasis. For more information about attribute construction please see our new paper in the Journal of Theoretical Biology [PubMed].

2) Interaction dendrograms. This new feature was added to the MDR software to facilitate statistical interpretation of MDR models. This is accomplished using estimates of interaction information (entropy-based measures) to measure the amount of information about the class (e.g. case-control status) that is gained by putting two attributes together using MDR. Here, a distance matrix is estimated using these entropy measures which in turn are used to build a dendrogram using hierarchical cluster analysis. These dendrograms indicate the degree of synergy or redundancy of pairs of attributes. Red lines in the dendrogram indicate synergy while blue lines indicate redundancy or correlation. Interaction dendrograms have been described previously using Cartersian products by Dr. Aleks Jakulin [PDF] and are described in our new paper in the Journal of Theoretical Biology [PubMed].

Tuesday, March 07, 2006

Logistic Regression or MDR? Both!

Which is better for detecting epistasis: logistic regression or MDR? An excellent new paper by Millstein et al. in the American Journal of Human Genetics suggests that logistic regression may have more power for detecting epistasis when main effects are present. This is not too surprising. Our multifactor dimensionality reduction (MDR) approach was designed specifically to improve the power to detect epistasis in the absence of detectable main effects. Thus, MDR complements a traditional logistic regression analysis. My experience with real data is that MDR confirms anything found by logistic regression and is able to identify interesting interactions that logistic regression misses. Our new paper in the Journal of Theoretical Biology shows how MDR and logistic regression can be used together in a flexible computational framework.


Millstein J, Conti DV, Gilliland FD, Gauderman WJ. A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet. 2006 Jan;78(1):15-27. [PubMed]

Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006 Jan 31; [PubMed]

Sunday, March 05, 2006

BioGEC Workshop: Call for Papers

This is a reminder that papers are due March 31st for our BioGEC Workshop. More information on the workshop and the call for papers can be found here.

Here is a brief description:

The field of Genetic and Evolutionary Computation (GEC) has greatly benefited by borrowing ideas from the biological sciences. Recently, it has become clear that GEC can help solve biological problems, and thereby “repay its debt”.

The fifth annual workshop on Biological Applications of Genetic and Evolutionary Computation (BioGEC), organized in connection with the 2006 Genetic and Evolutionary Computation Conference (GECCO-2006) in Seattle, USA, is intended to explore and critically evaluate the application of GEC to biological problems. Specifically, the goal is to bring biologists and computer scientists together to foster an exchange of ideas that will yield emergent properties that will move the field forward in unpredictable ways.

The 2006 BioGEC workshop will span two four-hour sessions. The first session will feature a community analysis of a real biological dataset. An important feature of this session is that the biologist that generated the data will be present to provide feedback on the results. The second session will feature poster presentations of new or incomplete work in the BioGEC domain. The goal of this session is to provide a forum for receiving critical feedback on ideas and research results that might not yet be mature.

Workshop Date and Location:

July 9, 2006 (8 hours)

Held at the Renaissance Hotel in Seattle, Washington, USA

Saturday, March 04, 2006

Sexual reproduction selects for robustness and negative epistasis in artificial gene networks

A new paper by Ricardo Azevedo et al. in Nature shows how negative epistasis can evolve as a consequence of sexual reproduction.

Azevedo RB, Lohaus R, Srinivasan S, Dang KK, Burch CL. Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature. 2006 Mar 2;440(7080):87-90. [PubMed]


The mutational deterministic hypothesis for the origin and maintenance of sexual reproduction posits that sex enhances the ability of natural selection to purge deleterious mutations after recombination brings them together into single genomes. This explanation requires negative epistasis, a type of genetic interaction where mutations are more harmful in combination than expected from their separate effects. The conceptual appeal of the mutational deterministic hypothesis has been offset by our inability to identify the mechanistic and evolutionary bases of negative epistasis. Here we show that negative epistasis can evolve as a consequence of sexual reproduction itself. Using an artificial gene network model, we find that recombination between gene networks imposes selection for genetic robustness, and that negative epistasis evolves as a by-product of this selection. Our results suggest that sexual reproduction selects for conditions that favour its own maintenance, a case of evolution forging its own path.

Friday, March 03, 2006

MDR on Sourceforge.net

Our open-source multifactor dimensionality reduction (MDR) software package is ranked #40 out of 642 bioinformatics projects on Sourceforge.net based on download activity. We will have a new version of MDR available for download next week. Stay tuned!