Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Monday, June 29, 2009

Gene-Environment Interaction in Asthma

The complexity of the genotype-phenotype mapping relationship seems to be catching on in asthma genetics. Why would gene-environment interaction not be excepted to play a role in asthma? Why would anyone investigate the genetic basis of asthma one SNP at a time? ...or one environment at a time?

von Mutius E. Gene-environment interactions in asthma. J Allergy Clin Immunol. 2009 Jan;123(1):3-11 [PubMed]

Martinez FD. Gene-environment interaction in complex diseases: asthma as an illustrative case. Novartis Found Symp. 2008;293:184-92 [PubMed]

See also my previous post on this topic.

Saturday, June 27, 2009

European Conference on Artificial Life (ECAL'09)

The following two papers were accepted for publication and presentation as part of the 2009 European Conference on Artificial Life (ECAL'09) to be held in Budapest in September. Hope to see you there!

Greene CS, Hill DP, Moore JH. An Open-Ended Computational Evolution Strategy for Evolving Parsimonious Solutions to Human Genetics Problems. Lecture Notes in Computer Science, in press (2009).


In human genetics a primary goal is the discovery of genetic factors that predict individual susceptibility to common human diseases, but this has proven difficult to achieve because these diseases are likely to result from the joint failure of two or more interacting components. Currently geneticists measure genetic variations from across the genomes of individuals with and without the disease. The association of single variants with disease is then assessed. Our goal is to develop methods capable of identifying combinations of genetic variations predictive of discrete measures of health in human population data. “Artificial evolution” approaches loosely based on real biological processes have been developed and applied, but it has recently been suggested that “computational evolution” approaches will be more likely to solve problems of interest to biomedical researchers. Here we introduce a method to evolve parsimonious solutions in an open-ended computational evolution framework that more closely mimics the complexity of biological systems. In ecological systems a highly specialized organism can fail to thrive as the environment changes. By introducing numerous small changes into training data, i.e. the environment, during evolution we drive evolution towards general solutions. We show that this method leads to smaller solutions and does not reduce the power of an open-ended computational evolution system. This method of environmental perturbation fits within the computational evolution framework and is an effective method of evolving parsimonious solutions.

Gilmore JM, Greene CS, Andrews PC, Moore JH. An Analysis of New Expert Knowledge Scaling Methods for Biologically Inspired Computing. Lecture Notes in Computer Science, in press (2009).


High-throughput genotyping has made genome-wide data on human genetic variation commonly available, however, finding associations between specific variations and common diseases has proven difficult. The size of these datasets presents an informatics challenge because exhaustive searching for even only pair-wise interactions is computationally expensive. Instead, search methods must be used which efficiently and effectively mine these datasets. Furthermore, individual susceptibility to common diseases likely depends on gene-gene interactions, i.e. epistasis, and not merely on independent genes. To meet these challenges, we turn to a biologically inspired ant colony optimization strategy. We have previously developed an ant system which allows the incorporation of expert knowledge as heuristic information. One method of scaling expert knowledge to probabilities usable in the algorithm, an exponential distribution function which respects intervals between raw expert knowledge scores, has been previously examined. Here, we develop and evaluate three additional expert knowledge scaling methods and find parameter sets for each which maximize power.

Wednesday, June 17, 2009

Neglected Advances in Classical Genetics

I ran across this great paper the other day. We don't do a good enough job teaching classicial genetics. Papers on 'omics' approaches have replaced the papers that form the foundation of genetics. This is a not a good trend and we need both.

Wilmer J. Miller and Willard F. Hollander. Three neglected advances in classical genetics. BioScience Vol. 45 No 2 Feb. 1995 pp. 98-104. [Web]

"Geneticists now concentrate on the wondrous new molecular techniques. Their promise is being fulfilled in applied as well as theoretical advances. But, while attention is diverted elsewhere, some advances in the classical areas have been neglected."

Monday, June 15, 2009

Recipe for Successful Graduate Students

Dr. John Holland was interviewed recently for SIGEVOlution magazine. He was asked what his recipe was for successful graduate students since he has had so many. Here is what he said. I use the same recipe.

1) Have the student find a broad question that really interests them.

I give my students great freedom to find a general topic they are really interested in. This helps provide the motivation that is sometimes lacking in graduate school.

2) Have the students learn a lot about a lot of different disciplines.

I require my students to take at least one additional year of courses to become fluent in another discipline.

3) Have the student find a mentor that will stand up for them no matter how crazy the idea.

Graduate school is the last time in your career that you will have true freedom and time to explore novel and crazy ideas. I encourage students to go out on a limb and explore the fringes of sciences. This is where all the really good ideas come from. I not a believer in incremental me-too science.

Thursday, June 11, 2009

MDR 2.0 beta 4 released

We released this week a new version of our open-source multifactor dimensionality reduction (MDR) software package. It can be downloaded from sourceforge.net.

New features include our novel Spatially Uniform ReliefF (SURF) algorithm that improves the power to filter SNPs involved in interactions from a large list. The paper reporting these results is under review. SURF can also be combined with our previously developed Tuned ReliefF (TuRF) algorithm to give SURF and TuRF. There are also some minor bug fixes in this new version.

Tuesday, June 09, 2009

Epistasis Q & A

I found this recently in the Journal of Biology. A nice but narrow view of epistasis.

Roth FP, Lipshitz HD, Andrews BJ. Q&A: epistasis. J Biol. 2009;8(4):35 [PubMed]

Friday, June 05, 2009

Modifications to the Patient Rule-Induction Method that utilize non-additive combinations of genetic and environmental effects

This is a nice approach that takes advantage of the combinatorial partitionaing method (CPM). I posted the original PRIM paper here earlier in 2007.

Dyson G, Frikke-Schmidt R, Nordestgaard BG, Tybjaerg-Hansen A, Sing CF. Modifications to the Patient Rule-Induction Method that utilize non-additive combinations of genetic and environmental effects to define partitions that predict ischemic heart disease. Genet Epidemiol. 2009 May;33(4):317-24. [PubMed]


This article extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (Genet Epidemiol 31:515-527) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2,258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors.

Wednesday, June 03, 2009

A call for epistasis analysis in asthma

Nice job Scott. Send me your asthma datasets. Happy to help!

Weiss ST, Raby BA, Rogers A. Asthma genetics and genomics 2009. Curr Opin Genet Dev. 2009 May 28. [PubMed]


Asthma Genetic Association studies have been plagued by methodologic problems that are common in all studies of complex traits: small sample size, lack of replication, and lack of control of population stratification. Despite this, the field has identified 43 replicated genes from association studies. The most frequently replicated are: TNF alpha, IL4, FCERB, Adam 33, and GSTP1. Several genes have been identified by linkage and fine mapping (ADAM33, DPP10, GPR154, and PHF11) and one gene has been identified by GWAS (ORMD3). The major issue is that these genes have been looked at one at a time rather than in some more holistic manner where epistasis is considered. For asthma genetics to begin to have an impact on clinical medicine we need to consider epistatic interaction.

Tuesday, June 02, 2009

Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture

Our paper on "Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture" by Greene et al. was published today in PLoS One. This paper demonstrates that replication is not everything and thus challenges the one-SNP-at-a-time analysis strategy used in genome-wide association studies (GWAS). We were specifically told by someone from the Wellcome-Trust not to publish this study because it might confuse people. I would be happy to field your questions if you find yourself confused after reading this paper.

Greene CS, Penrod NM, Williams SM, Moore JH. Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture. PLoS One 4(6), e5639 (2009). [PDF] [PubMed]


Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions.

Here is Figure 4 from the paper: