Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Wednesday, January 26, 2011

Yeast genetics is complex. What about humans?

This is a nice new paper documenting the genetic complexity of yeast. This adds to the growing body of literature highlighting the importance of gene-gene and gene-environment interactions in model organisms. I continue to raise the question as to why the field of human genetics continues to downplay such effects in humans. Do we really expect human have simpler genetic archtectures that yeast and other lower organisms?

Cubillos FA, Billi E, Zörgö E, Parts L, Fargier P, Omholt S, Blomberg A, Warringer J, Louis EJ, Liti G. Assessing the complex architecture of polygenic traits in diverged yeast populations. Mol Ecol. 2011 Jan 25. [Epub ahead of print] PubMed PMID: 21261765. [PubMed]


Phenotypic variation arising from populations adapting to different niches has a complex underlying genetic architecture. A major challenge in modern biology is to identify the causative variants driving phenotypic variation. Recently, the baker's yeast, Saccharomyces cerevisiae has emerged as a powerful model for dissecting complex traits. However, past studies using a laboratory strain were unable to reveal the complete architecture of polygenic traits. Here, we present a linkage study using 576 recombinant strains obtained from crosses of isolates representative of the major lineages. The meiotic recombinational landscape appears largely conserved between populations; however, strain-specific hotspots were also detected. Quantitative measurements of growth in 23 distinct ecologically relevant environments show that our recombinant population recapitulates most of the standing phenotypic variation described in the species. Linkage analysis detected an average of 6.3 distinct QTLs for each condition tested in all crosses, explaining on average 39% of the phenotypic variation. The QTLs detected are not constrained to a small number of loci, and the majority are specific to a single cross-combination and to a specific environment. Moreover, crosses between strains of similar phenotypes generate greater variation in the offspring, suggesting the presence of many antagonistic alleles and epistatic interactions. We found that subtelomeric regions play a key role in defining individual quantitative variation, emphasizing the importance of the adaptive nature of these regions in natural populations. This set of recombinant strains is a powerful tool for investigating the complex architecture of polygenic traits.

Thursday, January 20, 2011

The Meaning of Interaction

The following paper is a useful discussion of interaction from a model-based parametric statistical point of view. The discussion of biological vs. statistical epistasis is poorly cited, however.

Wang X, Elston RC, Zhu X. The Meaning of Interaction. Hum Hered. 2010 Dec 8;70(4):269-277. [PubMed]


Although recent studies have attempted to dispel the confusion that exists in regard to the definition, analysis and interpretation of interaction in genetics, there still remain aspects that are poorly understood by non-statisticians. After a brief discussion of the definition of gene-gene interaction, the main part of this study addresses the fundamental meaning of statistical interaction and its relationship to measurement scale, disproportionate sample sizes in the cells of a two-way table and gametic phase disequilibrium.

Wednesday, January 19, 2011

Model-based multifactor dimensionality reduction for detecting epistasis

A new MDR paper.

Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K. Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet.2011 Jan;75(1):78-89. [PubMed]


Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.

Monday, January 17, 2011

Application of the Explicit Test of Epistasis to Colon Cancer

The paper below by Leroy et al. is a nice example of how the explicit test of epistasis by Greene et al. can be used with MDR to identify and confirm interactions that are independent of marginal effects.

Greene CS, Himmelstein DS, Nelson HH, Kelsey KT, Williams SM, Andrew AS, Karagas MR, Moore JH. Enabling personal genomics with an explicit test of epistasis. Pac Symp Biocomput. 2010:327-36. [PubMed]

Leroy EC, Moore JH, Hu C, Martínez ME, Lance P, Duggan D, Thompson PA. Genes in the insulin and insulin-like growth factor pathway and odds of metachronous colorectal neoplasia. Hum Genet. 2011 Jan 11. [Epub ahead of print] PubMed PMID: 21221997. [PubMed]


Insulin and insulin-like growth factor (IGF) genes are implicated in colorectal carcinogenesis. Gene-by-gene interactions that influence the insulin/IGF pathways were hypothesized as modifiers of colorectal neoplasia risk. We built a classification tree to detect interactions in 18 IGF and insulin pathway-related genes and metachronous colorectal neoplasia among 1,439 subjects pooled from two chemoprevention trials. The probability of colorectal neoplasia was greatest (71.8%) among carriers of any A allele for rs7166348 (IGF1R) and AA genotype for rs1823023 (PIK3R1). In contrast, carriers of any A at rs7166348 (IGF1R), any G for the PIK3R1 variant, and AA for rs10426094 (INSR) had the lowest probability (14.3%). Logistic regression modeling showed that any A at rs7166348 (IGF1R) with the AA genotype at rs1823023 (PIK3R1) conferred the highest odds of colorectal neoplasia (OR 3.7; 95% CI 2.2-6.5), compared with carriage of GG at rs7166348 (IGF1R). Conversely, any A at rs7166348 (IGFR1), any G allele at rs1823023 (PIK3R1), and the AA genotype at rs10426094 (INSR) conferred the lowest odds (OR 0.22; 95% CI 0.07-0.66). Stratifying the analysis by parent study and intervention arm showed highly consistent trends in direction and magnitude of associations, with preliminary evidence of genotype effects on measured IGF-1 levels in a subgroup of subjects. These results were compared to those from multifactor dimensionality reduction, which identified different single nucleotide polymorphisms in the same genes (INSR and IGF1R) as effect modifiers for colorectal neoplasia. These results support a role for genetic interactions in the insulin/IGF pathway genes in colorectal neoplasia risk.

Saturday, January 15, 2011

Real-world comparison of CPU and GPU implementations of SNPrank

A nice paper on network analysis of GWAS data using high-performance computing.

Davis NA, Pandey A, McKinney BA. Real-world comparison of CPU and GPU implementations of SNPrank: a network analysis tool for GWAS. Bioinformatics. 2011 Jan 15;27(2):284-5. [PubMed]


MOTIVATION: Bioinformatics researchers have a variety of programming languages and architectures at their disposal, and recent advances in graphics processing unit (GPU) computing have added a promising new option. However, many performance comparisons inflate the actual advantages of GPU technology. In this study, we carry out a realistic performance evaluation of SNPrank, a network centrality algorithm that ranks single nucleotide polymorhisms (SNPs) based on their importance in the context of a phenotype-specific interaction network. Our goal is to identify the best computational engine for the SNPrank web application and to provide a variety of well-tested implementations of SNPrank for Bioinformaticists to integrate into their research.

RESULTS: Using SNP data from the Wellcome Trust Case Control Consortium genome-wide association study of Bipolar Disorder, we compare multiple SNPrank implementations, including Python, Matlab and Java as well as CPU versus GPU implementations. When compared with naïve, single-threaded CPU implementations, the GPU yields a large improvement in the execution time. However, with comparable effort, multi-threaded CPU implementations negate the apparent advantage of GPU implementations.

AVAILABILITY: The SNPrank code is open source and available at http://insilico.utulsa.edu/snprank.

CONTACT: brett.mckinney@gmail.com.

Friday, January 07, 2011

NIH/NIGMS Funding by Priority Score

The following is a figure put together by the National Institute of General Medical Sciences showing the number of grants reviewed and and the number funded by their priority score. Note that a score of 30 or better was needed to have a good chance of getting funded. I assume this looks similar at other institutes.

Thursday, January 06, 2011

Layers of Epistasis

Our new paper on "'Layers of epistasis: genome-wide regulatory networks and network approaches to genome-wide association studies' has been published online.

Cowper-Sal Lari R, Cole MD, Karagas MR, Lupien M, Moore JH. Layers of epistasis: genome-wide regulatory networks and network approaches to genome-wide association studies. Wiley Interdiscip Rev Syst Biol Med. 2010 Dec 31. [Epub ahead of print] PubMed PMID: 21197657. [PubMed]


The conceptual foundation of the genome-wide association study (GWAS) has advanced unchecked since its conception. A revision might seem premature as the potential of GWAS has not been fully realized. Multiple technical and practical limitations need to be overcome before GWAS can be fairly criticized. But with the completion of hundreds of studies and a deeper understanding of the genetic architecture of disease, warnings are being raised. The results compiled to date indicate that risk-associated variants lie predominantly in noncoding regions of the genome. Additionally, alternative methodologies are uncovering large and heterogeneous sets of rare variants underlying disease. The fear is that, even in its fulfillment, the current GWAS paradigm might be incapable of dissecting all kinds of phenotypes. In the following text, we review several initiatives that aim to overcome these limitations. The overarching theme of these studies is the inclusion of biological knowledge to both the analysis and interpretation of genotyping data. GWAS is uninformed of biology by design and although there is some virtue in its simplicity, it is also its most conspicuous deficiency. We propose a framework in which to integrate these novel approaches, both empirical and theoretical, in the form of a genome-wide regulatory network (GWRN). By processing experimental data into networks, emerging data types based on chromatin immunoprecipitation are made computationally tractable. This will give GWAS re-analysis efforts the most current and relevant substrates, and root them firmly on our knowledge of human disease.