Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Wednesday, September 30, 2009

Bioinformatics Strategies for Genome-Wide Association Studies - New NIH R01 Funded

My new NIH R01 (LM010098) on "Bioinformatics Strategies for Genome-Wide Association Studies" has been funded for four years by the National Library of Medicine. The abstract for this new grant is below.


Genome-wide association studies (GWAS) are commonplace despite the lack of a comprehensive bioinformatics approach to the analysis of the data. The common method of analysis is to employ parametric statistics and then adjust for the large number of tests performed to limit false-positives (i.e. type 1 errors). This agnostic approach is preferred by some because no assumptions are made about which genes or genomic regions might be important. This logic suggests that the data should tell us where the important genetic variants are. The goal of our proposed research program is to specifically compare this agnostic approach with a bioinformatics approach that selects associated SNPs based on expert knowledge about biochemical pathways and gene function. We propose to develop a bioinformatics approach for selecting SNPs from a GWAS using knowledge about the biology of the genes being studied and the molecular pathology of disease (AIM 1). We will modify and extend the Exploratory Visual Analysis (EVA) database and software that was originally designed for microarray studies with pilot funding from the NLM BISTI program. We will then use this bioinformatics approach along with an agnostic statistical approach for detecting SNPs associated with plasma levels of tissue plasminogen activator (t-PA) and plasminogen activator inhibitor one (PAI-1) in a large population-based sample of Caucasians (n=2000) from the PREVEND study in Groningen, The Netherlands (AIM 2). Those SNPs identified by both methods in the PREVEND study will be evaluated first for replication in an independent population-based sample of Caucasians (n=2000) from the Rotterdam Study in the Netherlands and then for validation in a population-based sample of Blacks (n=2000) from the HeART Study in Ghana, Africa (AIM 3). Finally, we will specifically compare how many and which SNPs replicate and validate using the statistical approach and the bioinformatics approach (AIM 4). Our working hypothesis is that we will obtain more validated and hence more real SNPs using the bioinformatics approach.

Tuesday, September 29, 2009

A View of the Parallel Computing Landscape

We rely very heavily on parallel computing to assist with our computational studies of epistasis. This article appeared in the Oct., 2009 issue of Communications of the ACM.

Krste Asanovic, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel, Katherine Yelick. A View of the Parallel Computing Landscape. Communications of the ACM Vol. 52 No. 10, Pages 56-67 [ACM]


Industry needs help from the research community to succeed in its recent dramatic shift to parallel computing. Failure could jeopardize both the IT industry and the portions of the economy that depend on rapidly improving information technology. Here, we review the issues and, as an example, describe an integrated approach we're developing at the Parallel Computing Laboratory, or Par Lab, to tackle the parallel challenge.

Friday, September 25, 2009

New NIH ARRA Grant Supplements

I have been awarded two NIH ARRA stimulus supplements to my National Library of Medicine R01 grant LM009012. The first will allow me to establish a Bioinformatics Visualization Laboratory at Dartmouth. The second will support six high school and undergraduate students to assist with our bioinformatics research during the summer.

Thursday, September 24, 2009

The Next Frontier: Advancing from Genetic Rsk to Functionality and Testing

The Education Committee of the International Genetic Epidemiology Society has organized an educational session on "The Next Frontier: Advancing from Genetic Rsk to Functionality and Testing" to be held in conjunction with the 59th Annual Meeting of the American Society of Human Genetics in Honolulu, Hawaii on Thursday, October 22nd from 5-7pm at the the Hilton Hawaiian Village. The confirmed speakers are as follows:

Mark McCarthy, M.D.
University of Oxford
"Genome-wide association studies: potential next steps on a genetic journey"

Stephen Chanock, M.D.
National Cancer Institute

Daniel Weeks, Ph.D. and Johanna Jakobdottir
University of Pittsburgh
“Interpretation of genetic association studies:
markers with replicated highly significant odds ratios may be poor classifiers”

Katrina Goddard, Ph.D.
The Center for Health Research
“Public awareness and use of direct-to-consumer genetic tests”

Tuesday, September 22, 2009

Postgraduate Training Program in Quantitative Biomedical Sciences

Our new Postgraduate Training Program in Quantitative Biomedical Sciences has been funded by a five-year R25 grant from the NCI/NIH. This new program is designed to cross-train students at the intersection between bioinformatics, biostatistics and epidemiology. More information and instructions for applying can be found here.

Friday, September 18, 2009

New NIH grant format

The NIH is moving to a new 12-page R01 grant format for grants submitted for the Feb. 2010 deadline or after. Nov. 5th is the last submission date for the old 25-page format. Details can be found here.

Wednesday, September 16, 2009

Dartmouth researchers get personal with genetics

The Dartmouth press release on our papers in the American Journal of Human Genetics and PLoS One is now on the Dartmouth website.

Also see:

Science Daily
e! Science News
Science Blog

Monday, September 14, 2009

A developmental systems perspective on epistasis

This is a very interesting paper on a developmental systems perspective on epistasis.

Gutiérrez J. A developmental systems perspective on epistasis: computational exploration of mutational interactions in model developmental regulatory networks. PLoS One. 2009 Sep 7;4(9):e6823. [PubMed]


The way in which the information contained in genotypes is translated into complex phenotypic traits (i.e. embryonic expression patterns) depends on its decoding by a multilayered hierarchy of biomolecular systems (regulatory networks). Each layer of this hierarchy displays its own regulatory schemes (i.e. operational rules such as +/- feedback) and associated control parameters, resulting in characteristic variational constraints. This process can be conceptualized as a mapping issue, and in the context of highly-dimensional genotype-phenotype mappings (GPMs) epistatic events have been shown to be ubiquitous, manifested in non-linear correspondences between changes in the genotype and their phenotypic effects. In this study I concentrate on epistatic phenomena pervading levels of biological organization above the genetic material, more specifically the realm of molecular networks. At this level, systems approaches to studying GPMs are specially suitable to shed light on the mechanistic basis of epistatic phenomena. To this aim, I constructed and analyzed ensembles of highly-modular (fully interconnected) networks with distinctive topologies, each displaying dynamic behaviors that were categorized as either arbitrary or functional according to early patterning processes in the Drosophila embryo. Spatio-temporal expression trajectories in virtual syncytial embryos were simulated via reaction-diffusion models. My in silico mutational experiments show that: 1) the average fitness decay tendency to successively accumulated mutations in ensembles of functional networks indicates the prevalence of positive epistasis, whereas in ensembles of arbitrary networks negative epistasis is the dominant tendency; and 2) the evaluation of epistatic coefficients of diverse interaction orders indicates that, both positive and negative epistasis are more prevalent in functional networks than in arbitrary ones. Overall, I conclude that the phenotypic and fitness effects of multiple perturbations are strongly conditioned by both the regulatory architecture (i.e. pattern of coupled feedback structures) and the dynamic nature of the spatio-temporal expression trajectories displayed by the simulated networks.

Friday, September 11, 2009

Epistasis and Its Implications for Personal Genetics

Our paper on "Epistasis and Its Implications for Personal Genetics" has been published in the American Journal of Human Genetics. This paper presents five recommendations for improving the impact of personal genetics.

Moore JH, Williams SM. Epistasis and Its Implications for Personal Genetics. Am J Hum Genet. 2009 Sep 11;85(3):309-320. [PubMed]


The widespread availability of high-throughput genotyping technology has opened the door to the era of personal genetics, which brings to consumers the promise of using genetic variations to predict individual susceptibility to common diseases. Despite easy access to commercial personal genetics services, our knowledge of the genetic architecture of common diseases is still very limited and has not yet fulfilled the promise of accurately predicting most people at risk. This is partly because of the complexity of the mapping relationship between genotype and phenotype that is a consequence of epistasis (gene-gene interaction) and other phenomena such as gene-environment interaction and locus heterogeneity. Unfortunately, these aspects of genetic architecture have not been addressed in most of the genetic association studies that provide the knowledge base for interpreting large-scale genetic association results. We provide here an introductory review of how epistasis can affect human health and disease and how it can be detected in population-based studies. We provide some thoughts on the implications of epistasis for personal genetics and some recommendations for improving personal genetics in light of this complexity.

Thursday, September 10, 2009

Enabling personal genomics with an explicit test of epistasis

Our paper on "Enabling personal genomics with an explicit test of epistasis" was accepted for publication and presentation as part of the 2010 Pacific Symposium on Biocomputing. This paper describes a simple, yet powerful, permutation test that specifically tests for nonlinear gene-gene interactions. The abstract is below. Hope to see you in Hawaii!

Greene, C.S., Himmelstein, D.S., Nelson, H.H., Kelsey, K.T., Williams, S.M., Andrew, A.S., Karagas, M.R., Moore, J.H. Enabling personal genomics with an explicit test of epistasis. Pacific Symposium on Biocomputing, pp. 327-336 (2010). [PubMed] [PDF]


One goal of personal genomics is to use information about genomic variation to predict who is at risk for various common diseases. Technological advances in genotyping have spawned several personal genetic testing services that market genotyping services directly to the consumer. An important goal of consumer genetic testing is to provide health information along with the genotyping results. This has the potential to integrate detailed personal genetic and genomic information into healthcare decision making. Despite the potential importance of these advances, there are some important limitations. One concern is that much of the literature that is used to formulate personal genetics reports is based on genetic association studies that consider each genetic variant independently of the others. It is our working hypothesis that the true value of personal genomics will only be realized when the complexity of the genotype-to-phenotype mapping relationship is embraced, rather than ignored. We focus here on complexity in genetic architecture due to epistasis or nonlinear gene-gene interaction. We have previously developed a multifactor dimensionality reduction (MDR) algorithm and software package for detecting nonlinear interactions in genetic association studies. In most prior MDR analyses, the permutation testing strategy used to assess statistical significance was unable to differentiate MDR models that captured only interaction effects from those that also detected independent main effects. Statistical interpretation of MDR models required post-hoc analysis using entropy-based measures of interaction information. We introduce here a novel permutation test that allows the effects of nonlinear interactions between multiple genetic variants to be specifically tested in a manner that is not confounded by linear additive effects. We show using data simulated across 35 different epistasis models with varying effect sizes (heritabilities = 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and sample sizes (n = 400, 800, 1600) that the power to detect interactions using the explicit test of epistasis is no different than a standard permutation test. We also show that the test has the appropriate size or type I error rate of approximately 0.05. We then apply MDR with the new explicit test of epistasis to a large genetic study of bladder cancer (n=914) and show that a previously reported nonlinear interaction between two XPD gene polymorphisms is indeed significant (P = 0.005), even after considering the strong additive effect of smoking in the model. Finally, we evaluated the power of the explicit test of epistasis to detect the nonlinear interaction between two XPD gene polymorphisms by simulating data from the MDR model of bladder cancer susceptibility. We show that the power to detect the interaction alone was 1.00 while the power to detect the independent effect of smoking alone was 0.06 which is close to the expected type I error rate of 0.05. Importantly, the power to detect the interaction with smoking in the model was 0.94. The results of this study provide for the first time a simple method for explicitly testing epistasis or gene-gene interaction effects in genetic association studies. An important advantage of the method is that it can be combined with any modeling approach. The explicit test of epistasis brings us a step closer to the type of routine gene-gene interaction analysis that is needed if we are to enable personal genomics.

Friday, September 04, 2009

Genetics of Gene Expression

Spielman certainly had an impact in this area. This will be a lot more fun when epistasis analysis is thrown into the mix.

Cheung VG, Spielman RS. Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat Rev Genet. 2009 Sep;10(9):595-604. [PubMed]


There is extensive natural variation in human gene expression. As quantitative phenotypes, expression levels of genes are heritable. Genetic linkage and association mapping have identified cis- and trans-acting DNA variants that influence expression levels of human genes. New insights into human gene regulation are emerging from genetic analyses of gene expression in cells at rest and following exposure to stimuli. The integration of these genetic mapping results with data from co-expression networks is leading to a better understanding of how expression levels of individual genes are regulated and how genes interact with each other. These findings are important for basic understanding of gene regulation and of diseases that result from disruption of normal gene regulation.

Thursday, September 03, 2009

Livescribe Smartpen - A Review

I received my Livescribe Smartpen this week and thought I would provide some preliminary impressions. For an introduction please watch the videos on http://www.livescribe.com/.

The Negatives

First, the pen didn't start right out of the box which was frustrating since Step 1 on the instructions was to turn on the pen, Steps 2-4 were for configuring the pen and Step 5 is an actual demo of the pen. It isn't until Step 6 that they tell you to plug the pen into the USB dock that then charges it.

Second, once I get the pen charged and tried I was getting System 3 errors during their canned demo. That was frustrating. it seems others have this problem according to their blog. The errors seemed to have mostly gone away after I got to Step 7 where you sync it with the PC and do a firmware update. They should have you charge and update the pen before they have you try it with the demo.

Third, I don't like writing with ballpoint pens. This could influence how much I actually use the pen. I suppose I will get used to it.

Fourth, You can delete stuff from the pen but I haven't yet figured out how to put stuff back on the pen.

The Positives

First, despite the snags described above, the pen was delivered with very nice and professional packaging. They have clearly done a great job with the marketing and presentation. I got the black leather case and the black journals that look and feel very nice.

Second, despite the negatives above I was very impressed with the technology. Once the minor kinks got worked out it worked as promised. I found the calculator app to be really cool and certainly highlights what is possible for future apps. My kids absolutely loved the piano app that allows you to draw keys on the paper and then play notes from a variety of different instruments including flute, piano, steel drums, fiddle, etc.

Third, Livescribe distributes an SDK that, according tho their website, includes an Eclipse-based integrated development environment (IDE) with custom plug-ins, a suite of APIs, sample code and documentation. I already have ideas of science-based apps that would be fun to develop.

Fourth, the PC software automatically uploads any new notes you have and converts them to a format that is searchable. It was able to reliably find words in my terrible ballpoint-limited handwriting. I thought this was very impressive and very useful.


I highly recommend the smart pen if you like to take notes. I think this is a truly useful technology that delivers what it promises for a fair price. I will updated this blog with further comments once I have had a chance to try it at a conference I am attending in two weeks.


I have no financial interest in Livescribe and am not being paid to write this review.

Tuesday, September 01, 2009

A quick guide to teaching R programming to computational biology students

Eglen SJ. A quick guide to teaching R programming to computational biology students. PLoS Comput Biol. 2009 Aug;5(8):e1000482. [PubMed]

These three labs are suggested for students.

1) Sequence alignment.

2) The discrete logistic equation.

3) Conway's game of life.