Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Monday, July 28, 2008

Systems Genetics of Alcoholism

The journal Alcohol Research and Health from the NIH/NIAAA has published its special issue on systems biology and alcoholism. We contributed a paper on the systems genetics of alcoholism. The paper is freely available online in HTML or PDF.

Chantel D. Sloan; Vicki Sayarath, M.P.H., R.D.; and Jason H. Moore, Ph.D.. Systems genetics of alcoholism. Alcohol Research and Health 31, 14-25 (2008).


Alcoholism is a common disease resulting from the complex interaction of genetic, social, and environmental factors. Interest in the high heritability of alcoholism has resulted in many studies of how single genes, as well as an individual’s entire genetic content (i.e., genome) and the proteins expressed by the genome, influence alcoholism risk. The use of large-scale methods to identify and characterize genetic material (i.e., high-throughput technologies) for data gathering and analysis recently has made it possible to investigate the complexity of the genetic architecture of susceptibility to common diseases such as alcoholism on a systems level. Systems genetics is the study of all genetic variations, their interactions with each other (i.e., epistasis), their interactions with the environment (i.e., plastic reaction norms), their relationship with interindividual variation in traits that are influenced by many genes and contribute to disease susceptibility (i.e., intermediate quantitative traits or endophenotypes1) defined at different levels of hierarchical biochemical and physiological systems, and their relationship with health and disease. (1An endophenotype is a genetically determined trait [i.e., phenotype] that is not immediately visible but may contribute to the susceptibility to develop a particular behavior or syndrome. See the glossary, p. 84, for descriptions of other technical terms used in this article.) The goal of systems genetics is to provide an understanding of the complex relationship between the genome and disease by investigating intermediate biological processes. After investigating main effects, the first step in a systems genetics approach, as described here, is to search for gene–gene (i.e., epistatic) reactions.

Saturday, July 26, 2008

Ignoring Complexity in Genetics

Here is another of many published papers pushing a genome-wide association study (GWAS) approach to understanding the role of genetics in human health. Note the complete lack of discussion about the complexity of the genotype to phenotype mapping relationship. There is not one mention of epistasis, plastic reaction norms, locus heterogeneity, clinical heterogeneity, phenocopy, etc. Seems a bit odd, huh? What is the purpose of an article like this in a leading clinical journal? Haven't we yet learned our lesson about assuming simplicity in genetics and genomics?

Hunter DJ, Altshuler D, Rader DJ. From Darwin's finches to canaries in the coal mine--mining the genome for new biology. N Engl J Med. 2008 Jun 26;358(26):2760-3. [PubMed]

For a short counterpoint see "Problems with genome-wide association studies":

Shriner D, Vaughan LK, Padilla MA, Tiwari HK. Problems with genome-wide association studies. Science. 2007 Jun 29;316(5833):1840-2. [PubMed]

Williams SM, Canter JA, Crawford DC, Moore JH, Ritchie MD, Haines JL. Problems with genome-wide association studies. Science. 2007 Jun 29;316(5833):1840-2. [PubMed]

Thursday, July 24, 2008

Epistasis and Genomic Complexity

This looks like a very interesting new paper. What do you think?

Sanjuán R, Nebot MR. A network model for the correlation between epistasis and genomic complexity. PLoS ONE. 2008 Jul 16;3(7):e2663. [PLoS One]

The study of genetic interactions (epistasis) is central to the understanding of genome organization and evolution. A general correlation between epistasis and genomic complexity has been recently shown, such that in simpler genomes epistasis is antagonistic on average (mutational effects tend to cancel each other out), whereas a transition towards synergistic epistasis occurs in more complex genomes (mutational effects strengthen each other). Here, we use a simple network model to identify basic features explaining this correlation. We show that, in small networks with multifunctional nodes, lack of redundancy, and absence of alternative pathways, epistasis is antagonistic on average. In contrast, lack of multi-functionality, high connectivity, and redundancy favor synergistic epistasis. Moreover, we confirm the previous finding that epistasis is a covariate of mutational robustness: in less robust networks it tends to be antagonistic whereas in more robust networks it tends to be synergistic. We argue that network features associated with antagonistic epistasis are typically found in simple genomes, such as those of viruses and bacteria, whereas the features associated with synergistic epistasis are more extensively exploited by higher eukaryotes.

Wednesday, July 23, 2008

IGES Educational Session on Machine Learning

The Education Committee of the International Genetic Epidemiology Society (IGES) is offering an Educational Session on "Machine Learning Methods for Genetic Epidemiology" at their 17 annual meeting to be held in St. Louis in September. The educational session will be held on Sunday September 14th, 2008. I will be giving a general introduction to machine learning followed by several excellent overviews of specific methods including Information Theory (Dr. McKinney), Decision Trees and Random Forests (Dr. Sun) and our own MDR method (Dr. Moore). The goal is to introduce this area of computer science to genetic epidemiologists. Software demos will be given. More information can be found on the IGES Education and Training Wiki. I hope to see you there!

Tuesday, July 22, 2008

BioData Mining has Launched

Our new Biomedical Central journal 'BioData Mining' has launched. You can find the table of contents here. There is a review on neural networks and a paper in gene-environment interactions.

Monday, July 21, 2008

IGES'08 in St. Louis

The annual meeting of the International Genetic Epidemiology Society (IGES) will be held in St. Louis in September. I am serving on the Education Committee and have helped set up a new Wiki for the IGES website. The Wiki is moderated by the Education Committee. Please have a look at the Wiki and let me or any other members of the committee know if there are additions or changes you would like to make. The link is below.

IGES Education Wiki

Sunday, July 20, 2008

GECCO'08 in Atlanta

The annual Genetic and Evolutionary Computation Conference (GECCO'08) was held last week in Atlanta. This is one of my favorite conferences of the year. This is one of the few conferences where you can find a nice mix of computer scientists, engineers, economists, biologists etc. that are all interested in learning what they can from each other in order to solve complex problems such as time series prediction in financial markets and the detection and characterization of gene-gene interactions.

This year I organized and chaired the 2nd annual Workshop on Open-Source Software for Applied Genetic and Evolutionary Computation (SoftGEC'08). The workshop was very well attended (~30 people) and there was an excellent discussion about the advantages of various open-source licenses. Please see the 2008 SoftGEC web page for the list of speakers.

I also gave the GECCO Advanced Tutorial on Bioinformatics. This was also very well attended (~40 people) which is indicative of the growing interest in the intersection between biology and computer science. I will be giving this tutorial next year at GECCO'09 in Montreal.

Dr. Clare Bates Congdon and I chaired the Bioinformatics and Computational Biology track again this year. We had many excellent contributed papers and talks including an invited session with several bioinformatics faculty from the School of Biology at Georgia Tech. This was a huge hit. I have stepped down from chairing this track after several years of service.

I am already looking forward to GECCO'09 in Montreal. The paper submission deadline is January 14th. See the call for papers here. Hope to see you in July of 2009!

Thursday, July 17, 2008

Protein-Protein Interactions

Our paper on using information about protein-protein interactions to guide a genome-wide analysis of epistasis has been published in Human Genetics. Email me if you would like the pdf.

Pattin KA, Moore JH. Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases. Hum Genet. 2008 Aug;124(1):19-29. [PubMed]


One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available.

Wednesday, July 09, 2008

Genetic Association Database

The Genetic Association Database (GAD) contains known genetic associations for various human diseases. In addition to single polymorphism associations, they also maintain a list of known gene-gene and gene-environment interactions. See the "Gene Interaction" link in the list of options on the left side of the main page. The list is quite long!