Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Tuesday, September 21, 2010

Machine Learning Prediction of Cancer Susceptibility

Our grant on "Machine Learning Prediction of Cancer Susceptibility" was renewed by the National Library of Medicine at the NIH for another five years of funding (R01 LM009012). This grant supports our work on the development of powerful machine learning and data mining algorithms for the detection and characterization of gene-gene interactions. Here is the project summary:

Susceptibility to sporadic forms of cancer is determined by numerous genetic factors that interact in a nonlinear manner in the context of an individual’s age and environmental exposure. This complex genetic architecture has important implications for the use of genome-wide association studies for identifying susceptibility genes. The assumption of a simple architecture supports a strategy of testing each single-nucleotide polymorphism (SNP) individually using traditional univariate statistics followed by a correction for multiple tests. However, a complex genetic architecture that is characteristic of most types of cancer requires analytical methods that specifically model combinations of SNPs and environmental exposures. While new and novel methods are available for modeling interactions, exhaustive testing of all combinations of SNPs is not feasible on a genome-wide scale because the number of comparisons is effectively infinite. Thus, it is critical that we develop intelligent strategies for selecting subsets of SNPs prior to combinatorial modeling. The objective of this renewal application is to continue the development of a research strategy for the detection, characterization, and interpretation of gene-gene and gene-environment interactions in genome-wide association studies of bladder cancer susceptibility. To accomplish this objective, we will continue developing and evaluating modifications and extensions to the ReliefF family of algorithms for selecting or filtering subsets of single-nucleotide polymorphisms (SNPs) for multifactor dimensionality reduction (MDR) analysis of gene-gene and gene-environment interactions (AIM 1). We will continue developing and evaluating a stochastic wrapper or search strategy for MDR analysis of interactions that utilizes ReliefF values as a heuristic (AIM 2). We will continue to make available ReliefF algorithms as part of our open-source MDR software package (AIM 3). Finally, we will apply the best ReliefF-MDR analysis strategies to the detection, characterization, and interpretation of gene-gene and gene-environment interactions in large genome-wide association studies of bladder cancer susceptibility (AIM 4). We anticipate the proposed machine learning methods will provide powerful new approaches for identifying genetic variations that are predictive of cancer susceptibility.

Monday, September 20, 2010

Towards a complete resolution of the genetic architecture of disease

How is it possible to discuss the 'complete resolution of genetic architecture' while completely ignoring gene-gene and gene-environment interaction? I am not at all convinced, as these authors are, that a majority of the missing heritability can be explained by rare variants. I also completely disagree with the last sentence of their abstract: "Whereas major challenges undoubtedly remain, particularly regarding data handling and the functional classification of variants, we suggest that these will be largely practical and not conceptual". How is it possible that the major challenges are practical rather that conceptual when we do not yet fully understand the complexity of the human genome?

Singleton AB, Hardy J, Traynor BJ, Houlden H. Towards a complete resolution of the genetic architecture of disease. Trends Genet. 2010 Aug 31. [PubMed]

Abstract

After years of linear gains in the genetic dissection of human disease we are now in a period of exponential discovery. This is particularly apparent for complex disease. Genome-wide association studies (GWAS) have provided myriad associations between common variability and disease, and have shown that common genetic variability is unlikely to explain the entire genetic predisposition to disease. Here we detail how one can expand on this success and systematically identify genetic risks that lead or predispose to disease using next-generation sequencing. Geneticists have had for many years a protocol to identify Mendelian disease. A similar set of tools is now available for the identification of rare moderate-risk loci and common low-risk variants. Whereas major challenges undoubtedly remain, particularly regarding data handling and the functional classification of variants, we suggest that these will be largely practical and not conceptual.

Monday, September 13, 2010

Human Microbiome Visualization Using 3D Technology

Our paper on visualization of human microbiome data has been accepted for publication as part of the 2011 Pacific Symposium on Biocomputing (PSB). This paper describes our 3D Heatmap application that harnesses the power of 3D video game engines. The 3dheatmap software is freely available from Sourceforge.net. Be sure and buy a 3D mouse!

Moore JH, Cowper Sal.Lari R, Hibberd P, Hill D, Madan JC. Human microbiome visualization using 3D technology. Pacific Symposium on Biocomputing, in press (2011).

Abstract

High-throughput sequencing technology has opened the door to the study of the human microbiome and its relationship with health and disease. This is both an opportunity and a significant biocomputing challenge. We present here a 3D visualization methodology and freely-available software package for facilitating the exploration and analysis of high-dimensional human microbiome data. Our visualization approach harnesses the power of commercial video game development engines to provide an interactive medium in the form of a 3D heat map for exploration of microbial species and their relative abundance in different patients. The advantage of this approach is that the third dimension provides additional layers of information that cannot be visualized using a traditional 2D heat map. We demonstrate the usefulness of this visualization approach using microbiome data collected from a sample of premature babies with and without sepsis.