Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Monday, June 12, 2006

Genome-Wide Genetic Analysis with MDR

Our invited review paper on the use of multifactor dimensionality reduction (MDR) to detect epistasis on a genome-wide scale has been accepted for publication in a new book titled "Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data" (Zhu and Davidson, editors) to be published by IGI. The original call for papers can be found here. The paper reviews the most current developments with using filter and wrapper approaches for applying MDR to datasets with thousands of SNPs. It also reviews our application of information theory and graph-based methods for interpreting MDR results.

Moore, J.H. Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Zhu, Davidson (eds.)Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, IGI, in press.

Abstract

Human genetics is an evolving discipline that is being driven by rapid advances in technologies that make it possible to measure enormous quantities of genetic information. An important goal of human genetics is to understand the mapping relationship between interindividual variation in DNA sequences (i.e. the genome) and variability in disease susceptibility (i.e. the phenotype). The focus of the present study is the detection and characterization of nonlinear interactions among DNA sequence variations in human populations using data mining and machine learning methods. We first review the concept difficulty and then review a multifactor dimensionality reduction (MDR) approach that was developed specifically for this domain. We then present some ideas about how to scale the MDR approach to datasets with thousands of attributes (i.e. genome-wide analysis). Finally, we end with some ideas about how nonlinear genetic models might be statistically interpreted to facilitate making biological inferences.

This work was supported by NIH R01s AI59694 and LM009012 (PI-Moore)