Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Saturday, April 11, 2009

Identification of gene-gene interactions in the presence of missing data using the multifactor dimensionality reduction method.

A new paper in Genetic Epidemiology on data imputation for MDR analysis. It is not too surprising that a mutivariate imputation is more powerful than other simpler approaches.

Namkung J, Elston RC, Yang JM, Park T. Identification of gene-gene interactions in the presence of missing data using the multifactor dimensionality reduction method. Genet Epidemiol. 2009 Feb 24. [Epub ahead of print] [PubMed]

Abstract

Gene-gene interaction is believed to play an important role in understanding complex traits. Multifactor dimensionality reduction (MDR) was proposed by Ritchie et al. [2001. Am J Hum Genet 69:138-147] to identify multiple loci that simultaneously affect disease susceptibility. Although the MDR method has been widely used to detect gene-gene interactions, few studies have been reported on MDR analysis when there are missing data. Currently, there are four approaches available in MDR analysis to handle missing data. The first approach uses only complete observations that have no missing data, which can cause a severe loss of data. The second approach is to treat missing values as an additional genotype category, but interpretation of the results may then be not clear and the conclusions may be misleading. Furthermore, it performs poorly when the missing rates are unbalanced between the case and control groups. The third approach is a simple imputation method that imputes missing genotypes as the most frequent genotype, which may also produce biased results. The fourth approach, Available, uses all data available for the given loci to increase power. In any real data analysis, it is not clear which MDR approach one should use when there are missing data. In this article, we consider a new EM Impute approach to handle missing data more appropriately. Through simulation studies, we compared the performance of the proposed EM Impute approach with the current approaches. Our results showed that Available and EM Impute approaches perform better than the three other current approaches in terms of power and precision.

1 Comments:

At 7:56 AM, Blogger Stephen Turner said...

Saw this a few weeks ago. If I remember from reading the article, their results show that their fancy EM method doesn't perform any better than simply using the available data, as presented in Bush et al 2006 in Bioinformatics.

 

Post a Comment

<< Home