Maximal conditional chi-square importance in random forests
Interesting new paper. Nice to see the conditioning on other SNPs.
Wang M, Chen X, Zhang H. Maximal conditional chi-square importance in random forests. Bioinformatics. 2010. [PubMed]
MOTIVATION: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings. RESULTS: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of asso-ciation between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical p-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and per-mutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a genome-wide association study of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genomewide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Com-pared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utiliz-ing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to iden-tify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases. CONTACT: email@example.com.