Tuned ReliefF (TuRF)
Our paper on "Tuning ReliefF for Genome-Wide Genetic Analysis" has been accepted for publication in the Lecture Notes in Computer Science (LNCS) series from Springer. This paper will be presented at the Evolutionary Computing, Machine Learning, and Data Mining in Bioinformatics (EvoBIO'07) Conference in Valencia, Spain in April. Email me for a preprint.
Moore JH, White BC. Tuning ReliefF for Genome-Wide Genetic Analysis. Lecture Notes in Computer Science, in press (2007).
An important goal of human genetics is the identification of DNA sequence variations that are predictive of who is at risk for various common diseases. The focus of the present study is on the challenge of detecting and characterizing nonlinear attribute interactions or dependencies in the context of a genome-wide genetic study. The first question we address is whether the ReliefF algorithm is suitable for attribute selection in this domain. The second question we address is whether we can improve ReliefF for selecting important genetic attributes. Using simulated genetic datasets, we show that ReliefF is significantly better than a naive chi-square test of independence for selecting two interacting attributes out of 103 candidates. In addition, we show that ReliefF can be improved in this domain by systematically removing the worst attributes and re-estimating ReliefF weights. Our simulation studies demonstrate that this new Tuned ReliefF (TuRF) algorithm is significantly better than ReliefF. The ability to filter or select DNA sequence variations that are associated with disease class through complex nonlinear interactions will play an important role in the development of genetic models of disease risk.