Epistasis Blog

From the Computational Genetics Laboratory at the University of Pennsylvania (www.epistasis.org)

Wednesday, January 06, 2010

New Papers for EvoBIO'10 and EvoCOMPLEX'10

We have four new papers that have been accepted for publication and presentation as part of the EvoBIO'10 and EvoCOMPLEX'10 conferences in Istanbul, Turkey. I hope to see you here!

Payne, J.L., Moore, J.H. Sexual Recombination in Self-Organizing Interaction Networks. Lecture Notes in Computer Science, in press (2010). EvoCOMPLEX'10


We build on recent advances in the design of self-organizing interaction networks by introducing a sexual variant of an existing asexual, mutation-limited algorithm. Both the asexual and sexual variants are tested on benchmark optimization problems with varying levels of problem difficulty, deception, and epistasis. Speci cally, we investigate algorithm performance on Massively Multimodal Deceptive Problems and NK Landscapes. In the former case, we nd that sexual recombination improves solution quality for all problem instances considered; in the latter case, sexual recombination only improves solution quality for problem instances with intermediate levels of epistasis. We conclude that sexual recombination in self-organizing interaction networks may improve solution quality in problem domains with deception or a moderate degree of epistatic interactions.

Greene, C.S., Himmelstein, D.S., Moore, J.H. A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships. Lecture Notes in Computer Science, in press (2010). EvoBIO'10


A goal of human genetics is to discover genetic factors that influence individuals’ susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variations and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate six-hundred pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variations have been minimized, while the predictiveness of third, fourth, or fifth order combinations is maximized. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This could improve our ability to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 56,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

Greene, C.S., Himmelstein, D.S., Kiralis, J., Moore, J.H. The Informative Extremes: Using Both Nearest and Farthest Individuals Can Improve Relief Algorithms in the Domain of Human Genetics. Lecture Notes in Computer Science, in press (2010). EvoBIO'10


A primary goal of human genetics is the discovery of genetic factors that influence individual susceptibility to common human diseases. This problem is difficult because common diseases are likely the result of joint failure of two or more interacting components instead of single component failures. Efficient algorithms that can detect interacting attributes are needed. The Relief family of machine learning algorithms, which use nearest neighbors to weight attributes, are a promising approach. Recently an improved Relief algorithm called Spatially Uniform
ReliefF (SURF) has been developed that significantly increases the ability of these algorithms to detect interacting attributes. Here we introduce an algorithm called SURF* which uses distant instances along with the usual nearby ones to weight attributes. The weighting depends
on whether the instances are are nearby or distant. We show this new algorithm significantly outperforms both ReliefF and SURF for genetic analysis in the presence of attribute interactions. We make SURF* freely available in the open source MDR software package. MDR is a crossplatform Java application which features a user friendly graphical interface.

Penrod, N.M., Greene, C.S., Granizo-MacKenzie, D., Moore, J.H., Artificial Immune Systems for Epistasis Analysis in Human Genetics. Lecture Notes in Computer Science, in press (2010). EvoBIO'10


Modern genotyping techniques have allowed the field of human genetics to generate vast amounts of data, but analysis methodologies have not been able to keep pace with this increase. In order to allow personal genomics to play a vital role in modern health care, analysis
methods capable of discovering high order interactions that contribute to an individual’s risk of disease must be developed. An artificial immune system (AIS) is a method which maps well to this problem and has a number of appealing properties. By considering many attributes simultaneously, it may be able to effectively and efficiently detect epistasis, that is non-additive gene-gene interactions. This situation of interacting genes is currently very difficult to detect without biological insight or statistical heuristics. Even with these approaches, at low heritability, these approaches have trouble distinguishing genetic signal from noise. The AIS also has a compact solution representation which can be rapidly evaluated. Finally the AIS approach, by iteratively developing an antibody which ignores irrelevant genotypes, may be better able to differentiate signal from noise than machine learning approaches like ReliefF which struggle at small heritabilities. Here we develop a basic AIS and evaluate it on very low heritability datasets. We find that the basic AIS is not robust to parameter settings but that, at some parameter settings, it performs very effectively. We use the settings where the strategy succeeds to suggest a path towards a robust AIS for human genetics. Developing an AIS which succeeds across many parameter settings will be critical to prepare this method for widespread use.


Post a Comment

<< Home