Exploiting Expert Knowledge For Epistasis Analysis Using Evolutionary Computing - A Brief Summary of Work in this Area from the Computational Genetics Laboratory

We have put a lot of effort over the last several years into adapting evolutionary computing strategies to the detection and modeling of epistasis on a genome-wide scale. Our focus has been on the use of expert knowledge to help guide these stochastic search algorithms. The key to these studies is to know what your building blocks are and then exploit the hell out of them algorithmically. A good building block could be a SNP that is in a biologically important gene or for which there is good statistical evidence of association, for example. We have used expert knowledge to guide population initialization, mutation, recombination, selection, function definition, model tuning and in a multiobjective fitness function, for example. We have also developed a system that learn which source of expert knoweldge to use from multiple different options. For background reading on exploiting building blocks see David Goldberg's book on The Design of Innovation. I chronologically summarize this work and its importance here for those of you interested in this area. Please let me know if you have any questions or would like one or more reprints.

1) White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H. A statistical comparison of grammatical evolution strategies in the domain of human genetics. Proceedings of the IEEE Congress on Evolutionary Computing, 676-682 (2005).

This paper was important because it provided the baseline experiments showing that various grammatical evolution strategies work no better than random search for attribute selection when the underlying model is epistatic. This motivated our work summarized below showing how expert knowledge can be used to significantly improve evolutionary computing algorithms in this domain. This was peer-reviewed, published and presented as part of the 2005 IEEE Congress on Evolutionary Computing.

2) Moore, J.H., White, B.C. Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Riolo, R., Soule, T., Worzel, B. (Eds.) Genetic Programming Theory and Practice IV, Springer, pp. 11-28 (2007).

This was our first paper showing that expert knowledge is critical for success in this domain. Here, we developed a multiobjective fitness function that combines the accuracy of a classifier and a source of statistical expert knowledge from ReliefF scores. We show that accuracy alone is not enough to beat random search. This was peer reviewed, published and presented as part of the 2006 Genetic Programming Theory and Practice (GPTP) workshop at the University of Michigan.

3) Moore, J.H., White, B.C. Exploiting expert knowledge in genetic programming for genome-wide genetic analysis. In: Runarsson et al. (eds), Lecture Notes in Computer Science 4193, 969-977 (2006).

This was our second paper showing that expert knowledge is critical for success in this domain. Here, we developed an expert knowledge-guided recombination operator that ensures models with good building blocks (e.g. SNPs) are recombined and reproduced. We showed that using expert knowledge to select and recombine models performs as well as the multiobjective fitness function described above (Moore and White 2007) but requires only a tenth of the population size. This was peer-reviewed, published and presented as part of the 2006 Parallel Programming from Nature (PPSN) conference.

4) Moore, J.H. Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Zhu, X., Davidson, I. (Eds.), Knowledge Discovery and Data Mining: Challenges and Realities, IGI Global, pp 17-30 (2007).

This invited book chapter provides a nice review of our expert knowledge-driven evolutionary computing methods from the first three papers from above as applied to our multifactor dimensionality reduction (MDR) method.

5) Moore, J.H., Barney, N., Tsai, C.T., Chiang, F.T., Gui, J., White, B.C. Symbolic modeling of epistasis. Human Heredity 63, 120-133 (2007).

In this paper we demonstrate how knowledge gained from multiple evolutionary computing runs can be used to build a probability distribution function of model pieces that can in turn be used for fine-scale model generation. This is related to cascading and estimation of distribution algorithms. This was an important foundational paper for our computational evolution system that is described later below and is the inspiration for the archive that is built into that system and used as an important source of knowledge.

6) Greene, C.S., White, B.C., Moore, J.H. An expert knowledge-guided mutation operator for genome-wide genetic analysis using genetic programming. Lecture Notes in Bioinformatics 4774, 30-40 (2007).

Here we show how expert knowledge can be used to guide mutation in evolutionary computing. We show that using expert knowledge guided mutation performs as well as expert knowledge guided selection or multiobjective fitness functions.

7) Moore, J.H., Barney, N., White, B.C. Solving complex problems in human genetics using genetic programming: The importance of theorist-practitioner-computer interaction. In: Riolo, R., Soule, T., Worzel, B. (Eds.) Genetic Programming Theory and Practice V, pp 69-85, Springer (2008).

This paper provides an overview of our Symbolic Modeling (SyMod) software that uses genetic programming do build classification and regression models. Provided in SyMod is the ability to load and use expert knowledge for sensible initialization (presented below), multiobjective optimization and selection and recombination. This was peer reviewed, published and presented as part of the 2007 Genetic Programming Theory and Practice (GPTP) workshop at the University of Michigan.

8) Moore, J.H. The critical need for computational intelligence in human genetics. Computing Reviews, Hot Topics Essay 7, March 3 (2008).

This is a short essay that highlights the need for computational intelligence approaches such as evolutionary computing that can utilize expert knowledge.

9) Moore, J.H., Andrews, P.C., Barney, N., White, B.C. Development and evaluation of an open-ended computational evolution system for the genetic analysis of susceptibility to common human diseases. Lecture Notes in Computer Science 4973, 129-140 (2008).

The goal of this study was to develop a computational evolution system (CES) that is capable of discovering its own operators (e.g. recombination, mutation, etc.) of any size and complexity. We showed that this system is capable of disovering complex operators that can exploit expert knowledge for solving epistasis problems. This system was expanded in the next paper below.

10) Moore, J.H., Greene, C.S., Andrews, P., White, B.C. Does complexity matter? Artificial evolution, computational evolution and the genetic analysis of common human diseases. In: Riolo, R., Soule, T., Worzel, B. (Eds.) Genetic Programming Theory and Practice VI, pp. 125-143, Springer (2008).

The goal of this study was to expand our computational evolution system (CES) introduced above that is capable of discovering its own operators of any size of complexity. We again showed that this system is capable of disovering complex operators that can exploit expert knowledge for solving epistasis problems. We also introduced the concept of an archive that provides a memory for what the system has already learned. The archive provides an additional source of expert knowledge.

11) Urbanowicz, R., White, B.C., Barney, N., Moore, J.H. Mask functions for the symbolic modeling of epistasis using genetic programming. Proceedings of the Genetic and Evolutionary Computing Conference. ACM Press, pp. 339-346 (2008).

This study showed how the results of multifactor dimensionality reduction (MDR) modeling could be used for improving the ability of genetic programming to discover epistasis models. This is accomplished by providing new functions in the function set based on prior modeling results. This is an important type of statistical expert knowledge. This was peer reviewed, published and presented as part of the 2008 Genetic and Evolutionary Computing Conference (GECCO).

12) Greene, C.S., White, B.C., Moore, J.H. Ant colony optimization for genome-wide genetic analysis. Lecture Notes in Computer Science 5217, 37-47 (2008).

Ant colony optimization (ACO) is an attractive type of evolutionary computing because it inherently allows for expert knowledge (i.e. the heuristic) in the pheromone updating rule. We show that ACO works quite well in this domain and is able to exploit expert knowledge effectively. This was peer reviewed, published and presented as part of the 2008 International Conference on Ant Colony Optimization and Swarm Intelligence (ANTS).

13) Greene, C.S., Moore, J.H. Solving complex problems in human genetics using GP: Challenges and opportunities. SIGEVOlution 3, 1-7 (2008).

This is a review paper for the evolutionary computing community highlighting the important role of expert knowledge in this domain.

14) Greene, C.S., Moore, J.H. Solving complex problems in human genetics using nature-inspired algorithms requires strategies which exploit domain-specific knowledge. Nature Inspired Informatics, IGI Global, in press (2009).

This is an invited book chapter highlighting the important role of expert knowledge in this domain.

15) Moore, J.H. Mining patterns of epistasis in human genetics. Biological Data Mining, in press (2009).

This is an invited book chapter highlighting the important role of expert knowledge in this domain.

16) Greene, C.S., Gilmore, J., Kiralis, J., Andrews, P.C., Moore, J.H. Optimal use of expert knowledge in ant colony optimization for the analysis of epistasis in human disease. Lecture Notes in Computer Science, in press (2009).

As described above, ant colony optimization (ACO) is an attractive type of evolutionary computing because it inherently allows for expert knowledge (i.e. the heuristic) in the pheromone updating rule. We previously showed that ACO works quite well in this domain and is able to exploit expert knowledge effectively. The goal of this study was to optimize the use of expert knowledge. This was peer reviewed, published and presented as part of the 2009 Evolutionary Computing, Machine Learning and Data Mining in Bioinformatics (EvoBIO) conference.

17) Greene, C.S., Kiralis, J., Moore, J.H. Nature-inspired algorithms for the genetic analysis of epistasis in common human diseases: A theoretical assessment of wrapper vs. filter approaches. Proceedings of the IEEE Congress on Evolutionary Computation, in press (2009).

This is a theoretical assessment of wrapper and filter approaches. Under certain assumption filtering based on expert knowledge might be more powerful. If those assumptions are violated a wrapper approach such genetic programming might be better. This was peer reviewed, published and presented as part of the 2009 IEEE Congress on Evolutionary Computing.

18) Greene, C.S., White, B.C., Moore, J.H. Sensible initialization using expert knowledge for genome-wide analysis of epistasis using genetic programming. Proceedings of the IEEE Congress on Evolutionary Computation, in press (2009).

This paper shows how expert knowledge can be used to initialize a genetic programming population prior to the run. We show that this is an effective way to introduce expert knowledge into an evolutionary computing system. This was peer reviewed, published and presented as part of the 2009 IEEE Congress on Evolutionary Computing.

19) Greene, C.S., Hill, D., Moore, J.H., Environmental sensing of expert knoweldge in a computational evolution system for complex problem solving in human genetics. In: Riolo, R., Soule, T., Worzel, B. (Eds.) Genetic Programming Theory and Practice VII, in press, Springer (2009).

The goal of this study was to expand our computational evolution system (CES) introduced above that is capable of discovering its own operators of any size of complexity. We show here that the CES is also capable of learning to exploit one of several sources of expert knowledge to solve the epistasis problem. This is not only important for the discovery of highly fit genetic models but it also important because the particular source of expert knowledge used by evolved operators may tell us something important about the problem itself. This was peer reviewed, published and presented as part of the 2009 Genetic Programming Theory and Practice (GPTP) workshop at the University of Michigan.

Epistasis Blog

Tuesday, April 21, 2009

Exploiting Expert Knowledge in Evolutionary Computing

1 Comments:

About Me

Twitter Updates

Previous Posts