We have spent the last few months extending our Symbolic Disciminant Analysis (SDA) approach (see Moore et al. Genetic Epidemiology 23, 57-69, 2003 [
PubMed]) for detecting, characterizing and interpreting epistasis. I presented this new method this morning at the Bio-Inspired Computing in Computational Biology workshop held in conjunction with the
Parallel Problem Solving from Nature (PPSN) IX conference in Iceland. This work has been accepted for publication (pending revisions) in a special issue of
Human Heredity that will focus on gene-gene and gene-environment interactions. An open-source software package for Symbolc Modeling (SyMod) is in development and will be available later this fall. Here is the title and abstract for the paper:
Symbolic Modeling of Epistasis
Jason H. Moore, Nate Barney, Chia-Ti Tsai, Fu-Tien Chiang, Bill C. White
The workhorse of modern genetic analysis is the parametric linear model. The advantages of the linear modeling framework are many and include a mathematical understanding of the model fitting process and ease of interpretation. However, an important limitation is that linear models make assumptions about the nature of the data being modeled. This assumption may not be realistic for complex biological systems such as disease susceptibility where nonlinearities in the genotype to phenotype mapping relationship that result from epistasis, plastic reaction norms, locus heterogeneity, and phenocopy, for example, are the norm rather than the exception. We have previously developed a flexible modeling approach called symbolic discriminant analysis (SDA) that makes no assumptions about the patterns in the data. Rather, SDA lets the data dictate the size, shape, and complexity of a symbolic discriminant function that could include any set of mathematical functions from a list of candidates supplied by the user. Here, we outline a new five step process for symbolic model discovery that uses genetic programming (GP) for coarse-grained stochastic searching, experimental design for parameter optimization, graphical modeling for generating expert knowledge, and estimation of distribution algorithms for fine-grained stochastic searching. Finally, we introduce function mapping as a new method for interpreting symbolic discriminant functions. We show that function mapping when combined with measures of interaction information facilitates statistical interpretation by providing a graphical approach to decomposing complex models to highlight synergistic, redundant, and independent effects of polymorphisms and their composite functions. We illustrate this five step SDA modeling process with a real case-control dataset.