Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Wednesday, December 20, 2006

Software Update

We have been busy preparing two new open-source software packages for alpha testing. Both are being internally tested now and should be ready for external alpha testing in January. I will post to this blog when they are ready. You can also check www.epistasis.org, www.symbolicmodeler.org and www.exploratoryvisualanalysis.org for updates.

Symbolic Modeler (SyMod)

SyMod is an open-source software package for symbolic modeling that can be used for both discrete and continuous endpoints. The advantage of SyMod is that no assumptions are made about the functional form of the model. The user supplies a list of attributes and a list of mathematical functions (e.g. +, - ,*, /, <, >, =, AND, OR, NOT, LOG, Min, Max, etc...) that are used by the software as building blocks to contruct symbolic discriminant functions (for discrete endpoints) or symbolic regression functions (for continuous endpoints). SyMod uses a stochastic search algorithm called genetic programming to identify the optimal symbolic model(s). An advantage of SyMod is the ability to load and use expert knowledge to help guide the search. This feature will be included in the first alpha release and is being tested now. Keep an eye out for our upcoming paper on "Symbolic modeling of epistasis" by Moore et al. that will appear in an early 2007 issue of Human Heredity. This paper outlines an important five-step process for symbolic modeling.

Exploratory Visual Analysis (EVA)

EVA is a database and GUI for storing, managing, and visualizing statistical analysis results. The database stores p-values and other statistics in a database along with annotated information about Gene Ontology, biochemical pathway, chromosomal location, etc. from NCBI and Ensembl. The GUI allows the user to visually explore the statistical results in real time in the context of the biological knowledge about each gene. The goal of EVA is to facilitate the identification of biologically meaningful patterns of statistical results that aren't possible simply by browsing an Excel spreadsheet with 40,000 or more p-values. We have had a prototype for EVA for several years and will be releasing soon an open-source Java version. If you are interested you might read several recent papers on EVA including our paper at the Pacific Symposium on Biocomputing in 2005 and our 2006 paper in Oncology Reports.


Post a Comment

<< Home