Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Sunday, December 23, 2007

An Open-Ended Computational Evolution System for the Genetic Analysis of Epistasis

Our paper on the development and evaluation of a prototype computational evolution system for epistasis analysis has been accepted for oral presentation at the EvoBIO'08 conference in Naples, Italy in March of 2008. The peer-reviewed paper will be published by Springer in Lecture Notes in Computer Science. Email me after Jan. 1st for a preprint. A complete list of accepted papers for the conference can be found here. This paper was inspired by the Banzhaf et al. review that distinguishes 'artificial evolution' and computational evolution' and fits with our theme of using expert knowledge to guide stochastic search algorithms for genetic analysis. I have included Figure 1 below the abstract.

Moore, J.H., Andrews, P.C., Barney, N., White, B.C. Development and Evaluation of an Open-Ended Computational Evolution System for the Genetic Analysis of Susceptibility to Common Human Diseases. Lecture Notes in Computer Science, in press (2008).

Abstract. An important goal of human genetics is to identify DNA sequence variations that are predictive of susceptibility to common human diseases. This is as a classification problem with data consisting of discrete attributes and a binary outcome. A variety of different machine learning methods based on artificial evolution have been developed and applied to modeling the relationship between genotype and phenotype. While artificial evolution approaches show promise, they are far from perfect and are only loosely based on real biological and evolutionary processes. It has recently been suggested that a new paradigm is needed where ‘artificial evolution’ is transformed to ‘computational evolution’ by incorporating more biological and evolutionary complexity into existing algorithms. It has been proposed that computational evolution systems will be more likely to solve problems of interest to biologists and biomedical researchers. The goal of the present study was to develop and evaluate a prototype computational evolution system for the analysis of human genetics data. We describe here this new open-ended computational evolution system and provide initial results from a simulation study that suggest more complex operators result in better solutions. This study represents a first step towards the use of computational evolution for bioinformatics problem-solving in the domain of human genetics.

Figure 1. Visual overview of our prototype computational evolution system for discovering symbolic discriminant functions that differentiate disease subject from healthy subjects using information about single nucleotide polymorphisms (SNPs). The hierarchical structure is shown on the left while some specific examples at each level are shown on the right. The top two levels of the hierarchy (A and B) exist to generate variability in the operators that modify the solutions. Shown in C is an example set of operators that will perform recombination on the two solutions shown in D. As illustrated in B, there is a 0.50 probability that a mutation to the recombination operator in C will add an operator thus making this particular operator more complex. This system allows operators of any arbitrary complexity to modify solutions. Note that we used a 24x24 grid of solutions in the present study. A 12x12 grid is shown as an illustrative example.

Saturday, December 22, 2007

Epistasis, Genetic Heterogeneity and Alzheimer Disease

Our paper on detecting epistasis in the presence of genetic heterogeneity is about to appear in Genetic Epidemiology. This paper is a nice example of how cluster analysis can be used to sort subjects into more genetically homogeneous groups prior to assicuation analysis. Dr. Tricia Thornton-Wells is the lead author and completed this work as part of her dissertation work with Dr. Jonathan Haines at Vanderbilt. She is now a postdoc at Vanderbilt.

Thornton-Wells TA, Moore JH, Martin ER, Pericak-Vance MA, Haines JL. Confronting complexity in late-onset Alzheimer disease: application of two-stage analysis approach addressing heterogeneity and epistasis. Genet Epidemiol. 2007 Dec 12; [Epub ahead of print] [PubMed]

Common diseases with a genetic basis are likely to have a very complex etiology, in which the mapping between genotype and phenotype is far from straightforward. A new comprehensive statistical and computational strategy for identifying the missing link between genotype and phenotype has been proposed, which emphasizes the need to address heterogeneity in the first stage of any analysis and gene-gene interactions in the second stage. We applied this two-stage analysis strategy to late-onset Alzheimer disease (LOAD) data, which included functional and positional candidate genes and markers in a region of interest on chromosome 10. Bayesian classification found statistically significant clusterings for independent family-based and case-control datasets, which used the same five markers in leucine-rich repeat transmembrane neuronal 3 (LRRTM3) as the most influential in determining cluster assignment. In subsequent analyses to detect main effects and gene-gene interactions, markers in three genes-urokinase-type plasminogen activator (PLAU), angiotensin 1 converting enzyme (ACE) and cell division cycle 2 (CDC2)-were found to be associated with LOAD in particular subsets of the data based on their LRRTM3 multilocus genotype. All of these genes are viable candidates for LOAD based on their known biological function, even though PLAU, CDC2 and LRRTM3 were initially identified as positional candidates. Further studies are needed to replicate these statistical findings and to elucidate possible biological interaction mechanisms between LRRTM3 and these genes.

Friday, December 07, 2007

Epistasis in Schizophrenia

This looks like an interesting new paper. I haven't read it yet but it looks like they carry out an extensive epistasis analysis. Should they have conditioned on main effects? Did they consider epistasis in absence of significant main effects? How much of the genetic architecture for this pathway are they revealing? The follow-up with functional studies looks nice.

Talkowski ME, Kirov G, Bamne M, Georgieva L, Torres G, Mansour H, Chowdari KV, Milanova V, Wood J, McClain L, Prasad K, Shirts B, Zhang J, O'Donovan MC, Owen MJ, Devlin B, Nimgaonkar VL.

Department of Psychiatry, University of Pittsburgh, School of Medicine, Pittsburgh, PA, USA.

Hum Mol Genet. 2007 Nov 27 [Epub ahead of print]

A network of dopaminergic gene variations implicated as risk factors for schizophrenia.We evaluated the hypothesis that dopaminergic polymorphisms are risk factors for schizophrenia. Stage I (screening): Eighteen dopamine-related genes were analyzed in two independent US Caucasian samples: 150 trios and 328 cases / 501 controls. The most promising associations were detected with SLC6A3 (alias DAT), DRD3, COMT, and SLC18A2 (alias VMAT2). Stage II (SNP coverage and epistasis): To comprehensively evaluate these four genes, 68 SNPs were genotyped in all 478 cases and 501 controls from stage I. Fifteen (23.1%) significant associations were found (p < 0.05). We tested for epistasis between pairs of SNPs providing main effects and observed 17 significant interactions (169 tests); 41.2% of significant interactions involved rs3756450 (5' near promoter) or rs464049 (intron 4) at SLC6A3. Stage III (confirmation): Sixty-five SNPs were genotyped in 659 Bulgarian trios. Both SLC6A3 variants implicated in the US interactions were over-transmitted in this cohort (rs3756450, p = 0.035; rs464049, p = 0.011). Joint analyses from stages II and III identified associations at all four genes (p(joint)< 0.05). We tested 29 putative interactions from stage II and detected replication between 7 locus pairs (p < 0.05). Simulations suggested our stage II and stage III interaction results were unlikely to have occurred by chance (p = 0.008 and 0.001, respectively). Stage IV (function): We tested rs464049 and rs3756450 for functional effects and found significant allele specific differences at rs3756450 using EMSA and dual-luciferase promoter assays. Conclusions: Our data suggest a network of dopaminergic polymorphisms increase risk for schizophrenia.