Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Tuesday, October 31, 2006

SyMod: Open-Source Software for Symbolic Modeling

We are in the process of testing our new SyMod software package. We plan to release a first beta version on Sourceforge.net for public testing within the next 2-4 weeks. Check back here for updates. A short description of SyMod can be found at http://www.epistasis.org/software.html.

A new paper describing a 5-step process for implementing the symbolic discriminant analysis (SDA) component of SyMod for epistasis modeling will appear in Human Heredity in early 2007. Email me if you want to see a preprint.

Development and distribution of SyMod is supported primarily by NIH R01 AI59694.

Monday, October 30, 2006

GECCO 2007

I will be chairing (with Clare Congdon) the Biological Applications track of the 2007 Genetic and Evolutionary Computing Conference (GECCO) in London. Papers for the conference are due on January 17th. I will post a more formal call for papers later. I look forward to this particular conference every year. It is a nice blend of machine learning, data mining and artificial life with a growing focus on real-world problems such as those in genetics and bioinformatics. I will also be giving a tutorial on bioinformatics at the conference.

Note that there will be a public debate on Complexity and Evolution for one of the keynote events at GECCO. Click here for more information.

Thursday, October 26, 2006

EvoBIO 2007 *** Deadline Extended ***

The paper submission deadline for the EvoBIO 2007 conference that I mentioned in my last post has been extended to November 10th. For more information about EvoBIO and the other conferences held at the same time please click here.

Sunday, October 22, 2006

EvoBIO 2007

I am co-chairing The 5th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO). EvoBIO 2007 will be held in in Valencia, Spain on April 11-13, 2007. Please note that the paper deadline for EvoBIO 2007 is Nov. 1st, 2006. Please see the conference web site (here) for more details. All papers developing or applying computational methods for genetic analysis are welcome.

Saturday, October 21, 2006

Required Reading for Graduate Students

I am assembling a list of the top 100 papers that a graduate student in statistical and computational genetics should read before they go out into the real world. I would be very interested to hear your opinions about which papers are critical to provide both a historical perspective and requisite knowledge. I will post this list here when it is completed. Please post or email me your suggestions. Thanks! Jason

Friday, October 20, 2006

MDR Applications

I have updated the list papers that apply our multifactor dimensionality reduction (MDR) method and software to detecting gene-gene and gene-environment interactions in studies of human health and disease. The updated list can be found here in my May 29th (2006) blog post. If you know of any papers that have been left off please let me know.

Click here to carry out a PubMed search for MDR publications.

Click here to Google MDR.

Click here to Google MDR in Google Scholar.

Sunday, October 15, 2006

MDR on Sourceforge.net

Our multifactor dimensionality reduction (MDR) software project on Sorceforge.net is ranked #26 out of 882 bioinformatics applications. The MDR software has been downloaded from the site approximately 6,700 times since Feb. of 2005.

You can download MDR 1.0 from here.

Saturday, October 14, 2006

ASHG 2006

I just returned from the 2006 meeting of the American Society of Human Genetics (ASHG) in New Orleans. I was pleased to see an increase in the number of abstracts that report studies of epistasis. However, the total number of abstracts that contain the word epistasis (n=22) is very small considering the importance of epistasis in the genetic architecture of common diseases. This is 22 out of a total of 2390 abstracts. You can find these by searching the pdf file with all 2390 abstracts. You can find our abstracts by searching for "j.h. moore".

The most interesting abstract and poster that I saw while I was there is copied below. This work was inspired by the winner's curse that is observed in auctions. I am a big fan of stealing good ideas from other disciplines.

2270/A
Correcting for the “winner’s curse” in genetic association studies. R. Xiao, M. Boehnke. Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI.
Studies of gene-disease association are now commonly used to localize genetic loci that impact disease susceptibility. It is also of interest to estimate the genetic effect of each identified locus. It is known that the initial positive findings of the genetic effect estimate tend to be upwardly biased, a phenomenon known as the “winner’s curse”. In our study, we model the winner’s curse in the context of case-control genetic association studies. We quantify its impact on the naïve estimators of the allele frequency difference between cases and controls as a function of several factors including sample size, minor allele frequency in controls and cases, and the chosen statistical significance level. We also propose a maximum likelihood method to improve the estimate of the allele frequency difference corrected for the ascertainment. Initial analytical and simulation results indicate that our method substantially reduces the observed overestimation, allowing better estimation of locus-specific effect, and more appropriate design
for follow up studies.

Saturday, October 07, 2006

MDR 1.0

The Computational Genetics Laboratory at Dartmouth Medical School is pleased to announce the availability of version 1.0 of our Multifactor Dimensionality Reduction (MDR) software. This version is the result of more than 1.5 years of development and testing. We would like to thank Nate Barney, Todd Holden, Bill White and many others for their hard work and dedication to produce and support the open-source version of MDR. We would also like to acknowledge the support of NIH grants R01 AI59694, R01 HD047447, and LM009012. The latter grant is new and supports the development of genome-wide analysis approaches for MDR.

Download the latest version here.

I will be passing out MDR 1.0 software CDs at the American Society of Human Genetics (ASHG) annual meeting in New Orleans next week. Ask me for one!

Friday, October 06, 2006

Two-Stage Two-Locus Models in Genome-Wide Association

A new paper by Evans et al. in PLoS Genetics looks very interesting:

Two-Stage Two-Locus Models in Genome-Wide Association.
Evans DM, Marchini J, Morris AP, Cardon LR.
PLoS Genetics 2006 Sep 22;2(9) [PubMed]

Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.