Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Thursday, May 26, 2005

Comparison of population- and family-based methods for genetic association analysis in the presence of interacting loci

A new paper by Howson et al. explores the power to detect two-locus interactions with different study designs.

Howson JM, Barratt BJ, Todd JA, Cordell HJ. Comparison of population- and family-based methods for genetic association analysis in the presence of interacting loci. Genet Epidemiol. 2005 May 12. [PubMed]

Abstract:

We compared different ascertainment schemes for genetic association analysis: affected sib-pairs (ASPs), case-parent trios, and unrelated cases and controls. We found, with empirical type 1 diabetes data at four known disease loci, that studies based on case-parent trios and on unmatched cases and controls often gave higher odds ratio estimates and stronger significance test values than ASP designs. We used simulations and a simplified disease model involving two interacting loci, one of large effect and one smaller, to examine interaction models that could cause such an effect. The different ascertainment schemes were compared for power to detect an effect when only the locus of smaller effect was genotyped. ASPs showed the greatest power for association testing under most models of interaction except under additive and certain epistatic crossover models, for which case/controls and case-parent trios did better. All ascertainment schemes gave an unbiased estimation of log genotype relative risks (GRRs) under a multiplicative model. Under nonmultiplicative interactions, GRRs at the minor locus as estimated from ASPs could be biased upwards or downwards, resulting in either an increase or decrease in power compared to the case/control or trio design. For the four known type 1 diabetes loci, we observed decreased risks with ASPs, which could be due to additive interactions with the remaining susceptibility loci. Thus, the optimal ascertainment strategy in genetic association studies depends on the unknown underlying multilocus genetic model, and on whether the goal of the study is to detect an effect or to accurately estimate the resulting disease risks. Genet. Epidemiol. (c) 2005 Wiley-Liss, Inc.

Friday, May 20, 2005

MDR 0.3 released

The Dartmouth Computational Genetics Laboratory (CGL) is pleased to announce the release of version 0.3 of our multifactor dimensionality reduction (MDR) software for detecting and characterizing gene-gene and gene-environment interactions in genetic and epidemiologic studies of common human diseases.

The new version of MDR can be downloaded from here.

New features include:

1) Publication-quality figures of the multilocus MDR models. Models can be viewed in any dimension and saved to an encapsulated postscript (*.eps) file.

2) The ability to save a snapshot of an MDR analysis so you can load it later. This eliminates the need to rerun an MDR analysis if you forget to output a set of results. All aspects of the analysis are saved including the data, the configuration parameters used, and all the results.

3) A sign test to test the null hypothesis that number of testing accuracies with values > 0.5 (+)is equal to the number <= 0.5 (-) from 10 or 20 cross-validation intervals, for example. We think the sign test might be useful for deciding whether or not to proceed with a computationally expensive permutation test using the MDR Permutation Testing module. We don't see it being used for formal hypothesis testing unless the size and power of the test are determined to be acceptable.

4) The ability to save the raw results from the MDR analysis. Raw results now include all fitness values (average training accuracies) for every model considered.

5) The ability to consider matched case-control or family-based data. Here, matched pairs are kept together during cross-validation.

6) IF-THEN rules for each genotype combination and their "high-risk" or "low-risk" assignments. For example: IF SNP1 = AA and SNP2 = GG then 1.

7) A progress bar to determine how long of a lunch break to take.

8) A new data format in which the class variable is the last column and each column has a header describing that variable or attribute (e.g. SNP1, CYP2D6). Old MDR data formats can be converted to new MDR formats using the MDR Data Tool availabe here.

Here are a few features that are planned for future releases:

1) Threading to take advantage of multi-processor computers.

2) Batch/command line mode to allow MDR to be run from scripts. This will facilitate running MDR multiple times in simulations on a parallel computer.

3) Visualization of the fitness landscape.

4) Wrapper-algorithms for variable/attribute selection. This will be important when the number of attributes is too large for exhaustive searching.

5) A context-sensitive help system.

Is there something you would like to see added to MDR? Request it here.

Note that MDR will be in beta testing for another 2-3 months. Please send us your feedback so we can roll out a polished MDR 1.0 later this summer.

Wednesday, May 18, 2005

Biological vs. Statistical Epistasis

The Dartmouth CGL is pleased to note that our paper on biological and statistical epistasis has been published in the June issue of BioEssays.

Moore JH, Williams SM. Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays. 2005 May 12;27(6):637-646. [PubMed]

Abstract:

Epistasis plays an important role in the genetic architecture of common human diseases and can be viewed from two perspectives, biological and statistical, each derived from and leading to different assumptions and research strategies. Biological epistasis is the result of physical interactions among biomolecules within gene regulatory networks and biochemical pathways in an individual such that the effect of a gene on a phenotype is dependent on one or more other genes. In contrast, statistical epistasis is defined as deviation from additivity in a mathematical model summarizing the relationship between multilocus genotypes and phenotypic variation in a population. The goal of this essay is to review definitions and examples of biological and statistical epistasis and to explore the relationship between the two. Specifically, we present and discuss the following two questions in the context of human health and disease. First, when does statistical evidence of epistasis in human populations imply underlying biomolecular interactions in the etiology of disease? Second, when do biomolecular interactions produce patterns of statistical epistasis in human populations?Answers to these two reciprocal questions will provide an important framework for using genetic information to improve our ability to diagnose, prevent and treat common human diseases. We propose that systems biology will provide the necessary information for addressing these questions and that model systems such as bacteria, yeast and digital organisms will be a useful place to start. BioEssays 27:637-646, 2005. (c) 2005 Wiley Periodicals, Inc.

Saturday, May 14, 2005

Epistasis and drug discovery

The Dartmouth CGL is pleased to note that our paper on "Combinatorial pharmacogenetics" has been accepted for publication in Nature Reviews Drug Discovery. It will appear this summer. The application of our multifactor dimensionality reduction (MDR) approach to pharmacogenetics is reviewed.

Wilke R, Reif D, Moore JH. Combinatorial pharmacogenetics. Nature Reviews Drug Discovery, in press (2005).

Abstract:

Combinatorial pharmacogenetics seeks to characterize genetic variations affecting reactions to potentially toxic agents within the complex metabolic networks of the human body. Polymorphic drug metabolizing enzymes (DME) are likely to represent some of the most common inheritable risk factors associated with common “disease” phenotypes, such as adverse drug reactions. The relatively high concordance between DME polymorphisms and clinical phenotypes suggests that research into this class of polymorphisms may benefit patients in the near-future. Characterization of other genes impacting drug disposition (absorption, distribution, metabolism, and elimination) will further enhance this process. As with most questions concerning biological systems, the complexity arises out of the combinatorial magnitude of all the possible interactions and pathways. The high-dimensionality of the resulting analysis problem will often overwhelm traditional analysis methods. Novel analysis techniques, such as multifactor dimensionality reduction (MDR), offer viable options for evaluating such data.

Friday, May 13, 2005

Detecting epistasis using random walks

A new paper by Hanlon and Lorenz in the Journal of Theoretical Biology presents a strategy based on random walks for detecting epistatic effects contributing to a quantitative trait:

Hanlon P, Lorenz A. A computational method to detect epistatic effects contributing to a quantitative trait. J Theor Biol. 2005 Aug 7;235(3):350-64. [PubMed]

Abstract:

We develop a new computational method to detect epistatic effects that contribute to a complex quantitative trait. Rather than looking for epistatic effects that show statistical significance when considered in isolation, we search for a close approximation to the quantitative trait by a sum of epistatic effects. Our search algorithm consists of a sequence of random walks around the space of sums of epistatic effects. An important feature of our approach is that there is learning between random walks, i.e. the control mechanism that chooses steps in our random walks adapts to the experiences of earlier random walks. We test the effectiveness of our algorithms by applying them to synthetic datasets where the phenotype is a sum of epistatic effects plus normally distributed noise. Our test statistic is the rate of success that our methods achieve in identifying the underlying epistatic effects. We report on the effectiveness of our methods as we vary parameters that are intrinsic to the computation (length of random walks and degree of learning) as well as parameters that are extrinsic to the computation (number of markers, number of individuals, noise level, architecture of the epistatic effects).

Monday, May 09, 2005

Biodefense Bioinformatics Blog

The Dartmouth CGL is happy to announce the availability of a new blog to communicate the latest developments in bioinformatics strategies for biodefense research. Our work on open-source software packages for MDR, RPM, and SDA is supported by a five-year grant from the National Institute of Allergy and Infectious Diseases (AI59694). The goal of this grant is to develop bioinformatics research strategies for identifying genetic and proteomic predictors of adverse events (e.g. fever, rash) following vaccination for smallpox.

You can access our Biodefense Bioinformatics Blog here.

Saturday, May 07, 2005

MDR detects epistasis in schizophrenia

A new paper by Qin et al. in the European Journal of Human Genetics reports an epistatic association between the GRIN1 and GRIN2B genes and susceptibility to schizophrenia [OMIM#181500] that was detected using our multifactor dimensionality reduction (MDR) software.

Qin S, Zhao X, Pan Y, Liu J, Feng G, Fu J, Bao J, Zhang Z, He L. An association study of the N-methyl-D-aspartate receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B) in schizophrenia with universal DNA microarray. Eur J Hum Genet. 2005 Apr 20; [PubMed]

Abstract:

Dysfunction of the N-methyl-D-aspartate (NMDA) receptors has been implicated in the etiology of schizophrenia based on psychotomimetic properties of several antagonists and on observation of genetic animal models. To conduct association analysis of the NMDA receptors in the Chinese population, we examined 16 reported SNPs across the NMDA receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B), five of which were identified in the Chinese population. In this study, we combined universal DNA microarray and ligase detection reaction (LDR) for the purposes of association analysis, an approach we considered to be highly specific as well as offering a potentially high throughput of SNP genotyping. The association study was performed using 253 Chinese patients with schizophrenia and 140 Chinese control subjects. No significant frequency differences were found in the analysis of the alleles but some were found in the haplotypes of the GRIN2B gene. The interactions between the GRIN1 and GRIN2B genes were evaluated using the multifactor-dimensionality reduction (MDR) method, which showed a significant genetic interaction between the G1001C in the GRIN1 gene and the T4197C and T5988C polymorphisms in the GRIN2B gene. These findings suggest that the combined effects of the polymorphisms in the GRIN1 and GRIN2B genes might be involved in the etiology of schizophrenia.

Epistasis and balanced polymorphism influencing complex trait variation

A new paper by Kroymann and Mitchell-Olds in Nature demonstrates genetic complexity in Arabidopsis thaliana:

Kroymann J, Mitchell-Olds T. Epistasis and balanced polymorphism influencing complex trait variation. Nature. 2005 May 5;435(7038):95-8. [PubMed]

Abstract:

Complex traits such as human disease, growth rate, or crop yield are polygenic, or determined by the contributions from numerous genes in a quantitative manner. Although progress has been made in identifying major quantitative trait loci (QTL), experimental constraints have limited our knowledge of small-effect QTL, which may be responsible for a large proportion of trait variation. Here, we identified and dissected a one-centimorgan chromosome interval in Arabidopsis thaliana without regard to its effect on growth rate, and examined the signature of historical sequence polymorphism among Arabidopsis accessions. We found that the interval contained two growth rate QTL within 210 kilobases. Both QTL showed epistasis; that is, their phenotypic effects depended on the genetic background. This amount of complexity in such a small area suggests a highly polygenic architecture of quantitative variation, much more than previously documented. One QTL was limited to a single gene. The gene in question displayed a nucleotide signature indicative of balancing selection, and its phenotypic effects are reversed depending on genetic background. If this region typifies many complex trait loci, then non-neutral epistatic polymorphism may be an important contributor to genetic variation in complex traits.

Hardy-Weinberg Equilibrium

Tests of departure from Hardy-Weinberg equilibrium are often misunderstood and misused. Everyone doing genetic studies in human populations should read the following paper:

Wittke-Thompson JK, Pluzhnikov A, Cox NJ. Rational Inferences about Departures from Hardy-Weinberg Equilibrium. Am J Hum Genet. 2005 Jun;76(6):967-86. [PubMed]

MDR Analysis Module Update

With the Data Tool now finished (see May 3rd posting), we are turning our attention to the MDR Analysis module (v0.2.1). Here is a preview of the new features we are adding to the next release (v0.3):

1) Publication-quality figures of the multilocus MDR models.
2) The ability to save a snapshot of an MDR analysis so you can load it later.
3) A sign test to test the null hypothesis that number of testing accuracies with values > 0.5 (+)is equal to the number <= 0.5 (-) from 10 or 20 cross-validation intervals, for example. We think the sign test might be useful for deciding whether or not to proceed with a computationally expensive permutation test using the MDR Permutation Testing module. We don't see it being used for formal hypothesis testing unless the size and power of the test are determined to be acceptable.
4) The ability to save the raw results from the MDR analysis.
5) The ability to consider matched case-control or family-based data.

Here are a few features that are planned for future releases:

6) A progress bar.
7) The multilocus MDR model in the form of IF-THEN rules.
8) A command-line interface that can be used with scripting.
9) Threading to take advantage of multi-processor computers.
10) Wrapper-algorithms for variable/attribute selection.
11) Visualization of the fitness landscape.
12) Context-sensitive help.
13) Save results in XML format.

Is there something you would like to see added to MDR? Request it here.

More information about the open-source MDR software package including access to the JAVA source code and executables can be found here.

General information about the MDR method can be found here.

Tuesday, May 03, 2005

MDR Data Tool

The Dartmouth CGL is happy to announce the availability of a Data Tool module for our multifactor dimensonality reduction (MDR) software package. The Data Tool makes it possible to convert between several different data formats, impute missing genotypes using a simple model, and resample the data to create balanced case/control ratios. The Data Tool module joins the Analysis module and the Permutation Testing module to complete the open-source MDR software package. All three modules will be continuously updated with new features. Please check back often for updates.

The MDR Data Tool can be downloaded from here.

Information about the MDR method and algorithm can be found online here.

** NOTE ** We have changed the basic MDR data format for all three modules. The new format requires the class variable (i.e. case/control status) to be in the last column. Also, the new format accepts variable names/headers in the first row of each column. This new data format is more consistent with how most research datasets are formatted. The Data Tool will read the old MDR format and save files in the new format. The data tool will also read and save files in Weka ARFF format.

Please let us know if there are additional features you would like to see in any of the three MDR modules.