Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Tuesday, May 31, 2011

Detecting genetic interactions for quantitative traits with U-statistics

This is an interesting paper that addresses an important topic. We need more methods that focus on epistasis contributing to interindividual variation in quantitative traits. U statistics seem promising.

Li M, Ye C, Fu W, Elston RC, Lu Q. Detecting genetic interactions for
quantitative traits with U-statistics. Genet Epidemiol. 2011 May 26. doi:
10.1002/gepi.20594. [Epub ahead of print] [PubMed] PMID: 21618602.


The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this study, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U-statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to nicotine dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (P-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with P-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.

Monday, May 30, 2011

Transcriptional robustness and protein interactions are associated in yeast

This result is entirely consistent with epistasis due to canalization.

Bekaert M, Conant GC. Transcriptional robustness and protein interactions are associated in yeast. BMC Syst Biol. 2011 May 5;5(1):62. [PubMed]


BACKGROUND: Robustness to insults, both external and internal, is a characteristic feature of life. One level of biological organization for which noise and robustness have been extensively studied is gene expression. Cells have a variety of mechanisms for buffering noise in gene expression, but it is not completely clear what rules govern whether or not a given gene uses such tools to maintain appropriate expression.

RESULTS: Here, we show a general association between the degree to which yeast cells have evolved mechanisms to buffer changes in gene expression and whether they possess protein-protein interactions. We argue that this effect bears a resemblance to epistasis, because yeast appears to have evolved regulatory mechanisms such that distant changes in gene copy number for a protein-protein interaction partner gene can alter a gene's expression. This association is not unexpected given recent work linking epistasis and the deleterious effects of changes in gene dosage (i.e., the dosage balance hypothesis). Using gene expression data from artificial aneuploid strains of bakers' yeast, we found that genes coding for proteins that physically interact with other proteins show less expression variation in response to aneuploidy than do other genes. This effect is even more pronounced for genes whose products interact with proteins encoded on aneuploid chromosomes. We further found that genes targeted by transcription factors encoded on aneuploid chromosomes were more likely to change in expression after aneuploidy.

CONCLUSIONS: We suggest that these observations can be best understood as resulting from the higher fitness cost of misexpression in epistatic genes and a commensurate greater regulatory control of them.

Saturday, May 21, 2011

The effects of linkage disequilibrium in large scale SNP datasets for MDR

This is a nice new open-access paper from Dr. Marylyn Ritchie's lab on the effects of LD on MDR models.

Grady BJ, Torstenson ES, Ritchie MD. The effects of linkage disequilibrium in large scale SNP datasets for MDR. BioData Min. 2011 May 5;4(1):11. [PubMed] [BioData Mining]


BACKGROUND: In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.

RESULTS: Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.

CONCLUSIONS: Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. However, careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.

Monday, May 16, 2011

Computational Intelligence Using Genetic Programming

I just returned from the IXth Genetic Programming Theory and Practice Workshop held by the Center for the Study of Complex Systems at the University of Michigan. This is an invitation only workshop that brings together theorists and practitioners interested in the development and application of computer systems that can solve complex problems by developing their own programs (i.e. automatic programming). This group focuses on the use of genetic programming or GP to discover useful computer programs using the principles of evolution by natural selection. The proceedings from this workshop are published each year in a book that can be found on Amazon. The proceedings from this year will be published in late 2011 or early 2012.

The real value of this workshop is the large amount of time dedicted to open-ended discussion about how solve complex problems in medicine, industry, finance, etc. My own motivation for working with GP is to teach the computer how to solve a complex human genetics problem as I would. I do not believe that naive computer programs or analysis strategies such as those used in the agnostics genome-wide association study (GWAS) paradigm will be successful in addressing the complexity of the genotype-phenotype relationship. We, as human analysis engines, don't ignore the pathobiology of disease when we look at data. Why should we instruct the computer to do the same? Given infinite time, each of us would tinker and try new and different things with the data until we found a good answer that made biological sense. We would use our knoweldge of biochemistry, genomics, molecular biology, pathology and physiology to both frame the analysis and interpret the results. Our series of papers published as part of GPTP since 2006 have focused on adaptive computer programs that harness this kind of biological and biomedical knowledge to explore the space of computer programs that can build models of genetic architecture.

One of the more interesting and extended discussions at GPTP this year was about novelty-seeking. Ken Stanley gave a great talk about rewarding computer programs that explore new and different solutions to a problem (read more). His Picbreeder program is a nice example of novelty search in the sense that you can discover and develop interesting pictures without a clear initial objective in mind (e.g. evolve a picture of a car). An analogy in human genetics would be to reward computer program that generate genetic models of disease by exploring new biochemical pathways. I am working on approaches to try this within our own genetic analysis system. I like Ken's quote: "To achieve your highest goals, you must be willing to abandon them."

It is very clear that GP has been used to solve problems that humans or other computer programs haven't been able to. For example, Moshe Sipper has developed computer game players that rival human players (read more). Some of the participants (e.g. Michael Korns) even invest and make money using GP. This is a powerful way to do automatic programming and should be part of the broader toolbox of any complex problem-solver. I would be happy to send you a pre-print of our current GPTP paper.

Wednesday, May 04, 2011

Microbiome Studies at the 2012 Pacific Symposium on Biocomputing

I will be co-chairing again next year the Microbiome Studies session at PSB. Here is the call for papers: http://psb.stanford.edu/cfp-ms. Papers are due July 11, 2011. The conference will be held January 3-7, 2012 on the Big Island of Hawaii. Let me know if you have any questions.