Epistasis Blog

From the Artificial Intelligence Innovation Lab at Cedars-Sinai Medical Center (www.epistasis.org)

Sunday, March 31, 2019

How to increase our belief in discovered statistical interactions via large-scale association studies?

Our new paper with Dr. Kristel van Steen on approaches for improving evidence for statistical interactions.

Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet. 2019 [PubMed] [Human Genetics]


The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.

Friday, March 01, 2019

Testing the assumptions of parametric linear models: the need for biological data mining in disciplines such as human genetics

This editorial is in response to some claims that an observed linear relationship between relative pair trait correlation and IBD genetic sharing is indicative of a simple additive genetic architecture dominated by independent genetic effects. As we show here, you could observe this pattern under a genetic architecture dominated by epistasis.

Moore JH, Mackay TFC, Williams SM. Testing the assumptions of parametric linear models: the need for biological data mining in disciplines such as human genetics. BioData Min. 2019 Feb 11;12:6. [PubMed] [BioData Mining]


All data science methods have specific assumptions that are made in order for their inferences to be valid. Some assumptions impact statistical significance testing and some influence the models themselves. For example, a fundamental assumption of linear regression is that the relationship between the independent and dependent variables is additive such that a unit increase in one leads to a unit increase in the other with some error that can be modeled using a normal distribution. The presence of a nonlinear relationship between the variables violates this assumption and can lead to inaccurate inferences. We demonstrate this here using a simple example from human genetics and then end with some thoughts about the role of biological data mining in revealing nonlinear relationships between variables.