Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published May 30, 2015 | Supplemental Material + Published
Journal Article Open

Controlling for conservation in genome-wide DNA methylation studies

Abstract

BACKGROUND: A commonplace analysis in high-throughput DNA methylation studies is the comparison of methylation extent between different functional regions, computed by averaging methylation states within region types and then comparing averages between regions. For example, it has been reported that methylation is more prevalent in coding regions as compared to their neighboring introns or UTRs, leading to hypotheses about novel forms of epigenetic regulation. RESULTS: We have identified and characterized a bias present in these seemingly straightforward comparisons that results in the false detection of differences in methylation intensities across region types. This bias arises due to differences in conservation rates, rather than methylation rates, and is broadly present in the published literature. When controlling for conservation at coding start sites the differences in DNA methylation rates disappear. Moreover, a re-evaluation of methylation rates at intronexon junctions reveals that the magnitude of previously reported differences is greatly exaggerated. We introduce two correction methods to address this bias, an inference-based matrix completion algorithm and an averaging approach, tailored to address different underlying biological questions. We evaluate how analysis using these corrections affects the detection of differences in DNA methylation across functional boundaries. CONCLUSIONS: We report here on a bias in DNA methylation comparative studies that originates in conservation rate differences and manifests itself in the false discovery of differences in DNA methylation intensities and their extents. We have characterized this bias and its broad implications, and show how to control for it so as to enable the study of a variety of biological questions.

Additional Information

© Singer and Pachter; licensee BioMed Central. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Received: 15 April 2015. Accepted: 1 May 2015. Published: 30 May 2015. We thank Yael Mandel-Gutfreund, Idit Kosti and Asaf Zemach for helpful feedback, as well as Nicolas Bray and other members of the Pachter lab for many insightful discussions. L.P. and M.S. were partially funded by NIH R01 HG006129. Authors' contributions: MS and LP conceived the study and conducted the mathematical characterization, statistical analysis and design of correction methods. MS implemented the COMPARE software and conducted the data analysis. MS and LP wrote the manuscript. Both authors read and approved the final manuscript. The authors declare that they have no competing interests.

Attached Files

Published - art_3A10.1186_2Fs12864-015-1604-3.pdf

Supplemental Material - 12864_2015_1604_MOESM1_ESM.pdf

Supplemental Material - 12864_2015_1604_MOESM2_ESM.pdf

Files

art_3A10.1186_2Fs12864-015-1604-3.pdf
Files (2.7 MB)
Name Size Download all
md5:b8aa4fffdd3ff84cb8aaebea4e7d79e4
1.4 MB Preview Download
md5:33093bacdd65ca90f0940514faddef29
1.0 MB Preview Download
md5:d70133e48862912cffa83f060ca57bfb
260.1 kB Preview Download

Additional details

Created:
August 22, 2023
Modified:
October 24, 2023