Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 1, 2014 | Supplemental Material + Published
Journal Article Open

Large-scale quality analysis of published ChIP-seq data

Abstract

ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique in PubMed as of December 2012. Individually and in aggregate these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia Of DNA Elements (ENCODE) project, developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1st 2012. The majority (55%) of datasets scored as highly successful, but a substantial minority (20%) were of apparently poor quality, and another ~25% were of intermediate quality. We discuss how different uses of ChIP-Seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e. no-immunoprecipitation and mock-immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.

Additional Information

© 2014 Marinov et al. Manuscript received September 29, 2013; accepted for publication November 21, 2013; published Early Online December 17, 2013. This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We thank members of the ENCODE consortium and members of the Wold laboratory for helpful discussions, and Henry Amrhein, Diane Trout, and Sean Upchurch for computational assistance. G.K.M. and B.J.W. are supported by the Beckman Foundation, the Donald Bren Endowment, and National Institutes of Health grants U54 HG004576 and U54 HG006998. Communicating editor: T. R. Hughes

Attached Files

Published - 209.full.pdf

Supplemental Material - 008680SI.pdf

Supplemental Material - FigureS1.pdf

Supplemental Material - FigureS10.pdf

Supplemental Material - FigureS11.pdf

Supplemental Material - FigureS2.pdf

Supplemental Material - FigureS3.pdf

Supplemental Material - FigureS4.pdf

Supplemental Material - FigureS5.pdf

Supplemental Material - FigureS6.pdf

Supplemental Material - FigureS7.pdf

Supplemental Material - FigureS8.pdf

Supplemental Material - FigureS9.pdf

Supplemental Material - TableS1.pdf

Supplemental Material - TableS2.pdf

Files

FigureS4.pdf
Files (8.5 MB)
Name Size Download all
md5:97e516cef70d4343b2ebb8d50d9fb5b7
190.1 kB Preview Download
md5:49c700e6d1374e955513ff2e74851a47
310.2 kB Preview Download
md5:92de1217c0cb6ad26023fef23753fbaf
399.7 kB Preview Download
md5:5a9b5688ca068026b9b47369851c380e
1.2 MB Preview Download
md5:0bfdd6378347490ef294dd43dbcf7955
100.8 kB Preview Download
md5:ba2620c7b03dbea3a13a67d87e12a86a
200.6 kB Preview Download
md5:07e0c183da7559790d96cb185ffb74af
90.9 kB Preview Download
md5:75cfd44ebb7fcbf50ceb801913583bcb
266.9 kB Preview Download
md5:6cba12fc6018cf9d79909655329a1bc3
1.7 MB Preview Download
md5:7695cf8382c0d25b4018061441299d39
145.9 kB Preview Download
md5:ab0251161f472f472b13a0ad8f741b02
180.4 kB Preview Download
md5:3068dff5b21f085adda16ed8fb2d0b86
240.0 kB Preview Download
md5:883aaf823813779f1fc0a2654e204a0a
188.0 kB Preview Download
md5:bbe0388312709085c152cf9541eec6b7
71.3 kB Preview Download
md5:3ad7b6aa0620b141a521f5bb31e19778
3.2 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
October 25, 2023