Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 2020 | Supplemental Material + Submitted + Published
Journal Article Open

Term Matrix: A novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns

Abstract

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally, and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes likely reflects errors in literature curation, ontology structure, or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g., amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 2.5 million automatically propagated annotations across all taxa.

Additional Information

© 2020 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. Manuscript received 22/06/2020; Manuscript accepted 06/08/2020; Published online 02/09/2020. We thank Peter D'Eustachio for Reactome updates and the InterPro group for InterPro2GO mapping updates. We thank Nomi Harris for constructive comments on the manuscript. We also thank the many biocurators, editors and other members of the GO Consortium who have contributed to GO annotations and to the development of the Gene Ontology, and PomBase principal investigator Stephen G. Oliver for ongoing guidance and support of all PomBase activities. Data accessibility: The GO ontology and annotation datasets are freely available from the Gene Ontology website (see the main downloads page [41]). All other data supporting this article have been uploaded as part of the electronic supplementary material. Authors' contributions: V.W. conceived the project, generated annotation rules and wrote the initial draft; S.C. and C.J.M. developed Term Matrix; K.M.R. provided bioinformatic support for the fission yeast case study; V.W., A.L., S.R.E., D.P.H., K.V.A., H.A. and R.C.L. corrected annotation errors identified in the study; M.A.H. made extensive text revisions, and prepared the manuscript for submission; D.P.H., K.V.A. and P.G. corrected ontology errors; S.P. and M.F. provided SPKW mapping updates; M.F. and P.G. provided PAINT propagation updates. All authors contributed to the discussion of ideas and manuscript revisions, and read and approved the final manuscript. The authors declare no competing interests. V.W., A.L., M.A.H. and K.M.R. are supported by the Wellcome Trust via the PomBase project (grant no. 104967/Z/14/Z). S.C., S.R.E., D.P.H., K.V.A., P.G. and C.J.M. are funded via the GO resource, which is supported by the National Human Genome Research Institute (NHGRI) (grant no. U41 HG002273). S.R.E. is also funded by the NHGRI via the Saccharomyces Genome Database (grant no. U41 HG001315) and the Alliance of Genome Resources (grant no. U24 HG010859). K.V.A. is also funded via WormBase, which is supported by the NHGRI (grant no. U24 HG002223), the UK Medical Research Council (grant no. MR/S000453/1) and the UK Biotechnology and Biological Sciences Research Council (grant no. BB/P024602/1). H.A. is funded by the UK Medical Research Council (grant no. MR/N030117/1). R.C.L. is supported by Alzheimer's Research UK (grant no. ARUK-NAS2017A-1) and by the National Institute for Health Research UCL Hospitals Biomedical Research Centre. The GO Consortium, FlyBase (HA), Mouse Genome Informatics (DPH), the Saccharomyces Genome Database (SRE), and WormBase (KVA) are members of the Alliance of Genome Resources.

Attached Files

Published - rsob.200149.pdf

Submitted - 2020.04.21.045195v1.full.pdf

Supplemental Material - RSOB200149_si_001.xlsx

Supplemental Material - RSOB200149_si_002.xlsx

Supplemental Material - RSOB200149_si_003.xlsx

Supplemental Material - RSOB200149_si_004.xlsx

Supplemental Material - RSOB200149_si_005.xlsx

Supplemental Material - RSOB200149_si_006.xlsx

Files

rsob.200149.pdf
Files (3.0 MB)
Name Size Download all
md5:b509d1319beef1dae3ecc88afd53a801
12.2 kB Download
md5:e262afbcc49de790f9f678d89430d2a8
18.6 kB Download
md5:8001e8da6831f10f0262987d35439bfc
17.5 kB Download
md5:fcbcd0664f2b681a8ebb5a669b6f5eb1
91.1 kB Download
md5:ac9df41c71d153f2942d4979ae435482
30.1 kB Download
md5:efa8330f5dddd2cfacf3088cbe3fb62a
959.2 kB Preview Download
md5:7cd9ccf5028e257bd98bd9494b9dfaf1
4.9 kB Download
md5:f501e7b565775d6b47ca0852cb162a1a
1.8 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
December 8, 2023