Published February 1, 2022 | Supplemental Material + Published
Journal Article Open

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study

An error occurred while generating the citation.

Abstract

Motivation: Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome. Results: To address this problem, we developed a novel clustering approach called 'metagenomic clustering by reference library' (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed 'signatures', are iteratively clustered in a greedy fashion, retaining at each step the reference genes yielding the lowest E values, and terminating when signatures of remaining reference genes have a minimal overlap. The outcome of this computation is a non-redundant list of reference genes homologous to minimally overlapping sets of contigs, representing potential candidates for gene families present in the metagenome. Unlike metagenomic clustering methods, there is no need for contigs to overlap to be associated with a cluster, enabling MCRL to draw on more information encoded in the metagenome when computing tentative gene families. We demonstrate how MCRL can be used to extract candidate viral gene families from an oral metagenome and an oral virome that otherwise could not be determined using standard approaches. We evaluate the sensitivity, accuracy and robustness of our proposed method for the viral case study and compare it with existing analysis approaches.

Additional Information

© The Author(s) 2021. Published by Oxford University Press. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model). Received: 29 October 2019; Revision received: 03 October 2021; Editorial decision: 05 October 2021; Accepted: 07 October 2021; Published: 12 October 2021; Corrected and typeset: 08 December 2021. We wish to thank our reviewers for their thoughtful comments. This work was supported by the National Health Institute Director's Pioneer Award and the National Health Institute's Eureka [R01-GM098465]. Conflict of Interest: none declared.

Attached Files

Published - btab703.pdf

Supplemental Material - btab703_supplementary_data.zip

Files

btab703.pdf
Files (26.9 MB)
Name Size Download all
md5:7eeaec901887829a3c4d2a487bd07efb
1.2 MB Preview Download
md5:c10e894de3329d10228cc2fd0d290f12
25.7 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
December 22, 2023