Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 15, 2009 | Published
Journal Article Open

Identifying novel constrained elements by exploiting biased substitution patterns

Abstract

Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection.

Additional Information

© 2009 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Published: 27 May 2009. We thank M. Kamal, T. Mikkelsen and Or Zuk for insightful comments on the article; M. Kellis, P. Kheradpour, E. Lander, M. Lin, K. Lindblad-Toh, M. Rasmussen and A. Stark for helpful discussions. Funding: UCI (to XX); NHGRI (to MG, MC and MZ); Israel Science Foundation (to NF). Availability: The algorithms are implemented in a Java software package, called SiPhy, freely available at http://www.broadinstitute.org/science/software/. Conflict of interest: none declared.

Attached Files

Published - btp190.pdf

Files

btp190.pdf
Files (396.4 kB)
Name Size Download all
md5:1683ee1c0f0cef6cf6e5558ae7ef8531
396.4 kB Preview Download

Additional details

Created:
August 20, 2023
Modified:
October 23, 2023