Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published April 2004 | Submitted + Published
Journal Article Open

MAVID: Constrained ancestral alignment of multiple sequences

Abstract

We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.

Additional Information

© 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). Accepted November 17, 2003. Received September 10, 2003. ograms. We thank Von Bing Yap for helping with the evolutionary models used in MAVID. Thanks to Ingileif Brynd's Hallgr'msdóttir for her help throughout the project and for her comments on the final manuscript. The data used in the multiple alignment of the CFTR region was generated by the NIH Intramural Sequencing Center (www.nisc.nih.gov), and was used subject to their 6-mo hold policy. The HIV sequences were downloaded from the HIV database (hiv-web.lanl.gov). Thanks also to the Rat Sequencing Consortium, both for providing the rat sequence to align, and for facilitating helpful collaborations and discussions. Finally, we thank the anonymous reviewers for their insightful comments and suggestions. This work was partially supported by funding from the NIH (grant R01-HG02362-01) and the Berkeley PGA grant from the NHLBI. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

Attached Files

Published - 693.full.pdf

Submitted - 0311018.pdf

Files

0311018.pdf
Files (654.4 kB)
Name Size Download all
md5:31b1299f135d6f3863c542a9d16a3ddb
414.4 kB Preview Download
md5:0b2c015c263f3380e98da5840c654504
240.0 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 24, 2023