MAVID: Constrained ancestral alignment of multiple sequences
- Creators
- Bray, Nicolas
-
Pachter, Lior
Abstract
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.
Additional Information
© 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). Accepted November 17, 2003. Received September 10, 2003. ograms. We thank Von Bing Yap for helping with the evolutionary models used in MAVID. Thanks to Ingileif Brynd's Hallgr'msdóttir for her help throughout the project and for her comments on the final manuscript. The data used in the multiple alignment of the CFTR region was generated by the NIH Intramural Sequencing Center (www.nisc.nih.gov), and was used subject to their 6-mo hold policy. The HIV sequences were downloaded from the HIV database (hiv-web.lanl.gov). Thanks also to the Rat Sequencing Consortium, both for providing the rat sequence to align, and for facilitating helpful collaborations and discussions. Finally, we thank the anonymous reviewers for their insightful comments and suggestions. This work was partially supported by funding from the NIH (grant R01-HG02362-01) and the Berkeley PGA grant from the NHLBI. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.Attached Files
Published - 693.full.pdf
Submitted - 0311018.pdf
Files
Name | Size | Download all |
---|---|---|
md5:31b1299f135d6f3863c542a9d16a3ddb
|
414.4 kB | Preview Download |
md5:0b2c015c263f3380e98da5840c654504
|
240.0 kB | Preview Download |
Additional details
- PMCID
- PMC383315
- Eprint ID
- 74826
- Resolver ID
- CaltechAUTHORS:20170307-074220313
- NIH
- R01-HG02362-01
- National Heart, Lung, and Blood Institute
- Created
-
2017-03-07Created from EPrint's datestamp field
- Updated
-
2021-11-11Created from EPrint's last_modified field