Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 1, 2003 | Published
Journal Article Open

SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model

Abstract

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1) generalized hidden Markov models, which have been used previously for gene finding, and (2) pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus andPlasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.

Additional Information

© 2003 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). Received May 13, 2002. Accepted December 3, 2002. We thank Terry Speed and David Kulp for helpful suggestions and support, and James Harley Gorrell for technical computing advice. Marina Alexandersson was supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education. This work was partially supported by NIH grant R01 HG02362-01. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

Attached Files

Published - 496.full.pdf

Files

496.full.pdf
Files (237.1 kB)
Name Size Download all
md5:5ca7587e795546cf091fa3b281dcc642
237.1 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 24, 2023