Scaffolding a Caenorhabditis nematode genome with RNA-seq
Abstract
Efficient sequencing of animal and plant genomes by next-generation technology should allow many neglected organisms of biological and medical importance to be better understood. As a test case, we have assembled a draft genome of Caenorhabditis sp. 3 PS1010 through a combination of direct sequencing and scaffolding with RNA-seq. We first sequenced genomic DNA and mixed-stage cDNA using paired 75-nt reads from an Illumina GAII. A set of 230 million genomic reads yielded an 80-Mb assembly, with a supercontig N50 of 5.0 kb, covering 90% of 429 kb from previously published genomic contigs. Mixed-stage poly(A)+ cDNA gave 47.3 million mappable 75-mers (including 5.1 million spliced reads), which separately assembled into 17.8 Mb of cDNA, with an N50 of 1.06 kb. By further scaffolding our genomic supercontigs with cDNA, we increased their N50 to 9.4 kb, nearly double the average gene size in C. elegans. We predicted 22,851 protein-coding genes, and detected expression in 78% of them. Multigenome alignment and data filtering identified 2672 DNA elements conserved between PS1010 and C. elegans that are likely to encode regulatory sequences or previously unknown ncRNAs. Genomic and cDNA sequencing followed by joint assembly is a rapid and useful strategy for biological analysis.
Additional Information
© 2010 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). Received May 26, 2010; accepted in revised form August 24, 2010. Published in Advance October 27, 2010. We thank Robin Giblin-Davis for providing PS1010 in 1991, Oren Schaedel for use of his C. elegans L3 RNA-seq data, Todd Ciche and Karin Kiontke for advice on worm culture and RNA extractions, Henry Amrhein and Diane Trout for computational support, and Adler Dillman, Karin Kiontke, Adrienne Roeder, Hillel Schwartz, and Allyson Whittaker for comments on the manuscript. Sequencing was performed in the Millard and Muriel Jacobs Genetics and Genomics Laboratory at Caltech (I.A., L.S.). This work was supported by the Howard Hughes Medical Institute, with which P.W.S. is an Investigator, the Beckman Institute Functional Genomics Center, the Caltech Moore Cell Center, grants HG02223 and HG003162 from the National Human Genome Research Institute, and grant GM084389 from the National Institute of General Medical Sciences.Attached Files
Published - Mortazavi2010p12233Genome_Res.pdf
Supplemental Material - Mortazavi_FigS1.pdf
Supplemental Material - Mortazavi_FigS2.pdf
Supplemental Material - Mortazavi_Table_S6.xls
Supplemental Material - Supp_Material.doc
Supplemental Material - Supplemental_Files.zip
Files
Name | Size | Download all |
---|---|---|
md5:dbbcb0517282c67685e5cdf9fb8957e4
|
588.7 kB | Preview Download |
md5:5ae7ad7b49b665122a4143e4de463e3d
|
456.2 kB | Download |
md5:85b74d584aa505299434203272aa2ef9
|
55.8 kB | Preview Download |
md5:2e79209674641b0274c268b84387d906
|
21.9 kB | Preview Download |
md5:b6a6fc1be245547ff09514178d0d29c2
|
23.0 kB | Download |
md5:b3d3125087e41421226102dd7735c674
|
14.6 MB | Preview Download |
Additional details
- PMCID
- PMC2990000
- Eprint ID
- 21459
- Resolver ID
- CaltechAUTHORS:20101220-153301188
- Howard Hughes Medical Institute (HHMI)
- Caltech Beckman Institute
- Caltech Moore Cell Center
- NIH
- HG02223
- NIH
- HG003162
- NIH
- GM084389
- Created
-
2011-01-05Created from EPrint's datestamp field
- Updated
-
2021-11-09Created from EPrint's last_modified field