Comparative validation of the D. melanogaster modENCODE transcriptome annotation
Abstract
Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community.
Additional Information
© 2014 Chen et al. Published by Cold Spring Harbor Laboratory Press. Freely available online through the Genome Research Open Access option. This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. Published in Advance July 1, 2014. Received April 29, 2013; accepted in revised form December 2, 2013. We thank modENCODE and laboratory members for discussion. This research was supported by the Intramural Research Programs of the National Institutes of Health, NIDDK (DK015600-18 to B.O.) and by the extramural National Institutes of Health program (1ROIGM082843 to A.K.; U01HB004271 to S.E.C.). This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Maryland (http://biowulf.nih.gov).Attached Files
Published - Genome_Res.-2014-Chen-1209-23.pdf
Supplemental Material - Supp_File_S1_CAGE_Dmel_FM_carcass.bed
Supplemental Material - Supp_File_S2_CAGE_Dmel_ovary.bed
Supplemental Material - Supp_File_S3_CAGE_Dmel_testis_rep1.bed
Supplemental Material - Supp_File_S4_CAGE_Dmel_testis_rep2.bed
Supplemental Material - Supp_File_S5_CAGE_Dpse_F_carcass.bed
Supplemental Material - Supp_File_S6_CAGE_Dpse_M_carcass.bed
Supplemental Material - Supp_File_S7_CAGE_Dpse_ovary.bed
Supplemental Material - Supp_File_S8_CAGE_Dpse_testes.bed
Supplemental Material - Supplemental_Material.docx
Supplemental Material - Table_S10_intergenic_validation.xls
Supplemental Material - Table_S12_promoter_summary.xls
Supplemental Material - Table_S13_splice_junction_validation.xls
Supplemental Material - Table_S15_splicing_events.xls
Supplemental Material - Table_S16_editing_validation.xls
Supplemental Material - Table_S4_sample_identifiers.xls
Supplemental Material - Table_S5_first_CDS_RPKM.xls
Supplemental Material - Table_S6_CDS_exon_validation.xls
Supplemental Material - Table_S7_UTR_validation.xls
Supplemental Material - Table_S8_ncRNA_validation.xls
Supplemental Material - Table_S9_intron_validation.xls
Files
Name | Size | Download all |
---|---|---|
md5:a94486a5e78f02f09b560e86845eca67
|
2.4 MB | Download |
md5:7ef21382a4a7fd3cbdca655b9cb92a4a
|
288.3 kB | Download |
md5:badac0721759f28842362f3fc7ad19f4
|
24.6 MB | Download |
md5:a03ccd1b584617f4711006a309f3e85b
|
11.7 MB | Download |
md5:427b06767cccd10b8677f3d52ef9e89c
|
308.2 kB | Download |
md5:abadf0123ef0a39426d7c21278a5d95f
|
362.1 kB | Download |
md5:9f9b0a0b8ac4f1f1f2a2530cc9ef05a9
|
264.7 kB | Download |
md5:a1b02e7d4688e0c56458b479a2a5c211
|
280.6 kB | Download |
md5:c60c21537d1a26775686fce23abc2103
|
87.2 MB | Download |
md5:47e4da98e2950a15235fc5e1fbee7f0c
|
5.3 MB | Preview Download |
md5:139c1d53cc2a9e049dec037091a21d8f
|
336.8 kB | Download |
md5:4e9926031dc815fc9e7319d1125eef39
|
283.6 kB | Download |
md5:2a9d0c0a61025fe544d626626ff1441d
|
14.0 MB | Download |
md5:cabac86292ecd7cfdb516443ca04a88c
|
270.1 kB | Download |
md5:299b007f8bcb529044460725bed83841
|
601.9 kB | Download |
md5:2def25784a39d23f4313d34255c2c4b3
|
2.5 MB | Download |
md5:be6fcd46847c0737b5a12c5d9cf05123
|
280.8 kB | Download |
md5:d6666296c18056cee3c41d1cc577fe27
|
91.2 MB | Download |
md5:be295e1a5d5c3f4cd61e8c2e516ad2ed
|
572.4 kB | Download |
md5:70748810322240302d27f9ecb66e3185
|
53.2 kB | Download |
md5:4344d15c4af01362cb651cad12dcc1c8
|
55.9 MB | Download |
Additional details
- PMCID
- PMC4079975
- Eprint ID
- 47692
- Resolver ID
- CaltechAUTHORS:20140731-083814156
- NIH
- DK015600-18
- NIH
- 1ROIGM082843
- NIH
- U01HB004271
- Created
-
2014-07-31Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field