Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 2010 | Supplemental Material + Published
Journal Article Open

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Abstract

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

Additional Information

© The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Received June 2, 2010; Revised July 11, 2010; Accepted July 15, 2010. Staff at WormBase are gratefully acknowledged. The Austrian Ministry for Science and Research approved the animal experimentation (BMWF-68.205/0103-II/10b/2008) and is also acknowledged. C.C. is in receipt of an International Postgraduate Research Scholarship from the Australian Government and a fee-remission scholarship from The University of Melbourne as well as the Clunies Ross (2008) and Sue Newton (2009) awards from the School of Veterinary Science of the same university. Funding: The Australian Research Council; Australian Academy of Science; the Australian-American Fulbright Commission (to R.B.G.); National Human Genome Research Institute and National Institutes of Health (to M.M.).

Attached Files

Published - Cantacessi2010p11548Nucleic_Acids_Res.pdf

Supplemental Material - NAR-01194-Met-N-2010_R1_Legends_to_supplementary_material.doc

Supplemental Material - Supplementary_Figure_1.ppt

Supplemental Material - Supplementary_Figure_2_R1.ppt

Supplemental Material - Supplementary_Figure_3_R1.ppt

Supplemental Material - Supplementary_Figure_4_R1.ppt

Supplemental Material - Supplementary_data_file_1_R1.doc

Supplemental Material - Supplementary_data_file_2.xls

Supplemental Material - Supplementary_data_file_3.xls

Files

Cantacessi2010p11548Nucleic_Acids_Res.pdf
Files (9.0 MB)
Name Size Download all
md5:bff1e93feb192ba65ec9d3aee56283af
452.6 kB Download
md5:30bda22fd9b28696d2e27c439b10651e
85.5 kB Download
md5:7a07eb5e5562aeec1111d33ab04ab963
35.8 kB Download
md5:7a45dce71b2845a4807f01087433194d
379.9 kB Download
md5:4258703beaa2562dabc2fc04ebef41a0
178.4 kB Preview Download
md5:570bb339de9f5fbe379d24647b328090
47.6 kB Download
md5:956217f8c6b2e596b1bd3aeef8e4e69e
1.6 MB Download
md5:3d595a49f1cf07cfd997b48bdfaadc7a
457.2 kB Download
md5:2f40ea57192b2d98da9b914dec46a756
5.7 MB Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023