Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published May 15, 2020 | Supplemental Material + Published
Journal Article Open

Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning

Abstract

Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data set (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available.

Additional Information

© 2020 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Received 23 March 2020, Revised 1 May 2020, Accepted 6 May 2020, Available online 15 May 2020. This research was funded by grants from the National Health and Medical Research Council (NHMRC) of Australia and the Australian Research Council (ARC) to RBG, PKK and/or NDY. Other support to RBG was from the Melbourne Water. NDY was supported by a Career Development Fellowship, and PKK by an Early Career Research Fellowship from NHMRC. TLC was a recipient of a Research Training Program Scholarship from the Australian Government and is also supported by the Oswaldo Cruz Foundation (Fiocruz/Brazil). PWS was supported by U.S. National Institutes of Health grant U24-HG002223. CRediT authorship contribution statement: Tulio L. Campos: Conceptualization, Methodology, Software, Validation, Data curation, Writing - original draft, Visualization, Investigation, Writing - review & editing. Pasi K. Korhonen: Conceptualization, Supervision, Software, Validation, Visualization, Investigation, Writing - review & editing. Paul W. Sternberg: Visualization, Investigation, Writing - review & editing. Robin B. Gasser: Conceptualization, Supervision, Visualization, Investigation, Writing - review & editing. Neil D. Young: Conceptualization, Supervision, Visualization, Investigation, Writing - review & editing. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data and code availability: The data used herein, the code developed to perform the systematic ML approaches as well as information regarding software versions and attached libraries are available at: https://bitbucket.org/tuliocampos/essential_elegans. A static version linked to this publication is available at: https://doi.org/10.6084/m9.figshare.11533101.

Attached Files

Published - 1-s2.0-S2001037020302713-main.pdf

Supplemental Material - 1-s2.0-S2001037020302713-mmc1.zip

Files

1-s2.0-S2001037020302713-main.pdf
Files (10.0 MB)
Name Size Download all
md5:7bad374252a88cacce196ca79871e731
2.0 MB Preview Download
md5:f49eb44f75c9fdd2a000b5cdc1f92527
8.0 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
December 22, 2023