Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 28, 2023 | Submitted + Supplemental Material
Report Open

Predicting phenotype transition probabilities via conditional algorithmic probability approximations

Abstract

Unravelling the structure of genotype-phenotype (GP) maps is an important problem in biology. Recently, arguments inspired by algorithmic information theory (AIT) and Kolmogorov complexity have been invoked to uncover simplicity bias in GP maps, an exponentially decaying upper bound in phenotype probability with increasing phenotype descriptional complexity. This means that phenotypes with very many genotypes assigned via the GP map must be simple, while complex phenotypes must have few genotypes assigned. Here we use similar arguments to bound the probability P(x → y) that phenotype x, upon random genetic mutation, transitions to phenotype y. The bound is P(x → y)≲ 2^(−aK˜(y|x)−b), where K˜(y|x) is the estimated conditional complexity of y given x, quantifying how much extra information is required to make y given access to x. This upper bound is related to the conditional form of algorithmic probability from AIT. We demonstrate the practical applicability of our derived bound by predicting phenotype transition probabilities (and other related quantities) in simulations of RNA and protein secondary structures. Our work contributes to a general mathematical understanding of GP maps, and may also facilitate the prediction of transition probabilities directly from examining phenotype themselves, without utilising detailed knowledge of the GP map.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. This project has been partially supported by Gulf University for Science and Technology under project code: ISG — Case grant number 263301 and a Summer Faculty Fellowship (both awarded to KD). This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/T022159/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk). Data availability: The data sets generated during and analysed during the current study are available from the corresponding author(s) on request. Author contributions: Conceived the study: KD, JN, SA, AL. Analytic calculations: KD. Simulations and data analysis: KD, JN. Wrote the paper: KD, JN, SA, AL. The authors have declared no competing interest.

Attached Files

Submitted - 2022.09.21.508902v2.full.pdf

Supplemental Material - media-1.pdf

Files

2022.09.21.508902v2.full.pdf
Files (1.4 MB)
Name Size Download all
md5:ae267c1a9a911afb43dacba80a62c87a
766.4 kB Preview Download
md5:1535e5b69524d0986bce7ab90cb0e8e0
673.4 kB Preview Download

Additional details

Created:
August 20, 2023
Modified:
October 18, 2023