Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published July 29, 2010 | Published
Journal Article Open

Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study

Abstract

Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a "synthetic association study" in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes.

Additional Information

© 2010 Levin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: February 25, 2010; Accepted: June 8, 2010; Published: July 29, 2010. Author Contributions: Conceived and designed the experiments: TL AMG MBE. Performed the experiments: TL AMG. Analyzed the data: TL AMG. Contributed reagents/materials/analysis tools: TL AMG. Wrote the paper: TL AMG RB MBE. Supervised the research: MBE LP RB. The authors have no support or funding to report. Competing interests: MBE is a member of the PLoS Board of Directors. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.

Attached Files

Published - journal.pone.0011645.PDF

Files

journal.pone.0011645.PDF
Files (643.2 kB)
Name Size Download all
md5:c87677b172a40db2d8f72723d08d822b
643.2 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 24, 2023