Multi-context genetic modeling of transcriptional regulation resolves novel disease loci

Creators: Thompson, Mike; Gordon, Mary Grace; Lu, Andrew; Tandon, Anchit; Halperin, Eran; Gusev, Alexander; Ye, Chun Jimmie; Balliu, Brunilda; Zaitlen, Noah

Abstract

A majority of the variants identified in genome-wide association studies fall in non-coding regions of the genome, indicating their mechanism of impact is mediated via gene expression. Leveraging this hypothesis, transcriptome-wide association studies (TWAS) have assisted in both the interpretation and discovery of additional genes associated with complex traits. However, existing methods for conducting TWAS do not take full advantage of the intra-individual correlation inherently present in multi-context expression studies and do not properly adjust for multiple testing across contexts. We developed CONTENT— a computationally efficient method with proper cross-context false discovery correction that leverages correlation structure across contexts to improve power and generate context-specific and context-shared components of expression. We applied CONTENT to bulk multi-tissue and single-cell RNA-seq data sets and show that CONTENT leads to a 42% (bulk) and 110% (single cell) increase in the number of genetically predicted genes relative to previous approaches. Interestingly, we find the context-specific component of expression comprises 30% of heritability in tissue-level bulk data and 75% in single-cell data, consistent with cell type heterogeneity in bulk tissue. In the context of TWAS, CONTENT increased the number of gene-phenotype associations discovered by over 47% relative to previous methods across 22 complex traits.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. This version posted September 23, 2021. Code and data availability: Trained weights for the GTEx V7 dataset and our in-house single-cell RNAseq are available at TWAShub (http://twas-hub.org/). The CONTENT software is freely available at https://github.com/cozygene/CONTENT.We provide TWAS summary statistics for all three methods on both datasets (as well as an indicator of whether the association was hierarchical FDR-adjusted significant) at doi.org/10.5281/zenodo.5209239. Author contributions: NZ and BB conceived of the project and developed the statistical methods with MT. MT implemented the comparisons with simulated data with contributions from AT. MT, AL, and MGG, performed the analyses of the GTEx and CLUES data and additional analyses. MT implemented the software. MT, NZ, and BB wrote the manuscript, with significant input from EH, CJY, AG, MGG. AG prepared the online data resources. Conflicts of interest: CJY is a Scientific Advisory Board member for and holds equity in Related Sciences and ImmunAI. CJY is a consultant for and holds equity in Maze Therapeutics. CJY is a consultant for TReX Bio. CJY has received research support from Chan Zuckerberg Initiative, Chan Zuckerberg Biohub, and Genentech.

Attached Files

Submitted - 2021.09.23.461579v1.full.pdf

Files

2021.09.23.461579v1.full.pdf

Files (3.9 MB)

Name	Size	Download all
2021.09.23.461579v1.full.pdf md5:cef83db79ecbfc969403859ddfde2c00	3.9 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes