Expression reflects population structure
Abstract
Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.
Additional Information
© 2018 Brown et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: July 30, 2018; Accepted: November 20, 2018; Published: December 19, 2018. The authors would like to thank Shannon McCurdy for invaluable feedback on this manuscript. LP and NB were funded by National Institutes of Health grant R01HG008164. LP was also funded by National Institutes of Health grant DK094699. BB was funded by the National Science Foundation Graduate Research Fellowship Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Data Availability: GEUVADIS project RNA-seq reads are available at the European Nucleotide Archive (accession number ENA: ERP001942). 1000 genomes genotypes are available from cog-genomics (https://www.cog-genomics.org/plink/1.9/resources#1kg). Analysis software are available on github (https://github.com/pachterlab/PCCA/). Gencode v27 transcripts are available at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.pc_transcripts.fa.gz. Gencode v27 GTF is available at ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz. The authors have declared that no competing interests exist.Attached Files
Published - journal.pgen.1007841.pdf
Submitted - 364448.full.pdf
Supplemental Material - journal.pgen.1007841.s001.pdf
Supplemental Material - journal.pgen.1007841.s002.png
Supplemental Material - journal.pgen.1007841.s003.png
Supplemental Material - journal.pgen.1007841.s004.png
Supplemental Material - journal.pgen.1007841.s005.png
Supplemental Material - journal.pgen.1007841.s006.png
Supplemental Material - journal.pgen.1007841.s007.png
Supplemental Material - journal.pgen.1007841.s008.png
Supplemental Material - journal.pgen.1007841.s009.png
Files
Name | Size | Download all |
---|---|---|
md5:3b6b32881ede4aa3ba523597ff7824eb
|
200.9 kB | Preview Download |
md5:ca3b5a6cfd5b4b64eaf33b321d520f52
|
165.5 kB | Preview Download |
md5:f213869b161bce9d35ddb03711963336
|
428.7 kB | Preview Download |
md5:7b4a27c1029f1cf979313f5d30592437
|
1.6 MB | Preview Download |
md5:8a8a72936e0c66eeeb375097b9c7deff
|
329.3 kB | Preview Download |
md5:98e6634008683d982174960c1ec2a413
|
193.4 kB | Preview Download |
md5:70ebb75f6cda9d97f2070f3ff660adc6
|
459.5 kB | Preview Download |
md5:2483c0994d3574b4408684b2eb016f18
|
1.6 MB | Preview Download |
md5:2e02836de1cd3beeb7e469fd02b3da7d
|
1.0 MB | Preview Download |
md5:c9d3faf1ec1803592862bb3a13d4bf43
|
1.1 MB | Preview Download |
md5:db5f4ce12836d142da129b8915c01b3c
|
502.5 kB | Preview Download |
Additional details
- PMCID
- PMC6317812
- Eprint ID
- 90174
- Resolver ID
- CaltechAUTHORS:20181008-162020262
- R01 HG008164
- NIH
- DK094699
- NIH
- NSF Graduate Research Fellowship
- Created
-
2018-10-09Created from EPrint's datestamp field
- Updated
-
2023-06-01Created from EPrint's last_modified field