Factorized linear discriminant analysis and its application in computational biology
- Creators
-
Qiao, Mu
-
Meister, Markus
Abstract
A fundamental problem in computational biology is to find a suitable representation of the high-dimensional gene expression data that is consistent with the structural and functional properties of cell types, collectively called their phenotypes. This representation is often sought from a linear transformation of the original data, for the reasons of model interpretability and computational simplicity. Here we propose a novel method of linear dimensionality reduction to address this problem. This method, which we call factorized linear discriminant analysis (FLDA), seeks a linear transformation of gene expressions that varies highly with only one phenotypic feature and minimally with others. We further leverage our approach with a sparsity-based regularization algorithm, which selects a few genes important to a specific phenotypic feature or feature combination. We illustrated this approach by applying it to a single-cell transcriptome dataset of Drosophila T4/T5 neurons. A representation from FLDA captured structures in the data aligned with phenotypic features and revealed critical genes for each phenotype.
Attached Files
Submitted - 2010.02171.pdf
Files
Name | Size | Download all |
---|---|---|
md5:32aee2c903a9c6860ab9b1e9da0614b9
|
3.7 MB | Preview Download |
Additional details
- Alternative title
- Factorized linear discriminant analysis for phenotype-guided representation learning of neuronal gene expression data
- Eprint ID
- 107327
- Resolver ID
- CaltechAUTHORS:20210105-133427535
- Created
-
2021-01-06Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering (BBE)