Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published January 6, 2021 | Submitted
Report Open

Factorized linear discriminant analysis and its application in computational biology

Abstract

A fundamental problem in computational biology is to find a suitable representation of the high-dimensional gene expression data that is consistent with the structural and functional properties of cell types, collectively called their phenotypes. This representation is often sought from a linear transformation of the original data, for the reasons of model interpretability and computational simplicity. Here we propose a novel method of linear dimensionality reduction to address this problem. This method, which we call factorized linear discriminant analysis (FLDA), seeks a linear transformation of gene expressions that varies highly with only one phenotypic feature and minimally with others. We further leverage our approach with a sparsity-based regularization algorithm, which selects a few genes important to a specific phenotypic feature or feature combination. We illustrated this approach by applying it to a single-cell transcriptome dataset of Drosophila T4/T5 neurons. A representation from FLDA captured structures in the data aligned with phenotypic features and revealed critical genes for each phenotype.

Attached Files

Submitted - 2010.02171.pdf

Files

2010.02171.pdf
Files (3.7 MB)
Name Size Download all
md5:32aee2c903a9c6860ab9b1e9da0614b9
3.7 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
December 22, 2023