64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Integrating multiple single-cell RNA-seq datasets for differential inference

Author

FS
Fangda SONG

Co-author

  • K
    Kelvin Y.Yip
  • Y
    Yingying Wei

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: IPS Abstract

Keywords: bayesian hierarchical model, data_integration, genomics

Session: IPS 400 - Recent advances of large-scale data integration and meta-analysis

Monday 17 July 2 p.m. - 3:40 p.m. (Canada/Eastern)

Abstract

Single-cell RNA-seq (scRNA-seq) data are well known for severe batch effects, therefore, there has been very active research on how to integrate multiple scRNA-seq datasets for joint cell type clustering in the past several years. However, when performing joint clustering across scRNA-seq datasets, people always ignore the treatment or biological conditions of the cells--whether the cells came from a healthy individual or a case sample. Thus, rigorous statistical methods to integrate and compare scRNA-seq data collected from different conditions are still lacking. Here, we propose a Bayesian hierarchical model to rigorously quantify the treatment effects on both cellular compositions and cell-type-specific gene expression levels for scRNA-seq data. We implement a highly scalable algorithm to handle the large number of cells. Application of our proposed method to four pancreatic scRNA-seq datasets demonstrates that considering the biological conditions of samples in the analysis further boosts the clustering accuracy as compared to traditional analysis pipelines for scRNA-seq data and identifies cell-type-specific and condition-specific differentially expressed genes.