64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

IPS 400 - Recent advances of large-scale data integration and meta-analysis

Category: IPS
Monday 17 July 2 p.m. - 3:40 p.m. (Canada/Eastern) (Expired) Room 208

View proposal detail

With the advances of technology and increases in computational speed, the need to analyze large-scale data has emerged. Multi-cohort, multi-source and multi-modal datasets often need to be combined for integrated clustering, increased statistical power, reduced biases of estimation of treatment or causal effects, among many other analytical purposes. Therefore, we propose to organize an invited session to bring leading researchers together to share their recent research advances and discuss ideas and important issues in the fields. In this session, speakers will be invited to present their latest work in data integrative analysis and meta-analysis, providing an interaction and brainstorming opportunity for researchers in these two fast evolving and growing fields. 

Presenters are from United States, Hong Kong and Taiwan.  

1. Presenter: Dr. Fangda Song  
Title: Integrating multiple single-cell RNA-seq datasets for differential inference
 
When performing joint cell type clustering by integrating multiple Single-cell RNA-seq (scRNA-seq) datasets, people always ignore the treatment or biological conditions of the cells. Here, we propose a Bayesian hierarchical model to rigorously quantify the treatment effects on both cellular compositions and cell-type-specific gene expression levels for scRNA-seq data and an algorithm to handle the large number of cells. Application of our proposed method to pancreatic scRNA-seq datasets demonstrates that considering the biological conditions further boosts the clustering accuracy and identifies cell-type-specific and condition-specific differentially expressed genes.
 
2. Presenter: George Tseng (Prof., University of Pittsburgh)
Title:  On p-value combination of independent and frequent signals: asymptotic efficiency and Fisher ensemble
 
Here we focus on revisiting a classical scenario of p-value combination, combining a small number of p-values while the sample size generating each p-value goes to infinity. We evaluate many traditional and recently developed modified Fisher's methods to investigate their asymptotic efficiencies and finite-sample performance and concludes that Fisher and adaptively weighted Fisher method have top performance and complementary advantages across different proportions of signals. Then we propose a so-called Fisher ensemble method that combines these two Fisher-related methods using the harmonic mean ensemble approach and show that it achieves asymptotic Bahadur optimality and integrates the strengths of both methods in simulations. We subsequently extend Fisher ensemble for concordant effect size directions. A transcriptomics meta-analysis application confirms the theoretical and simulation conclusions.
 
 
3. Presenter: Ming-Chieh Shih (Prof., National Dong-Hua University)
Title: Validation of observational data evidence for treatment effects with randomized clinical trials.
 
Randomized clinical trials provide unbiased treatment effect estimates by design; however, the inclusion criteria of randomized clinical trials are often limited. Therefore, for certain target population, one must turn to observational studies to infer treatment effects, at the risk of bias within these observations. Here we propose a test that validates the conditional average treatment effect estimates from observational studies using randomized clinical trial data based on a maximum moment restrictions approach. We show that this test has asymptotic power of one and demonstrate its properties using real-world data from Women's Health Initiative.
 
4. Presenter: Dr. Chung Chang
Title: Heavy-tailed distribution for combining dependent p-values with asymptotic robustness
The issue of combining individual p-values to aggregate multiple small effects is a longstanding statistical topic. Many classical methods are designed for combining independent and frequent signals using the sum of transformed p-values with the transformation of light-tailed distributions, in which Fisher’s method and Stouffer’s method are the most well-known. In recent years, advances in big data promoted methods to aggregate correlated, sparse and weak signals; among them, Cauchy and harmonic mean combination tests were proposed to robustly combine p-values under unspecified dependency structure. Both of the proposed tests are the transformation of heavy-tailed distributions for improved power with the sparse signal. Motivated by this observation, we investigate the transformation of regularly varying distributions, which is a rich family of heavy-tailed distribution, to explore the conditions for a method to possess robustness to dependency and optimality of power for sparse signals. We show that only an equivalent class of Cauchy and harmonic mean tests has sufficient robustness to dependency in a practical sense. Moreover, a practical guideline to adjust significance level under dependency is provided based on our theorem and simulation. We also show an issue caused by large negative penalty in the Cauchy method and propose a simple, yet practical modification with fast computation. Finally, we present simulations and apply to a neuroticism GWAS application to verify the discovered theoretical insights.
 

Organiser: Dr Chung Chang 

Chair: Dr Chung Chang 

Speaker: Dr. Fangda Song  

Speaker: Prof. George C. Tseng 

Speaker: Dr Chung Chang 

Speaker: Prof. Ming-Chieh Shih 

Good to know

This conference is currently not open for registrations or submissions.