64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

An empirically adjusted weighted ordered p-values method for meta-analysis in large-scale simultaneous hypothesis testing

Author

SS
Sinjini Sikdar

Co-author

  • W
    Wimarsha Jayanetti

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: meta-analysis

Session: CPS 57 - Statistical testing

Tuesday 18 July 4 p.m. - 5:25 p.m. (Canada/Eastern)

Abstract

Recent developments in high throughput genomic assays have opened up the possibility of testing hundreds and thousands of genes simultaneously. With the availability of the vast amounts of public databases, one can easily access multiple genomic study results where each study comprises of significance testing results of thousands of genes. Researchers, nowadays, tend to combine this genomic information from these multiple studies in the form of a meta-analysis. Most traditional meta-analysis methods aim at combining summary results to find signals in at least one of the studies. However, often the goal is to identify genes that are differentially expressed in a consistent pattern across multiple studies. Recently, a meta-analysis method based on the summaries of weighted ordered p-values (WOP) has been proposed that aim at detecting significance in a majority of studies. In the presentation, I will discuss how adherence to the standard null distributional assumptions of the WOP meta-analysis method can lead to incorrect significance testing results. To overcome this, I will propose a robust meta-analysis method that performs an empirical modification of the individual p-values before combining them through the WOP approach. Through various simulation studies, I will show that my proposed meta-analysis method outperforms the WOP method in terms of accurately identifying the truly significant set of genes by reducing false discoveries, especially in the presence of unobserved confounding variables. I will illustrate the application of my method on real genomic datasets.