IPS 202 - Advances in Bayesian Hierarchical Modeling and Variable Selection for Complex Data

Category: IPS

Tuesday 18 July 2 p.m. - 3:40 p.m. (Canada/Eastern) (Expired) Room 101

A great amount of literature has shown that development of Bayesian hierarchical modeling and variable selection techniques can enable efficient and interpretable data analysis. However, when involving complex data (ultrahigh-dimensional data, big data, data of small sample sizes), these tasks become highly challenging.

This session presents recent advances in this important area, with four talks covering theory, methodology, and application. It includes four confirmed speakers and one chair, covering different career stages, gender, and countries (2 assistant, 1 associate and 2 full professors, 2 countries): (i) Dr. Sameer Deshpande, Assistant Professor in Statistics, University of Wisconsin–Madison, USA; Talk title: “The multivariate spike-and-slab LASSO: Algorithms, asymptotics, and inference” (ii) Dr. Gang Han, Associate Professor in Biostatistics, School of Public Health, Texas A&M University, USA; Talk title: “Bayesian-frequentist Hybrid Inference in Applications with Small Sample Sizes” (iii) Dr. Johan Lim, Professor in Statistics, Seoul National University, Seoul, South Korea; Talk title: “Variable Selection in Bayesian Multiple Instance Regression using Shotgun Stochastic Search” (iv) Dr. Sherry Wang, Professor in Statistics and Data Science, Southern Methodist University, USA; Talk title: “Bayesian Empirical Likelihood with Dual Penalties for Ultra-high Dimensional Data”.

The chair will be Dr. Yusen Xia in Business and Data Science from Institute for Insights, Georgia State University, USA. We thank the SPC for taking time to review our proposal and wish you a very successful and productive ISI2023!

1. The multivariate spike-and-slab LASSO: Algorithms, asymptotics, and inference

We consider multivariate linear regression models to predict q correlated responses (of possibly mixed type) using a common set of p predictors. Our interest lies not only in determining whether a particular predictor has a direct or marginal effect on each response but also in understanding the residual dependence between the outcomes. We propose a Bayesian procedure for such determination using continuous spike-and-slab priors. Rather than relying on a stochastic search through the high-dimensional parameter space, we develop an Expectation Conditional Maximization algorithm targeting modal estimates of the matrix of regression coefficients and residual precision matrix. A key feature of our method is the model of our uncertainty about which parameters are negligible. We further derive posterior contraction rates and discuss several strategies for quantifying posterior uncertainty.

2. Bayesian-frequentist Hybrid Inference in Applications with Small Sample Sizes

The Bayesian-frequentist hybrid model and associated inference can combine the advantages of both Bayesian and frequentist methods and avoid their limitations. However, except for few special cases in existing literature, the computation under the hybrid model is generally non-trivial or even unsolvable. We develop a computation algorithm for hybrid inference under any general loss functions. Simulation and data examples demonstrate that hybrid inference can improve upon frequentist inference by incorporating valuable prior information, and also improve Bayesian inference based on non-informative priors where the latter leads to biased estimates for the small sample sizes used in inference.

3. Variable Selection in Bayesian Multiple Instance Regression using Shotgun Stochastic Search

In multiple instance learning (MIL), each sample has a set of covariate vectors (instances) individually observed, but has only one response variable shared by the

instances. We propose a Bayesian modeling to address two selection problems. One is the instance selection which finds out the instances with capability

of explaining the response. The other is the variable selection which searches for the covariates related with the response. For this, we adopt the stochastic search variable selection (George and McCulloch (1993)) to identify the best subset of explanatory variables, which has not drawn attention in MIL literature before. Our novel model simultaneously solves the two selection tasks by modifying the shotgun stochastic search algorithm (Hans et al. (2007)), which enables Monte Carlo Markov Chain to explore extensive discrete space more efficiently.

4. Bayesian Empirical Likelihood with Dual Penalties for Variable Selection in Ultra-high Dimensional Data

In the semi-parametric domain, under the ultra-high dimensional setting, we propose a Bayesian empirical likelihood method for variable selection, which requires no distributional assumptions but only estimating equations. Motivated by Chang et al. (2018) on doubly penalized empirical likelihood (EL), we introduce priors to regularize both regression parameters and Lagrange multipliers associated with the estimating equations, to promote sparse learning. We show theoretically that the posterior consistency and the variable selection consistency are ensured under some mild conditions. We further develop an efficient Markov chain Monte Carlo (MCMC) sampling algorithm based on the active set idea, which has been proved to be useful in reducing computational burden.

Organiser: Prof. Xinlei Wang
Chair: Prof. Yusen Xia
Speaker: Prof. Xinlei Wang
Speaker: Gang Han
Speaker: Johan Lim
Speaker: Sameer Deshpande