64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

IPS 107 - New statistical methods for longitudinal microbiome data

Category: IPS
Tuesday 18 July 2 p.m. - 3:40 p.m. (Canada/Eastern) (Expired) Room 213

View session detail

The human microbiome plays a critical role in human health and disease. Modern technology enables the cost-effective acquisition of high-throughput microbiome data. More recently, longitudinal measurements are often obtained in microbiome studies, allowing us to gain mechanistic insights into microbial dynamics and systems. However, microbiome data have unique features such as compositionality, zero inflation, high skewness, and hierarchical structure, which pose significant challenges to the statistical analysis. In addition, the longitudinality (repeated measurements with a temporal order) makes the analysis even more challenging. Despite the burgeoning resources on longitudinal microbiome studies (e.g., iHMP), statistical methods for such longitudinal, high-dimensional data are still in their infancy. There is a pressing need for more rigorous methodologies that can better harness the power of rich data. The proposed session consists of a group of top-notch international researchers who are actively working on longitudinal microbiome data analysis. The talks will discuss important challenges in this area and propose novel methods for cluster analysis, association analysis, and identifiability assessment for longitudinal microbiome data. The session will have a broad impact on both methodology and application areas. With the new methods, we may better understand the microbial dynamic and its health implications, which promise to revolutionize diagnosis and prognosis and lead to therapeutic breakthroughs. The session will raise strong awareness in the audience and stimulate research interest in this new research area.

Talk1: Statistical challenges in longitudinal microbiome data analysis.
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems, however current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time; and (3) network modelling to identify temporal relationships between microorganisms. We have explored the strengths and limitations of current methods to fulfill these objectives, and compared different methods in simulation and case studies for objectives (1) and (2). We will also present our current methodological developments for objectives (2) and (3).
Talk2: Addressing model identifiability in the analysis of longitudinal sequence count data
A common statistical problem is inference from positive-valued multivariate measurements where the scale (e.g., sum) of the measurements are not representative of the scale (e.g., total size) of the system being studied. This situation is common in the analysis of modern sequencing data. The field of Compositional Data Analysis (CoDA) axiomatically states that analyses must be invariant to scale. Yet, many scientific questions posed in the analysis of longitudinal studies rely on the unmeasured system scale for identifiability. Instead, many existing tools make a wide variety of assumptions to identify models, often imputing the unmeasured scale. Here, we analyze the theoretical limits on inference given these data and formalize the assumptions required to provide principled scale reliant inference. Using statistical concepts such as consistency and calibration, we show that we can provide guidance on how to make scale reliant inference from these data. We prove that the Frequentist ideal is often unachievable and that existing methods can demonstrate bias and a breakdown of Type-I error control. We introduce scale simulation estimators and scale sensitivity analysis as a rigorous, flexible, and computationally efficient means of performing scale reliant inference.
Talk3: Dynamic clustering and heterogeneity pursuit with longitudinal microbiome data
 Detecting sample clusters and identifying sources of heterogeneity (i.e., the distinctive microbial components or phenotypic features that differentiate the clusters) play a critical role in unraveling the relationship between microbial profiles and heterogeneous health states. We develop Dirichlet-Multinomial (DM) mixture models with heterogeneity pursuit to cluster microbiome profiles and pinpoint key taxa with distinctive abundances across clusters. We further adapt the heterogeneity pursuit method to the longitudinal setting through a hidden Markov setup to simultaneously identify latent microbial states and characterize the dynamics of state transitions. An application with iHMP data demonstrates the promise of the proposed framework for deciphering the heterogeneity and dynamics of microbiome.
Talk4: Strain genetic association studies within the human microbiome

 

Organiser: Dr Gen Li 

Chair: Justin Silverman 

Speaker: Dr Saritha Kodikara  

Speaker: Justin Silverman 

Speaker: Curtis Huttenhower  

Good to know

This conference is currently not open for registrations or submissions.