Planning and Prediction: Modelling Self-Response to the Canadian Census of Population via Survival Analysis with Competing Risks
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Abstract
Keywords: census, competing risks, competing_risks, predictive modelling, predictive_modeling, self-response, survival analysis, time-dependent model validation, time-dependent_model_validation
Session: CPS 22 - Survival statistics
Monday 17 July 4 p.m. - 5:25 p.m. (Canada/Eastern)
For the past two censuses of the Canadian population (2016 and 2021), Statistics Canada used a microsimulation model during the data collection period to dynamically forecast the end-of collection response rates and non-response follow-up costs. The goal of these weekly forecasts was to evaluate proposed collection strategies and ensure the judicious use of resources. One of the critical components of this microsimulation model is the self-response (SR) process, which simulates SR, defined as a dwelling submitting a census questionnaire or a request for the means to complete a questionnaire, without intervention from an agent of Statistics Canada. Specifically, the microsimulation model takes as input daily estimated probabilities of SR and collection mode of SR, at the dwelling level, for all remaining days of collection.
Previous work proposed a parametric discrete-time survival model for the estimation of SR probabilities by EQ (electronic questionnaires). In practice, however, there are multiple SR outcomes and collection modes, which create many potential SR events that produce misleading conclusions if modelled independently from one another. In this presentation, we broaden the application of survival analysis concepts to census data by utilizing a competing risks framework. Precisely, several continuous and discrete survival models will be considered, all under a competing risks framework, that incorporate fixed and time-dependent predictors of SR (such as age, marital status, and impact of reminder letters). Furthermore, we borrow a time-series validation process based on forward validation, in which the validation procedure preserves the temporal order of observations and therefore is free from look-ahead bias. This validation is critical, as the final model will be put in production in a nowcasting context during the 2026 Census.
This presentation will start by a brief introduction of the Canadian Census and its microsimulation model. We will then present how survival analysis can be used to model SR events and justify the use of a competing risks framework. Subsequently, we will give an overview of the various models considered as well as how we determined the potential explanatory variables, with a particular focus on the impact of the various reminder letters, namely the wave methodology, as a time dependent covariate. Next, we will describe the model validation procedure and use the results and lessons learned to present the value and challenges of the use of survival analysis with competing risks to support production of high-quality official statistics.