64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Classifying Respondent Comments from the 2021 Canadian Census of Population using Machine Learning Methods

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: IPS Abstract

Keywords: classification, nlp

Session: IPS 200 - Challenges of Natural Language Processing techniques in official statistics

Tuesday 18 July 10 a.m. - noon (Canada/Eastern)

Abstract

To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Models were trained to classify comments by subject matter area —such as education, labour or demography— as well as technical issues and privacy concerns. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents from around 2 million comments, was directed to the appropriate subject matter analysts for them to analyze post-collection in a near real time manner.