64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

National Sentiment Statistics through Social Media: Obstacles and Opportunities

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: IPS Abstract

Keywords: bias, social networks;, survey

Session: IPS 200 - Challenges of Natural Language Processing techniques in official statistics

Tuesday 18 July 10 a.m. - noon (Canada/Eastern)

Abstract

Understanding the sentiment of a nation's population can provide valuable insights into societal well-being, political stability, economic trends and educational needs. Social media platforms offer a wealth of real-time data that can be analyzed to develop a national sentiment statistic. As such, it presents a lot of opportunities. However, there are also obstacles associated with using data science to automate national statistics based on social media.

In this presentation, we address these obstacles by discussing different topics related to the use of Twitter data to develop a national sentiment statistic for Flanders, Belgium. We highlight the challenges associated with selection bias, annotator bias, and model performance. We then propose solutions to these challenges, including techniques for predicting annotation difficulty and correcting for selection bias.

We also explore the potential of Twitter sentiment analysis to complement traditional survey-based research on happiness, quality-of-living, and political confidence. We compare Twitter data with official survey data to understand the extent to which Twitter users represent the population as a whole, and investigate the differences in sentiment across different provinces in Flanders.

Overall, our research highlights both the obstacles and opportunities of using social media data for national sentiment statistics. By addressing the obstacles, we gain valuable insights, experiences and best practices that help us to develop and deploy even more official statistics based on social media in the future. Not only regarding the robustness, validity and representativeness of the produced statistics, but also concerning the interplay and complementarity with already existing statistical procedures, such as surveys.