IPS 272 - Leveraging all data in the production of official statistics: A progress report

Category: IPS

Wednesday 19 July 10 a.m. - noon (Canada/Eastern) (Expired) Room 203

National Statistics Institutes (NSIs) are facing numerous challenges. Although surveys have served as a solid foundation for producing official statistics for more than 50 years, increasing costs, decreasing response rates, and decreasing list frame coverage are raising concerns about the quality of the resulting estimates. At the same time, data users are wanting more official statistics reported at finer temporal and spatial scales. In the era of big data, administrative and other data are becoming increasingly available. Consequently, many NSIs are actively moving to either integrate survey and non-survey data or to use only non-survey data to produce official statistics. In this session, speakers from four different countries will discuss the progress being made and the problems yet to be solved if NSIs are to truly leverage all available data in the production of official statistics. Anders Holmberg, Australian Bureau of Statistics, will focus on ways to find a balance between cost efficient statistics production and the thirst for timely data analytics, while acknowledging the paradoxes and trade-offs. Ron Jarmin, U.S. Census Bureau, will discuss bridging the gap between theory and practice when using all available data, including how building statistics for the retail trade sector from transaction level data can yield fundamental improvements in both price and quantity measurement, the investments in statistical infrastructure needed leverage the value of private sector data, and issues that require solutions to scale the use of these data for official statistics. Ton de Waal, Statistics Netherlands, will explore some of the theoretical challenges when using diverse data, such as measuring the quality of estimates (partly) based on non-probability data with a focus on assessing selection bias in practical situations and on assessing the effect of measurement error on the quality of estimates. Andrea Diniz da Silva, Brazil's Escola Nacional de Ciencias Estatisticas - ENCE, will highlight recent achievements of the Regional Hub for Big Data in Brazil, one of four regional hubs created by the United Nations to provide support to NSIs, for training and fostering the interest of young statisticians on the use of big data and for the practices and challenges regarding the use of big data for statistics in Latin America and the Caribbean Region. Linda J. Young, USDA National Agricultural Statistics Service, will discuss the common issues identified by the four speakers and potential opportunities for collaboration as NSIs move rapidly to improve estimates using all available information.

Leveraging all data in the production of official statistics: A progress report

Speaker 1: Anders Holmberg, Australian Bureau of Statistics

Title: Longer, wider, more granular, frenetic and recycled, can NSIs pivot to exploit more and different data and meet new demands?

Abstract:

National statistical institutes (NSIs) stepped up during the pandemic and delivered new timely statistics accessing new data sources. This traded timeliness for accuracy provided governments with valuable facts to make better policy decisions. In my view, this was helped by being prepared from a long and (according to some) slow progress moving away from direct data collections to more comfortably using secondary data sources. In parallel NSIs are transitioning to roles as not only collectors and producers of data, but also curators and providers of data for advanced policy analysis purposes. NSIs are increasingly providing platforms for safe use and analysis of microdata. My talk will focus on ways to find a balance between having cost efficient statistics production and quenching the thirst for timely data analytics. There are paradoxes and trade-offs that NSIs must acknowledge and I shall highlight some of them and outline how traditional sample surveys still have an important role.

Speaker 2: Ron Jarmin, U.S. Census Bureau

Title: Modernizing Official Statistics in Theory and Practice

Abstract:

Statistical agencies are driven to modernize both by challenges to existing methods (e.g., declining survey response rates) and by emerging opportunities (e.g., the proliferation of new digital source data). I discuss recent progress at the U.S. Census Bureau. I provide an overview of successful applications of blended data to improve economic indicators and mitigate the impact of data collection challenges stemming from the COVID-19 pandemic. I highlight research showing that building statistics for the retail trade sector from transaction level data can yield fundamental improvements in both price and quantity measurement. I outline foundational investments in statistical infrastructure needed leverage the value of private sector data and discuss access and engineering issues that need solutions to scale the use these data for official statistics.

Speaker 3: Ton de Waal, Statistics Netherlands

Title: Measuring Quality of Official Statistics (partly) Based on Non-Probability Data

Abstract:

In recent years, non-probability data, i.e. data not collected by means of a known and well-designed sampling mechanism such as administrative data and big data, are more and more used for producing official statistics. In some cases official statistics are based on non-probability data solely, in other cases official statistics are based on a combination of non-probability data and traditional sample survey data. When official statistics are (partly) based on non-probability data, assessment of the quality of estimates for parameters of interest is usually much more complicated than when such estimates are based on sample survey data only. For instance, administrative data and big data are sometimes based on a selective part of the population, which may lead to selection bias in estimates for parameters of interest. Also, when estimates are based on several datasets – either non-probability datasets or survey samples – in which the same target variable is measured, measurement error in this variable may become apparent and needs to be assessed. In this talk we will discuss measuring the quality of estimates (partly) based on non-probability data. We will focus on assessing selection bias in practical situations and on assessing the effect of measurement error on the quality of estimates.

Speaker 4: Andrea Diniz da Silva, Brazil’s Escola Nacional de Ciências Estatísticas – ENCE

Title: Use of Big Data in Official Statistics: State of Art in Latin American and the Caribbean

Abstract:

Big data is more and more successfully used by many national statistical offices as alternative to traditional survey data. The United Nations Global Platform registered at least 111 national experiences on using web scraping, mobile phone data, social media and satellite imagery, among other sources, to produce a broad number of statistics including on prices, migration, agriculture, mobility and indicators for at least SDGs 1, 2, 5, 6, 11, 14 and 15. To support national statistical offices, the United Nations created four regional hubs: Brazil, China, United Arab Emirates and Rwanda. The Regional Hub for Big Data in Brazil, which serves Latin America and the Caribbean, has been developing several actions to promote training and fostering the interest of young statisticians on the use of big data, to support sharing of experiences and knowledge and to strengthen ties and promote cooperation in the Region. The presentation will provide an overview on the recent achievements resulting of the Hub’s activities as well as on the practices and challenges regarding the use of big data for statistics in the Region as reported by the national statistical offices answering a consultation conducted at the beginning of 2023.

Discussant: Linda J. Young, USDA National Agricultural Statistics Service

National Statistics Institutes (NSIs) are facing numerous challenges. Although surveys have served as a solid foundation for producing official statistics for more than 50 years, increasing costs, decreasing response rates, and decreasing list frame coverage are raising concerns about the quality of the resulting estimates. At the same time, data users are wanting more official statistics reported at finer temporal and spatial scales. In the era of big data, administrative and other data are becoming increasingly available. Consequently, many NSIs are actively moving to either integrate survey and non-survey data or to use only non-survey data to produce official statistics. In this session, speakers from four different countries will discuss the progress being made and the problems yet to be solved if NSIs are to truly leverage all available data in the production of official statistics. Anders Holmberg, Australian Bureau of Statistics, will focus on ways to find a balance between cost efficient statistics production and the thirst for timely data analytics, while acknowledging the paradoxes and trade-offs. Ron Jarmin, U.S. Census Bureau, will discuss bridging the gap between theory and practice when using all available data, including how building statistics for the retail trade sector from transaction level data can yield fundamental improvements in both price and quantity measurement, the investments in statistical infrastructure needed leverage the value of private sector data, and issues that require solutions to scale the use of these data for official statistics. Ton de Waal, Statistics Netherlands, will explore some of the theoretical challenges when using diverse data, such as measuring the quality of estimates (partly) based on non-probability data with a focus on assessing selection bias in practical situations and on assessing the effect of measurement error on the quality of estimates. Andrea Diniz da Silva, Brazil's Escola Nacional de Ciencias Estatisticas - ENCE, will highlight recent achievements of the Regional Hub for Big Data in Brazil, one of four regional hubs created by the United Nations to provide support to NSIs, for training and fostering the interest of young statisticians on the use of big data and for the practices and challenges regarding the use of big data for statistics in Latin America and the Caribbean Region. Linda J. Young, USDA National Agricultural Statistics Service, will discuss the common issues identified by all four speakers and potential opportunities for collaboration as NSIs move rapidly to improve estimates using all available information.

Organiser: Dr Linda J. Young

Chair: Barbara Rater

Speaker: Ron Jarmin

Speaker: Anders Holmberg

Speaker: PROF. DR. Ton De Waal

Speaker: Andrea Diniz da Silva

Discussant: Linda Young