Basic Food Basket Price From Big Data
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Abstract
Keywords: food, price, webscraping
Session: CPS 36 - Big data I
Tuesday 18 July 8:30 a.m. - 9:40 a.m. (Canada/Eastern)
The Basic Food Basket of goods and services has a strong representation in social dynamics, because the less well-off classes are greatly affected by changes in prices of food. In Brazil, this occurs because the inequalities that plague the country reduce the purchasing power of the population, which causes the Basic Food Basket consumes a large portion of the income of the Brazilian families. The Inter-Union Department of Statistics and Socioeconomic Studies (DIEESE) provides the price of the basic basket for the national level and regions, based on prices obtained by in-person collection at physical establishments. Because of the elevated cost of data collection, obtaining data imposes limitations on geographic coverage and to the number of establishments visited. In addition, quality issues may be experienced as consequence of the scarcity of resources. One option to overcome such limitations is the use of alternative data sources. The presented research work aims to provide insights for the use of alternative open data for calculating the price of the Basic Food Basket. In the proposed strategy, the value of the Basic Food Basket is calculated based on prices collected on e-commerce pages in Brazil, using web scraping. Free and open-source software is used, namely R, including package RSelenium. Two aggregator sites are scraped: Ifood and Google Shopping, as both cover most of the country's retail stores. Analysis of the adherence of the price of the Basic Food Basket obtained as proposed by the authors to the one obtained in the traditional way will be carried out through hypothesis tests. Two alternative prices will be considered: one using the prices practiced on the reference date used in DIEESE and another averaging the prices of the period of a month. The second approach has the advantage of capture variations existing throughout the month, which allows eventual corrections and improvements, in addition to the proper understanding of divergences.