64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Prediction of Internet Users in Indonesia Using Google Trends Data


Atika Nashirah Hasyyati



64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: internet, machine learning

Session: CPS 70 - Statistics and internet

Tuesday 18 July 5:30 p.m. - 6:30 p.m. (Canada/Eastern)


Statistics Indonesia produces the percentage of internet users in Indonesia based on the Indonesian National Socioeconomic Survey (Susenas) annually. One of the impacts of the COVID-19 pandemic is limited access to respondents, so The Partnership on Measuring Information and Communication Technology for Development reported the need to promote data innovations as the complement of ICT traditional data. Internet use is one of the most important ICT variables that can describe the gaps in connectivity and technological advancement. Meanwhile, the current number of internet users can only be published annually so there is a need for more timely estimates. Google Trends data is free to access and timely available (nowcasting) that can be used to produce data and prediction. This paper aims at using Google Trends as an alternative data source to produce monthly estimates of the percentage of individuals using the internet in Indonesia by province. In this case, we also utilise the official data (National Socioeconomic Survey) of the percentage of internet users in Indonesia by province from 2014 to 2021. Using some variables as predictors based on Google Trends data (based on Web Search, possible search keywords are Facebook, WhatsApp, etc., including top searches). Then, applying machine learning methods to predict the percentage of internet users. Several challenges need to be overcome when conducting data gathering from the Google Trends API. Over 1.6 million data was collected by using gtrendsR package. After data cleaning, some machine learning methods were compared to produce monthly estimates of internet users. Comparison of some machine learning methods show that XGBoost has the highest accuracy.