Small Area Population Estimate Model
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Paper
Keywords: bigdata, data science, density, estimate, geodatabases, innovation,, machine learning, model
Session: CPS 19 - Statistical estimation III
Monday 17 July 4 p.m. - 5:25 p.m. (Canada/Eastern)
Abu Dhabi emirates is characterized by majority non- Emirati citizens population. Therefore, in and out migration plays a critical role in determining patterns captured in Population indicators.
Abu Dhabi emirate is changing in rapid pace were many urbanized areas appear in short time which change the structure of population allocation in the emirate. This impose a challenge by relying on census data back in 10 years as a base line for current population estimations.
Traditional method for population counting:
The most precise method to obtain population size and characteristics is through the census as a baseline, due to this method being costly and time-consuming.
The absence and irregularity of receiving of an important admin data -like Identity data- limit the ability of SCAD to use classical methods to do the population estimations. That lead to difficulty in estimating the population precisely and consistency. Population distribution was other challenge as well.
Innovative methodology for population estimate:
SCAD created a new estimation approach to provide policymakers and planners with up-to-date population indicators. This approach depends on modernizing statistical production model by utilizing alternative available admin data and unlock big data capabilities to support classical model used in NSO’s usually to estimate the total number of populations in Abu Dhabi in addition to allocate them properly on geographic level.
The new approach is: “Small Area Estimation (SAE) model” which aims to calculate the total population estimates by district in Abu Dhabi emirate using utility consumption data.
This has massive added value for SCAD and ADGEs as it greatly improves the population estimates on district levels and beyond. Using water consumption plus additional inputs from ADGEs, SCAD population, this will enable better decision making on demographics and the potential to build more accurate forecasting models.
In this method SCAD developed algorithm that combines Data Science and Population Statistics has been created. The models estimate population and their characteristics (nationality, gender) using the following steps:
1. Determine the amount of utilities over a month per household
2. Infer the average consumption by person using the conversion rate factors.
3. Determine correction factor (people by household) depending on its location to account for features such as; level of income, cadastral data, type of household, citizenship, if any available or applicable.
4. Correct number of people by household depending on the correction factors
5. Sum the population obtained by district
6. Validate with Population reference data by district
7. Apply correction factors for districts to adjust the total population