Finding households and dwellings in a register-based census using graphs


Keywords: graphs, households, place_of_residence, population_register, register-based_census

The 2021 census in Estonia was mostly based on administrative data. Estonian registers are well-equipped for a register-based census: they use unique identifiers for both addresses and persons, cover a wide variety of topics, and their quality is generally high. However, an exception to the latter is the place of permanent residence in the Population Register (PR), which is incorrect or absent in about 20% of the population.

The inaccuracy of place of residence data causes strong biases in household and family statistics. In a register-based census, private household consists of people who live in the same dwelling. If the place of residence data from PR is used to determine households, the number of lone parent families is overestimated and that of families of cohabiting partners is underestimated. Apparently, many families are split between multiple addresses. For example, if a family consisting of mother, father and their two children decides to register one child and her parent to an address different from that of the other child and parent, this family will appear as two lone-parent families in the PR.

To reunite families that appear broken in the register, we turn to other administrative sources and look for signs that hint living together. For example, if two persons are married, it is very likely that they also live in the same dwelling. If a new household is formed from people registered on different addresses, it is not clear which dwelling this household inhabits. To find potential candidates for place of residence, we look for people’s links with various addresses (e.g., owns real estate, has an electricity contract, has contact address in some register).

We consider people and addresses as nodes of a graph; the connections between them define the edges. Then, a household and its dwelling can be viewed as a strongly connected subgraph, or in terms of graph theory, a community. Constructing households and finding dwellings for them reduces to the community detection, a common task in the analysis of networks.

Altogether, 17 registers were used to find the edges—connections between people, or people and addresses. Each edge was assigned a weight describing the strength of the connection. It was modelled as probability of living together or living at current address, using Estonian Labor Force Survey and Estonian Social Survey as training data.

The graph-based approach was used to compute the place of residence and household and family statistics in the census of 2021. The surveys conducted shortly after census moment provided the external validation of the new method.

My presentation will highlight different aspects of the work carried out at Statistics Estonia to produce accurate households and dwellings statistics for its 2021 population census.