64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Quantifying the contribution of individual records to the reidentification risk of (pseudo)anonymized datasets


Fotios Stavropoulos


  • V
    Vasiliki Daskalaki
  • K
    Kimon Spiliopoulos
  • K
    Konstantinos Spinakis
  • GS
    Gilbert Saporta
  • M
    Michel Béra


64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Paper

Keywords: anonymization, extreme-value theory, privacy, pseudonymization, reidentification, risk

Session: CPS 07 - Statistical estimation II

Monday 17 July 8:30 a.m. - 9:40 a.m. (Canada/Eastern)


The reidentification of individuals or business establishments in (pseudo)anonymized microdata may expose sensitive data and will lead to fines and reputational damage for the data’s custodians. The QaR method (AFNOR, 2020) proposes a measure of the reidentification risk of a dataset, and a statistical technique, based on extreme-value theory, to estimate it. This risk has great value. It is a gauge of the effectiveness of whatever disclosure control the custodians apply to the data; it could be reported to regulatory authorities to demonstrate the custodians’ level of care for the data subjects’ privacy; it can be used to calculate an insurance premium against unauthorized disclosure or the amount of money that custodians need in their balance sheet to cover potential financial damages due to such disclosure.
The present paper deals with a particular aspect of the methodology: the quantification of the contribution of each record to the dataset’s risk. It discusses its importance and its large computational demands in very large datasets, and proposes metrics that are faster to compute and could serve as proxies of record contribution. The results for some of these proxies are promises but more investigation is needed.