Differentially Private Sampling with Replacement
Conference
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Abstract
Keywords: confidentiality
Abstract
Publishing the data of a part of survey respondents has been a major practice for statistical disclosure control. However, this practice, called subsampling, is challenged by computer scientists who regard the presence of a survey respondent in a published data set as the violation of their principle. Although this principle is footless, their claim to employ random perturbation is reasonable for publishing high dimensional data, since random perturbation can avoid the curse of dimensionality that spoils deterministic disclosure control methods such as global recoding or micro-aggregation. Random subsampling can also avoid the curse of dimensionality while it makes the identification of an individual uncertain. We show that random subsampling with replacement can be justified by the notion of differential privacy, which the challenging computer scientists admire. Synthetic data generated by bootstrapping are also justified.