Download PDF

Differentially Private Sampling with Replacement

Author

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: confidentiality

Abstract

Publishing the data of a part of survey respondents has been a major practice for statistical disclosure control. However, this practice, called subsampling, is challenged by computer scientists who regard the presence of a survey respondent in a published data set as the violation of their principle. Although this principle is footless, their claim to employ random perturbation is reasonable for publishing high dimensional data, since random perturbation can avoid the curse of dimensionality that spoils deterministic disclosure control methods such as global recoding or micro-aggregation. Random subsampling can also avoid the curse of dimensionality while it makes the identification of an individual uncertain. We show that random subsampling with replacement can be justified by the notion of differential privacy, which the challenging computer scientists admire. Synthetic data generated by bootstrapping are also justified.