Statistical Data Privacy

Instructors: Anne-Sophie Charest, Jingchen (Monika) Hu

July 14, 2023 - July 15, 2023

This conference is currently not open for registrations or submissions.

About the Short Course: 2 Days Course

This course is meant as an introduction to the ideas of statistical data privacy, with an emphasis on synthetic datasets for privacy purposes and differential privacy as a privacy measure. The course can be divided into four blocs each of relatively equal size. The first block will provide an overview of the field of statistical data privacy. We will explain what we mean by privacy and why it is important, using examples. We will describe different contexts in which privacy is of interest, different sub-fields of work on privacy and review different methods classically used by agencies for statistical data privacy.

The second block will focus on synthetic dataset. We will explain what they are, how to assess their utility and their privacy protection, and a few methods to generate them. The second day will focus on differential privacy. We will explain the origin of this formal privacy measure, look in details at its mathematical definition, its meaning, and its limitations. We will also present concrete applications of differential privacy by statistical agencies and private companies. We will then delve in some more technical details, with different mechanisms to achieve differential privacy for statistical tasks.

In-Person Event. Location Of Short Courses: University of Ottawa

Who is this course for?

This course will be appropriate for a large audience, including senior undergraduates, graduate students and faculty in statistics or related fields interested in statistical data privacy and practitioners in government or industry who work with personal data and need to care about privacy issues. The course could even be of interest to people whose work is less technical but who need to manage data within an organization since some of the content is not as technical but relates to terminology and big ideas in the context of data privacy. In fact, the course could be built in such a way that morning sessions (or sessions on the first day) are less technical and open to a larger audience.

Level Of Instruction: Beginner

Learning Outcomes

At the end of the course, the participants should be able to:

Explain the need for privacy protection of statistical data
Describe the utility-risk trade-off for data privacy
Measure the risk and utility of a statistical dataset or output
Generate (simple) synthetic datasets with R
Define and interpret differential privacy
Describe and implement basic mechanisms for differential privacy (in simple contexts)
Look for and read other (more technical, advanced, recent) resources on the topic

Course Materials

A large part of the course will be based on the textbook An Introduction to Statistical Data Privacy: Synthetic Data and Differential Privacy authored by Jingchen (Monika) Hu, Aleksandra (Sesa) Slavkovic and Anne-Sophie Charest (under contract with CRC Press – writing in progress). The book already includes R codes for all of the methods described. Additional slides, handouts and R labs will be provided for topics not covered in the book.

Delivery Structure

The course will be a mixture of lecturing and hands-on activities with the R software. The lecture parts will also be designed to foster interactions and active learning, by including some quizzes and short exercises to do in class, either individually or in teams, to test and deepen the understanding of the material presented.

Knowledge Assumed

A course in mathematical statistics is preferred (at least for the more technical sessions where we’ll use random variables, distributions, expectations, models, etc.). Experience with R will also be best for the hands-on parts of the course.

About the instructor: Anne-Sophie Charest

Anne-Sophie Charest has been an Associate Professor of Statistics in the Department of Mathematics and Statistics at Université Laval since 2012. Before that, she completed a BA Honors in Probability and Statistics at McGill University, Montreal, and received a Master’s and Ph.D. in Statistics from Carnegie Mellon University, Pittsburgh. Her research focuses on statistical data privacy. She is particularly interested in methods to generate synthetic datasets for privacy protection, as well as methodology to obtain valid statistical inference from such synthetic datasets.

She also works on the measurement of disclosure risks, including with the formal criterion of differential privacy. Anne-Sophie is a member of the Statistics research laboratory of the Centre de recherché mathématiques (CRM) in Quebec, and a member of the Big Data Research Center and the Institute Intelligence and Data at Université Laval. Her work is funded through various grants from the Natural Sciences and Engineering Research Council of Canada, the Canadian Statistical Sciences Institute and CIFAR. Besides her research, she is also very active in the teaching of statistics, at the undergraduate and graduate level. She is currently the program chair for the undergraduate degree in Statistics at Université Laval.

Affiliations: Department of Mathematics and Statistics, Université Laval

About the instructor: Jingchen (Monika) Hu

Jingchen (Monika) Hu is an Associate Professor of Mathematics and Statistics at Vassar College. She completed a B.S. in Computing Mathematics at City University of Hong Kong and M.S. and Ph.D. in Statistical Science at Duke University in North Carolina, United States. Monika’s main research interests are statistical data privacy. She focuses on developing Bayesian methodology in creating synthetic datasets that maintaining high utility and preserving privacy protection. More recently, Monika works on creating synthetic datasets that can satisfy differential privacy. She teaches an undergraduate senior seminar on statistical data privacy at Vassar College and has given short courses on statistical data privacy and Bayesian methods at several federal statistical agencies in the United States.

Monika is currently a consultant on several privacy projects at the New York City Department of Health and Mental Hygiene. Monika is an Associated Editor for Journal of Statistics and Survey Methodology and INFORMS Journal on Computing and is on the editorial board of Transactions on Data Privacy. Her work is funded through several U.S. National Science Foundation grants. In addition to scholarly research, Monika publishes articles on statistics education and is a co-author of an undergraduate Bayesian textbook, Probability and Bayesian Modeling.

Affiliations: Department of Mathematics and Statistics, Vassar College