IPS 316 - Design and Analysis of Experiments for Data Science

Category: IPS

Wednesday 19 July 2 p.m. - 3:40 p.m. (Canada/Eastern) (Expired) Room 106

ata sciences are posited with great challenge on data size, heterogeneity, and quality of data. The amount and quality of information extracted from data is often driven by how data were collected and analyzed according to the type of experiments. Thus, design and analysis of experiments for data science is a key to achieve analysis efficiency and computation efficiency. The purpose of this invited session is to bring five statistics experts working in this area to showcase how experimental design and analysis plays an important role in Data Science. We have a special issue on “Design and Analysis of Experiments for Data Science” at the New England Journal of Statistics in Data Science (NEJSDS). Detailed information can be found at https://journal.nestat.org/news/Design_and_Analysis. These four speakers are selected from those who have agreed to submit their work to this special issue. The discussant is one of the co-editors for this special issue.

The five participants are Drs. John Stufken from George Mason University (USA), Luc Pronzato from Laboratoire I3S - Sophia Antipolis (France), C. Devon Lin from Queen’s University (Canada), Lulu Kang from Illinois Institute of Technology (USA), and Simon Mak from Duke University (USA). The five speakers also well represent a gender diversity: Drs. Lulu Kang and C. Devon Lin are female while the other three speakers are male.

Experimental design and analysis have wide generality and significant advantages for gaining attractive inferential and computational properties. For example, as extraordinary amounts of data are being produced in many branches of science, proven statistical methods are no longer applicable with extraordinary large datasets due to computational limitations. A critical step in big data analysis is data reduction, which is an experimental design problem. Many newly developed methodology in this field have important applications in data sciences. This session aims to cover some representative work on relevant practical problems., such as multi-stage multi-fidelity Gaussian process model for computer experiments, subdata selection for data reduction in data science, variational inference for computation efficiency in data science, optimal designs for nonlinear model in data science.

We hope this session will help facilitate cross-fertilization of experimental design and analysis and data science. Beyond the design of experiments and statistical learning communities, this session is expected to attract significant attentions from audience in broad fields of statistics and computer science. The tentative titles for the talks are listed as follows:

1. Dr. John Stufken, Professor, Department of Statistics, School of Computing of the College of Engineering and Computing, George Mason University, USA

Title: Subdata Selection from Big Data with a Large Number of Variables

2. Dr. Simon Mak, Assistant Professor, Department of Statistical Science, Duke University, USA

Title: Design and Analysis of Multi-stage Multi-fidelity Computer Experiments

3. Dr. Lulu Kang, Associate Professor, Department of Applied Mathematics, Applied Mathematics, Illinois Institute of Technology, USA

Title: Energetic Variational Inference with Non-Local Interaction

4. Dr. Luc Pronzato, DR CNRS, Laboratoire I3S - Sophia Antipolis, France

Title: Optimal designs for nonlinear model in data science

Session Format: Chair, 4 speakers, and 1 discussant

Organiser: Prof. Chunfang Devon Lin

Chair: Prof. Chunfang Devon Lin

Speaker: Dr Simon Mak

Speaker: Dr John Stufken