Sample surveys in the era of Big Data and Machine Learning
Category: International Association of Survey Statisticians (IASS)
Estimation in nonprobability samples with Propensity Score Adjustment and Kernel Weighting
Maria del Mar Rueda, University of Granada, Spain
Nonprobability samples usually entail selection biases that may arise from substantial differences between the potentially covered population and the target population. In this work, we compare two design-based methods: Propensity Score Adjustment, which estimates the propensities using predictive models such as logistic regression or machine learning classifiers, and Kernel Weighting, which combines the estimation of propensities with sample matching. In addition, we also consider the use of weight smoothing to account for the estimation in multipurpose surveys, where the covariates used in the adjustments may be more suitable for some target variables than for others.
New Data sources for improving Official Statistics
Paolo Righi, Istat
The talk focuses on the Data Integration approach, which combines multiple sources (surveys with probabilistic samples, administrative data and Big Data) and considers two classes of estimators. The first class considers design-based estimators and uses Big Data as auxiliary information, the second class uses the probabilistic sample as a source auxiliary information and its estimators make model-based inference using Big Data. In the latter case, the probabilistic sample is useful for dealing with the selection bias of the non-probabilistic sample and for correcting the measurement error when the Big Data does not collect the target variable accurately. The two classes of estimators are applied on real survey and Big data.
Propensity score weighting for handling selection bias in voluntary samples
Jae Kwang Kim, Iowa State University
Propensity score weighting is widely used to improve the representativeness and to correct the selection bias in the sample. In this talk, we consider an alternative approach of estimating the inverse of the propensity scores using the density ratio function. The smoothed density ratio function is obtained by the solution to the information projection onto the space satisfying the moment conditions on the balancing scores. The proposed approach is applicable to nonignorable selection model with some identifiability conditions.
Sampling for network function learning
To define what we call network functions, let us first envisage a valued graph, where the nodes represent the units and the edges the connections among them, and both the nodes and the edges may be associated with values in addition. Any network function for a given unit must then be defined in terms of both the corresponding node and the nodes connected to it, as well as the associated values. A basic difficulty for learning such network functions arises when the edges of the graph are unknown to start with, even when the entire set of nodes are known, such that the edges can only be partly observed by sampling from the collection of nodes and edges, i.e. the graph. In this talk, we consider the feasibility of graph sampling approach to network function learning, as well as the corresponding learning methods based on sample graphs.
- An empirical likelihood approach to reduce selection bias in voluntary sample
- Estimation in nonprobability samples with Propensity Score Adjustment and Kernel Weighting
- Graph sampling for node embedding
- New Data sources for improving the Official Statistics
- Sample surveys in the era of Big Data and Machine Learning