科学研究
A Two-Stage Optimal Subsampling Estimation for Missing Data Problems with Large-Scale Data
邀请人:梁汉营
发布时间:2022-11-02浏览次数:

题 目:A Two-Stage Optimal Subsampling Estimation for Missing Data Problems with Large-Scale Data

报告人:王启华 研究员 (中国科学院)

地点:腾讯会议室

时间:2022年11月6日(周日) 20:00-21:30

摘要:Subsampling is useful to downsize data volumes and speed up calculations for large-scale data and is well studied with completely observed data. In the presence of missing data, computation is more challenging and subsampling becomes more crucial and complex. However, there is still a lack of study on subsampling for missing data problems. This paper fills the gap by studying subsampling method for a widely used missing data estimator, the augmented inverse probability weighting (AIPW) estimator. The response mean estimation problem with missing responses is discussed for illustration. A two-stage subsampling method is proposed via Poisson sampling framework. A small subsample of expected size $n_{1}$ is used in the first stage to estimate the parameters in the propensity score  and the outcome regression models,  while a larger subsample of expected size $n_{2}$ is used in the computationally simple second stage to calculate  the final estimator. An attractive property of the resulting estimator is that its convergence rate is $n_{2}^{-1/2}$ rather than $n_{1}^{-1/2}$ when both the propensity score and the outcome regression functions are correctly specified. The rate $n_{2}^{-1/2}$ is still attainable  for some important cases if only one of the two functions is correctly specified. This indicates that using a small subsample in the computationally complex first stage can reduce computational burden with little impact on the statistical accuracy. Asymptotic normality of the resulting estimator is established and the optimal subsampling probability is derived by minimizing the asymptotic variance of the resulting estimator. Simulations and a real data analysis were conducted to demonstrate the empirical performance of the resulting estimator.

腾讯会议: ID:959-714-368  密码:385709

欢迎广大师生参加