March 3 @ 4:00 pm - 5:00 pm
Title: Statistical Exploitation of Unlabeled Data under High Dimensionality
Presenter: Jiwei Zhao
Abstract: In this talk, we consider the benefits of unlabeled data in the semi-supervised setting under high dimensionality, for parameter estimation and statistical inference. In particular, we address the following two important questions. First, can we use the labeled data as well as the unlabeled data to construct a semi-supervised estimator such that its convergence rate is faster than the supervised estimator? Second, can we construct confidence intervals or hypothesis tests that are guaranteed to be more efficient or powerful than the supervised estimator? We show that, the semi-supervised estimator with a faster convergence rate exists under some conditions, and the implementation of this optimal estimator needs a reasonably good estimation of the conditional mean function. For statistical inference, we mainly propose a safe approach that is guaranteed to be no worse than the supervised estimator in terms of statistical efficiency. Not surprisingly, if the conditional mean function is well estimated, our safe approach becomes the semi-parametrically efficient approach. After the theory development, I will also present some simulation results as well as a real data analysis.