题目:Navigating Challenges in Nonparametric Classification and Outlier Detection: a Remedy Based on Semi-parametric Density Ratio Models
姓名:刘玉坤 教授(华东师范大学)
地点: 致远楼108室
时间: 2024年5月13日 15:30-17:00
Abstract:
The goal of classification is to assign categorical labels to unlabelled test data based on patterns and relationships learned from a labeled training dataset. Yet this task becomes challenging when the training data and the test data exhibit distributional mismatches. The unlabelled test data follow a finite mixture model, which is not identifiable without any model assumptions. In this paper, we propose to model the test data by a finite semi-parametric mixture model under density ratio model, and construct a semi-parametric likelihood prediction set (SPLPS) for the labels in the test data. Our approach tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers as often as possible. It has the potential to enhance the robustness and effectiveness of classification models when dealing with varying distributions between training and test data. Our method circumvents a stringent separation assumption between training data and outliers, which is required by Guan and Tibshirani (2022) but is often violated by commonly-used distributions. We prove asymptotic consistency and normalities of our parameter estimators and asymptotic optimality of the proposed SPLPS. We illustrate our methods by analyzing four real-world datasets.
欢迎各位参加!