Published November 10, 2023 | Version v3
Conference paper Open

Robust covariance estimation with missing values and cell-wise contamination

  • 1. ROR icon École Polytechnique

Description

Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix. Moreover, the robust procedures designed to address this problem require the invertibility of the covariance operator and thus are not effective on high-dimensional data. In this paper, we propose an unbiased estimator for the covariance in the presence of missing values that does not require any imputation step and still achieves near minimax statistical accuracy with the operator norm. We also advocate for its use in combination with cell-wise outlier detection methods to tackle cell-wise contamination in a high-dimensional and low-rank setting, where state-of-the-art methods may suffer from numerical instability and long computation times. To complement our theoretical findings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings.

Files

6362_robust_covariance_estimation_w.pdf

Files (1.6 MB)

Name Size Download all
md5:42f2ab065df9acfd1ef0fcf153129a7b
1.6 MB Preview Download

Additional details

Identifiers

Funding

ELIAS – European Lighthouse of AI for Sustainability 101120237
European Commission

Dates

Accepted
2023-11-10