Published June 10, 2020 | Version v1
Conference paper Open

Oblivious Dimension Reduction for k-Means -- Beyond Subspaces and the Johnson-Lindenstrauss Lemma

  • 1. Sapienza University of Rome
  • 2. Google Zürich
  • 3. IDSIA, Lugano
  • 4. Aarhus University

Description

We show that for n points in d-dimensional Euclidean space, a data oblivious random projection of the columns onto \(m \in O(\frac{(\log k+\log \log n)\log 1/\varepsilon)}{\varepsilon^6}) \) dimensions is suffcient to approximate the cost of all k-means clusterings up to a multiplicative \((1\pm \varepsilon)\)factor. The previous-best upper bounds on m are \(O(\varepsilon^{-2}\log n)\) given by a direct application of the Johnson-Lindenstrauss
Lemma, and \(O(k\varepsilon^{-2})\) given by [Cohen et al.-STOC'15].
We also prove the existence of a non-oblivious cost preserving sketch with target dimension \(O(\frac{(\log k+\log \log n)\log 1/\varepsilon)}{\varepsilon^4}) \), improving on \(\lceil\frac{k}{\varepsilon}\rceil\) [Cohen et al.-STOC'15]. Furthermore, we show how to construct strong coresets for the k-means problem of size \(O(k\cdot\text{poly}(\log k,\varepsilon^{-1}))\). Previous constructions of strong coresets have size of order \(k\cdot \min(d,k/\varepsilon)\).

Files

randomprojections.pdf

Files (527.9 kB)

Name Size Download all
md5:34922978a6ee724e7231528625845fc9
527.9 kB Preview Download

Additional details

Funding

European Commission
AMDROMA - Algorithmic and Mechanism Design Research in Online MArkets 788893