Oblivious Dimension Reduction for k-Means -- Beyond Subspaces and the Johnson-Lindenstrauss Lemma
- 1. Sapienza University of Rome
- 2. Google Zürich
- 3. IDSIA, Lugano
- 4. Aarhus University
Description
We show that for n points in d-dimensional Euclidean space, a data oblivious random projection of the columns onto \(m \in O(\frac{(\log k+\log \log n)\log 1/\varepsilon)}{\varepsilon^6}) \) dimensions is suffcient to approximate the cost of all k-means clusterings up to a multiplicative \((1\pm \varepsilon)\)factor. The previous-best upper bounds on m are \(O(\varepsilon^{-2}\log n)\) given by a direct application of the Johnson-Lindenstrauss
Lemma, and \(O(k\varepsilon^{-2})\) given by [Cohen et al.-STOC'15].
We also prove the existence of a non-oblivious cost preserving sketch with target dimension \(O(\frac{(\log k+\log \log n)\log 1/\varepsilon)}{\varepsilon^4}) \), improving on \(\lceil\frac{k}{\varepsilon}\rceil\) [Cohen et al.-STOC'15]. Furthermore, we show how to construct strong coresets for the k-means problem of size \(O(k\cdot\text{poly}(\log k,\varepsilon^{-1}))\). Previous constructions of strong coresets have size of order \(k\cdot \min(d,k/\varepsilon)\).
Files
randomprojections.pdf
Files
(527.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:34922978a6ee724e7231528625845fc9
|
527.9 kB | Preview Download |