ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI)
Creators
Description
These data sets were originally created for the following publications:
M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
H.-P. Kriegel, E. Schubert, A. Zimek
Evaluation of Multiple Clustering Solutions
In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.
The outlier data set versions were introduced in:
E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.
They are derived from the original image data available at https://aloi.science.uva.nl/
The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005
Additional information is available at: https://elki-project.github.io/datasets/multi_view
The following views are currently available:
Feature type | Description | Files |
---|---|---|
Object number | Sparse 1000 dimensional vectors that give the true object assignment | objs.arff.gz |
RGB color histograms | Standard RGB color histograms (uniform binning) | aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz |
HSV color histograms | Standard HSV/HSB color histograms in various binnings | aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz |
Color similiarity | Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) | aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other) |
Haralick features | First 13 Haralick features (radius 1 pixel) | aloi-haralick-1.csv.gz |
Front to back | Vectors representing front face vs. back faces of individual objects | front.arff.gz |
Basic light | Vectors indicating basic light situations | light.arff.gz |
Manual annotations | Manually annotated object groups of semantically related objects such as cups | manual1.arff.gz |
Outlier Detection Versions
Additionally, we generated a number of subsets for outlier detection:
Feature type | Description | Files |
---|---|---|
RGB Histograms | Downsampled to 100000 objects (553 outliers) | aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz |
Downsampled to 75000 objects (717 outliers) | aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz | |
Downsampled to 50000 objects (1508 outliers) | aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz |
Files
Files
(1.2 GB)
Name | Size | Download all |
---|---|---|
md5:aaa1bdf2c7cc818b62adcf36fd0943ff
|
80.0 MB | Download |
md5:77ca9349561390fee03fd45fe5cbc214
|
22.7 MB | Download |
md5:c64fc7f000e477f244691df646f9dba3
|
31.7 MB | Download |
md5:f349dd6c6074915647072070b8876f35
|
8.0 MB | Download |
md5:36d34bb2798b601febefc192f9a1b2c2
|
4.1 MB | Download |
md5:bf7a39adeeef9f70bd25af2a98c366e2
|
6.0 MB | Download |
md5:d97491ea27e9cc07004041e82f52904c
|
8.9 MB | Download |
md5:6ad989b4ecf229bcb4d3b027a7413579
|
41.9 MB | Download |
md5:d7fd403a60e65f838267cf4510c509f1
|
53.4 MB | Download |
md5:63eaa7edb5872a0dc84a369a2648bd68
|
13.4 MB | Download |
md5:ee9fb069fca420616efe564bbc186549
|
6.8 MB | Download |
md5:b1054e813f037130b7dae70f6474c8d1
|
10.2 MB | Download |
md5:4f6bf86e4f9e7297f706c58cf593098d
|
15.1 MB | Download |
md5:2794dfc8ebfa665562e94312c90276ad
|
66.0 MB | Download |
md5:97d4d8cd86be4d395759216f4818c833
|
4.3 MB | Download |
md5:797f6c8ccf90b1cd2eb4f7b6d4356286
|
73.8 MB | Download |
md5:d180bd39a8ba70266091c311a0bce213
|
74.2 MB | Download |
md5:bdd257a25be29bc56d85b1085f529050
|
13.3 MB | Download |
md5:6c80bb2c249a063c89b67b28fe306485
|
74.9 MB | Download |
md5:d5a9d3445eea25ac2b399996a0ebffb0
|
52.5 MB | Download |
md5:1692e46af2b9842ca138c41d2b548964
|
37.9 MB | Download |
md5:813ddcf99ef4756282a8ddbca840da2a
|
74.1 MB | Download |
md5:5f0cb23aed035a9292706e54e3ab31d2
|
94.0 MB | Download |
md5:50e8f23d971d87d565daf2d1021c54d9
|
5.4 MB | Download |
md5:6dd0d7d4f1b87966c984e999bcd38ec1
|
12.5 MB | Download |
md5:65ef5a591b594998da8343baf4e4ffa1
|
23.0 MB | Download |
md5:98953f57eb731c213f948ce74294deae
|
37.1 MB | Download |
md5:30980e50dd470d2b5d26169799034108
|
53.5 MB | Download |
md5:16db3064dad4bd7ee1dda1c32f8c22de
|
12.8 MB | Download |
md5:931bb25b4a6b7b7784d01c28302f861b
|
22.6 MB | Download |
md5:e7a2f8697862f289216b8f28ecd09e14
|
72.5 MB | Download |
md5:01eeecf68377008d87f284074a18ccf5
|
37.1 MB | Download |
md5:5b4c3a16f6136569821182267aa9616d
|
54.5 MB | Download |
md5:ea5ba42e46e4b39eadec556055d0b7b3
|
486.4 kB | Download |
md5:87b5e5176e5a056b79d39ca6adbd3d22
|
303.8 kB | Download |
md5:904a2bd4a1269708468cae060054dd39
|
331.9 kB | Download |
md5:ae2fe5aa7df7aaa11d16b1fd5d83fc5d
|
300.2 kB | Download |
Additional details
Dates
- Other
-
2022Uploaded to zenodo
References
- M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
- H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.
- E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.
- J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005