Published June 30, 2010 | Version 1.0
Dataset Open

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI)

  • 1. ROR icon Ludwig-Maximilians-Universität München
  • 2. ROR icon TU Dortmund University
  • 3. ROR icon University of Southern Denmark

Description

These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek
Evaluation of Multiple Clustering Solutions
In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

 

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type Description Files
Object number Sparse 1000 dimensional vectors that give the true object assignment objs.arff.gz
RGB color histograms Standard RGB color histograms (uniform binning) aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
HSV color histograms Standard HSV/HSB color histograms in various binnings aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
Color similiarity Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
Haralick features First 13 Haralick features (radius 1 pixel) aloi-haralick-1.csv.gz
Front to back Vectors representing front face vs. back faces of individual objects front.arff.gz
Basic light Vectors indicating basic light situations light.arff.gz
Manual annotations Manually annotated object groups of semantically related objects such as cups manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type Description Files
RGB Histograms Downsampled to 100000 objects (553 outliers) aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
  Downsampled to 75000 objects (717 outliers) aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
  Downsampled to 50000 objects (1508 outliers) aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz

Files

Files (1.2 GB)

Name Size Download all
md5:aaa1bdf2c7cc818b62adcf36fd0943ff
80.0 MB Download
md5:77ca9349561390fee03fd45fe5cbc214
22.7 MB Download
md5:c64fc7f000e477f244691df646f9dba3
31.7 MB Download
md5:f349dd6c6074915647072070b8876f35
8.0 MB Download
md5:36d34bb2798b601febefc192f9a1b2c2
4.1 MB Download
md5:bf7a39adeeef9f70bd25af2a98c366e2
6.0 MB Download
md5:d97491ea27e9cc07004041e82f52904c
8.9 MB Download
md5:6ad989b4ecf229bcb4d3b027a7413579
41.9 MB Download
md5:d7fd403a60e65f838267cf4510c509f1
53.4 MB Download
md5:63eaa7edb5872a0dc84a369a2648bd68
13.4 MB Download
md5:ee9fb069fca420616efe564bbc186549
6.8 MB Download
md5:b1054e813f037130b7dae70f6474c8d1
10.2 MB Download
md5:4f6bf86e4f9e7297f706c58cf593098d
15.1 MB Download
md5:2794dfc8ebfa665562e94312c90276ad
66.0 MB Download
md5:97d4d8cd86be4d395759216f4818c833
4.3 MB Download
md5:797f6c8ccf90b1cd2eb4f7b6d4356286
73.8 MB Download
md5:d180bd39a8ba70266091c311a0bce213
74.2 MB Download
md5:bdd257a25be29bc56d85b1085f529050
13.3 MB Download
md5:6c80bb2c249a063c89b67b28fe306485
74.9 MB Download
md5:d5a9d3445eea25ac2b399996a0ebffb0
52.5 MB Download
md5:1692e46af2b9842ca138c41d2b548964
37.9 MB Download
md5:813ddcf99ef4756282a8ddbca840da2a
74.1 MB Download
md5:5f0cb23aed035a9292706e54e3ab31d2
94.0 MB Download
md5:50e8f23d971d87d565daf2d1021c54d9
5.4 MB Download
md5:6dd0d7d4f1b87966c984e999bcd38ec1
12.5 MB Download
md5:65ef5a591b594998da8343baf4e4ffa1
23.0 MB Download
md5:98953f57eb731c213f948ce74294deae
37.1 MB Download
md5:30980e50dd470d2b5d26169799034108
53.5 MB Download
md5:16db3064dad4bd7ee1dda1c32f8c22de
12.8 MB Download
md5:931bb25b4a6b7b7784d01c28302f861b
22.6 MB Download
md5:e7a2f8697862f289216b8f28ecd09e14
72.5 MB Download
md5:01eeecf68377008d87f284074a18ccf5
37.1 MB Download
md5:5b4c3a16f6136569821182267aa9616d
54.5 MB Download
md5:ea5ba42e46e4b39eadec556055d0b7b3
486.4 kB Download
md5:87b5e5176e5a056b79d39ca6adbd3d22
303.8 kB Download
md5:904a2bd4a1269708468cae060054dd39
331.9 kB Download
md5:ae2fe5aa7df7aaa11d16b1fd5d83fc5d
300.2 kB Download

Additional details

Dates

Other
2022
Uploaded to zenodo

References

  • M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
  • H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.
  • E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.
  • J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005