GLAMI-1M: A Multilingual Image-Text Fashion Dataset - 800px

Vaclav Kosar; Antonín Hoskvec; Milan Šulc; Radek Bartyzal

doi:10.5281/zenodo.7338792

Published November 20, 2022 | Version 2022-11-23

Dataset Open

GLAMI-1M: A Multilingual Image-Text Fashion Dataset - 800px

1. GLAMI
2. Rossum

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at: https://github.com/glami/glami-1m

Files

GLAMI-1M-dataset-800px--0.zip

Files (64.0 GB)

Name	Size	Download all
GLAMI-1M-dataset-800px--0.zip md5:9d7aa43dc315a5d567389e68e4e50ccb	6.8 GB	Preview Download
GLAMI-1M-dataset-800px--1.zip md5:36d1ffbef560b707f815faad98b039f1	5.2 GB	Preview Download
GLAMI-1M-dataset-800px--10.zip md5:cec58af85ff2cc9eefd9ba20ebdd334e	4.4 GB	Preview Download
GLAMI-1M-dataset-800px--2.zip md5:47598ccfddcfb3ed9a00e851aa1165b9	6.4 GB	Preview Download
GLAMI-1M-dataset-800px--3.zip md5:d92c0a41e093bd07fd59a6a80c65cf25	5.0 GB	Preview Download
GLAMI-1M-dataset-800px--4.zip md5:0134506e9d47ddf2e4c7b286b68e3c09	5.2 GB	Preview Download
GLAMI-1M-dataset-800px--5.zip md5:19dcb70eb67c4ecb560d1ba6580d0b3c	6.0 GB	Preview Download
GLAMI-1M-dataset-800px--6.zip md5:33cddce2f4c3df3be3ae8c0e7fb841ed	6.1 GB	Preview Download
GLAMI-1M-dataset-800px--7.zip md5:2ed26dcc539e1e57af4305e9b1f22686	6.0 GB	Preview Download
GLAMI-1M-dataset-800px--8.zip md5:6895f21aa72a790fb8ce9556637f42d7	6.3 GB	Preview Download
GLAMI-1M-dataset-800px--9.zip md5:235f5003edaf7b45a7d99b38d814311e	6.5 GB	Preview Download

	All versions	This version
Views	690	685
Downloads	977	973
Data volume	11.9 TB	11.9 TB

GLAMI-1M: A Multilingual Image-Text Fashion Dataset - 800px

Authors/Creators

Description

Files

GLAMI-1M-dataset-800px--0.zip

Files (64.0 GB)