Published November 16, 2022 | Version 2022-11-16
Dataset Open

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

  • 1. GLAMI
  • 2. GLAMI, FNSPE CTU in Prague
  • 3. Rossum

Description

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at: https://github.com/glami/glami-1m.

Files

GLAMI-1M-dataset.zip

Files (11.2 GB)

Name Size Download all
md5:500348bbf54595db81cba353acd50d78
11.2 GB Preview Download