CroQS Benchmark
Creators
Description
CroQS (Cross-modal Query Suggestion) v1.0.0 is a benchmark dataset designed to evaluate methods that generate improved textual queries guided by visual results, in the context of text-to-image retrieval. The dataset supports the task of generating query suggestions grounded in visual content, specifically helping users refine or reformulate queries based on result set clusters.
This version includes:
-
50 initial textual queries used to retrieve image sets from the MS COCO 2017 dataset.
-
295 manually defined semantic clusters, based on visual similarity or common properties in the image results.
-
8127 unique COCO images (referenced via URL, not redistributed directly).
-
Each query result set is manually grouped into 2 to 10 clusters (average ~5.9 clusters per query).
-
Each cluster is associated with a human-written query suggestion that describes the cluster's shared visual properties.
CroQS can be used to train or evaluate models for:
-
Query refinement and expansion
-
Multimodal and cross-modal retrieval
-
Cluster-based query suggestion
-
Vision-language understanding
For more details on the benchmark task and evaluation metrics, refer to our paper:
Maybe You Are Looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval.
In Proceedings of the 46th European Conference on Information Retrieval (ECIR 2025).
Springer Link | arXiv:2412.13834
Files
CroQS_Benchmark_v1.0.0.json
Files
(90.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d00d852725e4d1e222d0a03e07c34b43
|
90.1 kB | Preview Download |
Additional details
Additional titles
- Alternative title (En)
- Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval
- Subtitle
- A Benchmark for Cross-modal Query Suggestion for Text-to-Image Retrieval
Related works
- Is part of
- Conference paper: 10.1007/978-3-031-88711-6_9 (DOI)
Funding
- European Union
- FAIR – Future Artificial Intelligence Research - Spoke 1 PNRR M4C2 Inv. 1.3 PE00000013
- European Union
- MUCES: a MUltimedia platform for Content Enrichment and Search in audiovisual archives P2022BW7CW
- Fondazione ICSC Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing
- FutureHPC & BigData
- Ministero dell'università e della ricerca
- NEREO PRIN project 2022AEFHAZ
- Ministero dell'università e della ricerca
- FoReLab and CrossLab projects
Software
- Repository URL
- https://github.com/Ruggero1912/CroQS-benchmark