Learning to quantify: LeQua 2022 datasets

Esuli, Andrea; Moreo, Alejandro; Sebastiani, Fabrizio

doi:10.5281/zenodo.6546188

Published December 1, 2021 | Version v3

Dataset Open

Learning to quantify: LeQua 2022 datasets

1. ISTI-CNR

# Learning to Quantify

The aim of LeQua 2022 (the 1st edition of the CLEF “Learning to Quantify” lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual datasets, i.e., methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.

## Links

https://lequa2022.github.io/

https://github.com/HLT-ISTI/LeQua2022_scripts

## Tasks

T1A: This task is concerned with evaluating binary quantifiers, i.e., quantifiers that must only predict the relative frequencies of a class and its complement. Participants in this task will be provided with documents already converted into vector form; the task is thus suitable for participants who do not wish to engage in generating representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.

T1B: This task is concerned with evaluating single-label multi-class quantifiers, i.e., quantifiers that operate on documents that each belong to exactly one among a set of n>2 classes. Like in Task T1A, participants will be provided with documents already converted in vector form.

T2A: Like Task T1A, this task is concerned with evaluating binary quantifiers. Unlike in Task T1A, participants will be provided with the raw text of the documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems.

T2B: Like Task T1B, this task is concerned with evaluating single-label multi-class quantifiers; like in Task T2A, participants will be provided with the raw text of the documents.

Files

ReadMe.txt

Files (9.0 GB)

Name	Size
ReadMe.txt md5:033aaaa0df2fad4ff61bf21ae40d629a	1.9 kB	Preview Download
T1A.test.zip md5:e32f575310b48275bc0d4391f43762d9	1.1 GB	Preview Download
T1A.test_prevalences.zip md5:2ca5507b2d8ef3eda1d26ae3fb4bb966	19.9 kB	Preview Download
T1A.train_dev.zip md5:c2fbf10756baf9b6627e570d220e0845	230.2 MB	Preview Download
T1B.test.zip md5:d0c5a7a85e89649eadc52842b73a2063	4.5 GB	Preview Download
T1B.test_prevalences.zip md5:df61e6b2c8ed5d9b4a5b38a7676974de	179.2 kB	Preview Download
T1B.train_dev.zip md5:0dca2e82adc97b219022d2a6cf11386f	908.4 MB	Preview Download
T2A.test.zip md5:f7bf75f3841e0157757e317500f56546	413.5 MB	Preview Download
T2A.test_prevalences.zip md5:3034924f30df7ad4eb628f621ba36ac6	20.2 kB	Preview Download
T2A.train_dev.zip md5:0bd2aba01c723e7aec534e2f15a9895d	83.7 MB	Preview Download
T2B.test.zip md5:c372234d1b6fb6a1ccc4dd564ecb7bb6	1.5 GB	Preview Download
T2B.test_prevalences.zip md5:2640dc56bd8437026e426a3d1ab7e796	171.9 kB	Preview Download
T2B.train_dev.zip md5:401114c778d992551b27a1f7d25805f4	299.3 MB	Preview Download

Additional details

European Commission
SoBigData-PlusPlus - SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics 871042
European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911

	All versions	This version
Views	2,008	937
Downloads	1,982	1,297
Data volume	1.6 TB	1.1 TB

Learning to quantify: LeQua 2022 datasets

Authors/Creators

Description

Files

ReadMe.txt

Files (9.0 GB)

Additional details

Funding