Semantic Textual Similarity in Catalan

Rodriguez-Penagos, Carlos Gerardo; Armentano-Oller, Carme; Gonzalez-Agirre, Aitor; Gibert Bonet, Ona

doi:10.5281/zenodo.4621370

Published February 10, 2021 | Version 1.0.1

Dataset Open

Semantic Textual Similarity in Catalan

1. BSC

STS corpus is a benchmark for evaluating Semantic Text Similarity in Catalan.
It consists of 3079 sentence pairs, annotated with the semantic similarity between them, using a scale from 0 (no similarity at all) to 5 (semantic equivalence). It is done manually by 4 different people following our guidelines based on previous work from the SemEval challenges (https://www.aclweb.org/anthology/S13-1004.pdf).

This dataset was developed by BSC TeMU as part of the AINA project.

Corpus per evaluar STS en català.

Consta de 3079 parells de frases, anotades segons el grau de similitud semàntica que tenen, segons una escala que va de 0 (no s'assemblen gens) a 5 (són equivalents). L'anotació ha estat feta manualment per 4 persones segons les nostres guies, basades en els SemEval Callenges (https://www.aclweb.org/anthology/S13-1004.pdf)

Aquest dataset ha estat desenvolupat pel la unitat de Text mining del BSC en el marc del projecte Aina.

Files

STS-ca-v1.0.1.zip

Files (1.3 MB)

Name	Size	Download all
STS-ca-v1.0.1.zip md5:fa1c3246d7499dda975bd0424daa1be9	1.3 MB	Preview Download

Views

153

Downloads

Show more details

	All versions	This version
Views	1,273	254
Downloads	153	19
Data volume	154.8 MB	24.4 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

Catalan

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 19, 2021
Modified: August 2, 2021

Semantic Textual Similarity in Catalan

Creators

Description

Files

STS-ca-v1.0.1.zip

Files (1.3 MB)