It's the same but not the same: Do LLMs distinguish Spanish varieties?
Authors/Creators
Description
Spanish, spoken by over 600 million people, exhibits significant lexical, morphological, and syntactic diversity. Traditional benchmarks often overlook dialectal nuances, leading to biased assessments. This benchmark addresses the gap by focusing on dialectal variation and LLM performance in handling different Spanish dialects.
The Spanish Dialect Benchmark dataset evaluates the ability of LLMs to distinguish and accurately use various Spanish dialects. It addresses the challenge of dialectal bias by presenting 31 multiple-choice questions reflecting regional linguistic variations.
Examples:
-
¿Cuál suena más natural?
-
a. «Llegas tarde, vístete y corre». (Peninsular, Chilean Spanish)
-
b. «Llegas tarde, vístete y córrele». (Antillean, Mexican Spanish)
-
-
¿Qué verbo usas para describir la acción de ponerse de pie?
-
a. levantarse (Rioplatense, Peninsular Spanish)
-
b. pararse (Antillean, Mexican Spanish)
-
Files
Results.zip
Files
(680.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e383cfb9aee948920ac22db86d5998d3
|
361.5 kB | Download |
|
md5:484464661bb196db2e0e42839488407e
|
319.2 kB | Preview Download |
Additional details
Additional titles
- Translated title (Spanish)
- Es igual pero no es lo mismo: ¿Distinguen los LLMs las variedades del español?
Funding
- Agencia Estatal de Investigación
- Fun4Date PID2022-136684OB-C21/C22
- European Commission
- SMARTY 101140087