AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek

Stopponi, Silvia; Peels-Matthey, Saskia; Nissim, Malvina

doi:10.5281/zenodo.8027490

Published February 27, 2023 | Version v2

Dataset Open

AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek

1. University of Groningen

AGREE (Ancient Greek Relatedness Embeddings Evaluation) is a benchmark for the evaluation of semantic models of Ancient Greek created at the University of Groningen (The Netherlands). More information about it can be found in the following publication:

Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087

1. Overview of the repository

This benchmark was created from a mix of expert judgements about relatedness between Ancient Greek words and model outputs validated by human experts. The evaluation items are pairs of Ancient Greek lemmas with a high semantic relatedness.

The human judgements were collected via two questionnaires, proposing two different tasks to the experts. The evaluation items included in the AGREE benchmark are a selection of the most strictly related pairs of lemmas obtained from the two tasks. Here an overview of the contents of the repository:

1_agree_task1.json includes all the data collected with the first task. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'frequency': the number of times that the pair was suggested as related by an expert;
- 'POS1': part-of-speech of the first lemma;
- 'POS2': part-of-speech of the second lemma;
- 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no').
2_agree_task2.json includes all the data collected with the second task. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'origin':
  - 'common_pair' = one of the two pairs proposed to all participants in the second task;
  - 'task1' = pairs proposed by experts in the first task;
  - 'models_easy_rel' = output of word2vec models, pair considered as strictly related;
  - 'models_task1' = pairs proposed by experts in the first task and also output by word2vec models;
  - 'models' = output of word2vec language models;
  - 'unrelated' = made up pairs of unrelated lemmas (control pairs);
- 'respondents': number of experts evaluating a pair;
- 'score': average relatedness score given by the experts on a 0-100 scale;
- 'agreement': inter-annotated agreement between all experts who evaluated the block of pairs to which the current pair belongs (when available, i.e. when the block of pairs was presented to more than one participant);
- 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no').
3_agree_final_benchmark.json includes the final selection of items that constitutes AGREE. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'origin':
  - 'task1': pair either proposed more than once in the first task or proposed only once, but scored >= 70 in the second task;
  - 'task2': pair scored by more than one respondent in the second task and with average score >= 70.

This updated version of the repository includes the individual answers to the two questionnaires (see files 'answers_Task1_postprocessed.xlsx' and 'raw_answers_Task2.xlsx').

2. Acknowledgements

This work was partially supported by the Young Academy Groningen through the PhD scholarship of Silvia Stopponi.

We acknowledge the financial support of Anchoring Innovation. Anchoring Innovation is the Gravitation Grant research agenda of the Dutch National Research School in Classical Studies, OIKOS. It is financially supported by the Dutch ministry of Education, Culture and Science (NWO project number 024.003.012). For more information about the research programme and its results, see the website www.anchoringinnovation.nl.

We want to thank the experts of Ancient Greek around the world who shared their knowledge of Ancient Greek semantics and donated some of their precious time. Without them the creation of this benchmark would not have been possible.

We also want to thank the many colleagues from the University of Groningen, the National Research School OIKOS, and other Universities abroad who contributed to this work with discussion and advice.

3. Citation
Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087

Files

1_agree_task1.json

Files (850.6 kB)

Name	Size	Download all
1_agree_task1.json md5:13bbb3d989338e5869775a66c2d3db5c	101.2 kB	Preview Download
2_agree_task2.json md5:3f037ae1ad530f9098eda1e64e19243a	277.2 kB	Preview Download
3_agree_final_benchmark.json md5:da5ae1f2a67836cccb7b25058df97e13	57.8 kB	Preview Download
answers_Task1_postprocessed.xlsx md5:a828e4982c9caa0ff71ae410f36ee1a0	52.9 kB	Download
raw_answers_Task2.xlsx md5:b3da17e10c5e1817f4344faee7052cad	361.4 kB	Download

	All versions	This version
Views	225	176
Downloads	156	125
Data volume	28.4 MB	23.4 MB

AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek

Creators

Description

Files

1_agree_task1.json

Files (850.6 kB)