AGREE: a New Benchmark for the Evaluation of Semantic Models of Ancient Greek
Creators
Description
AGREE (Ancient Greek Relatedness Embeddings Evaluation) is a benchmark for the evaluation of semantic models of Ancient Greek created at the University of Groningen (The Netherlands). More information about it can be found in the following publication:
Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087
1. Overview of the repository
This benchmark was created from a mix of expert judgements about relatedness between Ancient Greek words and model outputs validated by human experts. The evaluation items are pairs of Ancient Greek lemmas with a high semantic relatedness.
The human judgements were collected via two questionnaires, proposing two different tasks to the experts. The evaluation items included in the AGREE benchmark are a selection of the most strictly related pairs of lemmas obtained from the two tasks. Here an overview of the contents of the repository:
- 1_agree_task1.json includes all the data collected with the first task. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'frequency': the number of times that the pair was suggested as related by an expert;
- 'POS1': part-of-speech of the first lemma;
- 'POS2': part-of-speech of the second lemma;
- 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no').
- 2_agree_task2.json includes all the data collected with the second task. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'origin':
- 'common_pair' = one of the two pairs proposed to all participants in the second task;
- 'task1' = pairs proposed by experts in the first task;
- 'models_easy_rel' = output of word2vec models, pair considered as strictly related;
- 'models_task1' = pairs proposed by experts in the first task and also output by word2vec models;
- 'models' = output of word2vec language models;
- 'unrelated' = made up pairs of unrelated lemmas (control pairs);
- 'respondents': number of experts evaluating a pair;
- 'score': average relatedness score given by the experts on a 0-100 scale;
- 'agreement': inter-annotated agreement between all experts who evaluated the block of pairs to which the current pair belongs (when available, i.e. when the block of pairs was presented to more than one participant);
- 'benchmark': inclusion of the pair in the AGREE benchmark ('yes'/'no').
- 3_agree_final_benchmark.json includes the final selection of items that constitutes AGREE. The following labels are used:
- 'pair': two Ancient Greek lemmas;
- 'origin':
- 'task1': pair either proposed more than once in the first task or proposed only once, but scored >= 70 in the second task;
- 'task2': pair scored by more than one respondent in the second task and with average score >= 70.
This updated version of the repository includes the individual answers to the two questionnaires (see files 'answers_Task1_postprocessed.xlsx' and 'raw_answers_Task2.xlsx').
2. Acknowledgements
We acknowledge the financial support of Anchoring Innovation. Anchoring Innovation is the Gravitation Grant research agenda of the Dutch National Research School in Classical Studies, OIKOS. It is financially supported by the Dutch ministry of Education, Culture and Science (NWO project number 024.003.012). For more information about the research programme and its results, see the website www.anchoringinnovation.nl.
We want to thank the experts of Ancient Greek around the world who shared their knowledge of Ancient Greek semantics and donated some of their precious time. Without them the creation of this benchmark would not have been possible.
We also want to thank the many colleagues from the University of Groningen, the National Research School OIKOS, and other Universities abroad who contributed to this work with discussion and advice.
3. Citation
Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim, AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek, Digital Scholarship in the Humanities, Volume 39, Issue 1, April 2024, Pages 373–392, https://doi.org/10.1093/llc/fqad087
Files
1_agree_task1.json
Files
(850.6 kB)
Name | Size | Download all |
---|---|---|
md5:13bbb3d989338e5869775a66c2d3db5c
|
101.2 kB | Preview Download |
md5:3f037ae1ad530f9098eda1e64e19243a
|
277.2 kB | Preview Download |
md5:da5ae1f2a67836cccb7b25058df97e13
|
57.8 kB | Preview Download |
md5:a828e4982c9caa0ff71ae410f36ee1a0
|
52.9 kB | Download |
md5:b3da17e10c5e1817f4344faee7052cad
|
361.4 kB | Download |