Published December 13, 2024 | Version v1
Dataset Open

Vocabulary Tests LLMs

  • 1. ROR icon Universidad Carlos III de Madrid
  • 2. ROR icon Universidad Politécnica de Madrid
  • 3. ROR icon Universidad de Valladolid
  • 4. EDMO icon University of Salamanca
  • 5. ROR icon Ghent University

Description

Vocabulary evaluation of LLMs


This repository contains the results for the different vocabulary tests run on LLM tools/models presented in the paper: "The continued usefulness of vocabulary tests for evaluating large language models" currently published  in PLOS ONE: https://doi.org/10.1371/journal.pone.0308259


The name of the files correspond to the vocabulary tests for which results are presented in Tables 3-6 in the paper (note that questions for the TOEFL test are not public).  

In each file, the first column has the question posed to the LLM tool/model, followed by the correct answer. The rest of the columns correspond to the answers of the different LLM tools/models evaluated.  The results and percentages are summarized at the bottom of the file after the last test items.

The models evaluated are:

Model Link
Llama 2 7b   https://huggingface.co/meta-llama/Llama-2-7b-chat
Llama 2 13b   https://huggingface.co/meta-llama/Llama-2-13b-chat
Llama 2 70b https://huggingface.co/meta-llama/Llama-2-70b-chat
Mistral 7b v0.1 https://huggingface.co/mistralai/Mistral-7B-v0.1
GPT 3.5 turbo 0613 https://platform.openai.com/docs/models/gpt-3-5-turbo
GPT 4 0613   https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
Bard  

To cite our work:

@article{10.1371/journal.pone.0308259,
    doi = {10.1371/journal.pone.0308259},
    author = {Martínez, Gonzalo AND Conde, Javier AND Merino-Gómez, Elena AND Bermúdez-Margaretto, Beatriz AND Hernández, José Alberto AND Reviriego, Pedro AND Brysbaert, Marc},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {Establishing vocabulary tests as a benchmark for evaluating large language models},
    year = {2024},
    month = {12},
    volume = {19},
    url = {https://doi.org/10.1371/journal.pone.0308259},
    pages = {1-17},
    number = {12},

}

Files

LLM_Vocabulary_Evaluation.zip

Files (237.4 kB)

Name Size Download all
md5:9419049130f5325ad4a2f587acfa3c8d
237.4 kB Preview Download

Additional details