Vocabulary Tests LLMs

Gonzalo, Martínez; Javier, Conde; Elena, Merino-Gómez; Beatríz, Bermúdez-Margaretto; José Alberto, Hernández; Pedro, Reviriego; Marc, Brysbaert

doi:10.5281/zenodo.14453973

Published December 13, 2024 | Version v1

Dataset Open

Vocabulary Tests LLMs

1. Universidad Carlos III de Madrid
2. Universidad Politécnica de Madrid
3. Universidad de Valladolid
4. University of Salamanca
5. Ghent University

Vocabulary evaluation of LLMs

This repository contains the results for the different vocabulary tests run on LLM tools/models presented in the paper: "The continued usefulness of vocabulary tests for evaluating large language models" currently published in PLOS ONE: https://doi.org/10.1371/journal.pone.0308259

The name of the files correspond to the vocabulary tests for which results are presented in Tables 3-6 in the paper (note that questions for the TOEFL test are not public).

In each file, the first column has the question posed to the LLM tool/model, followed by the correct answer. The rest of the columns correspond to the answers of the different LLM tools/models evaluated. The results and percentages are summarized at the bottom of the file after the last test items.

The models evaluated are:

Model	Link
Llama 2 7b	https://huggingface.co/meta-llama/Llama-2-7b-chat
Llama 2 13b	https://huggingface.co/meta-llama/Llama-2-13b-chat
Llama 2 70b	https://huggingface.co/meta-llama/Llama-2-70b-chat
Mistral 7b v0.1	https://huggingface.co/mistralai/Mistral-7B-v0.1
GPT 3.5 turbo 0613	https://platform.openai.com/docs/models/gpt-3-5-turbo
GPT 4 0613	https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
Bard

To cite our work:

@article{10.1371/journal.pone.0308259,
    doi = {10.1371/journal.pone.0308259},
    author = {Martínez, Gonzalo AND Conde, Javier AND Merino-Gómez, Elena AND Bermúdez-Margaretto, Beatriz AND Hernández, José Alberto AND Reviriego, Pedro AND Brysbaert, Marc},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {Establishing vocabulary tests as a benchmark for evaluating large language models},
    year = {2024},
    month = {12},
    volume = {19},
    url = {https://doi.org/10.1371/journal.pone.0308259},
    pages = {1-17},
    number = {12},

}

Files

LLM_Vocabulary_Evaluation.zip

Files (237.4 kB)

Name	Size	Download all
LLM_Vocabulary_Evaluation.zip md5:9419049130f5325ad4a2f587acfa3c8d	237.4 kB	Preview Download

Additional details

Repository URL: https://github.com/WordsGPT/LLM_Vocabulary_Evaluation

	All versions	This version
Views	54	54
Downloads	7	7
Data volume	1.7 MB	1.7 MB

Vocabulary Tests LLMs

Creators

Description

Vocabulary evaluation of LLMs

Files

LLM_Vocabulary_Evaluation.zip

Files (237.4 kB)

Additional details

Software