Published May 4, 2024 | Version v2
Dataset Open

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

  • 1. ROR icon Universidad Carlos III de Madrid
  • 2. ROR icon Universidad Politécnica de Madrid
  • 3. Universidad de Valladolid

Description

Description

Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations. 

The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations.

  • The 0_Base_Topics.xlsx file lists the topics used for the dataset generation
  • The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context:
    • Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
    • Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
    • Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
    • Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
    • Roles (context)
      • Default: No role is assigned to the LLM, the default role is used.
      • Child: The LLM is requested to answer as a five-year-old child. 
      • Young adult male: The LLM is requested to answer as a young male adult.    
      • Young adult female: The LLM is requested to answer as a young female adult.    
      • Elderly adult male: The LLM is requested to answer as an elderly male adult.    
      • Elderly adult female: The LLM is requested to answer as an elderly female adult.
      • Affluent adult male: The LLM is requested to answer as an affluent male adult.    
      • Affluent adult female: The LLM is requested to answer as an affluent female adult. 
      • Lower-class adult male: The LLM is requested to answer as a lower-class male adult. 
      • Lower-class adult female: The LLM is requested to answer as a lower-class female adult.  
      • Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary.

Paper

@article{10.1145/3696459,
author = {Mart\'{\i}nez, Gonzalo and Hern\'{a}ndez, Jos\'{e} Alberto and Conde, Javier and Reviriego, Pedro and Merino-G\'{o}mez, Elena},
title = {Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3696459},
doi = {10.1145/3696459},
abstract = ,
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = sep,
keywords = {LLM, Lexical diversity, ChatGPT, Evaluation}
}

Files

Files (46.5 MB)

Name Size Download all
md5:73c20af0a18681f3da00f36af36fa41d
25.6 kB Download
md5:a18f6ca007b32e892faf969ee0deb4ff
2.5 MB Download
md5:86af04906f8f4ddefef51d36e5df380c
2.8 MB Download
md5:813512e1041f74a1818e71673d19ee73
2.1 MB Download
md5:9dc67df880147e7480628e762440aae6
3.0 MB Download
md5:6cdc561af95e8bca5441618592fd021c
5.0 MB Download
md5:9286bac6ac6d8912e5b3332489f93591
9.9 MB Download
md5:6a21d6912efb6922e5bad887e9a3f22f
4.0 MB Download
md5:150c6d693bd655bf9a21b91b2d307029
2.7 MB Download
md5:851097698072a74ee2410156b483619c
4.4 MB Download
md5:7b37975ad42c11b2a7251f4f2d410da3
5.1 MB Download
md5:d7bb6668678e32caccc56298c5f688a6
2.0 MB Download
md5:c3e7224d4f4cde9f4f6d6071caf2bef0
2.9 MB Download

Additional details

Related works

Is published in
Publication: 10.48550/arXiv.2402.15518 (DOI)