Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

Gonzalo, Martínez; José Alberto, Hernández; Javier, Conde; Pedro, Reviriego; Elena, Merino

doi:10.5281/zenodo.11121394

Published May 4, 2024 | Version v2

Dataset Open

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

1. Universidad Carlos III de Madrid
2. Universidad Politécnica de Madrid
3. Universidad de Valladolid

Description

Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations.

The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations.

The 0_Base_Topics.xlsx file lists the topics used for the dataset generation
The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context:
- Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
- Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
- Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
- Roles (context)
  - Default: No role is assigned to the LLM, the default role is used.
  - Child: The LLM is requested to answer as a five-year-old child.
  - Young adult male: The LLM is requested to answer as a young male adult.
  - Young adult female: The LLM is requested to answer as a young female adult.
  - Elderly adult male: The LLM is requested to answer as an elderly male adult.
  - Elderly adult female: The LLM is requested to answer as an elderly female adult.
  - Affluent adult male: The LLM is requested to answer as an affluent male adult.
  - Affluent adult female: The LLM is requested to answer as an affluent female adult.
  - Lower-class adult male: The LLM is requested to answer as a lower-class male adult.
  - Lower-class adult female: The LLM is requested to answer as a lower-class female adult.
  - Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary.

Paper

@article{10.1145/3696459,
author = {Mart\'{\i}nez, Gonzalo and Hern\'{a}ndez, Jos\'{e} Alberto and Conde, Javier and Reviriego, Pedro and Merino-G\'{o}mez, Elena},
title = {Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3696459},
doi = {10.1145/3696459},
abstract = ,
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = sep,
keywords = {LLM, Lexical diversity, ChatGPT, Evaluation}
}

Files

Files (46.5 MB)

Name	Size	Download all
0_Base_topics.xlsx md5:73c20af0a18681f3da00f36af36fa41d	25.6 kB	Download
Frequency_GPT35.xlsx md5:a18f6ca007b32e892faf969ee0deb4ff	2.5 MB	Download
Frequency_NYT_GPT4.xlsx md5:86af04906f8f4ddefef51d36e5df380c	2.8 MB	Download
Presence_GPT35.xlsx md5:813512e1041f74a1818e71673d19ee73	2.1 MB	Download
Presence_NYT_GPT4.xlsx md5:9dc67df880147e7480628e762440aae6	3.0 MB	Download
Roles_GPT35.xlsx md5:6cdc561af95e8bca5441618592fd021c	5.0 MB	Download
Roles_Llama3-8B.xlsx md5:9286bac6ac6d8912e5b3332489f93591	9.9 MB	Download
Roles_Mistral-7B.xlsx md5:6a21d6912efb6922e5bad887e9a3f22f	4.0 MB	Download
Roles_NYT_GPT4_.xlsx md5:150c6d693bd655bf9a21b91b2d307029	2.7 MB	Download
Temperature_GPT35.xlsx md5:851097698072a74ee2410156b483619c	4.4 MB	Download
Temperature_NTY_GPT4.xlsx md5:7b37975ad42c11b2a7251f4f2d410da3	5.1 MB	Download
Top_GPT35.xlsx md5:d7bb6668678e32caccc56298c5f688a6	2.0 MB	Download
Top_NYT_GPT4.xlsx md5:c3e7224d4f4cde9f4f6d6071caf2bef0	2.9 MB	Download

Additional details

Is published in: Publication: 10.48550/arXiv.2402.15518 (DOI)

	All versions	This version
Views	334	198
Downloads	754	426
Data volume	2.5 GB	1.6 GB

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

Creators

Description

Description

Paper

Files

Files (46.5 MB)

Additional details

Related works