Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published August 16, 2023 | Version 1.0.0
Dataset Open

Model Output of GPT-3.5 and GPT-4 for ECHR-AM

  • 1. University of Passau
  • 2. University of Passau | Institute for Artificial Intelligence Research and Development of Serbia, Novi Sad, Serbia.

Description

 "gpt3.5-gpt4-input-output-echram.zip" :

Input and output to GPT-3.5 and GPT-4 based on ECHR dataset published in JSON format in this paper for argument component classification only i.e. clauses that are argumentative (conclusion/premise), extracted from the JSON file

Note: Output of the model is under OpenAI Terms & policies.

 

Please cite our paper also if you use this dataset: Performance analysis of large language models in the domain of legal argument mining

You can click here for BibTex or copy the text below. 

@ARTICLE{10.3389/frai.2023.1278796,

AUTHOR={Al Zubaer, Abdullah  and Granitzer, Michael  and Mitrović, Jelena },

TITLE={Performance analysis of large language models in the domain of legal argument mining},

JOURNAL={Frontiers in Artificial Intelligence},

VOLUME={6},

YEAR={2023},

URL={https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1278796},

DOI={10.3389/frai.2023.1278796},

ISSN={2624-8212},

ABSTRACT={<p>Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.</p>}}

 

 

 

 

Files

gpt3.5-gpt4-input-output-echram.zip

Files (29.2 MB)

Name Size Download all
md5:4de30b402b253cde3caa0437b1bfcbbc
29.2 MB Preview Download

Additional details

Related works

Is supplemented by
Journal article: 10.3389/frai.2023.1278796 (DOI)

References

  • Prakash Poudyal, Jaromir Savelka, Aagje Ieven, Marie Francine Moens, Teresa Goncalves, and Paulo Quaresma. 2020. ECHR: Legal Corpus for Argument Mining. In Proceedings of the 7th Workshop on Argument Mining, pages 67–75, Online. Association for Computational Linguistics.