Published October 9, 2023
| Version 1
Dataset
Open
Synthetic datasets for end-to-end Relation Extraction of relationships between Organisms and Natural-Products
- 1. Idiap Research Institute
- 2. dECMT, Cancer Research UK Manchester Institute; The University of Manchester
- 3. The University of Manchester; Idiap Research Institute
Description
Synthetic datasets (training/validation) for end-to-end Relation Extraction of relationships between Organisms and Natural-Products. The datasets are provided for reproducibility purposes, but, can also be used to train new models.
As in the corresponding article, 3 subtypes of synthetic datasets are provided:
- Diversity-synt: The seed literature references used in the generation process correspond to the top-500 extracted items per biological kingdoms using the GME-sampler.
- Random-synt: 5 datasets of equivalent sizes as Diversity-synt, but using randomly sampled seed literature references.
- Extended-synt: A merge of Diversity-synt and the 5 Random-synt datasets.
All datasets were produced with Vicuna-13b-v1.3. Like the model, the produced synthetic data are also submitted to the License of the model used for generation, see the original LLaMA model card.
LLaMA is licensed under the LLaMA License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
Files
Files
(16.2 MB)
Name | Size | Download all |
---|---|---|
md5:2d71cda754eddad2949ba90511dd2610
|
16.2 MB | Download |