Published October 9, 2023 | Version 1
Dataset Open

Evaluation dataset for Relation Extraction of relationships between Organisms and Natural-Products

  • 1. dECMT, Cancer Research UK Manchester Institute; The University of Manchester
  • 2. Idiap Research Institute
  • 3. The University of Manchester; Idiap Research Institute

Description

A curated evaluation dataset for end-to-end Relation Extraction of relationships between organisms and natural-products.

Details about the manual annotation:

  • For Chemicals:
    • The chemical labels are annotated as they appear in the abstract.
    • In abstracts, singular chemicals and classes of chemicals produced by a specific organism were distinguished.
    • The "type" attribute {“chemical”, “class”} is used to indicate the nature of the mentioned name.
    • A "class" attribute for chemical entities has also been included if class information is present in the abstract.
    • A Wikidata and PubChem identifiers were assigned to chemicals and classes when available.
  • For Organisms:
    • The organism labels are annotated as they appear in the abstract.
    • If in an abstract, the genus name was mention first, e.g. "Plakinastrella sp." and then the specie name e.g "Plakinastrella clathrata" is precise, then only the specie name is used.
    • A Wikidata identifier was assigned to all organisms.
    • In some abstracts, only the genus name is mentioned.
  • For Relations:
    • Only the relations explicitly mentioned in the abstract are reported in the output labels.
    • Relations are reported in their order of appearance in the abstract.

Files

curated_test_set.json

Files (671.0 kB)

Name Size Download all
md5:04469028908079cbabc4dcc68456511f
671.0 kB Preview Download