Published March 3, 2025 | Version v1
Dataset Open

Manually curated evaluation dataset and HBDB snapshot without full text

  • 1. ROR icon National Taiwan University

Description

The HBDB snapshot can be imported directly. However, please note that the sentences table has been removed due to the terms of Elsevier's Text and Data Mining (TDM) service.

The 'eval_dataset' is a manually curated dataset that includes various terms associated with four chemicals. The folder structure is organized into four layers as follows:

  1. Term A: The target volatile organic compound (VOC), such as acetone.
  2. Category of Term B: This could be a classification like chemical or molecular function.
  3. Reference ID in HBDB: Please refer to the HBDB snapshot for the URL (DOI or URL) and PubMed ID (PMID). For example, Reference ID 15878 corresponds to PubMed ID 21871718 or this link.
  4. JSON File Attributes:
    • term_A: The target VOC.
    • term_B: Related term.
    • context: Truncated sentences limited to 200 characters due to the terms of Elsevier's TDM service. Please refer to the original paper for the complete text.
    • category: Category of term B, matching the second layer.
    • score: Relationship score.
    • verified: Indicates manual curation, done twice.
    • table: The corresponding table in the HBDB database snapshot.
    • compound_id: Compound ID in HBDB (e.g., 28 for acetone).
    • reference_id: Reference ID, corresponding to the third layer.
    • paragraph: Section containing the extracted sentences.

Files

eval_dataset.zip

Files (281.2 MB)

Name Size Download all
md5:32bc615914a77eaf6441e771e4ef3e1a
161.8 kB Preview Download
md5:9d007914ce7253aad15277fadd640803
281.0 MB Download