Published March 3, 2025
| Version v1
Dataset
Open
Manually curated evaluation dataset and HBDB snapshot without full text
Authors/Creators
Description
The HBDB snapshot can be imported directly. However, please note that the sentences table has been removed due to the terms of Elsevier's Text and Data Mining (TDM) service.
The 'eval_dataset' is a manually curated dataset that includes various terms associated with four chemicals. The folder structure is organized into four layers as follows:
- Term A: The target volatile organic compound (VOC), such as acetone.
- Category of Term B: This could be a classification like chemical or molecular function.
- Reference ID in HBDB: Please refer to the HBDB snapshot for the URL (DOI or URL) and PubMed ID (PMID). For example, Reference ID 15878 corresponds to PubMed ID 21871718 or this link.
- JSON File Attributes:
- term_A: The target VOC.
- term_B: Related term.
- context: Truncated sentences limited to 200 characters due to the terms of Elsevier's TDM service. Please refer to the original paper for the complete text.
- category: Category of term B, matching the second layer.
- score: Relationship score.
- verified: Indicates manual curation, done twice.
- table: The corresponding table in the HBDB database snapshot.
- compound_id: Compound ID in HBDB (e.g., 28 for acetone).
- reference_id: Reference ID, corresponding to the third layer.
- paragraph: Section containing the extracted sentences.
Files
eval_dataset.zip
Files
(281.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:32bc615914a77eaf6441e771e4ef3e1a
|
161.8 kB | Preview Download |
|
md5:9d007914ce7253aad15277fadd640803
|
281.0 MB | Download |