Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published April 24, 2023 | Version 1.0.0
Dataset Open

PretoxTM Corpus: a gold standard corpus of preclinical treatment-related findings annotated from toxicology reports

  • 1. Barcelona Supercomputing Center


The PretoxTM Corpus is a gold standard corpus of preclinical treatment-related findings annotated from toxicology reports.

Example documents annotated by domain experts, also known as a gold standard corpus, are needed in order to develop, train, and validate text mining tools. To this aim we designed and performed an annotation activity for the development of the corpus of treatment-related findings: the PretoxTM corpus.

A treatment-related finding expression enclose several named entities; the most relevant one is the abnormal effect detected; which depending on the study domain of the finding, can be given by a measurement, test, or examination named Study Test and an abnormal Manifestation result obtained for that study test; or by an abnormal Finding in study domains where there is no associated test or measurement (e.g., clinical, macroscopic and microscopic). Other related named entities that could be present to complete the treatment-related finding are; the Specimen of the abnormal observation, the Sex of the subject, the Group of subjects in which the observation was detected and the Dose level administration of the compound. Examples of sentences with treatment-related findings are: “The decrease in food consumption and body weight of the animals from the mid dose onwards is regarded as evidence of general toxicity.” and "At dose level 3, absolute and relative liver weights were increased in male rats.”.

Contributions: The PretoxTM corpus was developed by BSC, with the contribution of IMIM and a team of experts from the eTRANSAFE EFPIA partners.

License: Creative Commons Attribution-ShareAlike 4.0 International (cc by sa 4.0).

The PretoxTM resources have been developed as part of the eTRANSAFE project.

For more information about PretoxTM please visit:

PretoxTM Corpus Gitlab:

PretoxTM central documentation:



Files (11.0 MB)

Name Size Download all
11.0 MB Preview Download

Additional details


eTRANSAFE – Enhacing TRANslational SAFEty Assessment through Integrative Knowledge Management 777365
European Commission