Published July 21, 2021
| Version 1.0
Journal article
Open
DrugProt Large-Scale Text Mining corpus: Biocreative VII Track 1 - Text mining drug and chemical-protein interactions
Creators
- 1. Barcelona Supercomputing Center
- 2. University of Turku
Description
This Zenodo contains the BioCreative VII Large scale DrugProt Additional Subtrack abstracts and entity annotations.
Abstracts
- large_scale_abstracts.tsv This file contains plain-text, UTF8-encoded, NFC normalized DrugProt PubMed records in a tab ‐ separated format. In total 2366081 records are provided, where each line in the fails contains a single PMID, title and abstract separated by tabulators. Due to PubMed inconsistencies, there is a minor percentage of duplicated records. Indeed, we have identified 222 records with different PMID but the same abstract title and body.
Entity mention annotations
-
large_scale_entities.tsv. This file contains the automatically labeled mention annotations of chemical compounds and genes/proteins (so-called gene and protein-related objects as defined during BioCreative V) generated for the Large Scale records. There are 53993602 entity annotations.
Related resources:
- Web
- DrugProt corpus
- Evaluation library
- Online evaluation (CodaLab)
- Relation annotation guidelines
- Gene and protein annotation guidelines
- Chemicals and drugs annotation guidelines
- DrugProt Silver Standard Knowledge Graph
- FAQ
- DrugProt Large Scale Additional SubTrack
- DrugProt Large Scale document collection protocol
- DrugProt Complete PubMed Knowledge Graph
Files
large-scale-drugprot.zip
Files
(1.9 GB)
Name | Size | Download all |
---|---|---|
md5:ffafa66950258ef831817b26a286e79c
|
1.9 GB | Preview Download |