BIOMAT-AnatNER Corpus: Train and Validation Sets
Creators
Description
BIOMAT-AnatNER Train and Validation Sets
BIOMAT-AnatNER stands for BIOMATerials Anatomical Structure Named Entity Recognition. It is a corpus developed as part of the Horizon Europe BIOMATDB project to support the extraction and classification of anatomical structure mentions in scientific literature related to biomaterials. The corpus focuses specifically on the annotation of entities encompassing tissues, organs and body parts that appear in the context of biomaterials applications, such as mentions of tissues involved in implantation studies, or organs relevant to biocompatibility testing.
The corpus was created through a collaborative effort involving domain experts, who were tasked with the establishment of comprehensive and accurate annotation guidelines for the manual annotation of the final gold standard corpus. On this basis, PubMed abstracts were carefully selected based on MeSH (Medical Subject Headings) terms associated with relevant disciplines, such as regenerative medicine, orthopedics, dentistry and cardiology, to reflect the terminology commonly used in biomaterials research and manually annotated according to the rules predefinedin the annotation guidelines.
This repository contains the train (750 documents) and validation (100 documents) sets of the BIOMAT-AnatNER Corpus, which are made available under open access for public use. These sets have been released to support the development of Named Entity Recognition (NER) models for biomaterials-related concept extraction from scientific literature, particularly those mentions referring to anatomical structures.
The test set is not included in this repository, as it is reserved for a future shared task planned within the scope of the project. For this reason, access to the full corpus remains restricted, but will be made publicly available upon completion of the shared task.
Resources
- Project Website
- Biomaterials Marketplace
- Biomaterials Database
- BIOMAT-AnatNER full corpus
- Annotation Guidelines
Files
BIOMAT-AnatNER_Train_Set.zip
Files
(5.5 MB)
Name | Size | Download all |
---|---|---|
md5:5ab6f580c87cdad847c83a8abf1a7a7b
|
5.5 MB | Preview Download |