Published April 30, 2025 | Version v1
Dataset Open

BIOMAT-AnatNER Corpus: Train and Validation Sets

  • 1. ROR icon Barcelona Supercomputing Center
  • 2. ROR icon Universitat Politècnica de Catalunya

Description

BIOMAT-AnatNER Train and Validation Sets

BIOMAT-AnatNER stands for BIOMATerials Anatomical Structure Named Entity Recognition. It is a corpus developed as part of the Horizon Europe BIOMATDB project to support the extraction and classification of anatomical structure mentions in scientific literature related to biomaterials. The corpus focuses specifically on the annotation of entities encompassing tissues, organs and body parts that appear in the context of biomaterials applications, such as mentions of tissues involved in implantation studies, or organs relevant to biocompatibility testing.

The corpus was created through a collaborative effort involving domain experts, who were tasked with the establishment of comprehensive and accurate annotation guidelines for the manual annotation of the final gold standard corpus. On this basis, PubMed abstracts were carefully selected based on MeSH (Medical Subject Headings) terms associated with relevant disciplines, such as regenerative medicine, orthopedics, dentistry and cardiology, to reflect the terminology commonly used in biomaterials research and manually annotated according to the rules predefinedin the annotation guidelines.

This repository contains the train (750 documents) and validation (100 documents) sets of the BIOMAT-AnatNER Corpus, which are made available under open access for public use. These sets have been released to support the development of Named Entity Recognition (NER) models for biomaterials-related concept extraction from scientific literature, particularly those mentions referring to anatomical structures.

The test set is not included in this repository, as it is reserved for a future shared task planned within the scope of the project. For this reason, access to the full corpus remains restricted, but will be made publicly available upon completion of the shared task.

Resources

Files

BIOMAT-AnatNER_Train_Set.zip

Files (5.5 MB)

Name Size Download all
md5:5ab6f580c87cdad847c83a8abf1a7a7b
5.5 MB Preview Download