BIOMAT-AnatNER: A Biomaterials Domain-Specific Corpus for Named Entity Recognition of Anatomical Structures

Rosell, Judith; Mateu Sanz, Miguel; Foltz, Clemence; Rodríguez Ortega, Miguel; Rodríguez Miret, Jan; Krallinger, Martin

doi:10.5281/zenodo.15278901

Published April 25, 2025 | Version v1

Dataset Restricted

BIOMAT-AnatNER: A Biomaterials Domain-Specific Corpus for Named Entity Recognition of Anatomical Structures

1. Barcelona Supercomputing Center
2. Universitat Politècnica de Catalunya

BIOMAT-AnatNER Corpus

BIOMAT-AnatNER stands for BIOMATerials Anatomical Structure Named Entity Recognition. It is a corpus developed as part of the Horizon Europe BIOMATDB project to support the extraction and classification of anatomical structure mentions in scientific literature related to biomaterials. The corpus focuses specifically on the annotation of entities encompassing tissues, organs and body parts that appear in the context of biomaterials applications, such as mentions of tissues involved in implantation studies, or organs relevant to biocompatibility testing.

The corpus was created through a collaborative effort involving domain experts, who were tasked with the establishment of comprehensive and accurate annotation guidelines for the manual annotation of the final gold standard corpus. On this basis, PubMed abstracts were carefully selected based on MeSH (Medical Subject Headings) terms associated with relevant disciplines, such as regenerative medicine, orthopedics, dentistry and cardiology, to reflect the terminology commonly used in biomaterials research and manually annotated according to the rules predefinedin the annotation guidelines.

The BIOMAT-AnatNER corpus is one of four developed within the project and is divided into three subsets: a training set (750 documents), a test set (150 documents), and a validation set (100 documents), available in multiple formats, including brat, CSV and CoNLL.

This corpus is part of a broader initiative to support the development of an advanced, searchable biomaterials database with integrated analytical tools and digital advisors. It is also intended for use in training Named Entity Recognition (NER) models, enabling the automatic identification and extraction of anatomical structure mentions relevant to biomaterials research and development.

Resources

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

	All versions	This version
Views	62	62
Downloads	19	19
Data volume	123.7 MB	123.7 MB

BIOMAT-AnatNER: A Biomaterials Domain-Specific Corpus for Named Entity Recognition of Anatomical Structures

Creators

Description

BIOMAT-AnatNER Corpus

Resources

Files

Restricted