Published April 30, 2025 | Version v1
Dataset Open

BIOMAT-MONER Corpus: Train and Validation Sets

  • 1. ROR icon Barcelona Supercomputing Center
  • 2. Universitat Politècnica de Catalunya, Barcelona, ES
  • 3. ROR icon West Pomeranian University of Technology

Description

BIOMAT-MONER Train and Validation Sets

BIOMAT-MONER stands for BIOMATerials Manufactured Object Named Entity Recognition. It is a corpus developed within the scope of the Horizon Europe BIOMATDB project to support the extraction and classification of manufactured object relevant to biomaterials research entity mentions from scientific literature in the biomaterials domain. The corpus focuses on the annotation of manufactured objects, including references to devices, tools, experimental apparatus, and implantable medical products, along with their chemical, physical, and mechanical properties among other relevant features, as well as their trade names, that are described in the context of their use in biomaterials-related experiments or applications.

The corpus was created through a collaborative effort involving domain experts, who were tasked with the establishment of comprehensive and accurate annotation guidelines for the manual annotation of the final gold standard corpus. To ensure domain relevance and terminological coverage, PubMed abstracts were carefully selected based on relevant MeSH (Medical Subject Headings) terms associated with biomaterials, medical devices, and related fields. The abstracts were then manually annotated according to the rules predefined in the annotation guidelines.

This repository contains the train (750 documents) and validation (100 documents) sets of the BIOMAT-MONER Corpus, which are made available under open access for public use. These sets have been released to support the development of Named Entity Recognition (NER) models for biomaterials-related concept extraction from scientific literature, particularly those mentions referring to manufactured objects relevant to biomaterials research.

The test set is not included in this repository, as it is reserved for a future shared task planned within the scope of the project. For this reason, access to the full corpus remains restricted, but will be made publicly available upon completion of the shared task.

Resources

Files

BIOMAT-MONER_Train_Set.zip

Files (6.0 MB)

Name Size Download all
md5:6c5ef40ae8d5bb1c745ad2e2734ceed5
6.0 MB Preview Download