Dataset Open Access

Biotea-2-Bioschemas test data

Garcia, Leyla; Giraldo, Olga; Garcia, Alexander; Rebholz-Schuhmann, Dietrich

Biotea-2-Bioschemas mapps Biotea model to following the approach proposed by Bioschemas. Here we present the test data used in Biotea GitHub pages, corresponding to 2596 PubMed Open Access (PMC-OA) subset publications together with the software used to render markup.

Date deposited includes (i) publications retrieved from PMC-OA API, i.e., full text in JATS/XML, (ii) ontology terms recognized in the abstracts and obtained from the NCBO Annotator, i.e., semantic annotations, and (iii) the same annotations following the PubAnnotation format.

Software deposited includes (i) biotea-bioschemas-metadata which parses JATS/XML files and creates Bioschemas markup including metadata, abstract and references, (ii) biotea-bioschemas-annotations which parses PubAnnotation annotations and creates Bioschemas markup, and (iii) biotea-bioschemas-showcase which uses the other two in order to display markup in a graphical basic way and render it as a script element in the HTML following the JSON-LD format. The corresonding GitHub repositories are: (i), (ii), and (iii)

Biotea-2-bioschemas can be seen in action at

Files (114.9 MB)
Name Size
302.9 kB Download
374.3 kB Download
779.7 kB Download
41.1 MB Download
62.8 MB Download
9.5 MB Download
  • Garcia A, Lopez F, Garcia L, Giraldo O, Bucheli V, Dumontier M. Biotea: semantics for Pubmed Central. PeerJ. PeerJ Inc.; 2018;6: e4201.

  • Garcia Castro LJ, McLaughlin C, Garcia A. Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data. J Biomed Semantics. BioMed Central; 2013;4: S5.

  • Gray AJG, Goble CA, Jiménez R. Bioschemas: From Potato Salad to Protein Annotation. International Semantic Web Conference. 2017; Available:

  • Jonquet C, Shah NH, Youn CH, Musen MA, Storey M-A. NCBO Annotator: Semantic Annotation of Biomedical Data. Proceedings of the 2009 International Semantic Web Conference. 2009. Available:

  • Kim J-D, Wang Y. PubAnnotation: a persistent and shareable corpus and annotation repository. Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics; 2012. pp. 202–205.

All versions This version
Views 1515
Downloads 1313
Data volume 230.1 MB230.1 MB
Unique views 1212
Unique downloads 33


Cite as