Dataset Open Access

BioPortal Snapshot 30.03.2017

Matentzoglu, Nicolas; Parsia, Bijan

The snapshot was created using the [BioPortal REST API](http://data.bioontology.org/documentation). The snapshot was produced for research purpose only. The download process was performed as follows:

  1. For all available and indexed ontologies (see meta1490889630984.csv for a view of the whole repository), the latest version was determined using the submission id. There were 512 such entries available via the webservice.
  2. An attempt was made to download the referenced file using the download URL. This file was not always available. (498 were attempted to be downloaded).
  3. If the file turned out to be a ZIP archive, it was unpacked before processing. Only archives with single ontologies were considered by the downloader.
  4. If a file was downloaded / unzipped, it was copied to a new directory and given the extension orig. The files in this directory are all byte equivalent  with the files downloaded, which may be of interest to some. (Total: 438 ontologies).
  5. Every file in the orig directory was then attempted to be parsed by the OWL API. If this was successful, the whole imports closure was merged and serialised into OWL/XML with the OWL API (4.2.8). These files are the recommended ones to study for most researchers, and can be found in the owlxml directory/archive.
  6. No repairing of OWL 2 DL profile violations was attempted for this snapshot, other than the default workings of the OWL API when serialising to OWL XML. However, for this report, we gathered the metrics applying a single fix: When the ontology hat a non absolute version IRI ([OntologyVersionIRINotAbsolute](https://github.com/owlcs/argo/issues/3)), we created one. This was done so that the profile counts are more meaningful. Note that a number of ontologies merely suffer from undeclared entities, which is easily remedied.
  7. The dataset contains 422 ontologies, out of which 73 are OWL Full (many of which with own minor violations), 168 Pure DL, i.e. falling under OWL 2 DL but not under any of the profiles, and 181 falling under one of the profiles (EL only: 47, EL+QL: 45, EL+QL+RL: 67, EL+RL: 4, QL-only: 6, RL-only: 8, RL+QL: 4). Note that the "orig" archive (directory) contains more ontologies the the owl xml, because some of them were downloadable, but not parsable. This might be interesting for researchers interested in exploring the reasons for parsing failures. A more in-depth characterisation with plots on the size distributions, detailed breakdowns of the profile violations and a list of all ontolgies can be found here: http://rpubs.com/matentzn/bioportal2017_03_30.

Files in this dataset:

  • bioportal2017.03.30.csv: Metadata about the ontologies in the set, such as axiom counts and profile membership. Non-absolute version IRI exceptions were fixed prior to measurement.
  • bioportal2017.03.30_norepair.csv: Metadata about the ontologies in the set, such as axiom counts and profile membership. No automated repairs.
  • 03_30_2017_05_54_25experiment.log: Print log with details on the errors and exceptions during the snapshot creation
  • meta1490889630984.csv: Dump of the entire BioPortal ontology list, including versions and so on. 
  • original.zip: the archive contains all files that were downloadable in their original state
  • owlxml.zip: the archive contains all files that were downloadable imports merged and serialised to OWLXML


For citation, we recommend

  • This dataset directly
  • Noy et al (2009) for BioPortal (https://www.ncbi.nlm.nih.gov/pubmed/19483092)
  • Matentzoglu et al (2013) for reference to the Manchester OWL repository and our continuous efforts to survey the state of OWL and ontologies (https://link.springer.com/chapter/10.1007/978-3-642-41335-3_21) 
Files (491.7 MB)
Name Size
03_30_2017_05_54_25experiment.log
md5:7afe385a0b55248c23f77fce8ddb4d07
1.8 MB Download
bioportal2017.03.30.csv
md5:cd0bacb30692b988d78f803933fe8470
512.3 kB Download
bioportal2017.03.30_norepair.csv
md5:387a83524bf1039e6cf1caf487da0639
522.7 kB Download
meta1490889630984.csv
md5:f48870cd6df95c737a7c8b80fd0eb20b
3.5 MB Download
original.zip
md5:f907914e56e42bb4d283ba97b8c15c94
365.9 MB Download
owlxml.zip
md5:02dcf4488feaccdaacdec14c9b280deb
119.4 MB Download
README.txt
md5:46af68c671d1c7d0b051f86964e06d93
3.6 kB Download
656
971
views
downloads
All versions This version
Views 656656
Downloads 971971
Data volume 56.8 GB56.8 GB
Unique views 603603
Unique downloads 758758

Share

Cite as