The snapshot was created using the [BioPortal REST API](http://data.bioontology.org/documentation). The snapshot was produced for research purpose only. The download process was performed as follows:

1. For all available and indexed ontologies (see meta1490889630984.csv for a view of the whole repository), the latest version was determined using the submission id. There were 512 such entries available via the webservice.
2. An attempt was made to download the referenced file using the download URL. This file was not always available. (498 were attempted to be downloaded).
3. If the file turned out to be a ZIP archive, it was unpacked before processing. Only archives with single ontologies were considered by the downloader.
4. If a file was downloaded / unzipped, it was copied to a new directory and given the extension orig. The files in this directory are all byte equivalent  with the files downloaded, which may be of interest to some. (Total: 438 ontologies).
5. Every file in the orig directory was then attempted to be parsed by the OWL API. If this was successful, the whole imports closure was merged and serialised into OWL/XML with the OWL API (4.2.8). These files are the recommended ones to study for most researchers, and can be found in the owlxml directory/archive.
6. No repairing of OWL 2 DL profile violations was attempted for this snapshot, other than the default workings of the OWL API when serialising to OWL XML. However, for this report, we gathered the metrics applying a single fix: When the ontology hat a non absolute version IRI ([OntologyVersionIRINotAbsolute](https://github.com/owlcs/argo/issues/3)), we created one. This was done so that the profile counts are more meaningful. Note that a number of ontologies merely suffer from undeclared entities, which is easily remedied.
7. The dataset contains 422 ontologies, out of which 73 are OWL Full (many of which with own minor violations), 168 Pure DL, i.e. falling under OWL 2 DL but not under any of the profiles, and 181 falling under one of the profiles (EL only: 47, EL+QL: 45, EL+QL+RL: 67, EL+RL: 4, QL-only: 6, RL-only: 8, RL+QL: 4). Note that the "orig" archive (directory) contains more ontologies the the owl xml, because some of them were downloadable, but not parsable. This might be interesting for researchers interested in exploring the reasons for parsing failures. A more in-depth characterisation can be found here: http://rpubs.com/matentzn/bioportal2017_03_30.

Files in this dataset:

* bioportal2017.03.30.csv: Metadata about the ontologies in the set, such as axiom counts and profile membership. Non-absolute version IRI exceptions were fixed prior to measurement.
* bioportal2017.03.30_norepair.csv: Metadata about the ontologies in the set, such as axiom counts and profile membership. No automated repairs.
* 03_30_2017_05_54_25experiment.log: Print log with details on the errors and exceptions during the snapshot creation
* meta1490889630984.csv: Dump of the entire BioPortal ontology list, including versions and so on. 
* original.zip: the archive contains all files that were downloadable in their original state
* owlxml.zip: the archive contains all files that were downloadable imports merged and serialised to OWLXML


For citation, we recommend

* This dataset directly
* Noy et al (2009) for BioPortal (https://www.ncbi.nlm.nih.gov/pubmed/19483092)
* Matentzoglu et al (2013) for reference to the Manchester OWL repository and our continuous efforts to survey the state of OWL and ontologies (https://link.springer.com/chapter/10.1007/978-3-642-41335-3_21)