D3.2.2 Report on the state and quality of biosystematics documents and survey reports
Creators
- 1. Plazi, Bern, Switzerland
- 2. Department of Entomology, Ohio State University
- 3. BGBM
- 4. NBGB
- 5. Museum für Naturkunde, Berlin, Germany
- 6. RBGK
- 7. AAFC
- 8. Naturalis, Leiden, Netherlands
- 9. University of Massachusetts
- 10. Pensoft, Sofia, Bulgaria
Description
The present document is a deliverable of the pro-iBiosphere project, funded by the European Commission’s Directorate-General Information Society and Media (DG INFSO), under its 7th EU Framework Program for Research and Technological Development (FP7).
Biosystematics has a two hundred and fifty year old tradition of documenting the world’s living species and higher taxa in highly standardized taxonomic treatments. The convention for taxonomic treatments consists of a scientific Latin name for each taxon, a list of citations to previous references to the described taxon (including any synonyms), a list of exemplar specimens, illustrations, a diagnosis, a summary of its distribution and behaviour and ecology, and other relevant information. These treatments have been published in articles and monographs to create a corpus of biosystematic literature of tens of millions of pages. The target audience for this literature has been the human reader. We can now extend this model with metadata and attached digital objects, with the potential to transform the biodiversity literature into a gateway to the content of collections of specimens, sounds, images, descriptions, interactive maps of occurrences, and DNA information. Here we address the challenge of how to enhance the publication process to make these rich data accessible, computable and re-usable.
We do not specify a comprehensive semantic schema, but target the process that will build a system that can scale to all challenges, can evolve to increased sophistication, and can call upon and link to existing and emerging external semantic data management systems. This approach can evolve to ensure that content users get the information they need.
We address the need to convert legacy literature into semantically enhanced documents or database records. By relying on pre-existing vocabularies, we will avoid duplication of effort. Existing schemas such as TaxonX schema provide a starting point for semantic enhancement. The use of the TaxPub extension of the Journal Article Tag Suite (JATS) Document Type Definition (DTD) will guarantee integration of the corpus of future and enhanced publications. The discipline will need new enhancement tools that enable more fine grained structuring of taxonomic descriptions and conversion among schemas. Finally, criteria should be established for the next generation Biosystematics Literature to facilitate machine readability, text and data mining, integration into emerging Semantic Web environments and genomics, and for the infrastructure to manage this information.
We make the following recommendations:
- All biosystematic (= taxonomic) literature needs to be openly accessible to the maximum extent possible. At least publicly funded institutions should refrain from claiming intellectual property rights for biosystematic information and in respect of material which is protected by copyright or database rights, they should be commit it to the public domain by publishing it under a CC0 or similar license.
- Biosystematics documents should be encoded in an open, platform-independent XML or an equivalent language.
- The semantic elements of XML encoded documents should be cross-mapped to corresponding terms and concepts in external vocabularies.
- Markup conventions should complement existing standards. The following elements should be marked up to the finest degree of granularity possible:
- Scientific taxon names
- Author names
- Georeferenced observations
- Type and voucher materials
- Bibliographic references
- Species traits
- Treatments
- Visual and audio material
- Identification keys
- DNA references
- Markup conventions should complement existing standards.
- Markup should be as explicit as possible and in open documentation to improve access to legacy literature and ease of their future use.
- Nomenclatural acts and synonymies should be semantically enhanced to improve usability.
- Semantic enhancement should allow progressive markup as an iterative process.
- Funding agencies should support the development of tools for markup of biosystematic documents, especially of names, materials cited, bibliographic references, traits and treatments.
- The community must develop and maintain registries of sources and repositories for semantically enhanced biosystematic publications, treatments and data to ensure visibility of and open persistent access to this corpus of material.
- Stable globally unique identifiers should be used for semantic elements.
- Reference databases must be developed, be easily accessed, and must be maintained.
- iBiosphere should minimally export metadata relating to biodiversity data objected to the Linked Open Data Cloud.
Files
proiBiosphere_WP3_Plazi_D3.2.2_VFF_31082013.pdf
Files
(696.5 kB)
Name | Size | Download all |
---|---|---|
md5:aa398ab271709dc65d0dadb48d1af947
|
696.5 kB | Preview Download |