Published January 11, 2024 | Version v8
Dataset Open

Normalized subject indexing data of K10plus library union catalog

  • 1. Verbundzentrale des GBV (VZG)

Description

This dataset contains normalized subject indexing data of K10plus library union catalog. It includes links between bibliographic records in K10plus and concepts (subjects or classes) from controlled vocabularies:

  • kxp-subjects.tsv.gz: TSV format
  • kxp-subjects.nt.gz: RDF format (in form of NTriples)
  • vocabularies.json: information about vocabularies
  • stats.json: statistics (number of records, subjects per vocabulary etc.)

The dataset is based on a K10plus database dump at 2023-12-09.

K10plus

K10plus is a union catalog of German libraries, run by library service centers BSZ and VZG since 2019. The catalog contains bibliographic data of the majority of academic libraries in Germany. Bibliographic records in K10plus are uniquely identified by a PPN identifier.

Several APIs exist to retrieve more data for a record via its PPN, e.g. link into K10plus OPAC:

https://opac.k10plus.de/PPNSET?PPN={PPN}

Retrieve full record in MARC/XML format:

https://unapi.k10plus.de/?format=marcxml&id=opac-de-627:ppn:{PPN}

Get formatted citation for display:

https://ws.gbv.de/suggest/csl2?citationstyle=ieee&language=en&database=opac-de-627&query=pica.ppn=${PPN}

APIs to look up more data from a notation or identifier of a vocabulary can be found in https://bartoc.org/. For instance BK class 58.55 can be retrieved via DANTE API:

https://api.dante.gbv.de/data?uri=http%3A%2F%2Furi.gbv.de%2Fterminology%2Fbk%2F58.55

See vocabularies.json for mapping of vocabulary symbol to BARTOC URI and additional information.

Statistics
See stats.json for number of records, links, triples and subjects per vocabulary.

TSV

The .tsv file contains three tab-separated columns:

  1. Bibliographic record identifier (PPN)
  2. Vocabulary symbol
  3. Notation or identifier in the vocabulary

An example:

010000011  bk  58.55
010000011  gnd 4036582-7

Record 010000011 is indexed with class 58.55 from Basic Classification and with authority record 4036582-7 from Integrated authority file.

RDF

The NTriples file contains the same information as given in TSV file but identifiers are mapped to URIs. An example:

<http://uri.gbv.de/document/opac-de-627:ppn:010000011> <http://purl.org/dc/terms/subject> <http://d-nb.info/gnd/4036582-7> .
<http://uri.gbv.de/document/opac-de-627:ppn:010000011> <http://purl.org/dc/terms/subject> <http://uri.gbv.de/terminology/bk/58.55> .

Changelog

  • 2024-01-11: New dump from end of 2023. Added fivs and fivr classification
  • 2023-11-01: New dump from end of September.
  • 2023-05-07: New dump. Number of records slightly reduced because K10plus cleaned up duplicate records.
  • 2023-04-13: New dump, added stats.json
  • 2023-01-20: New dump
  • 2022-09-11: New dump, fixed PPN URIs and broken UTF-8 encoding
  • 2022-08-24: Fixed GND URIs, added LCC and KAB (https://doi.org/10.5281/zenodo.7018350)
  • 2022-08-24: First version (https://doi.org/10.5281/zenodo.7016626)

License and provenance

All data is public domain but references are welcome. See https://coli-conc.gbv.de/ for related projects and documentation.

This dataset has been created with public scripts from git repository https://github.com/gbv/k10plus-subjects.

Files

stats.json

Files (1.4 GB)

Name Size Download all
md5:2ae76b66d4e00be5dd145d8cec9aa7b7
896.5 MB Download
md5:068ae55fa09f1c6b4cf3d213b278434b
516.2 MB Download
md5:88c1357249d6933d5747263fc6354a39
370 Bytes Preview Download
md5:44eef39866c1c3b0e1f966e0cbc983ba
5.6 kB Preview Download

Additional details

Related works

Is derived from
Software: https://github.com/gbv/k10plus-subjects (URL)