Published June 30, 2022 | Version v3
Dataset Open

Subject indexing data of K10plus library union catalog

  • 1. Verbundzentrale des GBV (VZG)

Description

This dataset contains a an extract of K10plus library union catalog with its subject indexing data:

  • kxp-subjects_2022-09-30_??of10.dat : the full data (68.051.434 records) split in files of up to 5.000.000 records each

K10plus is a union catalog of German libraries, run by library service centers BSZ and VZG since 2019. The catalog contains bibliographic data of the majority of academic libraries in Germany. The core data of K10plus is made available as OpenData via APIs and in form of database dumps. More information can be found here:

Data format

The data is provided in its raw internal format called PICA+ to not loose information during conversion. In particular the data is given in PICA Normalized Format with one record per line. Each record consists of a list of fields and each field consists of a list of subfields.

The data can best be processed with command line tools pica-rs or picadata.

A detailled description of PICA format and its processing is given in the German textbook Einführung in die Verarbeitung von PICA-Daten.

For visual inspection PICA Normalized Format is best converted into PICA Plain Format (pica-rs command pica print). The following example record contains seven fields:

003@ $0010003231
013D $9104450460$VTsvz$3209786884$7gnd/4151278-9$aEinführung
044K $9106080474$VTsv1$7gnd/4077343-7$3209204761$aSekte
044N $aReligionsgemeinschaft
045E $a12
045F $a291
045Q/01 $9181570408$VTkv$a11.97$jNeue religiöse Bewegungen$jSekten
045R $91270641751$VTkv$7rvk/11410:$3200641751$aBG 9600$jAllgemeines$NB$JTheologie und Religionswissenschaften$NBG$JFundamentaltheologie$NBG 9020-BG 9790$JKirche und Kirchen$NBG 9600-BG 9720$JFreikirchen und Sekten
045V $a1

Each K10plus record is uniquely identified by its record identifier PPN, given in field 003@ subfield $0. The PPN can be used:

Scope of the data

The data is limited to records having a least one holding by a library participating in K10plus. Records are provided with “offline expansion” (some subfield have been added automatically to facilitate re-use of the data) and limited to the following fields:

  • 003@ with internal record identifier “PPN” in subfield $0
  • 010@ language 
  • 013D type of content
  • 013E musical type of document
  • 013F target audience
  • 013H additional type of document
  • 041A keywords
  • 044. all subject indexing fields starting with 044
  • 045. all subject indexing fields starting with 045
  • 144Z local library keywords
  • 145S local library classification
  • 145Z local library classification

The following fields may also be of interest but are not included:

  • 017G and 017HURL for catalog enrichment (e.g. table of contents)
  • 047I abstract

Documentation of the fields can be found at https://format.k10plus.de/k10plushelp.pl?cmd=pplist&katalog=Standard#titel

Processing examples

Extract CSV file of PPN and RVK-Notation:

pica filter '045R?' kxp-subjects_2022-06-30.dat | pica select '003@$0,045Ra'

Get a list of PPN of records having RVK but not BK:

pica filter '045R? & !045Q/01' kxp-subjects_2022-06-30.dat | pica select '003@$0'

See https://github.com/gbv/k10plus-subjects#readme for additional examples of data analysis.

Automatic download

Given the Zenodo Record ID (e.g. 6810556), a list of all files can be generated with curl and jq:

curl -sL https://zenodo.org/api/records/$ID | jq -r '.files|map([.key,.links.self]|@tsv)[]'

Changes

  • 2022-09-30: update with additional fields 010@, 013E, 013H, 014A (68.051.434 records)
  • 2022-06-30: update with additional fields 013D and 013F (47.686.064 records)
  • 2021-06-30: first published dump (41.786.820 records)

License

https://creativecommons.org/publicdomain/zero/1.0/

Files

README.md

Files (17.9 GB)

Name Size Download all
md5:333997605fc233925b7985f820802b96
1.5 GB Download
md5:3e321552b6503a3111fc365ea03d0a3c
2.0 GB Download
md5:9207f8826cb8641eaa17024e70076167
1.9 GB Download
md5:2554472d2d36ad73a6804b379d6ff9c4
465.9 MB Download
md5:55f1c529c9f6dc749c3e79da2f340bfc
1.7 GB Download
md5:91299508e41cdbcb4960be6ac7ea4b1f
1.3 GB Download
md5:7541ed6dbec497bbd376eda770890ed2
1.5 GB Download
md5:dedc512ab59a86240e7171bc3eb451b5
1.1 GB Download
md5:969eb39c1fa8a011297fdded8a9c2b54
1.3 GB Download
md5:fd4eb1932fa4aca56e010deb318addb1
1.3 GB Download
md5:4a91a64d4d83375c9bb0d444aa4d1b0f
1.2 GB Download
md5:da988620601700b5e1eeb4c91d2e92bf
946.5 MB Download
md5:9a2e999e5880d5ed9141251edae2b508
1.1 GB Download
md5:3efe94855725338dbc6f47cfddafa7f8
691.2 MB Download
md5:35ab4a2603ae2a5d6698922bbab3a2b9
2.7 kB Preview Download