UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.
There is a newer version of this record available.

Dataset Open Access

Subject indexing data of K10plus library union catalog

Jakob Voß

This dataset contains a an extract of K10plus library union catalog with its subject indexing data:

  • kxp-subjects-sample_2021-06-30.dat: a random sample fo 10.000 records
  • kxp-subjects_2021-06-30_0{1,2,3,4,5,6,7,8,9}of09.dat : the full data (41.786.820 record) split in files of up to 5.000.000 records each

K10plus is a union catalog of German libraries, run by library service centers BSZ and VZG since 2019. The catalog contains bibliographic data of the majority of academic libraries in Germany. The core data of K10plus is made available as OpenData via APIs and in form of database dumps. More information can be found here:

The data is provided in its raw internal format called PICA+ to not loose information during conversion. In particular the data is given in PICA Normalized Format with one record per line. Each record consists of a list of fields and each field consists of a list of subfields.

The data can best be processed with command line tools pica-rs or picadata. A detailled description of PICA format and its processing is given in the German textbook Einführung in die Verarbeitung von PICA-Daten.

For visual inspection PICA Normalized Format is best converted into PICA Plain Format (pica-rs command pica print). The following example record contains seven fields:

003@ $0010003231
044K $9106080474$VTsv1$7gnd/4077343-7$3209204761$aSekte
044N $aReligionsgemeinschaft
045E $a12
045F $a291
045Q/01 $9181570408$VTkv$a11.97$jNeue religiöse Bewegungen$jSekten
045R $91270641751$VTkv$7rvk/11410:$3200641751$aBG 9600$jAllgemeines$kTheologie und Religionswissenschaften$kFundamentaltheologie$kKirche und Kirchen$kFreikirchen und Sekten

Each K10plus record is uniquely identified by its record identifier PPN, given in field 003@ subfield $0. The PPN can be used:

The data is limited to records having a least one holding by a library participating in K10plus. Records are provided with “offline expansion” (some subfield have been added automatically to facilitate re-use of the data) and limited to the following fields:

  • 003@ with internal record identifier “PPN” in subfield $0
  • 041A keywords
  • 044. all subject indexing fields starting with 044
  • 045. all subject indexing fields starting with 045
  • 144Z local library keywords
  • 145S local library classification
  • 145Z local library classification

Documentation of the fields can be found at https://format.k10plus.de/k10plushelp.pl?cmd=pplist&katalog=Standard#titel

The current dump contains 41.786.820 record with subject indexing out of 71.429.482 K10plus records in total.

For reference, the dump has been created from a full dump of K10plus with this chain of commands:

pica filter 003@? --reduce 003@.0,044.,045.,144Z,145S,145Z kxp-catalog-full_2022-06-30_*.dat | grep -Pv '^003@..0[0-9X]+.$' > kxp-subjects_2021-06-30.dat

The file was then split into chunks of 5.000.000 records each:

split -l 5000000 --numeric-suffixes=01 kxp-subjects_2021-06-30.dat kxp-subjects_2021-06-30_

Processing examples

Extract CSV file of PPN and RVK-Notation:

pica filter '045R?' kxp-subjects_2021-06-30.dat | pica select '003@$0,045Ra'

Get a list of PPN of records having RVK but not BK:

pica filter '045R? & !045Q/01' kxp-subjects_2021-06-30.dat | pica select '003@$0'

License

https://creativecommons.org/publicdomain/zero/1.0/

Files (12.7 GB)
Name Size
kxp-subjects-sample_2021-06-30.dat
md5:37d5649b1d79f66a4bfbedc3b18119a4
3.0 MB Download
kxp-subjects_2021-06-30_01of09.dat
md5:bbbde5307475ba32097b80cf53f2cbd0
1.4 GB Download
kxp-subjects_2021-06-30_02of09.dat
md5:8ba540ad31c6706a0f4836af36dc3686
1.5 GB Download
kxp-subjects_2021-06-30_03of09.dat
md5:26836fcb7c48b6372654fa724deaf69e
1.4 GB Download
kxp-subjects_2021-06-30_04of09.dat
md5:02139b2266033b9ff5420df199e8be2e
1.5 GB Download
kxp-subjects_2021-06-30_05of09.dat
md5:1c15cfa4b1480dab3bb9355b9ed9ddcd
1.5 GB Download
kxp-subjects_2021-06-30_06of09.dat
md5:8b0ac6039dcc7660fc2e81cac373765b
1.1 GB Download
kxp-subjects_2021-06-30_07of09.dat
md5:bcf1bb04d69dc9976d43ad5560aa0b53
1.6 GB Download
kxp-subjects_2021-06-30_08of09.dat
md5:5c54871135b1716e0d702d66f4108eb2
2.3 GB Download
kxp-subjects_2021-06-30_09of09.dat
md5:4c07fb066a15b90cea0473296e503913
356.9 MB Download
README.md
md5:f3c193159fc2a7fbded3b7b81c52c897
4.2 kB Download
611
109
views
downloads
All versions This version
Views 61150
Downloads 1094
Data volume 104.9 GB1.5 GB
Unique views 55344
Unique downloads 464

Share

Cite as