Extract from the Berlin State Library's Main Catalog
Description
The data set is based on the main catalog of the library. Currently, the following fields are extracted:
- title
- author (+ optional GND ID)
- publisher
- place of publication
- country of publication
- year of publication
The extract has been created by the processPicaPlus script available here. Attention, some special characters might not have been extracted correctly in versions <1.0.0.
Change Log:
0.2.0
fixes various encoding issues for non-ASCII characters
0.3.0
added year of publication; added separate files for languages: rus, pol, rum, cze, slo, gre with minor encoding issues
Dataset Characteristics
The following languages are available in separate data files:
- eng
- ger
- lat
- fre
- ita
- spa
- por
- dut
- swe
- dan
- nor
- ice
- fry
- rus*
- pol*
- rum*
- cze*
- slo*
- gre*
*: Language file might be subject to character encoding issues.
The other languages are present in the data set but have not been separated, i.e., they are combined in one data file:
'fre', 'rus', 'pol', 'ger', 'eng', 'lit', 'dan', 'dut', 'spa', 'swe', 'ita', 'lat', 'nor', 'ind', 'bul', 'grc', 'fry', 'rum', 'cze', 'slo', 'bel', 'ice', 'fin', 'gre', 'hun', 'tur', 'enm', 'hrv', 'est', 'srp', 'roh', 'syr', 'wen', 'mal', 'afr', 'slv', 'mac', 'smi', 'nds', 'qmw', 'pra', 'oci', 'bre', 'san', 'alb', 'baq', 'non', 'ara', 'chm', 'per', 'cat', 'gmh', 'sla', 'arm', 'ukr', 'por', 'chu', 'heb', 'arc', 'gle', 'tib', 'lav', 'geo', 'crp', 'hin', 'mul', 'chi', 'epo', 'kor', 'kan', 'vot', 'csb', 'glg', 'kaz', 'frm', 'jpn', 'bur', 'srd', 'sal', 'ira', 'bos', 'mol', 'rom', 'tat', 'aze', 'yid', 'mar', 'mak', 'pli', 'rys', 'tgk', 'map', 'vie', 'tuk', 'oss', 'ota', 'tut', 'ben', 'sun', 'tir', 'bak', 'chv', 'ber', 'khm', 'may', 'pan', 'uzb', 'swa', 'kir', 'egy', 'dum', 'nep', 'cop', 'mon', 'tam', 'urd', 'zxx', 'wel', 'mis', 'ng', 'goh', 'dt', 'en', 'fao', 'fro', 'pus', 'kur', 'cus', 'hau', 'uig', 'sit', 'dt.', 'cpf', 'tgl', 'qoj', 'tag', 'raj', 'fiu', 'xal', 'kbd', 'udm', 'scr', 'gag', 'kas', 'scc', 'pro', 'tha', 'dar', 'dr', 'sna', 'ewe', 'de', 'dra', 'ang', 'ine', 'zza', 'und', 'ave', 'amh', 'crh', 'jav', 'cpe', 'akk', 'dsb', 'qce', 'guj', 'ltz', 'got', 'bua', 'peo', 'mdr', 'nob', 'ava', 'che', 'sux', 'kok', 'zap', 'nl', 'inc', 'sah', 'gem', 'law', 'bem', 'sin', 'qdo', 'hsb', 'som', 'lao', 'kam', 'kom', 'abk', 'roa', 'cau', 'ady', 'bat', 'mlt', 'sai', 'xho', 'paa', 'sot', 'bnt', 'lug', 'myn', 'kar', 'qhe', 'kin', 'zul', 'tsn', 'apa', 'nso', 'yao', 'yor', 'bih', 'nog', 'nap', 'loz', 'nbl', 'kon', 'nya', 'snh', 'chn', 'run', 'suk', 'fur', 'osa', 'bra', 'den', 'kpe', 'kal', 'tig', 'wol', 'gla', 'lad', 'mos', 'cre', 'krc', 'ge', 'fr', 'dak', 'fij', 'mad', 'srr', 'kum', 'her', 'nai', 'cel', 'inh', 'kro', 'hit', 'pal', 'tmh', 'tsw', 'bam', 'kab', 'kik', 'kua', 'lub', 'luo', 'nub', 'tem', 'znd', 'mai', 'tai', 'qkr', 'ful', 'man', 'lol', 'sag', 'tog', 'hai', 'arg', 'fat', 'nav', 'niu', 'ibo', 'ido', 'men', 'qju', 'gaa', 'vol', 'nah', 'mlg', 'nic', 'ijo', 'sus', 'orm', 'smo', 'mag', 'tyv', 'mnc', 'cos', 'mdf', 'kaa', 'dua', 'gez', 'ton', 'ven', 'snd', 'syc', 'nym', 'nia', 'sem', 'chg', 'fan', 'twi', 'mas', 'ina', 'ile', 'art', 'ori', 'qai', 'arw', 'mao', 'bas', 'kmb', 'tiv', 'bal', 'tar', 'tpi', 'abs', 'asm', 'qqa', 'iku', 'min', 'rup', 'tel', 'or', 'tah', 'aka', 'day', 'qqg', 'lah', 'lus', 'sio', 'oto', 'alg', 'shn', 'ndo', 'haw', 'tso', 'mus', 'cai', 'qev', 'new', 'zha', 'grn', 'khi', 'ssw', 'nde', 'bla', 'grb', 'mun', 'din', 'sam', 'mwr', 'cor', 'sat', 'cho', 'ger,', 'que', 'btk', 'glv', 'rar', 'jk', 'nno', 'cmc', 'mga', 'jw', 'iro', 'sog', 'hat', 'dzo', 'mkh', 'bik', 'ban', 'ilo', 'pam', 'ts', 'sme', 'myv', 'qnn', 'jpr', 'qte', 'yap', 'bis', 'sga', 'qkj', 'pap', 'ath', 'ipk', 'phi', 'sco', 'del', 'moh', 'iri', 'gae', 'ryl', 'our', 't--', 'grk', 'ssa', 'awa', 'efi', 'jrb', 'enk', 'kru', 'oji', 'arn', 'car', 'gsw', 'lez', 'war', 'ace', 'qrn', 'wln', 'ceb', 'aar', 'bug', 'kaw', 'chr', 'cpp', 'tet', 'aym', 'ces', 'hmo'
Files
cze_out.txt
Files
(1.9 GB)
Name | Size | Download all |
---|---|---|
md5:9732d372f8ba9516dc64d988b7311152
|
7.0 MB | Preview Download |
md5:e582e959f5f85795ee2e968226491e2d
|
3.6 MB | Preview Download |
md5:fc6785edcb4c3e4d8977a7d21c87001a
|
14.2 MB | Preview Download |
md5:77574c7d25fe0f202fc104528485d5a2
|
328.2 MB | Preview Download |
md5:5c54395a842deff86d2cecf8238ef259
|
75.5 MB | Preview Download |
md5:999f2ab3f52c5d249568c2f20d88471e
|
67.5 kB | Preview Download |
md5:fcdee9d86f2702a06cc912557aa09a9d
|
490.4 MB | Preview Download |
md5:7b85ff65d905369b76b01812d6f588de
|
1.2 MB | Preview Download |
md5:4dae71c7fabbbdc8e7d17b9f8c9e9f36
|
189.9 kB | Preview Download |
md5:8b57f0be40288cb0008a1871ef23b97c
|
30.1 MB | Preview Download |
md5:bf4f5a62fc94384a8162badda0301b39
|
65.9 MB | Preview Download |
md5:02fea1cb9c8877e66c8295604a86f5c8
|
2.0 MB | Preview Download |
md5:34622f9d23f1550c5546c390c80f51b8
|
813.4 MB | Preview Download |
md5:7da7f8dbb963d5429d1da677714d252a
|
13.5 MB | Preview Download |
md5:b980f399670bac0d7fea24e5995d4eb3
|
1.6 MB | Preview Download |
md5:44a4cc16658cd631a164591167c63603
|
1.7 MB | Preview Download |
md5:ee805132b692b9d11226614ed85c3c18
|
39.0 MB | Preview Download |
md5:56443d3eea99bd2d6db36f02ded6efb8
|
1.8 MB | Preview Download |
md5:e30af9a14d1e04ed4899c2039f628674
|
7.9 MB | Preview Download |
md5:9e55ba7ddf93fe050d138ddefe210426
|
6.5 kB | Preview Download |
md5:b3f99c3795809b0e34e6ae4b6545c7d5
|
5.8 MB | Preview Download |