Dataset Open Access
The data set is based on the main catalog of the library. Currently, the following fields are extracted:
The extract has been created by the processPicaPlus script available here. Attention, some special characters might not have been extracted correctly in versions <1.0.0.
Change Log:
0.2.0
fixes various encoding issues for non-ASCII characters
0.3.0
added year of publication; added separate files for languages: rus, pol, rum, cze, slo, gre with minor encoding issues
Dataset Characteristics
The following languages are available in separate data files:
*: Language file might be subject to character encoding issues.
The other languages are present in the data set but have not been separated, i.e., they are combined in one data file:
'fre', 'rus', 'pol', 'ger', 'eng', 'lit', 'dan', 'dut', 'spa', 'swe', 'ita', 'lat', 'nor', 'ind', 'bul', 'grc', 'fry', 'rum', 'cze', 'slo', 'bel', 'ice', 'fin', 'gre', 'hun', 'tur', 'enm', 'hrv', 'est', 'srp', 'roh', 'syr', 'wen', 'mal', 'afr', 'slv', 'mac', 'smi', 'nds', 'qmw', 'pra', 'oci', 'bre', 'san', 'alb', 'baq', 'non', 'ara', 'chm', 'per', 'cat', 'gmh', 'sla', 'arm', 'ukr', 'por', 'chu', 'heb', 'arc', 'gle', 'tib', 'lav', 'geo', 'crp', 'hin', 'mul', 'chi', 'epo', 'kor', 'kan', 'vot', 'csb', 'glg', 'kaz', 'frm', 'jpn', 'bur', 'srd', 'sal', 'ira', 'bos', 'mol', 'rom', 'tat', 'aze', 'yid', 'mar', 'mak', 'pli', 'rys', 'tgk', 'map', 'vie', 'tuk', 'oss', 'ota', 'tut', 'ben', 'sun', 'tir', 'bak', 'chv', 'ber', 'khm', 'may', 'pan', 'uzb', 'swa', 'kir', 'egy', 'dum', 'nep', 'cop', 'mon', 'tam', 'urd', 'zxx', 'wel', 'mis', 'ng', 'goh', 'dt', 'en', 'fao', 'fro', 'pus', 'kur', 'cus', 'hau', 'uig', 'sit', 'dt.', 'cpf', 'tgl', 'qoj', 'tag', 'raj', 'fiu', 'xal', 'kbd', 'udm', 'scr', 'gag', 'kas', 'scc', 'pro', 'tha', 'dar', 'dr', 'sna', 'ewe', 'de', 'dra', 'ang', 'ine', 'zza', 'und', 'ave', 'amh', 'crh', 'jav', 'cpe', 'akk', 'dsb', 'qce', 'guj', 'ltz', 'got', 'bua', 'peo', 'mdr', 'nob', 'ava', 'che', 'sux', 'kok', 'zap', 'nl', 'inc', 'sah', 'gem', 'law', 'bem', 'sin', 'qdo', 'hsb', 'som', 'lao', 'kam', 'kom', 'abk', 'roa', 'cau', 'ady', 'bat', 'mlt', 'sai', 'xho', 'paa', 'sot', 'bnt', 'lug', 'myn', 'kar', 'qhe', 'kin', 'zul', 'tsn', 'apa', 'nso', 'yao', 'yor', 'bih', 'nog', 'nap', 'loz', 'nbl', 'kon', 'nya', 'snh', 'chn', 'run', 'suk', 'fur', 'osa', 'bra', 'den', 'kpe', 'kal', 'tig', 'wol', 'gla', 'lad', 'mos', 'cre', 'krc', 'ge', 'fr', 'dak', 'fij', 'mad', 'srr', 'kum', 'her', 'nai', 'cel', 'inh', 'kro', 'hit', 'pal', 'tmh', 'tsw', 'bam', 'kab', 'kik', 'kua', 'lub', 'luo', 'nub', 'tem', 'znd', 'mai', 'tai', 'qkr', 'ful', 'man', 'lol', 'sag', 'tog', 'hai', 'arg', 'fat', 'nav', 'niu', 'ibo', 'ido', 'men', 'qju', 'gaa', 'vol', 'nah', 'mlg', 'nic', 'ijo', 'sus', 'orm', 'smo', 'mag', 'tyv', 'mnc', 'cos', 'mdf', 'kaa', 'dua', 'gez', 'ton', 'ven', 'snd', 'syc', 'nym', 'nia', 'sem', 'chg', 'fan', 'twi', 'mas', 'ina', 'ile', 'art', 'ori', 'qai', 'arw', 'mao', 'bas', 'kmb', 'tiv', 'bal', 'tar', 'tpi', 'abs', 'asm', 'qqa', 'iku', 'min', 'rup', 'tel', 'or', 'tah', 'aka', 'day', 'qqg', 'lah', 'lus', 'sio', 'oto', 'alg', 'shn', 'ndo', 'haw', 'tso', 'mus', 'cai', 'qev', 'new', 'zha', 'grn', 'khi', 'ssw', 'nde', 'bla', 'grb', 'mun', 'din', 'sam', 'mwr', 'cor', 'sat', 'cho', 'ger,', 'que', 'btk', 'glv', 'rar', 'jk', 'nno', 'cmc', 'mga', 'jw', 'iro', 'sog', 'hat', 'dzo', 'mkh', 'bik', 'ban', 'ilo', 'pam', 'ts', 'sme', 'myv', 'qnn', 'jpr', 'qte', 'yap', 'bis', 'sga', 'qkj', 'pap', 'ath', 'ipk', 'phi', 'sco', 'del', 'moh', 'iri', 'gae', 'ryl', 'our', 't--', 'grk', 'ssa', 'awa', 'efi', 'jrb', 'enk', 'kru', 'oji', 'arn', 'car', 'gsw', 'lez', 'war', 'ace', 'qrn', 'wln', 'ceb', 'aar', 'bug', 'kaw', 'chr', 'cpp', 'tet', 'aym', 'ces', 'hmo'
Name | Size | |
---|---|---|
cze_out.txt
md5:9732d372f8ba9516dc64d988b7311152 |
7.0 MB | Download |
dan_out.txt
md5:e582e959f5f85795ee2e968226491e2d |
3.6 MB | Download |
dut_out.txt
md5:fc6785edcb4c3e4d8977a7d21c87001a |
14.2 MB | Download |
eng_out.txt
md5:77574c7d25fe0f202fc104528485d5a2 |
328.2 MB | Download |
fre_out.txt
md5:5c54395a842deff86d2cecf8238ef259 |
75.5 MB | Download |
fry_out.txt
md5:999f2ab3f52c5d249568c2f20d88471e |
67.5 kB | Download |
ger_out.txt
md5:fcdee9d86f2702a06cc912557aa09a9d |
490.4 MB | Download |
gre_out.txt
md5:7b85ff65d905369b76b01812d6f588de |
1.2 MB | Download |
ice_out.txt
md5:4dae71c7fabbbdc8e7d17b9f8c9e9f36 |
189.9 kB | Download |
ita_out.txt
md5:8b57f0be40288cb0008a1871ef23b97c |
30.1 MB | Download |
lat_out.txt
md5:bf4f5a62fc94384a8162badda0301b39 |
65.9 MB | Download |
nor_out.txt
md5:02fea1cb9c8877e66c8295604a86f5c8 |
2.0 MB | Download |
out.txt
md5:34622f9d23f1550c5546c390c80f51b8 |
813.4 MB | Download |
pol_out.txt
md5:7da7f8dbb963d5429d1da677714d252a |
13.5 MB | Download |
por_out.txt
md5:b980f399670bac0d7fea24e5995d4eb3 |
1.6 MB | Download |
rum_out.txt
md5:44a4cc16658cd631a164591167c63603 |
1.7 MB | Download |
rus_out.txt
md5:ee805132b692b9d11226614ed85c3c18 |
39.0 MB | Download |
slo_out.txt
md5:56443d3eea99bd2d6db36f02ded6efb8 |
1.8 MB | Download |
spa_out.txt
md5:e30af9a14d1e04ed4899c2039f628674 |
7.9 MB | Download |
statistics.txt
md5:9e55ba7ddf93fe050d138ddefe210426 |
6.5 kB | Download |
swe_out.txt
md5:b3f99c3795809b0e34e6ae4b6545c7d5 |
5.8 MB | Download |
All versions | This version | |
---|---|---|
Views | 602 | 451 |
Downloads | 31,823 | 12,535 |
Data volume | 5.5 TB | 1.6 TB |
Unique views | 512 | 391 |
Unique downloads | 28,002 | 11,432 |