Other Open Access

German Word Embeddings for ShiCo based on historic newspapers

Martin Riedl

We provide word embeddings models that have been computed on historic German newspapers. The models are computed for time spans of 10 years and can be used with ShiCo, a visualization tool for word embeddings.  We provide models for three different corpora and also have links to the ShiCo demos:

  • SBB (State library of Berlin): Newspaper collection from Germany from 1872 to 1912 (demo).
  • Chonicling America: German-written newspaper pages from 1840 to 1908 that have been published in the United States (demo).
  • Europeana: German-written newspapers that have been published in Europe from 1840 to 1912 (demo).

For each model a configuration is required both for the frontend and the backend (see config.shico.tar.gz). In order to setup a ShiCo instance you can either follow the description of the ShiCo GitHub page or follow the instructions for running ShiCo using Docker as described in the README.docker.txt file.

Files (42.7 GB)
Name Size
config.shico.tar.gz
md5:c3954dab3d0dc85261dba9001d9ee1bd
636 Bytes Download
README.docker.txt
md5:2fc14eedc99952fafe6421ffcb2dcfd4
2.7 kB Download
shico_embeddings_chronicling_america.tar.gz
md5:2ceb2f6a9611d70aeaf0f44f70ca530f
5.5 GB Download
shico_embeddings_europeana.tar.gz
md5:905a13a06f87e5c58e008d9860b057d8
24.2 GB Download
shico_embeddings_sbb.tar.gz
md5:0677b7cb214fbc5a90e58d8c5f7f59fe
13.0 GB Download
56
177
views
downloads
All versions This version
Views 5656
Downloads 177177
Data volume 1.9 TB1.9 TB
Unique views 5151
Unique downloads 5353

Share

Cite as