Conference paper Open Access

Word Clustering for Historical Newspapers Analysis

Lidia Pivovarova; Jani Marjanen; Elaine Zosa


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>This paper is a part of a collaboration between computer scientists and historians aimed at development of novel methods for historical newspapers analysis. We present a case study of ideological terms ending with -ism suffix in nineteenthcentury<br>\nFinnish newspapers. We propose a two-step procedure to trace differences in word usages over time: training of diachronic embeddings on several time slices and when clustering embeddings of&nbsp;selected words together with their neighbours<br>\nto obtain historical context. The obtained&nbsp;clusters turn out to be useful for historical studies. The paper also discusses<br>\nspecific difficulties related to development of historian-oriented tools.</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Helsinki", 
      "@type": "Person", 
      "name": "Lidia Pivovarova"
    }, 
    {
      "affiliation": "University of Helsinki", 
      "@type": "Person", 
      "name": "Jani Marjanen"
    }, 
    {
      "affiliation": "University of Helsinki", 
      "@type": "Person", 
      "name": "Elaine Zosa"
    }
  ], 
  "headline": "Word Clustering for Historical Newspapers Analysis", 
  "image": "https://zenodo.org/static/img/logos/zenodo-gradient-round.svg", 
  "datePublished": "2019-09-12", 
  "url": "https://zenodo.org/record/3402940", 
  "@type": "ScholarlyArticle", 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.3402940", 
  "@id": "https://doi.org/10.5281/zenodo.3402940", 
  "workFeatured": {
    "url": "https://www.inf.uni-hamburg.de/inst/dmp/hercore/publications/ltdha.html", 
    "alternateName": "LT-DHA 2019", 
    "location": "Varna Bulgaria", 
    "@type": "Event", 
    "name": "Language Technology for Digital Historical Archives (Workshop collocated with RANLP 2019)"
  }, 
  "name": "Word Clustering for Historical Newspapers Analysis"
}
159
92
views
downloads
All versions This version
Views 159159
Downloads 9292
Data volume 83.6 MB83.6 MB
Unique views 150150
Unique downloads 8888

Share

Cite as