Published March 24, 2020 | Version 2020.1
Software Open

OpenITI NgramReader+

  • 1. University of Vienna

Description

Like Google Ngram Viewer (https://books.google.com/ngrams), OpenITI NgramReader+ charts diachronic frequencies of words and phrases, using the data of the OpenITI corpus (Arabic data only). Unlike Google Ngram Viewer, however, OpenITI NgramReader+ allows one to combine ngrams, which helps to combine different morphological forms together, and to explore classes of objects. Why to combine forms? Arabic morphology is complex and the same word can appear in a large variety of forms: for example, kitābal-kitābwa-kitābwa-l-kitāb are instances of the same lemma and one might want to combine all or only some forms into a single entity. OpenITI NgramReader+ allows one to do that with regular expressions. This approach also allows one to create thematic clusters of words. For example, one can combine Baġdād and Madīnaŧ al-salām in order to get all mentions of the ʿAbbāsid capital; or, to combine together all cities of Ḫurāsān in order to gauge frequencies of references to Ḫurāsān in general.

There are three versions of the OpenITI NgramReader+:

  • Lite (al-Ṣuġrá) includes only unigrams with frequencies 3 and higher;
  • Medium (al-Wusṭá) includes unigrams (5 and higher) and bigrams (10 and higher);
  • Full (al-Kubrá) includes unigrams (3 and higher) and bigrams (5 and higher);

The Lite version of the application is available online at: https://maximromanov.shinyapps.io/OpenITI_NgramReaderPlus_Lite/.

Files

OpenITI_NgramReader_v2020.1_MGR.zip

Files (416.7 MB)

Name Size Download all
md5:b8fe4aefb1b27c398f271fc8d05ce9da
416.7 MB Preview Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.3082464 (DOI)