Software Open Access

OpenITI NgramReader+

Romanov, Maxim

Like Google Ngram Viewer (https://books.google.com/ngrams), OpenITI NgramReader+ charts diachronic frequencies of words and phrases, using the data of the OpenITI corpus (Arabic data only). Unlike Google Ngram Viewer, however, OpenITI NgramReader+ allows one to combine ngrams, which helps to combine different morphological forms together, and to explore classes of objects. Why to combine forms? Arabic morphology is complex and the same word can appear in a large variety of forms: for example, kitābal-kitābwa-kitābwa-l-kitāb are instances of the same lemma and one might want to combine all or only some forms into a single entity. OpenITI NgramReader+ allows one to do that with regular expressions. This approach also allows one to create thematic clusters of words. For example, one can combine Baġdād and Madīnaŧ al-salām in order to get all mentions of the ʿAbbāsid capital; or, to combine together all cities of Ḫurāsān in order to gauge frequencies of references to Ḫurāsān in general.

There are three versions of the OpenITI NgramReader+:

  • Lite (al-Ṣuġrá) includes only unigrams with frequencies 3 and higher;
  • Medium (al-Wusṭá) includes unigrams (5 and higher) and bigrams (10 and higher);
  • Full (al-Kubrá) includes unigrams (3 and higher) and bigrams (5 and higher);

The Lite version of the application is available online at: https://maximromanov.shinyapps.io/OpenITI_NgramReaderPlus_Lite/.

Files (416.7 MB)
Name Size
OpenITI_NgramReader_v2020.1_MGR.zip
md5:b8fe4aefb1b27c398f271fc8d05ce9da
416.7 MB Download
167
21
views
downloads
All versions This version
Views 167167
Downloads 2121
Data volume 8.8 GB8.8 GB
Unique views 140140
Unique downloads 1717

Share

Cite as