OpenITI NgramReader+
Description
Like Google Ngram Viewer (https://books.google.com/ngrams), OpenITI NgramReader+ charts diachronic frequencies of words and phrases, using the data of the OpenITI corpus (Arabic data only). Unlike Google Ngram Viewer
, however, OpenITI NgramReader+ allows one to combine ngrams, which helps to combine different morphological forms together, and to explore classes of objects. Why to combine forms? Arabic morphology is complex and the same word can appear in a large variety of forms: for example, kitāb, al-kitāb, wa-kitāb, wa-l-kitāb are instances of the same lemma and one might want to combine all or only some forms into a single entity. OpenITI NgramReader+ allows one to do that with regular expressions
. This approach also allows one to create thematic clusters of words. For example, one can combine Baġdād and Madīnaŧ al-salām in order to get all mentions of the ʿAbbāsid capital; or, to combine together all cities of Ḫurāsān in order to gauge frequencies of references to Ḫurāsān in general.
There are three versions of the OpenITI NgramReader+:
- Lite (al-Ṣuġrá) includes only unigrams with frequencies 3 and higher;
- Medium (al-Wusṭá) includes unigrams (5 and higher) and bigrams (10 and higher);
- Full (al-Kubrá) includes unigrams (3 and higher) and bigrams (5 and higher);
The Lite version of the application is available online at: https://maximromanov.shinyapps.io/OpenITI_NgramReaderPlus_Lite/.
Files
OpenITI_NgramReader_v2020.1_MGR.zip
Files
(416.7 MB)
Name | Size | Download all |
---|---|---|
md5:b8fe4aefb1b27c398f271fc8d05ce9da
|
416.7 MB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.3082464 (DOI)