StergiosChatzikyriakidis/Greek_dialect_corpus: v1.0
Creators
Description
A collection of raw text from various Greek dialects. Contains data from the following dialects:
- Cypriot Greek
- Cretan Greek
- Pontic Greek
- Northern Greek
- Some part of the Modern Greek wikipedia
The repository contains data collected from the web and other textual resources (blogs, websites, theatrical plays among other things). The folder SMG_CG contains twitter data from Standard Modern Greek and Cypriot that have been originally collected by Hanna Sababa for her project A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek. Mr Sfakianakis is thanked from providing us with his Cretan translations of a number of Ancient Greek tragedies and comedies. The folder all_dialects contains a zip file that has the collection of data with minimal pre-processing and annotation for the respective dialect. Stergios Chatzikyriakidis gratefully acknowledges funding of the European Commission under TALOS AI for SSH Grant agreement ID: 101087269.
Files
StergiosChatzikyriakidis/Greek_dialect_corpus-v1.0.zip
Files
(66.9 MB)
Name | Size | Download all |
---|---|---|
md5:06ec924a67d35fbb6e8a0018b1bb3f93
|
66.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/StergiosChatzikyriakidis/Greek_dialect_corpus/tree/v1.0 (URL)