UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Dataset Open Access

Past Written Texts Dataset

John Ellul; Marina Polycarpou

Data curator(s)
Evangelia I. Zacharaki

The dataset consists of features extracted from older adults’ text.

The texts were written by the older person either in an electronic mean (eg. older e-mail), or in paper form and were transcribed by the project's clinical nurses.

The texts were then translated to English using the MyMemory service (https://mymemory.translated.net/), and a series of features were generated that can be used for sentiment analysis.

The list of fields of this dataset is presented below:

- Part_id: The user ID, which should be a 4-digit number

- Date: The recording date, which follows the “DD-MM-YY” format (eg. 14 September 2017, is formatted as 14-09-17)

- Clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to

- Transcript: If the text was written by the older adult (0) or was transcribed by a nurse (1)

- Language: The original language of the text (0 = Greek)

- Text_length, Number_of_sentences, Number_of_words, Number_of_words_per_sentence, Text_entropy: Statistical Measures

- Desc_image_ENG_sentiment, Desc_event_sentiment, Prev_text_ENG_sentiment: Sentiment Analysis

- Tf-XX: Term frequency – Inverse document frequency

- Tf-pos-XX: Part of Speech analysis, using tf-idf methodology

Files (2.9 kB)
Name Size
Social Media Sensing Texts.csv
2.9 kB Download
All versions This version
Views 265266
Downloads 173173
Data volume 500.8 kB500.8 kB
Unique views 232233
Unique downloads 153153


Cite as