Dataset Open Access

Past Written Texts Dataset

John Ellul; Marina Polycarpou

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:contributor>Evangelia I. Zacharaki</dc:contributor>
  <dc:creator>John Ellul</dc:creator>
  <dc:creator>Marina Polycarpou</dc:creator>
  <dc:description>The dataset consists of features extracted from older adults’ text.

The texts were written by the older person either in an electronic mean (eg. older e-mail), or in paper form and were transcribed by the project's clinical nurses.

The texts were then translated to English using the MyMemory service (, and a series of features were generated that can be used for sentiment analysis.

The list of fields of this dataset is presented below:

- Part_id: The user ID, which should be a 4-digit number

- Date: The recording date, which follows the “DD-MM-YY” format (eg. 14 September 2017, is formatted as 14-09-17)

- Clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to

- Transcript: If the text was written by the older adult (0) or was transcribed by a nurse (1)

- Language: The original language of the text (0 = Greek)

- Text_length, Number_of_sentences, Number_of_words, Number_of_words_per_sentence, Text_entropy: Statistical Measures

- Desc_image_ENG_sentiment, Desc_event_sentiment, Prev_text_ENG_sentiment: Sentiment Analysis

- Tf-XX: Term frequency – Inverse document frequency

- Tf-pos-XX: Part of Speech analysis, using tf-idf methodology</dc:description>
  <dc:subject>social media sensing</dc:subject>
  <dc:subject>sentiment analysis</dc:subject>
  <dc:subject>text-based sentiment analysis</dc:subject>
  <dc:title>Past Written Texts Dataset</dc:title>
All versions This version
Views 225226
Downloads 139139
Data volume 402.4 kB402.4 kB
Unique views 195196
Unique downloads 124124


Cite as