Dataset Open Access
John Ellul; Marina Polycarpou
The dataset consists of features extracted from older adults’ text.
The texts were written by the older person either in an electronic mean (eg. older e-mail), or in paper form and were transcribed by the project's clinical nurses.
The texts were then translated to English using the MyMemory service (https://mymemory.translated.net/), and a series of features were generated that can be used for sentiment analysis.
The list of fields of this dataset is presented below:
- Part_id: The user ID, which should be a 4-digit number
- Date: The recording date, which follows the “DD-MM-YY” format (eg. 14 September 2017, is formatted as 14-09-17)
- Clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to
- Transcript: If the text was written by the older adult (0) or was transcribed by a nurse (1)
- Language: The original language of the text (0 = Greek)
- Text_length, Number_of_sentences, Number_of_words, Number_of_words_per_sentence, Text_entropy: Statistical Measures
- Desc_image_ENG_sentiment, Desc_event_sentiment, Prev_text_ENG_sentiment: Sentiment Analysis
- Tf-XX: Term frequency – Inverse document frequency
- Tf-pos-XX: Part of Speech analysis, using tf-idf methodology