Published November 30, 2020 | Version v1
Journal article Open

Ethical Consideration in Short Message Service Dataset Corpus Creation

Description

Ever since the first Short Message Service (SMS) service was introduced in 1993, its popularity has continued to soar over the years such that SMS communication now constitutes a major segment in the spectrum of telecommunication. The popularity and extensive usage has attracted the interest of many researchers to the inherent potential in harvesting data and metadata from collection of SMS corpus for the performance of linguistic, diachronic, normalization and sociolinguistic studies and also in the validation and comparison of different classifiers in SMS spam filters. However, freely available dataset where this type of information can be found for research purposes are quite difficult to obtain. This is mostly due to the confidentiality of SMS where users want to reveal as little of the contents of their phones as possible. This work examines the techniques adopted in the creation of SMS corpus and the ethical consideration involved in the protection of users’ interest and privacy. A critical review of existing work in the field was done to ascertain ethical observations adopted and it was discovered that in other to achieve successful SMS corpus creation, the main consideration is the requirement to protect the rights and interests of the message donors and any other person mentioned in the text messages, without altering the original text in order to gather sufficient metadata information. Participant consent, data anonymization, and ensuring participants’ safe information storage are basic ethical consideration adopted to ensure a successful SMS corpus creation in this work.

Files

Volume 9 Issue 11 Paper 5.pdf

Files (497.1 kB)

Name Size Download all
md5:6b4162b184cfdc14940afcc8c7599a63
497.1 kB Preview Download