Persuasion Sentences in Spam Email (PerSentSE)

Jáñez-Martino, Francisco; Barrón-Cedeño, Alberto; ALAIZ-RODRÍGUEZ, ROCÍO; González-Castro, Víctor

doi:10.5281/zenodo.14585764

Published January 1, 2025 | Version v1

Dataset Open

Persuasion Sentences in Spam Email (PerSentSE)

1. University of Bologna
2. Universidad de León

How to Access:

To access this dataset, please contact Francisco Janez via email at francisco.janez@unileon.es. Access will be granted based on specific requests.

Purpose:
The PerSentSE corpus was developed to study persuasive techniques in spam emails. It includes 130 emails randomly selected from the SpamArchive2122 dataset, which contains over 20,000 spam emails in English.

Methodology:

Segmentation: Emails were divided into sentences using the NLTK library.
Annotation: Eight persuasive techniques, along with a "non-persuasion" class, were identified. Two expert annotators labeled an initial subset of emails to measure inter-annotator agreement, achieving a final acceptable level (γ = 0.63).

Corpus Statistics:

Total sentences: 1,075
Persuasive sentences: 216 (20.1%)

Persuasion Distribution by Email Sections (Table 7):

Subject lines: 35.59% persuasive, with an average of 1.62 techniques.
Greeting section: 54.17% persuasive, averaging 1.46 techniques.
Email body: 82.46% persuasive, with 5.51 techniques on average.
Farewell section: 31.43% persuasive, averaging 1.45 techniques.

Co-occurrence of Techniques (Figure 2):
Some persuasive techniques frequently appeared together:

Appeal to Fear/Prejudice with Loaded Language: 25 instances.
Exaggeration/Minimization with Loaded Language: 24 instances.
Appeal to Fear/Prejudice with Exaggeration/Minimization: 20 instances.

Findings:
The body section of emails concentrates the highest number of persuasive elements, contrary to earlier studies focusing on subject lines alone. This suggests that spam emails rely heavily on persuasive content in their main text.

Files

Files (167.6 kB)

Name	Size	Download all
PerSentSE.tsv md5:9ccd95a1d5e21a3e3f2f5eb8c40a2294	167.6 kB	Download

Additional details

Is published in: Publication: 10.1016/j.eswa.2024.125767 (DOI)

	All versions	This version
Views	107	107
Downloads	32	32
Data volume	5.5 MB	5.5 MB

Persuasion Sentences in Spam Email (PerSentSE)

Creators

Description

Files

Files (167.6 kB)

Additional details

Related works