There is a newer version of the record available.

Published October 7, 2025 | Version v1

Sentiment and Linguistic Analysis of Epidemic Outbreak Data from Official and Alternative Sources

Description

Information on epidemic outbreaks is a key input for health surveillance, as it allows for the assessment of the spread and associated social perception. This study examines emotional and linguistic patterns in narratives disseminated by international organizations (WHO, UN, CDC) and digital platforms (Google News and Reddit) over a three-month period. The KDD process was applied in R Studio (selection, preprocessing, transformation, modeling, and evaluation), using Bing and NRC lexicons and a supervised Naive Bayes model to enhance the detection of emotional nuances. A total of 12,340 texts (3,100 from official sources, 4,240 from Google News, and 5,000 from Reddit) were analyzed using standardized queries in English: pandemic, confinement, epidemic, and HMPV. Official sources showed a greater presence of positive emotions linked to cooperation and security; Google News concentrated negative narratives with terms such as risk and dangerous; Reddit combined fear and sadness with appearances of hope. The analysis included t-tests and ANOVA with 95% confidence intervals. The work is exploratory and preliminary in nature and suggests that surveillance systems should integrate the monitoring of social networks and digital media, along with public policy measures to improve communication in health crisis situations.

Files

Paper443.pdf

Files (824.4 kB)

Name Size Download all
md5:40031710c8d916f2c4ccf3a1a17845fd
824.4 kB Preview Download

Additional details

Dates

Available
2025-10-07
Early Access