Published June 1, 2022 | Version v1
Journal article Open

Online news popularity prediction before publication: effect of readability, emotion, psycholinguistics features

Description

The development of world wide web with easy access to massive information sources anywhere and anytime paves way for more people to rely on online news media rather than print media. The scenario expedites rapid growth of online news industries and leads to substantial competitive pressure. In this work, we propose a set of hybrid features for online news popularity prediction before publication. Two categories of features extracted from news articles, the first being conventional features comprising metadata, temporal, contextual, and embedding vector features, and the second being enhanced features comprising readability, emotion, and psycholinguistics features are extracted from the articles. Apart from analyzing the effectiveness of conventional and enhanced features, we combine these features to come up with a set of hybrid features. We curate an Indian news dataset consisting of news articles from the most rated Indian news websites for the study and also contribute the dataset for future research. Evaluations are performed over the Indian news dataset (IND) and compared with the performance over the benchmark mashable dataset using various supervised machine learning models. Our results indicate that the proposed hybrid of enhanced features with conventional features are highly effective for online news popularity prediction before publication.

Files

15 21095 1570720425.pdf

Files (298.7 kB)

Name Size Download all
md5:b4f4ac0015d8f658691a35d0b6531745
298.7 kB Preview Download