Published March 3, 2026 | Version v1

Daily Mail User Comments Dataset (2021 Random Sample, N = 150,000)

  • 1. FAS Research

Description

This dataset contains a random sample of 150,000 user comments posted on the Daily Mail website in 2021. The sample was drawn from a larger corpus of more than 40 million comments collected from 224,981 Daily Mail articles using a custom Python-based web scraping workflow.

To ensure suitability for computational text analysis, only comments containing at least 20 words were included in the sample. The dataset contains the full comment text, article-level metadata, timestamps, and community feedback indicators such as vote counts and rating scores.

The dataset is intended for research in computational social science, digital media studies, discourse analysis, and sentiment analysis.

Column description

  • RowID: Sequential row identifier within the exported dataset.

  • AssetId: Identifier of the Daily Mail article to which the comment belongs.

  • category: Content category/section of the article (e.g. news, sport, femail, tvshowbiz).

  • custom_id: Unique identifier of the comment.

  • AssetHeadline: Headline/title of the article.

  • DateCreated: Date and time when the comment was created; stored in the file as a numeric date value.

  • AssetCommentCount: Total number of comments associated with the article.

  • AssetUrl: URL path of the corresponding Daily Mail article.

  • message: Full text of the user comment.

  • year: Year of publication/collection of the comment (2021).

  • VoteCount: Total number of votes received by the comment.

  • VoteRating: Net rating of the comment, calculated as positive votes minus negative votes.

  • pos_votes: Number of positive votes received by the comment.

  • neg_votes: Number of negative votes received by the comment.

Files

Files (40.0 MB)

Name Size Download all
md5:edcdd07cfd8f82e05fb1118462c26a7f
40.0 MB Download

Additional details

Funding

European Commission
SMIDGE - Social Media narratives: addressing extremism in middle age 101095290