There is a newer version of the record available.

Published June 5, 2025 | Version 1.0
Dataset Open

SportsOri: A Novel Dataset for Analyzing Public Sentiment on Controversial Sports Events in YouTube Comments

  • 1. ROR icon International Institute of Information Technology
  • 2. ROR icon Savitribai Phule Pune University
  • 3. ROR icon Indian Institute of Science Education and Research Kolkata

Description

 

Sports engages billions of followers worldwide and impacts the
economy. Sports controversies often ignite passionate discus-
sions among fans, analysts, and players. With the rise of social
media, platforms like YouTube have become central to these discus-
sions. This study aims to analyze the stances or perform opinion
mining namely for, against, and neutral on comments from fa-
mous social media platforms like YouTube for famous public sports
controversies.

 

To our knowledge, it is the first-ever study and dataset (hand curated) of civic
engagement in controversial sports events spanning around 40 years.
LLMs (Llama and Deepseek reasoning family) were used for initial
annotations (stance) of comments and later fine-tuned for comparative performance analysis ( 30% boost in accuracy).

 

This dataset presents a collection of YouTube comments (around 43k)  on famous
and controversial Public Sports Events.

 

We explore public sentiment analysis (stance detection) on a total of 6 famous controversial 
sports incidents by extracting and processing YouTube comments.
Stance detection is performed on those events through fine-tuning
of models like Llama-3.1-8b and Deepseek reasoning models (Llama-
8b distilled) on comments from events like The Underarm Incident,
Jonny Bairstow’s Run-Out Incident, Ashwin’s Mankading Event,
Luis Suarez Handball Event etc. 

Files

Annotation Pipeline and Fine Tuning Details.pdf

Files (5.5 MB)

Name Size Download all
md5:e038abd2c81066bec0d5d1e76fe60528
161.6 kB Preview Download
md5:29af6b21551a99bc50dfc48a49f69bd2
212.3 kB Preview Download
md5:e26b65fed3b3dc860fea2ebf26298fd2
117.8 kB Preview Download
md5:9aeaf47c4d1c2b0c82c711b60c822031
138.0 kB Preview Download
md5:8080165c13a14735e4fcb5b439343e04
1.0 MB Download
md5:639c551a18523560d2755aa02cc2698a
319.5 kB Preview Download
md5:5392cd596138837b48d2189c1cab57a3
543.1 kB Preview Download
md5:ed894304d3b967d67e2fdaa5935dcd09
977.1 kB Download
md5:231ef8ab1b543a54048464c5220a9fd3
284.9 kB Preview Download
md5:d7608839a13aa2ce48e4fbba4c544e0c
413.0 kB Download
md5:178fc60c850be670d589161391419425
245.7 kB Preview Download
md5:5d63925f24aef594d8fdd07509fde9f6
19.8 kB Preview Download
md5:747e0874692c5dfb22b6f0e4869cd3d1
4.2 kB Preview Download
md5:16aa64aefa589fa339914802e3b5ea03
755.8 kB Preview Download
md5:548779a46deaeccfb899857c3f0f33b8
251.7 kB Download

Additional details

Dates

Collected
2024-05
Data Collected