MAPPING THE DIGITAL SCIENTIFIC DEBATE ON AI: DISCIPLINARY NARRATIVES, PLATFORM DYNAMICS, AND THE ROLE OF MEDIA AND COMMUNICATION

Larrondo Ureta, Ainara; Peña-Fernández, Simón; Morales-i-Gras, Jordi

doi:10.5281/zenodo.19466478

Published April 8, 2026 | Version v1

Dataset Open

MAPPING THE DIGITAL SCIENTIFIC DEBATE ON AI: DISCIPLINARY NARRATIVES, PLATFORM DYNAMICS, AND THE ROLE OF MEDIA AND COMMUNICATION

1. Profesora Titular de Universidad
2. University of the Basque Country

This dataset accompanies the article Mapping the Digital Scientific Debate on AI: Disciplinary Narratives, Platform Dynamics, and the Role of Media and Communication. It contains the final analytical sample of 6,215 social media posts discussing artificial intelligence in relation to scientific research, drawn from an initial corpus of 9,844 posts published during the first half of 2025 across Instagram, X, TikTok, LinkedIn, and Bluesky.

The dataset was designed to support transparent and reusable research on how AI is publicly debated as a scientific tool across disciplines and platforms. To maximize privacy protection, it does not include raw post text, usernames, profile data, URLs, or other direct identifiers. Instead, it only contains LLM-inferred and derived analytical fields, following a strict principle of irreversible anonymization and data minimization. This makes the dataset suitable for secondary analysis of discursive patterns while substantially reducing re-identification risks.

The annotation workflow combined computational content analysis with Large Language Models. First, posts were classified into OECD/FORD research fields, with posts not clearly related to academic research excluded from the final released dataset. Second, the valid research-related posts were coded through a closed codebook for dimensions such as disciplinary focus, content type, general topic, AI sentiment, AI stance, risks, opportunities, audience, framing, and synthetic discourse indicators. The final dataset therefore captures structured interpretive variables rather than original social media content.

This resource is intended for researchers interested in science communication, platform studies, AI discourse, computational social science, and the public understanding of science. Because the released file only contains inferred variables, it is especially useful for reproducible quantitative analyses of narrative patterns, disciplinary differences, framing strategies, and platform-specific dynamics without redistributing identifiable platform content.

FORD field legend

0 = None / Not clearly research
1 = Natural sciences
2 = Engineering and technology
3 = Medical and health sciences
4 = Agricultural and veterinary sciences
5 = Social sciences
6 = Humanities and the arts

Files

filtered_dataset_inferred.csv

Files (4.3 MB)

Name	Size	Download all
filtered_dataset_inferred.csv md5:c238aa7ea92cd3a1e3b16db14642f77b	4.3 MB	Preview Download

	All versions	This version
Views	26	26
Downloads	12	12
Data volume	77.6 MB	77.6 MB

MAPPING THE DIGITAL SCIENTIFIC DEBATE ON AI: DISCIPLINARY NARRATIVES, PLATFORM DYNAMICS, AND THE ROLE OF MEDIA AND COMMUNICATION

Authors/Creators

Description

Files

filtered_dataset_inferred.csv

Files (4.3 MB)