Published April 8, 2026 | Version v1
Dataset Open

MAPPING THE DIGITAL SCIENTIFIC DEBATE ON AI: DISCIPLINARY NARRATIVES, PLATFORM DYNAMICS, AND THE ROLE OF MEDIA AND COMMUNICATION

  • 1. Profesora Titular de Universidad
  • 2. University of the Basque Country

Description

This dataset accompanies the article Mapping the Digital Scientific Debate on AI: Disciplinary Narratives, Platform Dynamics, and the Role of Media and Communication. It contains the final analytical sample of 6,215 social media posts discussing artificial intelligence in relation to scientific research, drawn from an initial corpus of 9,844 posts published during the first half of 2025 across Instagram, X, TikTok, LinkedIn, and Bluesky.

The dataset was designed to support transparent and reusable research on how AI is publicly debated as a scientific tool across disciplines and platforms. To maximize privacy protection, it does not include raw post text, usernames, profile data, URLs, or other direct identifiers. Instead, it only contains LLM-inferred and derived analytical fields, following a strict principle of irreversible anonymization and data minimization. This makes the dataset suitable for secondary analysis of discursive patterns while substantially reducing re-identification risks.

The annotation workflow combined computational content analysis with Large Language Models. First, posts were classified into OECD/FORD research fields, with posts not clearly related to academic research excluded from the final released dataset. Second, the valid research-related posts were coded through a closed codebook for dimensions such as disciplinary focus, content type, general topic, AI sentiment, AI stance, risks, opportunities, audience, framing, and synthetic discourse indicators. The final dataset therefore captures structured interpretive variables rather than original social media content.

This resource is intended for researchers interested in science communication, platform studies, AI discourse, computational social science, and the public understanding of science. Because the released file only contains inferred variables, it is especially useful for reproducible quantitative analyses of narrative patterns, disciplinary differences, framing strategies, and platform-specific dynamics without redistributing identifiable platform content.

FORD field legend

  • 0 = None / Not clearly research
  • 1 = Natural sciences
  • 2 = Engineering and technology
  • 3 = Medical and health sciences
  • 4 = Agricultural and veterinary sciences
  • 5 = Social sciences
  • 6 = Humanities and the arts

Files

filtered_dataset_inferred.csv

Files (4.3 MB)

Name Size Download all
md5:c238aa7ea92cd3a1e3b16db14642f77b
4.3 MB Preview Download