Dataset for paper "The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok"

Solarova, Sara; Mosnar, Matej; Tibensky, Matus; Jakubčík, Ján; Bindas, Adrián; Liska, Simon; Hossner, Filip; Mesarčík, Matúš; Srba, Ivan

doi:10.5281/zenodo.18879043

Published March 5, 2026 | Version 1.0

Dataset Restricted

Dataset for paper "The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok"

1. Kempelen Institute of Intelligent Technologies
2. Comenius University Bratislava

This is a dataset accompanying the paper “The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok” presented at the FAccT 2026 conference, designed to analyze video interactions, ad classifications, and user engagement patterns. It contains records of video interactions, including metadata about the videos, user demographics, and ad classifications, allowing the full replication of results presented in the paper.

The video excerpts included in this dataset are used solely as units of content for analytical purposes. They do not represent, reflect, or imply the personal views, intentions, or stance of the individuals who created them. Content should be interpreted as data artifacts, not as statements attributable to any person.

To minimize the risk of third-party misuse, the dataset is available only to researchers for non-commercial research purposes upon verification of their email address associated with academic organisation.

Paper: https://dl.acm.org/doi/10.1145/3805689.3812355

Preprint: https://arxiv.org/abs/2603.05653

GitHub repository: https://github.com/kinit-sk/ai-auditology-advertising-and-minor-profiling-tiktok

Acknowledgemet: This work was partially funded by the EU NextGenerationEU through the Recovery and Resilience Plan forSlovakia under the project AI-Auditology, No. 09I03-03-V03-00020.

References

If you use this dataset in any publication, project, tool or in any other form, please, cite the following paper:

@inproceedings{10.1145/3805689.3812355,
    author = {Solarova, Sara and Mosnar, Matej and Tibensky, Matus and Jakubcik, Jan and Bindas, Adrian and Liska, Simon and Hossner, Filip and Mesar\v{c}\'{\i}k, Mat\'{u}\v{s} and Srba, Ivan},
    title = {The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok},
    year = {2026},
    isbn = {9798400725968},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3805689.3812355},
    doi = {10.1145/3805689.3812355},
    abstract = {Adolescents spend an increasing amount of their time in digital environments where their still-developing cognitive capacities leave them unable to recognize or resist commercial persuasion. Article 28(2) of the Digital Service Act (DSA) responds to this vulnerability by prohibiting profiling-based advertising to minors. However, the regulation's narrow definition of “advertisement” excludes current advertising practices including influencer paid partnerships and brand promotional content that serve functionally equivalent commercial purposes. We provide the first empirical evidence of how this definitional gap operates in practice through an algorithmic audit of TikTok. Our approach deploys sock-puppet accounts simulating a pair of minor and adult users with matching interest profiles. The content recommended to these users is automatically annotated, enabling systematic statistical analysis across four video categories: containing formal, disclosed, undisclosed advertisement and non-advertisement; as well as advertisement topical relevance to user's interest. Our findings reveal a stark regulatory paradox. TikTok demonstrates formal compliance with Article 28(2) by shielding minors from profiled formal advertisements, yet both disclosed and undisclosed ads exhibit significant profiling aligned with user interests (5-8 times stronger than for adult formal advertising). The strongest profiling emerges within undisclosed commercial content, where creators/brands fail to label paid partnership/promotional content and the platform neither corrects this omission nor prevents its personalized delivery to minors. These results demonstrate that minors remain exposed to algorithmically targeted commercial content through the same recommendation mechanisms the DSA seeks to constrain. We argue that protecting minors requires expanding the definition of advertisement in EU law to encompass influencer and brand promotional content, and ensuring that any such expansion is accompanied by a corresponding prohibition on profiling-based targeting of minors, so that commercial content cannot circumvent protections merely by operating outside formal advertising channels.},
    booktitle = {Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency},
    pages = {4811–4835},
    numpages = {25},
    keywords = {Digital Services Act, advertisement, algorithmic auditing, minor profiling, TikTok},
    location = {},
    series = {FAccT '26}
}

Dataset Description

The logs of video presented to individual simulated users are provided in the ai-auditology-advertising-and-minor-profiling-tiktok_video_data.csv file. It is structured into 31 columns, capturing details such as session and video identifiers, timestamps, ad classifications, visual indicators, user demographics, and video metadata.

Column Name	Data Type	Description	Example Value
session_id	string	Session identifier captured during browsing	1765302414.743265
video_id	string	Platform video identifier	[anonymized]
timestamp	datetime	Timestamp when the record was captured	2025-12-09T17:47:56.296448
is_ad	boolean	Whether the video was classified as an ad	false
ad_type	string (nullable)	Ad classification type when is_ad is true	other
ad_topic	string (nullable)	Detected topic for ad content	beauty
visual_indicators	array[string]	List of visual indicators used to classify ads	["hashtag #clearskin"]
reasoning	string	Model reasoning for the ad classification	No disclosure label visible.
interaction_number	integer	Sequential interaction count within the session	1
search_term	string	Search term used to find the content	clear skin
video_action_skip	boolean	Whether the user skipped the video	False
video_action_watch	boolean	Whether the user watched the video	True
video_action_like	boolean	Whether the user liked the video	True
video_action_bookmark	boolean	Whether the user bookmarked the video	True
video_time_watch_loop_start	float (nullable)	Timestamp when watch loop started	1765302470.8245792
video_time_watch_loop_end	float (nullable)	Timestamp when watch loop ended	1765302477.842666
video_time_skip	float (nullable)	Timestamp when the video was skipped	nan
video_time_like	float (nullable)	Timestamp when the video was liked	1765302471.8269806
video_time_bookmark	float (nullable)	Timestamp when the video was bookmarked	1765302477.3054323
video_time_predict_interaction	float (nullable)	Timestamp for predicted interaction (if any)	nan
topic	string	User interest topic used for personalization	beauty
gender	string	User gender	female
country_code	string	User country code	DE
date_of_birth	date	User date of birth	2009-11-29
agent	string	Agent identifier added during processing	Beauty_minor
video_url	string	Full URL to the video	https://www.tiktok.com/[anonymized]
video_author	string	Account handle of the video author	[anonymized]
video_description	string	Video description text	little bonus - your waist? nonexistent #chiaseeds #guthealth
video_time_duration	float	Video duration in seconds	25.866667
video_transcript	string (nullable)	Auto-transcribed video text if available	nan
video_transcript_language	string (nullable)	Language of the transcript	nan

Manual annotations of selected videos (used to assess the accuracy of ad type and topic classification model) are provided in ai-auditology-advertising-and-minor-profiling-tiktok_annotator_1.csv and ai-auditology-advertising-and-minor-profiling-tiktok_annotator_2.csv, for the first and second human annotator respectively.

Ethical considerations

Most of the ethical, legal and societal issues tied to this dataset were already described in the Ethical Considerations section of the associated paper. The most severe risks were tied to a Terms of Service (ToS) violation, various types of privacy intrusions, the possibility of third-party misuse, or the erosion of some privacy rights such as the right to erasure.

The research, from which this dataset resulted from, was done as a part of the research project, which obtained approval from the organisational Ethics Committee (decision as of December 17, 2024). To minimise any potential legal and ethical issues, we directly involved legal and ethics experts as part of this project. Researchers and research engineers conducting this auditing study also participated in four ethics assessment workshops together with ethics and legal experts, where relevant ethical and legal challenges have been identified and appropriate mitigations proposed.

The execution of sockpuppeting audits requires creating automated bots and using them for data collection, which is a potential violation of the terms of service of the social media platforms. However, this breach of ToS is permitted by Article 40 (12) of the EU Act on Digital Services (DSA) if the research concerns systemic risks. This work directly addresses such a systemic risk by the assessment of social media platforms compliance with obligations imposed by legislation, specifically prohibiting profiling-based advertising to minors stated by the Article 28(2) of DSA, as foreseen by Recital 83 of the DSA. Second, the interaction of the bots with the content on the platform may impact the platform and society (e.g., increasing the view or like count). However, we minimise the number of bots that we run. When it comes to data, we collect only publicly available metadata.

To mitigate potential biases and inaccuracies inherent in the Large Vision Model (LVM) used for advertisement classification, we implemented a multi-layered validation process. This included both ad-hoc and systematic manual audits of dataset subsets. Data failing to meet accuracy benchmarks were excluded, and we have reported the estimated error rates accordingly. To prioritize ethical standards and researcher well-being, all manual annotations were conducted solely by the study’s authors, following expert ethical guidelines.

Finally, to support users' rights to rectification and erasure in case of the publication of incorrect or sensitive information, we provide a procedure for them to request the removal of their posts from the dataset or to flag the inaccuracies in the data. To do this, users can contact the authors using the contact form provided for accessing the dataset.

------------------

This work was partially funded by the EU NextGenerationEU through the Recovery and Resilience Plan forSlovakia under the project AI-Auditology, No. 09I03-03-V03-00020.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/18879043">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

In order to share the dataset with you, please agree to the following terms:

You will use dataset strictly only for research purposes. The request for access to the dataset must be sent from the official and existing e-mail address of the relevant university, faculty or other scientific or research institution (for verification purposes).
You will not attempt to identify, deanonymize or contact the authors of the social media posts included in this dataset.
You will not re-share the dataset (or any of its parts) with anyone else not included in this request.
You will appropriately cite the papers mentioned in the dataset description in any publication, project, tool using this dataset.
You understand how the dataset was created and that the manual or automatically predicted annotations may not be 100% correct.
You acknowledge that you are fully responsible for the use of the dataset (data) and for any infringement of rights of third parties (in particular copyright) that may arise from its use beyond the intended purposes. Neither the authors nor Kempelen Institute of Intelligent Technologies (KInIT) are responsible for your actions.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	101	101
Downloads	8	8
Data volume	46.5 MB	46.5 MB

Dataset for paper "The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok"

Authors/Creators

Description

References

Dataset Description

Ethical considerations

Files

Restricted

Request access