PAN19 Author Profiling: Bots and Gender Profiling

Rangel, Francisco; Rosso, Paolo

doi:10.5281/zenodo.3692340

Published February 18, 2019 | Version v2

Dataset Open

PAN19 Author Profiling: Bots and Gender Profiling

1. Universitat Politècnica de València

Social media bots pose as humans to influence users with commercial, political or ideological purposes. For example, bots could artificially inflate the popularity of a product by promoting it and/or writing positive ratings, as well as undermine the reputation of competitive products through negative valuations. The threat is even greater when the purpose is political or ideological (see Brexit referendum or US Presidential elections). Fearing the effect of this influence, the German political parties have rejected the use of bots in their electoral campaign for the general elections. Furthermore, bots are commonly related to fake news spreading. Therefore, to approach the identification of bots from an author profiling perspective is of high importance from the point of view of marketing, forensics and security.

After having addressed several aspects of author profiling in social media from 2013 to 2018 (age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating whether the author of a Twitter feed is a bot or a human. Furthermore, in case of human, to profile the gender of the author.

The uncompressed dataset consists in a folder per language (en, es). Each folder contains:

A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.
A truth.txt file with the list of authors and the ground truth.

Files

pan19-author-profiling-20200229.zip

Files (129.9 MB)

Name	Size	Download all
pan19-author-profiling-20200229.zip md5:5755cde539af9f26a0b03392325481cb	65.7 MB	Preview Download
pan19-author-profiling-earlybirds-20190320.zip md5:08708924bc72f645b25fd44ab5952215	2.5 MB	Preview Download
pan19-author-profiling-test-2019-04-29.zip md5:ab9be517110dc9732b6fb0f2046fee5c	24.1 MB	Preview Download
pan19-author-profiling-training-dataset-2019-02-18.zip md5:ee1e0e5c30af15751c543c645bee5a25	37.6 MB	Preview Download

Additional details

Francisco Rangel and Paolo Rosso. Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling. In Linda Cappellato, Nicola Ferro, David E. Losada, and Henning Müller, editors, CLEF 2019 Labs and Workshops, Notebook Papers, September 2019. CEUR-WS.org.

	All versions	This version
Views	3,996	3,375
Downloads	674	656
Data volume	46.0 GB	45.2 GB

PAN19 Author Profiling: Bots and Gender Profiling

Authors/Creators

Description

Files

pan19-author-profiling-20200229.zip

Files (129.9 MB)

Additional details

References