{ "access": { "embargo": { "active": false, "reason": null }, "files": "restricted", "record": "public", "status": "restricted" }, "created": "2021-03-14T10:25:59.762065+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "files": { "enabled": true }, "id": "4603578", "is_draft": false, "is_published": true, "links": { "access": "https://zenodo.org/api/records/4603578/access", "access_links": "https://zenodo.org/api/records/4603578/access/links", "access_request": "https://zenodo.org/api/records/4603578/access/request", "access_users": "https://zenodo.org/api/records/4603578/access/users", "archive": "https://zenodo.org/api/records/4603578/files-archive", "archive_media": "https://zenodo.org/api/records/4603578/media-files-archive", "communities": "https://zenodo.org/api/records/4603578/communities", "communities-suggestions": "https://zenodo.org/api/records/4603578/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.4603578", "draft": "https://zenodo.org/api/records/4603578/draft", "files": "https://zenodo.org/api/records/4603578/files", "latest": "https://zenodo.org/api/records/4603578/versions/latest", "latest_html": "https://zenodo.org/records/4603578/latest", "media_files": "https://zenodo.org/api/records/4603578/media-files", "parent": "https://zenodo.org/api/records/4603577", "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.4603577", "parent_html": "https://zenodo.org/records/4603577", "requests": "https://zenodo.org/api/records/4603578/requests", "reserve_doi": "https://zenodo.org/api/records/4603578/draft/pids/doi", "self": "https://zenodo.org/api/records/4603578", "self_doi": "https://zenodo.org/doi/10.5281/zenodo.4603578", "self_html": "https://zenodo.org/records/4603578", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:4603578/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:4603578/sequence/default", "versions": "https://zenodo.org/api/records/4603578/versions" }, "media_files": { "enabled": false }, "metadata": { "creators": [ { "affiliations": [ { "name": "SYMANTO RESEARCH" } ], "person_or_org": { "family_name": "FRANCISCO RANGEL", "name": "FRANCISCO RANGEL", "type": "personal" } }, { "affiliations": [ { "name": "UNIVERSITAT POLIT\u00c8CNICA DE VAL\u00c8NCIA" } ], "person_or_org": { "family_name": "BERTa CHULVI", "name": "BERTa CHULVI", "type": "personal" } }, { "affiliations": [ { "name": "UNIVERSITAT POLIT\u00c8CNICA DE VAL\u00c8NCIA" } ], "person_or_org": { "family_name": "GRETEL LIZ DE LA PE\u00d1A", "name": "GRETEL LIZ DE LA PE\u00d1A", "type": "personal" } }, { "affiliations": [ { "name": "UNIVERSIT\u00c0 DEGLI ESTUDI DI MILANO - BICOCCA" } ], "person_or_org": { "family_name": "ELISABETTA FERSINI", "name": "ELISABETTA FERSINI", "type": "personal" } }, { "affiliations": [ { "name": "UNIVERSITAT POLIT\u00c8CNICA DE VAL\u00c8NCIA" } ], "person_or_org": { "family_name": "PAOLO ROSSO", "name": "PAOLO ROSSO", "type": "personal" } } ], "description": "
Task
\n\nHate speech (HS) is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics. Given the huge amount of user-generated contents on Twitter, the problem of detecting, and therefore possibly contrasting the HS diffusion, is becoming fundamental, for instance for fighting against misogyny and xenophobia. To this end, in this task, we aim at identifying possible hate speech spreaders on Twitter as a first step towards preventing hate speech from being propagated among online users.
\n\nAfter having addressed several aspects of author profiling in social media from 2013 to 2020 (fake news spreaders, bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possible to discriminate authors that have shared some hate speech in the past from those that, to the best of our knowledge, have never done it.
\n\nAs in previous years, we propose the task from a multilingual perspective:
\n\nNOTE: Although we recommend participating in both languages (English and Spanish), it is possible to address the problem just for one language.
\n\nAward
\n\nWe are happy to announce that the best performing team at the 9th International Competition on Author Profiling will be awarded 300,- Euro sponsored by Symanto
\n\nData
\n\nInput
\n\nThe uncompressed dataset consists of a folder per language (en, es). Each folder contains:
\n\nThe format of the XML files is:
\n\n<author lang="en">\n <documents>\n <document>Tweet 1 textual contents</document>\n <document>Tweet 2 textual contents</document>\n ...\n </documents>\n </author>\n\n\n
The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.
\n\nb2d5748083d6fdffec6c2d68d4d4442d:::0\n 2bed15d46872169dc7deaf8d2b43a56:::0\n 8234ac5cca1aed3f9029277b2cb851b:::1\n 5ccd228e21485568016b4ee82deb0d28:::0\n 60d068f9cafb656431e62a6542de2dc0:::1\n ...\n\n\n
Output
\n\nYour software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:
\n\n<author id="author-id"\n lang="en|es"\n type="0|1"\n />\n\n\n
The naming of the output files is up to you. However, we recommend using the author-id as filename and "XML" as an extension.
\n\nIMPORTANT! Languages should not be mixed. A folder should be created for each language and place inside only the files with the prediction for this language.
\n\nEvaluation
\n\nThe performance of your system will be ranked by accuracy. For each language, we will calculate individual accuracies in discriminating between the two classes. Finally, we will average the accuracy values per language to obtain the final ranking.
\n\nRelated Work
\n\nYou may request access to the files in this upload, provided that you fulfil the conditions below. The decision of whether to grant/deny access is solely under the responsibility of the record owner.
", "allow_guest_requests": true, "allow_user_requests": true, "secret_link_expiration": 30 } }, "communities": { "default": "15e3c329-b60b-4e66-86e8-8572ddccbf2c", "entries": [ { "access": { "member_policy": "open", "members_visibility": "public", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2019-10-08T12:14:16.763411+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "15e3c329-b60b-4e66-86e8-8572ddccbf2c", "links": {}, "metadata": { "curation_policy": "", "page": "PAN (pan.webis.de) is a series of scientific events and shared tasks on digital text forensics and stylometry.
", "title": "PAN" }, "revision_id": 0, "slug": "pan", "updated": "2019-11-06T11:14:54.540579+00:00" } ], "ids": [ "15e3c329-b60b-4e66-86e8-8572ddccbf2c" ] }, "id": "4603577", "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.4603577", "provider": "datacite" } } }, "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.4603578", "provider": "datacite" }, "oai": { "identifier": "oai:zenodo.org:4603578", "provider": "oai" } }, "revision_id": 7, "stats": { "all_versions": { "data_volume": 874346564.0, "downloads": 353, "unique_downloads": 282, "unique_views": 3063, "views": 4121 }, "this_version": { "data_volume": 612086856.0, "downloads": 222, "unique_downloads": 207, "unique_views": 2158, "views": 2957 } }, "status": "published", "updated": "2022-06-11T07:53:43.224967+00:00", "versions": { "index": 1, "is_latest": false } }