Published February 7, 2023 | Version 1.1
Dataset Restricted

PAN23 Profiling Cryptocurrency Influencers with Few-shot Learning

  • 1. Symanto Research
  • 2. Universitat Politècnica de València


This is the dataset for the shared task on Profiling Cryptocurrency Influencers with Few-shot Learning. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.


Task: In this shared task we aim to profile cryptocurrency influencers in social media, from a low-resource perspective. Moreover, we propose to categorize other related aspects of the influencers, also using a low-resource setting. Specifically, we focus on English Twitter posts for three different sub-tasks:

  1. Low-resource influencer profiling (subtask1):
    • Input:
      32 users per label with a maximum of 10 English tweets each.
      Classes: (1) null, (2) nano, (3) micro, (4) macro, (5) mega
    • Official evaluation metric: Macro F1
    • Submission: TIRA.
    • Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]
  2. Low-resource influencer interest identification (subtask2):
    • Input:
      64 users per label with 1 English tweet each.
      Classes: (1) technical information, (2) price update, (3) trading matters, (4) gaming, (5) other
    • Official evaluation metric: Macro F1
    • Submission: TIRA.
    • Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]
  3. Low-resource influencer intent identification (subtask3):
    • Input:
      64 users per label with 1 English tweets each.
      Classes: (1) subjective opinion, (2) financial information, (3) advertising, (4) announcement
    • Official evaluation metric: Macro F1
    • Submission: TIRA.
    • Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]


  • 1.0: initial upload
  • 1.1 fixed a minor bug where some users contained some non-English text. Since English is the target language in the competition, all non-English texts have been replaced or removed. 



The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Please let us know what you want to use the dataset for. Please also include your institution and supervisor, if any. 

You are currently not logged in. Do you have an account? Log in here