Published July 24, 2021 | Version v1
Dataset Open

SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing

  • 1. University of Arizona
  • 2. George Mason University
  • 3. Boston Children's Hospital and Harvard Medical School

Description

Data sharing restrictions are common in NLP datasets. For example, Twitter policies do not allow sharing of tweet text, though tweet IDs may be shared. The situation is even more common in clinical NLP, where patient health information must be protected, and annotations over health text, when released at all, often require the signing of complex data use agreements. The SemEval-2021 Task 10 framework asks participants to develop semantic annotation systems in the face of data sharing constraints. A participant's goal is to develop an accurate system for a target domain when annotations exist for a related domain but cannot be distributed. Instead of annotated training data, participants are given a model trained on the annotations. Then, given unlabeled target domain data, they are asked to make predictions.

Website: https://machine-learning-for-medical-language.github.io/source-free-domain-adaptation/

CodaLab site: https://competitions.codalab.org/competitions/26152

Github repository: https://github.com/Machine-Learning-for-Medical-Language/source-free-domain-adaptation

Files

baselines.zip

Files (205.6 kB)

Name Size Download all
md5:8dc156d135d41cf0905f9568d7791858
12.2 kB Preview Download
md5:b4bd608b9a18a2bc607bc03c0b7ef926
109.0 kB Preview Download
md5:56aab514f9450d8dcf6e5f1e845df789
2.3 kB Preview Download
md5:171f3ab4b1545d1333753c3a3b71b551
41.3 kB Preview Download
md5:b0c9194e0fba78134e29b9ba5d14de1a
40.8 kB Preview Download

Additional details

Funding

Temporal relation discovery for clinical text 5R01LM010090-07
National Institutes of Health
Automated domain adaptation for clinical natural language processing 1R01LM012918-01
National Institutes of Health