Dataset Open Access

MESINESP: Post-workshop datasets. Silver Standard and annotator records

Martin Krallinger; Carlos Rodríguez-Penagos; Aitor Gonzalez-Agirre; Alejandro Asensio


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Martin Krallinger</dc:creator>
  <dc:creator>Carlos Rodríguez-Penagos</dc:creator>
  <dc:creator>Aitor Gonzalez-Agirre</dc:creator>
  <dc:creator>Alejandro Asensio</dc:creator>
  <dc:date>2020-07-15</dc:date>
  <dc:description>The MESINESP (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) Challenge was held in May-June 2020, and as a result of a strong participation and the manual annotation of an evaluation dataset, two additional datasets are released now:

1) "all_annotations_withIDsv3.tsv" contains a tab-separated file with all manual annotations (both validated and non-validated) of the evaluation dataset prepared for the competition. It contains the following fields:


	annotatorName: Human annotator id
	documentId: Document ID in the source database
	 decsCode: A DeCS code added to it or validated
	timestamp: When it was added
	validated: if it was validated at that point by another annotator, or not yet
	SpanishTerm: The Spanish descriptor corresponding to the DeCS code
	mesinespId: The internal document id in the distributed  evaluation file
	dataset: if part of the evaluation or the test sets
	source: which database it was taken from


Example:

annotatorName    documentId    decsCode    timestamp    validated    SpanishTerm    mesinespId    dataset    source
A7    biblio-1001069    6893    2020-01-17T11:27:07.000Z    false    caballos    mesinesp-dev-671    dev    LILACS
A7    biblio-1001069    4345    2020-01-17T11:27:12.000Z    false    perros    mesinesp-dev-671    dev    LILACS

 

2) A "Silver Standard" created from the 24 system runs submitted by 6 participating teams. It contains each of the submitted DeCS code for each document in the test set, as well as other information that can help ascertain reliability and source for anyone that wants to use this dataset to enrich their training data. It contains more that 5.8 million datapoints, and is structured as follows


	SubmissionName:  Alias of the team that submitted the run
	REALdocumentId: The real id of the document
	mesinespId:    The mesinesp assigned id in the evaluation dataset
	docSource: The source database
	decsCode: the DeCS code assigned to it by the team's system
	SpanishTerm: The Spanish descriptor of the DeCS code
	MiF: The Micro-f1 scored by that system's run
	MiR: The Micro-Recall scored by that system's run
	MiP:  The Micro-Precision scored by that system's run   
	Acc: The Accuracy scored by that system's run
	consensus: The number of runs where that DeCS code was assigned to this document by the participating teams (max. is 24)


Example:

SubmissionName    REALdocumentId    mesinespId    docSource    decsCode    SpanishTerm    MiF    MiR    MiP    Acc    consensus
AN    ibc-177565    mesinesp-evaluation-00001    IBECS    28567    riesgo    0.2054    0.1930    0.2196    0.1198    4
AN    ibc-177565    mesinesp-evaluation-00001    IBECS    15335    trabajo    0.2054    0.1930    0.2196    0.1198    4
AN    ibc-177565    mesinesp-evaluation-00001    IBECS    33182    conocimiento    0.2054    0.1930    0.2196    0.1198    7
 

For citation and a detailed description of the Challenge, please cite:
Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras. Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering (2020). Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020). Thessaloniki, Greece, September 22--25

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).</dc:description>
  <dc:description>@inproceedings{durusan2019overview,
  title={Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering},
  author={Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras},
  booktitle={Experimental IR Meets Multilinguality, Multimodality, and Interaction Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020), Thessaloniki, Greece, September 22--25, 2020, Proceedings},
  volume={12260},
  year={2020},
  organization={Springer}
}</dc:description>
  <dc:identifier>https://zenodo.org/record/3946558</dc:identifier>
  <dc:identifier>10.5281/zenodo.3946558</dc:identifier>
  <dc:identifier>oai:zenodo.org:3946558</dc:identifier>
  <dc:language>spa</dc:language>
  <dc:relation>doi:10.5281/zenodo.3946557</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/medicalnlp</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>semantic indexing</dc:subject>
  <dc:subject>NLP</dc:subject>
  <dc:subject>DeCS</dc:subject>
  <dc:title>MESINESP: Post-workshop datasets. Silver Standard and annotator records</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
126
22
views
downloads
All versions This version
Views 126126
Downloads 2222
Data volume 837.3 MB837.3 MB
Unique views 112112
Unique downloads 1717

Share

Cite as