Dataset Open Access

MESINESP: Post-workshop datasets. Silver Standard and annotator records

Martin Krallinger; Carlos Rodríguez-Penagos; Aitor Gonzalez-Agirre; Alejandro Asensio


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">spa</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">semantic indexing</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">NLP</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">DeCS</subfield>
  </datafield>
  <controlfield tag="005">20200716005920.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">@inproceedings{durusan2019overview,
  title={Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering},
  author={Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras},
  booktitle={Experimental IR Meets Multilinguality, Multimodality, and Interaction Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020), Thessaloniki, Greece, September 22--25, 2020, Proceedings},
  volume={12260},
  year={2020},
  organization={Springer}
}</subfield>
  </datafield>
  <controlfield tag="001">3946558</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">September 22--25</subfield>
    <subfield code="g">BioASQ 2020</subfield>
    <subfield code="a">The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering</subfield>
    <subfield code="c">Thessaloniki, Greece</subfield>
    <subfield code="n">Task Mesinesp</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Barcelona Supercomputing Center</subfield>
    <subfield code="a">Carlos Rodríguez-Penagos</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Barcelona Supercomputing Center</subfield>
    <subfield code="a">Aitor Gonzalez-Agirre</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Barcelona Supercomputing Center</subfield>
    <subfield code="a">Alejandro Asensio</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">6763519</subfield>
    <subfield code="z">md5:415b0dd7a193160da73ea938e67c4fee</subfield>
    <subfield code="u">https://zenodo.org/record/3946558/files/all_annotations_withIDsv3.tsv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">64141446</subfield>
    <subfield code="z">md5:891a0483d212199f339751dc51300a37</subfield>
    <subfield code="u">https://zenodo.org/record/3946558/files/mesinesp_silver_standard.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u">http://bioasq.org/workshop</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-07-15</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-medicalnlp</subfield>
    <subfield code="o">oai:zenodo.org:3946558</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Barcelona Supercomputing Center</subfield>
    <subfield code="a">Martin Krallinger</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">MESINESP: Post-workshop datasets. Silver Standard and annotator records</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-medicalnlp</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The MESINESP&amp;nbsp;(Spanish BioASQ track, see https://temu.bsc.es/mesinesp) Challenge was held in May-June 2020, and as a result of a strong participation and the manual annotation of an evaluation dataset, two additional datasets are released now:&lt;/p&gt;

&lt;p&gt;1) &amp;quot;all_annotations_withIDsv3.tsv&amp;quot; contains a tab-separated file with all manual annotations (both validated and non-validated) of the evaluation dataset prepared for the competition. It contains the following fields:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;annotatorName: Human annotator id&lt;/li&gt;
	&lt;li&gt;documentId: Document ID in the source database&lt;/li&gt;
	&lt;li&gt;&amp;nbsp;decsCode: A DeCS code added to it or validated&lt;/li&gt;
	&lt;li&gt;timestamp: When it was added&lt;/li&gt;
	&lt;li&gt;validated: if it was validated at that point by another annotator, or not yet&lt;/li&gt;
	&lt;li&gt;SpanishTerm: The Spanish descriptor corresponding to the DeCS code&lt;/li&gt;
	&lt;li&gt;mesinespId: The internal document id in the distributed&amp;nbsp; evaluation file&lt;/li&gt;
	&lt;li&gt;dataset: if part of the evaluation or the test sets&lt;/li&gt;
	&lt;li&gt;source: which database it was taken from&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;annotatorName&amp;nbsp;&amp;nbsp; &amp;nbsp;documentId&amp;nbsp;&amp;nbsp; &amp;nbsp;decsCode&amp;nbsp;&amp;nbsp; &amp;nbsp;timestamp&amp;nbsp;&amp;nbsp; &amp;nbsp;validated&amp;nbsp;&amp;nbsp; &amp;nbsp;SpanishTerm&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinespId&amp;nbsp;&amp;nbsp; &amp;nbsp;dataset&amp;nbsp;&amp;nbsp; &amp;nbsp;source&lt;/strong&gt;&lt;br&gt;
A7&amp;nbsp;&amp;nbsp; &amp;nbsp;biblio-1001069&amp;nbsp;&amp;nbsp; &amp;nbsp;6893&amp;nbsp;&amp;nbsp; &amp;nbsp;2020-01-17T11:27:07.000Z&amp;nbsp;&amp;nbsp; &amp;nbsp;false&amp;nbsp;&amp;nbsp; &amp;nbsp;caballos&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinesp-dev-671&amp;nbsp;&amp;nbsp; &amp;nbsp;dev&amp;nbsp;&amp;nbsp; &amp;nbsp;LILACS&lt;br&gt;
A7&amp;nbsp;&amp;nbsp; &amp;nbsp;biblio-1001069&amp;nbsp;&amp;nbsp; &amp;nbsp;4345&amp;nbsp;&amp;nbsp; &amp;nbsp;2020-01-17T11:27:12.000Z&amp;nbsp;&amp;nbsp; &amp;nbsp;false&amp;nbsp;&amp;nbsp; &amp;nbsp;perros&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinesp-dev-671&amp;nbsp;&amp;nbsp; &amp;nbsp;dev&amp;nbsp;&amp;nbsp; &amp;nbsp;LILACS&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;2) A &amp;quot;Silver Standard&amp;quot; created from the 24 system runs submitted by 6 participating teams. It contains each of the submitted DeCS code for each document in the test set, as well as other information that can help ascertain reliability and source for anyone that wants to use this dataset to enrich their training data. It contains more that 5.8 million datapoints, and is structured as follows&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;SubmissionName:&amp;nbsp; Alias of the team that submitted the run&lt;/li&gt;
	&lt;li&gt;REALdocumentId: The real id of the document&lt;/li&gt;
	&lt;li&gt;mesinespId:&amp;nbsp; &amp;nbsp; The mesinesp assigned id in the evaluation dataset&lt;/li&gt;
	&lt;li&gt;docSource: The source database&lt;/li&gt;
	&lt;li&gt;decsCode: the DeCS code assigned to it by the team&amp;#39;s system&lt;/li&gt;
	&lt;li&gt;SpanishTerm: The Spanish descriptor of&amp;nbsp;the DeCS code&lt;/li&gt;
	&lt;li&gt;MiF: The Micro-f1 scored by that system&amp;#39;s run&lt;/li&gt;
	&lt;li&gt;MiR:&amp;nbsp;The Micro-Recall&amp;nbsp;scored by that system&amp;#39;s run&lt;/li&gt;
	&lt;li&gt;MiP:&amp;nbsp; The Micro-Precision&amp;nbsp;scored by that system&amp;#39;s run&amp;nbsp; &amp;nbsp;&lt;/li&gt;
	&lt;li&gt;Acc: The Accuracy scored by that system&amp;#39;s run&lt;/li&gt;
	&lt;li&gt;consensus: The number of runs where that DeCS code was assigned to this document by the participating teams (max. is 24)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Example:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SubmissionName&amp;nbsp;&amp;nbsp; &amp;nbsp;REALdocumentId&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinespId&amp;nbsp;&amp;nbsp; &amp;nbsp;docSource&amp;nbsp;&amp;nbsp; &amp;nbsp;decsCode&amp;nbsp;&amp;nbsp; &amp;nbsp;SpanishTerm&amp;nbsp;&amp;nbsp; &amp;nbsp;MiF&amp;nbsp;&amp;nbsp; &amp;nbsp;MiR&amp;nbsp;&amp;nbsp; &amp;nbsp;MiP&amp;nbsp;&amp;nbsp; &amp;nbsp;Acc&amp;nbsp;&amp;nbsp; &amp;nbsp;consensus&lt;/strong&gt;&lt;br&gt;
AN&amp;nbsp; &amp;nbsp; ibc-177565&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinesp-evaluation-00001&amp;nbsp;&amp;nbsp; &amp;nbsp;IBECS&amp;nbsp;&amp;nbsp; &amp;nbsp;28567&amp;nbsp;&amp;nbsp; &amp;nbsp;riesgo&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2054&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1930&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2196&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1198&amp;nbsp;&amp;nbsp; &amp;nbsp;4&lt;br&gt;
AN&amp;nbsp;&amp;nbsp; &amp;nbsp;ibc-177565&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinesp-evaluation-00001&amp;nbsp;&amp;nbsp; &amp;nbsp;IBECS&amp;nbsp;&amp;nbsp; &amp;nbsp;15335&amp;nbsp;&amp;nbsp; &amp;nbsp;trabajo&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2054&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1930&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2196&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1198&amp;nbsp;&amp;nbsp; &amp;nbsp;4&lt;br&gt;
AN&amp;nbsp;&amp;nbsp; &amp;nbsp;ibc-177565&amp;nbsp;&amp;nbsp; &amp;nbsp;mesinesp-evaluation-00001&amp;nbsp;&amp;nbsp; &amp;nbsp;IBECS&amp;nbsp;&amp;nbsp; &amp;nbsp;33182&amp;nbsp;&amp;nbsp; &amp;nbsp;conocimiento&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2054&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1930&amp;nbsp;&amp;nbsp; &amp;nbsp;0.2196&amp;nbsp;&amp;nbsp; &amp;nbsp;0.1198&amp;nbsp;&amp;nbsp; &amp;nbsp;7&lt;br&gt;
&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For citation&amp;nbsp;and a detailed description of the Challenge, please cite:&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Anastasios, Nentidis and Anastasia, Krithara and Konstantinos, Bougiatiotis and Martin, Krallinger and Carlos, Rodriguez-Penagos and Marta, Villegas and Georgios, Paliouras.&lt;/em&gt; &lt;strong&gt;Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering&lt;/strong&gt; (2020). Proceedings of the Eleventh International Conference of the CLEF Association (CLEF 2020).&amp;nbsp;Thessaloniki, Greece, September 22--25&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Funded by the Plan de Impulso de las Tecnolog&amp;iacute;as del Lenguaje (Plan TL).&lt;/strong&gt;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3946557</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3946558</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
126
22
views
downloads
All versions This version
Views 126126
Downloads 2222
Data volume 837.3 MB837.3 MB
Unique views 112112
Unique downloads 1717

Share

Cite as