Dataset Open Access
Martin Krallinger;
Aitor Gonzalez-Agirre;
Alejandro Asensio
Introduction
The Mesinesp (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) development set has a total of 750 records indexed manually by seven experienced medical literature indexers. Indexing is done using DeCS codes, a sort of Spanish equivalent to MeSH terms. Records were distributed in a way that each article was annotated, at least, by two different human indexers.
The data annotation process consisted in two steps:
These annotations were analyzed, resulting in an agreement using the Jaccard index.
Records consisted basically in medical literature abstracts and titles from the IBECS and LILACS databases.
Zip structure
The zip file contains two different development sets:
Corpus format
Each dataset is a JSON object with one single key named "articles", which contains a list of documents. So, the raw format of the file is one line per document plus two additional lines (the first and the last) to enclose that list of documents and the expected type of data is as follows:
{"articles":[
{"abstractText":str,"db":str,"decsCodes":list,"id":str,"journal":str,"title":str,"year":int},
...
]}
To clarify, the order of appearance of the fields in each document is as follows (note that this example it is pretty printed for readability purposes):
{
"articles": [
{
"abstractText": "Content of the abstract",
"db": "Name of the source database",
"decsCodes": [
"code1",
"code2",
"code3"
],
"id": "Id of the document",
"journal": "Name of the journal",
"title": "Title of the document",
"year": 2019
}
]
}
Note: The fields "db", "journal" and "year" might be null.
Name | Size | |
---|---|---|
mesinesp-development-set.zip
md5:58da931670a51b078cc5e193aa9d91e1 |
1.0 MB | Download |
Krallinger M, Krithara A, Nentidis A, Paliouras G, Villegas M. BioASQ at CLEF2020: Large-Scale Biomedical Semantic Indexing and Question Answering. InEuropean Conference on Information Retrieval 2020 Apr 14 (pp. 550-556). Springer, Cham.
All versions | This version | |
---|---|---|
Views | 291 | 291 |
Downloads | 49 | 49 |
Data volume | 50.4 MB | 50.4 MB |
Unique views | 251 | 251 |
Unique downloads | 46 | 46 |