Abstracts from Lilacs and Ibecs with ICD10 codes
Description
JSON file with abstracts from Lilacs and Ibecs with ICD10 codes (ICD10-CM and ICD10-PCS) associated to them (CIE10 in Spanish).
These databases have MeSH terms describing some of their documents. Then, using UMLS Metathesaurus, those MeSH terms have been translated into ICD10 codes (ICD10-CM and ICD10-PCS). Every abstract have at least one ICD10 code.
In addition, MeSH codes given by the databases (Lilacs and Ibecs) have a "word" describing them. These "words" have been used to add further ICD10 codes. We have done strict string matching to find whether those "words" were a descriptor of any ICD10 code (in the Spanish version, CIE10).
The format of the JSON file is the following:
{'articles': [{'title': 'title', 'pmid': 'pmid', 'abstractText': 'abtract (in Spanish)', 'Mesh': [{'Code': 'MeSHCode', 'Word': 'reference', 'CIE': [CIE10_1, CIE10_2, ...]}, ...] }, ...] }
Corpus statistics:
- There are 176 294 abstracts.
- On average, every abstract has 2,5 associated ICD10 codes.
- There are 3103 unique ICD10 codes (ICD10-CM and ICD10-PCS).
Notes
Files
abstractsWithCIE10.zip
Files
(83.9 MB)
Name | Size | Download all |
---|---|---|
md5:e96ecc06b88b582d7201acc092e9e9a9
|
83.9 MB | Preview Download |