Published November 19, 2020 | Version v2
Dataset Open

Database of Parenthetic Biomedical Abbreviations

  • 1. Faculty of Medicine of Sfax, University of Sfax, Sfax, Tunisia
  • 2. Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia

Description

This dataset includes the biomedical abbreviations stated between parentheses in the titles of the scholarly publications indexed by PubMed between 1947 and 2019. Each abbreviation is extracted thanks to the parenthetic level count algorithm and is assigned to the title, PMID and year of publication of each corresponding research paper. Then, every acronym is allocated its length and the number of upper and lower case letters it involves. Finally, the entities including one or no upper case letter, less than three characters, eight characters or more, or a high rate of non-alphanumeric characters are semi-automatically eliminated to ensure the consistency of the research database.

Notes

To cite the work: Turki, H., Hadj Taieb, M. A., & Ben Aouicha, M. (2020). Enhancing filter-based parenthetic abbreviation extraction methods. Journal of the American Medical Informatics Association. doi:10.1093/jamia/ocaa314.

Files

abbrev6.csv

Files (91.8 MB)

Name Size Download all
md5:c4cb92b97066840ee9a3e227504b134c
91.8 MB Preview Download