TopicTracker: a Python pipeline to search, download and explore PubMed entries
Creators
- 1. University of Zurich - Institute of Biomedical Ethics and History of Medicine
Description
TopicTracker is a Python pipeline intended to streamline and simplify the retrieval and exploration of large amounts of PubMed entries. The software is divided into three Jupyter notebooks: 1. Search and download; 2. Content analyser; 3. Interactive data exploration.
The first notebook allows to build PubMed queries, download entries, parse them and save them to a .csv file. It takes as input a PubMed query, and outputs a dataset (i.e: a folder containing a PubMed export, its metadata saved in the log file, and the Medline file for eventually importing the references you are analysing in Zotero or similar software). The functions for searching, downloading and parsing are written in a different module in order to simplify adaptations for other projects if need be. The output of the first notebook can be explored with the second and third notebooks of this collection.
The second notebook allows to analyse the trends of entities over time. It takes as input a dataset (i.e: a folder containing a PubMed export generated with the first notebook of this collection, its metadata, and the Medline file) and it outputs a set of .csv files and .svg plots with the trends of keywords, MeSH terms, authors, journals, lemmas in Title/Abstract, amount of COI statements, lemma trends in COI statements. The .csv files can then be explored further with the third notebook of this collection.
The third notebook allows fully interactive exploration of the datasets preprocessed with the second notebook. You can select a dataset to work with, a set of entities to explore, and plot any entity or combination of entities.
Dependencies (and versions) are listed in every notebook. A couple of toy datasets are provided.
New in v 1.3:
- Managed some more exceptions
- Updated some libraries
- Optimized the creation of the medline file in notebook 1
To do in v1.4:
- understand why the PubMed APIs work so strangely with the PDAT tag
- manage exceptions (=empty files -> empty dfs) in the tabs of notebook 3
Files
TopicTracker v1.3.zip
Files
(29.0 MB)
Name | Size | Download all |
---|---|---|
md5:14b84764c4f4a32ff3d64ff06cc7261e
|
29.0 MB | Preview Download |
Additional details
References
- 10.1016/j.heliyon.2020.e04426