There is a newer version of the record available.

Published February 8, 2021 | Version 1.3.0
Software Open

TopicTracker: a Python pipeline to search, download and explore PubMed entries

  • 1. University of Zurich - Institute of Biomedical Ethics and History of Medicine

Description

TopicTracker is a Python pipeline intended to streamline and simplify the retrieval and exploration of large amounts of PubMed entries. The software is divided into three Jupyter notebooks: 1. Search and download; 2. Content analyser; 3. Interactive data exploration. 

The first notebook allows to build PubMed queries, download entries, parse them and save them to a .csv file. It takes as input a PubMed query, and outputs a dataset (i.e: a folder containing a PubMed export, its metadata saved in the log file, and the Medline file for eventually importing the references you are analysing in Zotero or similar software). The functions for searching, downloading and parsing are written in a different module in order to simplify adaptations for other projects if need be. The output of the first notebook can be explored with the second and third notebooks of this collection.

The second notebook allows to analyse the trends of entities over time. It takes as input a dataset (i.e: a folder containing a PubMed export generated with the first notebook of this collection, its metadata, and the Medline file) and it outputs a set of .csv files and .svg plots with the trends of keywords, MeSH terms, authors, journals, lemmas in Title/Abstract, amount of COI statements, lemma trends in COI statements. The .csv files can then be explored further with the third notebook of this collection.

The third notebook allows fully interactive exploration of the datasets preprocessed with the second notebook. You can select a dataset to work with, a set of entities to explore, and plot any entity or combination of entities.

Dependencies (and versions) are listed in every notebook. A couple of toy datasets are provided.

New in v 1.3:

- Managed some more exceptions
- Updated some libraries
- Optimized the creation of the medline file in notebook 1

To do in v1.4: 
- understand why the PubMed APIs work so strangely with the PDAT tag
- manage exceptions (=empty files -> empty dfs) in the tabs of notebook 3

Files

TopicTracker v1.3.zip

Files (29.0 MB)

Name Size Download all
md5:14b84764c4f4a32ff3d64ff06cc7261e
29.0 MB Preview Download

Additional details

References

  • 10.1016/j.heliyon.2020.e04426