Automating data-cleaning and documentation of extracted data using interactive R-markdown notebooks

Steph Zimsen

doi:10.5281/zenodo.6093178

Published February 22, 2022 | Version v1

Presentation Open

Automating data-cleaning and documentation of extracted data using interactive R-markdown notebooks

Steph Zimsen¹

1. University of Washington

At the Institute for Health Metrics and Evaluation, we conduct ~40 systematic reviews each year. In our general process to search > screen > extract > analyze, we found we need an intervening step: cleaning extracted data before analysis. The problem arises from a feature of our workflow: one person extracts the data, while another analyzes. Clean-up falls through the gap as we hand off data. Analysts must then spend time cleaning, though the extractor is far more familiar with the dataset. To work faster with fewer errors, we developed a stepwise cleaning checklist, then wrote code modules to fix common problems. But juggling Excel and R and a checklist still takes time and attention. To streamline further, we are developing a systematic solution: an interactive R-markdown notebook to take in parameters of the specific extraction dataset; clean and validate the data; and return a new cleaned dataset. We are testing with a recent systematic review dataset of ~2800 observations from >150 sources. This semi-automated interactive code has other benefits besides valid, upload-ready analysis data. First, a flexible, parameterized template enables faster work, easily repeated. Also, the code can reproducibly make documentation of cleaning done, or extraction history, or other reports on data, parameters, and results. And critically, an interactive notebook makes sophisticated coding accessible to data extractors, who tend to have less coding experience than research analysts.

Files

Steph Zimsen.mp4

Files (1.3 GB)

Name	Size	Download all
Steph Zimsen.mp4 md5:62a3068972517f4ae124536937bc3eb8	1.3 GB	Preview Download

Additional details

Is derived from: Presentation: https://youtu.be/H64Bw6FvnMw (URL)

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	70	70
Downloads	15	15
Data volume	26.7 GB	26.7 GB

Automating data-cleaning and documentation of extracted data using interactive R-markdown notebooks

Creators

Description

Files

Steph Zimsen.mp4

Files (1.3 GB)

Additional details

Related works