Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat

doi:10.5281/zenodo.2574752

Published May 10, 2018 | Version v1

Conference paper Open

Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat

1. LIMSI, CNRS
2. AMC, University of Amsterdam

Systematic reviews in e.g. empirical medicine address research questions by comprehensively examining the entire published literature. Conventionally, manual literature surveys decide inclusion in two steps, first based on abstracts and title, then by full text, yet current methods to automate the process make no distinction between gold data from these two stages. In this work we compare the impact different schemes for choosing positive and negative examples from the different screening stages have on the training of automated systems. We train a ranker using logistic regression and evaluate it on a new gold standard dataset for clinical NLP , and on an existing gold standard dataset for drug class efficacy. The classification and ranking achieves an average AUC of 0.803 and 0.768 when relying on gold standard decisions based on title and abstracts of articles, and an AUC of 0.625 and 0.839 when relying on gold standard decisions based on full text. Our results suggest that it makes little difference which screening stage the gold standard decisions are drawn from, and that the decisions need not be based on the full text. The results further suggest that common-off-the-shelf algorithms can reduce the amount of work required to retrieve relevant literature.

Files

777.pdf

Files (206.0 kB)

Name	Size	Download all
777.pdf md5:4f4fcc8c90a96c61860c867080a55e30	206.0 kB	Preview Download

Additional details

MIROR – Methods in Research on Research 676207: European Commission

	All versions	This version
Views	99	99
Downloads	43	43
Data volume	9.1 MB	9.1 MB

Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat

Creators

Description

Files

777.pdf

Files (206.0 kB)

Additional details

Funding