What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents

doi:10.1007/978-3-319-99133-7_19

Published August 7, 2018 | Version v1

Conference paper Open

What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents

1. Department of Computer Science, Kiel University, Kiel, Germany
2. Computing Science and Mathematics, University of Stirling, Stirling, Scotland, UK

The vast amount of scientific literature poses a challenge when one is trying to understand a previously unknown topic. Selecting a representative subset of documents that covers most of the desired content can solve this challenge by presenting the user a small subset of documents. We build on existing research on representative subset extraction and apply it in an information retrieval setting. Our document selection process consists of three steps: computation of the document representations, clustering, and selection of documents. We implement and compare two different document representations, two different clustering algorithms, and three different selection methods using a coverage and a redundancy metric. We execute our 36 experiments on two datasets, with 10 sample queries each, from different domains. The results show that there is no clear favorite and that we need to ask the question whether coverage and redundancy are sufficient for evaluating representative subsets.

Files

2018_TIR_Beck_et_al.pdf

Files (761.9 kB)

Name	Size	Download all
2018_TIR_Beck_et_al.pdf md5:51cd0d42cbd0468d43b2c5de5d6dcd2c	761.9 kB	Preview Download

Additional details

MOVING – Training towards a society of data-savvy information professionals to enable open leadership innovation 693092: European Commission

	All versions	This version
Views	248	239
Downloads	160	157
Data volume	130.3 MB	128.0 MB

What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents

Creators

Description

Files

2018_TIR_Beck_et_al.pdf

Files (761.9 kB)

Additional details

Funding