Conference paper Open Access

Effective Unsupervised Author Disambiguation with Relative Frequencies

Backes, Tobias

This work addresses the problem of author name homonymy in the
Web of Science. Aiming for an efficient, simple and straightforward
solution, we introduce a novel probabilistic similarity measure for
author name disambiguation based on feature overlap. Using the
researcher-ID available for a subset of the Web of Science, we evalu-
ate the application of this measure in the context of agglomeratively
clustering author mentions. We focus on a concise evaluation that
shows clearly for which problem setups and at which time during
the clustering process our approach works best. In contrast to most
other works in this field, we are skeptical towards the performance
of author name disambiguation methods in general and compare
our approach to the trivial single-cluster baseline. Our results are
presented separately for each correct clustering size as we can
explain that, when treating all cases together, the trivial baseline
and more sophisticated approaches are hardly distinguishable in
terms of evaluation results. Our model shows state-of-the-art per-
formance for all correct clustering sizes without any discriminative
training and with tuning only one convergence parameter.

Files (1.5 MB)
Name Size
1.5 MB Download
Views 16
Downloads 18
Data volume 27.3 MB
Unique views 14
Unique downloads 17


Cite as