Pre-trained Embeddings for Entity Resolution: An Experimental Analysis

Alexandros Zeakis; George Papadakis; Dimitrios Skoutas; Manolis Koubarakis

doi:10.14778/3598581.3598594

Published June 26, 2023 | Version v1

Journal article Open

Pre-trained Embeddings for Entity Resolution: An Experimental Analysis

1. National and Kapodistrian University of Athens & Athena RC
2. National and Kapodistrian University of Athens
3. Athena RC

Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving language models to improve effectiveness. This is applied to both main steps of ER, i.e., blocking and matching. Several pre-trained embeddings have been tested, with the most popular ones being fastText and variants of the BERT model. However, there is no detailed analysis of their pros and cons. To cover this gap, we perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching. Our experimental results provide novel insights into the strengths and weaknesses of the main language models, facilitating researchers and practitioners to select the most suitable ones in practice.

Files

p2225-skoutas.pdf

Files (851.0 kB)

Name	Size	Download all
p2225-skoutas.pdf md5:546842bf27ed9d1160327d5c2cce9eb7	851.0 kB	Preview Download

Additional details

European Commission
STELAR - Spatio-TEmporal Linked data tools for the AgRi-food data space 101070122

	All versions	This version
Views	143	142
Downloads	238	235
Data volume	211.9 MB	209.3 MB

Pre-trained Embeddings for Entity Resolution: An Experimental Analysis

Authors/Creators

Description

Files

p2225-skoutas.pdf

Files (851.0 kB)

Additional details

Funding