Published November 1, 2021 | Version v1
Conference paper Open

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

  • 1. Sapienza University of Rome
  • 2. University of Copenhagen
  • 3. University of Milano-Bicocca

Description

With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which increases ambiguity and makes their contextualization harder, and ii) performing neural ranking on non-English documents is still cumbersome due to shortage of labeled datasets. In this paper we present SIR (Sense-enhanced Information Retrieval) to mitigate both problems by leveraging word sense information. At the core of our approach lies a novel multilingual query expansion mechanism based on Word Sense Disambiguation that provides sense definitions as additional semantic information for the query. Importantly, we use senses as a bridge across languages, thus allowing our model to perform considerably better than its supervised and unsupervised alternatives across French, German, Italian and Spanish languages on several CLEF benchmarks, while being trained on English Robust04 data only. We release SIR at https://github.com/SapienzaNLP/sir.

Files

EMNLP2021_Blloshmietal.pdf

Files (693.6 kB)

Name Size Download all
md5:bac65053806b8da7f670404156f08ef8
693.6 kB Preview Download

Additional details

Funding

European Commission
ELEXIS - European Lexicographic Infrastructure 731015