Published February 10, 2021 | Version v1
Conference paper Open

Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

  • 1. Universitat Pompeu Fabra

Description

Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.

Files

2008.11295.pdf

Files (942.1 kB)

Name Size Download all
md5:a34fbd75d7cbcb2490980631f0a120db
942.1 kB Preview Download

Additional details

Funding

CONNEXIONs – InterCONnected NEXt-Generation Immersive IoT Platform of Crime and Terrorism DetectiON, PredictiON, InvestigatiON, and PreventiON Services 786731
European Commission