Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

doi:10.5281/zenodo.4529274

Published February 10, 2021 | Version v1

Conference paper Open

Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

1. Universitat Pompeu Fabra

Concept extraction is crucial for a number of downstream applications. However, surprisingly enough, straightforward single token/nominal chunk–concept alignment or dictionary lookup techniques such as DBpedia Spotlight still prevail. We propose a generic open domain-oriented extractive model that is based on distant supervision of a pointer–generator network leveraging bidirectional LSTMs and a copy mechanism and that is able to cope with the out-of-vocabulary phenomenon. The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages, and tested on regular pages, where the pointers to other pages are considered as ground truth concepts. The outcome of the experiments shows that our model significantly outperforms standard techniques and, when used on top of DBpedia Spotlight, further improves its performance. The experiments furthermore show that the model can be readily ported to other datasets on which it equally achieves a state-of-the-art performance.

Files

2008.11295.pdf

Files (942.1 kB)

Name	Size	Download all
2008.11295.pdf md5:a34fbd75d7cbcb2490980631f0a120db	942.1 kB	Preview Download

Additional details

CONNEXIONs – InterCONnected NEXt-Generation Immersive IoT Platform of Crime and Terrorism DetectiON, PredictiON, InvestigatiON, and PreventiON Services 786731: European Commission

	All versions	This version
Views	108	107
Downloads	141	140
Data volume	134.7 MB	133.8 MB

Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

Creators

Description

Files

2008.11295.pdf

Files (942.1 kB)

Additional details

Funding