Hypernyms extracted from a large text corpus using Hearst lexical-syntactic patterns

Alexander Panchenko

doi:10.5281/zenodo.3234817

Published May 29, 2019 | Version v1

Dataset Open

Hypernyms extracted from a large text corpus using Hearst lexical-syntactic patterns

Alexander Panchenko¹

1. University of Hamburg

The list of hyponym-hypernym pairs was obtained by applying lexical-syntactic patterns described in Hearst (1992) on the corpus prepared by Panchenko et al. (2016). This corpus is a concatenation of the English Wikipedia (2016 dump), Gigaword, ukWaC and English news corpora from the Leipzig Corpora Collection. The lexical-syntactic patterns proposed by Marti Hearst (1992) and further extended and implemented in the form of FSTs by Panchenko et al. (2012) for extracting (noisy) hyponym-hypernym pairs are as follows -- (i) such NP as NP, NP[,] and/or NP; (ii) NP such as NP, NP[,] and/or NP; (iii) NP, NP [,] or other NP; (iv) NP, NP [,] and other NP; (v) NP, including NP, NP [,] and/or NP; (vi) NP, especially NP, NP [,] and/or NP. Pattern extraction on the corpus yields a list of 27.6 million hyponym-hypernym pairs along with the frequency of their occurrence in the corpus.

Files

Files (213.2 MB)

Name	Size	Download all
en_ps59g.csv.gz md5:76aaa66b92deaeab117747b2d89cf102	213.2 MB	Download

648

Views

120

Downloads

Show more details

	All versions	This version
Views	648	648
Downloads	120	120
Data volume	28.1 GB	28.1 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 30, 2019
Modified: January 24, 2020

Hypernyms extracted from a large text corpus using Hearst lexical-syntactic patterns

Authors/Creators

Description

Files

Files (213.2 MB)