Published February 18, 2020 | Version v1
Journal article Open

WordNet2Vec: Corpora Agnostic Word Vectorization Method

  • 1. Wroclaw University of Science and Technology

Description

The complex nature of big data resources requires new structuring methods, especially for textual content. WordNet is a good knowledge source for the comprehensive abstraction of natural language as it offers good implementation for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism, WordNet2Vec, is proposed in this paper. This creates vectors for each word from WordNet. These vectors encapsulate a general position | the role of a given word related to all other words in the given natural language. Any list or set of such vectors contains knowledge about the context of its components within the whole language. This type of word representation can be easily applied to many analytic tasks such as classification or clustering. The usefulness of the WordNet2Vec method is demonstrated in sentiment analysis including the classification of an Amazon opinion text dataset with transfer learning.

Files

NeuroComputing___Graph_based_word_representation_in_vector_space_revision_1_final.pdf

Additional details

Funding

European Commission
ENGINE - European research centre of Network intelliGence for INnovation Enhancement 316097
European Commission
RENOIR - Reverse EngiNeering of sOcial Information pRocessing 691152