Published June 30, 2019 | Version v1
Conference paper Open

Towards a Multi-view Language Representation: A Shared Space of Discrete and Continuous Language Features

Description

Linguistic typology databases contain valuable knowledge of the distinguishing properties of different languages. Typically they contain sparse discrete features that are difficult to integrate into computational methods, and dense task-learned language vectors have emerged in response. To join both worlds, we compute a shared space between discrete (binary) and continuous features using canonical correlation analysis. We evaluate the new language representation against a concatenation baseline in typological feature prediction and in phylogenetic inference, obtaining promising results to explore further.

Files

_TyP_NLP_2019__Towards_a_Multi_view_Language_Representation.pdf

Files (190.3 kB)

Additional details

Funding

GoURMET – Global Under-Resourced MEedia Translation 825299
European Commission