Published June 30, 2019
| Version v1
Conference paper
Open
Towards a Multi-view Language Representation: A Shared Space of Discrete and Continuous Language Features
Creators
Description
Linguistic typology databases contain valuable knowledge of the distinguishing properties of different languages. Typically they contain sparse discrete features that are difficult to integrate into computational methods, and dense task-learned language vectors have emerged in response. To join both worlds, we compute a shared space between discrete (binary) and continuous features using canonical correlation analysis. We evaluate the new language representation against a concatenation baseline in typological feature prediction and in phylogenetic inference, obtaining promising results to explore further.
Files
_TyP_NLP_2019__Towards_a_Multi_view_Language_Representation.pdf
Files
(190.3 kB)
Name | Size | Download all |
---|---|---|
md5:0d407619b9768f55c8b39852eb95e8f1
|
190.3 kB | Preview Download |