Dataset Open Access

A Computational Theory for the Emergence of Grammatical Categories in Cortical Dynamics

Dario Dematties; Silvio Rizzi; George K. Thiruvathukal; Alejandro Javier Wainselboim; Bonifacio Silvano Zanutto; Mauricio D. Perez

The file Corpora.txt keeps the corpus used to train the model and the different instances of the classifier. It is basically a text file with one sentence per line from the original corpus called test.tsv available at https://github.com/google-research-datasets/wiki-split.git. We eliminated punctuation marks and special characters from the original file putting each sentence per line.

Enju_Output.txt holds the outputs generated by Enju in -so mode (Output in stand-off format) using Corpora.txt as input. This file has basically a natural language English per-sentence parse with a wide-coverage probabilistic for HPSG grammar.

The file Supervision.txt keeps the grammatical tags of the corpus. This file holds a tag per word and each tag is situated in a single line. Sentences are separated by one empty line while tags from words in the same sentence are located in adjacent lines.

The file Word_Category.txt carries the coarse-grained word category information needed by the model and introduced in it by apical dendrites. Each word in the corpus has a word-category tag which provides additional constraints to those provided by lateral dendrites. This file contains a tag per word and each tag is situated in a single line. Sentences are separated by one empty line while tags from words in the same sentence are located in adjacent lines.

The file SynSemTests.xlsx keeps all the grammar classification results as well as the statistical analysis in the classification tests.

Files (112.8 MB)
Name Size
Corpora.txt
md5:fdcb80d7affb09ff9529e7333269bb21
1.8 MB Download
Data_Availability_Statement
md5:05cf39d1e6a64e0d0f6eedff24e46e4f
2.7 kB Download
Enju_Output.txt
md5:b4f1d7c433811c8671e2203593f8b5e9
107.2 MB Download
Frontiers_Supplementary_Material.pdf
md5:679ba37769254bfba93a09a42859b313
204.8 kB Download
IndividualTaggingPerformance.xlsx
md5:c43c01dd892649b7e1d6123d24bb5845
38.4 kB Download
ModelsComparison.xlsx
md5:f23b401cf2d9de17eca46940190a6e94
130.6 kB Download
README
md5:f100551ad364d19e657ef77ceccfa17f
1.5 kB Download
Supervision.txt
md5:eef32667038a07a32cde6c1c4434b166
1.8 MB Download
SynSemTests.xlsx
md5:f933f1623c3f4b1131821a7d0a426d2c
24.6 kB Download
Word_Category.txt
md5:8a395f94246ecd0a2203d7335f17f070
1.6 MB Download
149
3,341
views
downloads
All versions This version
Views 14993
Downloads 3,3411,737
Data volume 10.9 GB5.8 GB
Unique views 11578
Unique downloads 2,9721,518

Share

Cite as