Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

Davis, Forrest; van Schijndel, Marten

doi:10.5281/zenodo.3778994

Published April 30, 2020 | Version v1

Dataset Open

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

1. Cornell University

This repository contains the raw results (by word information-theoretic measures for the experimental stimuli) and the LSTM models analyzed in Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment. The models from the synthetic experiments are given in the synthetic archive, as well as the training data generation script. There is a README included that gives more details for recreating/evaluating results from those experiments.

The naming convention for each model in the models directory is:
[Language]_hidden[Hidden Units]_batch[Batch Size]_dropout[Dropout Rate]_lr[Learning Rate]_[Model Number].pt

Language: en for English and es for Spanish
Hidden Units: All models had two layers with 650 hidden units per layer
Batch Size: The size of the batch (128 for English, 64 for Spanish)
Dropout Rate: All models used a dropout rate of 0.2
Learning Rate: All models has a learning rate of 20
Model Number: Identifier of the model (English model 0 is the best model from Gulordava et al. (2018))

Files

Files (5.4 GB)

Name	Size
models.tar.gz md5:01ef23e1a95175f01bc32c55be3513fd	1.6 GB	Download
raw_results.tar.gz md5:c62dc12445cbed42a4aaad3d02254632	380.3 MB	Download
synthetic.tar.gz md5:29d2366dab3c2b2c1194cadfdf022b50	3.4 GB	Download

	All versions	This version
Views	382	379
Downloads	123	122
Data volume	254.2 GB	253.8 GB

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

Authors/Creators

Description

Files

Files (5.4 GB)