Santa Cruz Ellipsis Consortium Sluicing Dataset

Pranav Anand; Jim McCloskey; Dan Hardt

doi:10.5281/zenodo.1739702

Published November 30, 2018 | Version 1.0

Dataset Restricted

Santa Cruz Ellipsis Consortium Sluicing Dataset

1. UCSC

This is release 1.0 of the Santa Cruz Ellipsis Consortium Sluicing Dataset, made possible by funding from the UC Santa Cruz
Institute for Humanities Research, Committee on Research, as well as NSF funding for "The Implicit Content of Sluicing".

The data comprises roughly 5000 instances of sluicing (and some related constructions) extracted from the New York Times subset of the Gigaword dataset, from years 1994 to 2000. The sluices were located by a combination of parsetree patterns and regular expressions, and we believe them to be comprehensive for those years.

The sluices were then annotated over two years by teams of 5-6 annotators, and a final adjudication over 2018 has led to the resulting dataset.

The sluices are annotated for antecedent, paraphrased elided content, and potential mismatches between the sluice paraphrase and antecedent.

In this release, there are three directories:
   Doc: More information about the extracting of the data, the annotation process, and the tagset used is available in the Data directory of the release.
   Data: The annotated data, presented as a series of jsons
   Explorer: a lightweight script for traversing the json data

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

Please provide contact information and information about how you plan to use the dataset.

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	727	727
Downloads	26	26
Data volume	232.3 MB	232.3 MB

Santa Cruz Ellipsis Consortium Sluicing Dataset

Creators

Description

Files

Restricted

Request access