Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11)

Burrows, Steven; Potthast, Martin; Stein, Benno; Eiselt, Andreas

doi:10.5281/zenodo.3251771

Published June 1, 2013 | Version v1

Dataset Open

Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11)

1. Bauhaus-Universität Weimar

The Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11) contains 7,859 candidate paraphrases obtained from Mechanical Turk crowdsourcing. The corpus is made up of 4,067 accepted paraphrases, 3,792 rejected non-paraphrases, and the original texts. These samples have formed part of PAN 2010 international plagiarism detection competition, but were not previously available separate to rest of the competition data.

We provide the dataset as a single folder in a Zip archive. Each paraphrase is represented by three files, containing the original text (e.g.: "1-original.txt"), the paraphrase text (e.g.: "1-paraphrase.txt"), and a file containing metadata (e.g.: "1-metadata.txt"), with information about the task identifier, task author identifier, time taken, and whether the paraphrase was accepted or rejected.

Files

Webis-CPC-11.zip

Files (19.5 MB)

Name	Size	Download all
Webis-CPC-11.zip md5:c772ea22769389d78aebe525b85c2359	19.5 MB	Preview Download

Additional details

teven Burrows, Martin Potthast, and Benno Stein. Paraphrase Acquisition via Crowdsourcing and Machine Learning. Transactions on Intelligent Systems and Technology (ACM TIST), 4 (3) : 43:1-43:21, June 2013

Citations

Oops! Something went wrong while fetching results.

Views

549

Downloads

Show more details

	All versions	This version
Views	1,163	1,161
Downloads	549	548
Data volume	11.7 GB	11.7 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

Transactions on Intelligent Systems and Technology (ACM TIST)

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 21, 2019
Modified: June 11, 2022

Webis Crowd Paraphrase Corpus 2011 (Webis-CPC-11)

Creators

Description

Files

Webis-CPC-11.zip

Files (19.5 MB)

Additional details

References