The University of Pittsburgh English Language Institute Corpus (PELIC)

Alan Juffs; Na-Rae Han; Ben Naismith

doi:10.5281/zenodo.3991977

Published August 19, 2020 | Version 1.0

Dataset Open

The University of Pittsburgh English Language Institute Corpus (PELIC)

1. University of Pittsburgh

This is the first public release of the dataset from the University of Pittsburgh English Language Institute Corpus (PELIC). PELIC is a publicly available 4.2-million-word learner corpus of written texts. These texts were collected in an English for Academic Purposes (EAP) context over seven years in the University of Pittsburgh’s Intensive English Program and were produced by over 1100 students with a wide range of linguistic backgrounds and proficiency levels. PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting. In addition to the data, the PELIC repository contains corpus statistics and tutorials on how to access and analyze the data.

Notes

Corpus homepage: https://eli-data-mining-group.github.io/Pitt-ELI-Corpus/

Files

ELI-Data-Mining-Group/PELIC-dataset-v1.0.zip

Files (230.7 kB)

Name	Size	Download all
ELI-Data-Mining-Group/PELIC-dataset-v1.0.zip md5:6fe0a2f5c551b13cab17f827d6590cb8	230.7 kB	Preview Download

Additional details

Is supplement to: https://github.com/ELI-Data-Mining-Group/PELIC-dataset/tree/v1.0 (URL)

U.S. National Science Foundation
Toward a Decade of PSLC Research: Investigating Instructional, Social, and Learner Factors in Robust Learning through Data-Driven Analysis and Modeling 0836012

	All versions	This version
Views	1,697	1,390
Downloads	143	85
Data volume	95.6 MB	20.3 MB

ELI-Data-Mining-Group/PELIC-dataset-v1.0.zip

Files (230.7 kB)

Related works

Funding

The University of Pittsburgh English Language Institute Corpus (PELIC)

Authors/Creators

Description

Notes

Files

ELI-Data-Mining-Group/PELIC-dataset-v1.0.zip

Files (230.7 kB)

Additional details

Related works

Funding