Published October 13, 2019 | Version v1
Working paper Open

Doing phonology in the age of big data

  • 1. Cornell University
  • 2. University of Georgia


The central question we address in this paper is how to do phonology in an emerging era of big data. The more specific question we explore is how to better use naturalistic corpus data to study phonology. We support the growing trends that are expanding the range of phenomena phonologists investigate, and enhancing the richness of detail with which investigations are conducted.

Presenting case studies from English, Indonesian, and Romanian, we argue that the use of corpus data necessarily follows from the goals of the generative enterprise. At the same time, experimental and laboratory investigations are crucial to fully and systematically explore both phonological patterns and individual speaker differences, as we show with case studies of English, Italian, and Catalan.

We advocate for an iterative model of phonological analysis integrating careful data elicitation with both corpus analysis and experimental methods.


This working paper is copyrighted, and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) - see



Files (1.2 MB)

Name Size Download all
1.2 MB Preview Download