Software Open Access

types2: Type and Hapax Accumulation Curves

Suomela, Jukka

types2 is a tool for analysing textual diversity, richness, and productivity in text corpora and other data sets.

With this tool, we can analyse data sets from the perspective of the following statistics:

  • number of words: the total number of running words in the text corpus
  • number of tokens: the words of interest in our study
  • number of types: how many distinct tokens we have seen
  • number of hapaxes: how many tokens have occurred only once

We are usually interested in comparing the number of types or hapaxes vs. the number of words or tokens. With types2, it is possible to analyse the relationship between types, hapaxes, words, and tokens.

The tool can be used for visualisation, statistical hypothesis testing, and exploratory data analysis. In the statistical analysis, we use nonparametric methods (more specifically, Monte Carlo permutation tests). The only modelling assumption is that, under the null hypothesis, individual “samples” are exchangeable.

The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily.

Files (762.5 kB)
Name Size
types-v2-release3.zip
md5:537aa469c20e3b952eee753c6c477da7
762.5 kB Download
158
14
views
downloads
All versions This version
Views 158158
Downloads 1414
Data volume 10.7 MB10.7 MB
Unique views 156156
Unique downloads 1414

Share

Cite as