Published December 24, 2021 | Version 1.05
Dataset Open

Inputlog Copy Task Corpus: Exploring and defining typing skills

Description

Context

One of the components that is included in the keystroke logging program Inputlog (https://www.inputlog.net) is the Copy Task component. It consists of a multi-layered set of tasks that measure a person's typing skill:

Tapping task press the ‘d’ and ‘k’ key alternatively during 15 s
Sentence copy a sentence during 30 s
Word combination 1 copy a combination of three words seven times
Word combination 2 copy a combination of three words seven times
Word combination 3 copy a combination of three words seven times
Word combination 4 copy a combination of three words seven times
Consonant groups copy four blocks of six consonants once

The task is currently made available in twelve languages. 

For more information: https://doi.org/10.5334/jors.234 

 

Interactive Dashboard
Visit the webpage with an interactive dashboard to explore, filter, and download the +5K copy task corpus.

websitehttps://www.inputlog.net/copy-task/
dashboardhttps://inputlog-analysis.uantwerpen.be/expert

 

Corpus

We are happy to make a multilingual corpus available (open access) that currently consists of more than 5000 copy tasks. 

  • The + 5K corpus is carefully cleaned and fully anonymized.
  • The Shiny interface allows users to filter the corpus based on about 10 variables.
  • The selection can be downloaded in different formats and levels of aggregation (from raw idfx to synthesized analysis).
  • The selection can be explored using different interactive graph visualizations.
  • Researchers can upload their own corpus (or single copy task file) and compare it to the (selected) corpus.
  • An extra webpage is designed for laypersons wanting to take a copy task to test their typing skills. They get dashboard feedback in a user-friendly and attractive way and can compare their performance with (age-related) participants in the corpus. (Specially designed to further expand the corpus).

Facts and Figures
Some facts and figures about the corpus' composition:

Languages:

  • Dutch     3130 files
  • English    1163 files
  • German     281 files
  • French       201 files
  • Other         378 file

Gender

  • Female:       3495 files
  • Male:           1276 files
  • X or missing  382 files

Age

  • 15-         439 files
  • 16-20   1591 files
  • 21-25    2427 files
  • 26-35      478 files
  • 36-45      126 files
  • 46+          230 files

A subset of the total corpus has been uploaded here. The subset contains a dataset of about 500 tests (English | 21-25-year-olds).

 

Notes

The corpus interface is available on: https://inputlog-analysis.uantwerpen.be/expert

Files

sub-dataset_EN_21-25year.zip

Files (336.6 kB)

Name Size Download all
md5:096cdbd402d4f9ba784a3f335f9fbc1a
336.6 kB Preview Download

Additional details

References