Published October 17, 2022 | Version Version 1
Dataset Open

Multi-LEX: a database of multi-word frequencies (English files)

  • 1. CNRS & Aix-Marseille University

Description

Written word frequency is a key variable used in many psycholinguistic studies and is central in explaining visual word recognition. Indeed, methodological advances on single word frequency estimates have helped to uncover novel language-related cognitive processes, fostering new ideas and studies. In an attempt to support and promote research on a related emerging topic, visual multi-word recognition, we extracted from the exhaustive Google Ngram datasets a selection of millions of multi-word sequences and computed their associated frequency estimate. Such sequences are presented with Part-of-Speech information for each individual word. An online behavioral investigation making use of the French 4-gram lexicon in a grammatical decision task was carried out. The results show an item-level frequency effect of word sequences. Moreover, the proposed datasets were found useful during the stimulus selection phase, allowing more precise control of the multi-word characteristics.

Files

Files (1.3 GB)

Name Size Download all
md5:3630ceca9bf961b75da0d9dada397230
1.6 MB Download
md5:184245a20028bfb5a2a594c361afc8b9
188.7 MB Download
md5:c92fb1b72dddbff6528e1532380274d1
29.2 MB Download
md5:0059c58ba2ff833ced73e60da2995faa
1.6 MB Download
md5:e4511f0329787c63395d87a426ef9792
365.6 MB Download
md5:ff980ba76479c20c14423c00af329936
32.6 MB Download
md5:6b52395f4d59b068a2ca38753143a9b7
1.5 MB Download
md5:05c759f9338f1a7bc29f93db12ba3c81
353.0 MB Download
md5:3cfc5ef3bbb6f7cea8913c0040483b34
34.0 MB Download
md5:429d5e4a5a7341da3d42283aca6bb8e6
1.4 MB Download
md5:33ec44d9b5e6fc0b44f6288087de7dcf
252.4 MB Download
md5:f0fd696a9548f94e79baca780da8d7e3
35.8 MB Download

Additional details

Funding

European Commission
POP-R - Parallel Orthographic Processing and Reading 742141
Agence Nationale de la Recherche
O-codeReader - Parallel orthographic processing and multi-word reading ANR-15-CE33-0002
Agence Nationale de la Recherche
ILCB - ILCB: Institute of Language Communication and the Brain ANR-16-CONV-0002