Multi-LEX: a database of multi-word frequencies (English files)
Description
Written word frequency is a key variable used in many psycholinguistic studies and is central in explaining visual word recognition. Indeed, methodological advances on single word frequency estimates have helped to uncover novel language-related cognitive processes, fostering new ideas and studies. In an attempt to support and promote research on a related emerging topic, visual multi-word recognition, we extracted from the exhaustive Google Ngram datasets a selection of millions of multi-word sequences and computed their associated frequency estimate. Such sequences are presented with Part-of-Speech information for each individual word. An online behavioral investigation making use of the French 4-gram lexicon in a grammatical decision task was carried out. The results show an item-level frequency effect of word sequences. Moreover, the proposed datasets were found useful during the stimulus selection phase, allowing more precise control of the multi-word characteristics.
Files
Files
(1.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:3630ceca9bf961b75da0d9dada397230
|
1.6 MB | Download |
|
md5:184245a20028bfb5a2a594c361afc8b9
|
188.7 MB | Download |
|
md5:c92fb1b72dddbff6528e1532380274d1
|
29.2 MB | Download |
|
md5:0059c58ba2ff833ced73e60da2995faa
|
1.6 MB | Download |
|
md5:e4511f0329787c63395d87a426ef9792
|
365.6 MB | Download |
|
md5:ff980ba76479c20c14423c00af329936
|
32.6 MB | Download |
|
md5:6b52395f4d59b068a2ca38753143a9b7
|
1.5 MB | Download |
|
md5:05c759f9338f1a7bc29f93db12ba3c81
|
353.0 MB | Download |
|
md5:3cfc5ef3bbb6f7cea8913c0040483b34
|
34.0 MB | Download |
|
md5:429d5e4a5a7341da3d42283aca6bb8e6
|
1.4 MB | Download |
|
md5:33ec44d9b5e6fc0b44f6288087de7dcf
|
252.4 MB | Download |
|
md5:f0fd696a9548f94e79baca780da8d7e3
|
35.8 MB | Download |
Additional details
Funding
- European Commission
- POP-R - Parallel Orthographic Processing and Reading 742141
- Agence Nationale de la Recherche
- O-codeReader - Parallel orthographic processing and multi-word reading ANR-15-CE33-0002
- Agence Nationale de la Recherche
- ILCB - ILCB: Institute of Language Communication and the Brain ANR-16-CONV-0002