Published May 28, 2026 | Version v1
Dataset Open

Data and Analyses from Accessibility Evaluation of entry microstructure from Merriam-Webster (MW), Diccionario de la Lengua Española (DLE) and Digitales Wörterbuch der deutschen Sprache (DWDS)

  • 1. ROR icon Universidad de Salamanca

Description

File Descriptions

1. SourceWebData.zip

Raw multimodal captures of the three dictionary entries evaluated in the study: Merriam-Webster (MW) say, DLE decir, and DWDS sagen. Each entry is represented in three complementary formats: live DOM (HTML post-JavaScript execution), accessibility tree (a11y tree), and high-resolution screenshot (PNG). Generated using Playwright. Provides the identical source material supplied to all WAET and LLM evaluations across Phases 1–3.

2. IssueClassification.xlsx

Master classification spreadsheet of all 78 valid issues across all evaluation phases. Seven sheets: Summary (complete issue inventory with detection source and tool-specific findings), WCAG Analysis (criterion-level and principle-level detection rates), Design Patterns (detection rates by lexicographical design category), and three dictionary-specific sheets (MW, DLE, DWDS). Provides the unified analytical framework linking all evaluation conditions.

A Google Sheets version of the Issue Classification spreadsheet is also available.

3. Phase1_AXEreports.zip

Structured tool output from Axe DevTools and Axe-Guided (Pro) automated evaluations. Includes JSON reports of all detections (standard rules, guided tests), processed CSV summaries, and XML violations mapped to WCAG 2.2 criteria. Covers all three dictionaries. Raw data for Phase 1 syntactic baseline assessment.

4. Phase1_WAETAnalyses.zip

Human-curated Word documents (.docx) documenting WAVE and AXE Dev Tools output for all three dictionaries. Each document contains interface screenshots, extracted issues, WCAG criterion mappings, and expert annotations contextualising findings within lexicographical design. Complements automated JSON/CSV Axe reports with expert interpretation.

5. Phase1_0shotLLM_prompt+reply.zip

Complete zero-shot LLM evaluation logs from Phase 1. Contains standardised blind prompt and full responses from Gemini 3.1, Claude Sonnet 4.6, and ChatGPT 5.3 Instant for each of the three dictionary entries. Captures raw generative output prior to expert coding or filtering. Raw data for LLM zero-shot baseline assessment.

6. Phase3a_LLM_LitPrompt+reply.zip

Complete literature-augmented LLM evaluation logs from Phase 3. Contains knowledge-augmented prompts embedding five key scholarly references and full responses from Gemini and Claude for each dictionary entry. Includes both DLE-Full session (with DLE-specific user study) and DLE-Without session (without it).

7. Phase3b_LLM_DOPrompt+reply.zip

Complete design-oriented LLM evaluation logs from Phase 3. Contains two-prong hybrid prompt (structured design-category checklist + persona-based scenario simulation for comprehension/production/translation use types) and full responses from Gemini and Claude for each dictionary entry.

Format Summary

  • SourceWebData.zip: HTML, accessibility tree text, PNG screenshots
  • IssueClassification.xlsx: Excel spreadsheet with seven sheets
  • Phase1_AXEreports.zip: JSON, CSV, XML
  • Phase1_WAETAnalyses.zip: Word documents (.docx) with embedded screenshots
  • Phase1_0shotLLM_prompt+reply.zip: Word documents + Text/markdown prompts.
  • Phase3a_LLM_LitPrompt+reply.zip: Word documents + Text/markdown prompts.
  • Phase3b_LLM_DOPrompt+reply.zip: Word documents + Text/markdown prompts.

Files

SourceWebData.zip

Files (12.3 MB)

Name Size Download all
md5:3f350871b8aff88790db55fc82e1e6a4
78.1 kB Download
md5:9d9357e9d7bd2a915e36d03002728912
212.8 kB Preview Download
md5:a9a9b86dc04a0d9b5d1e8afff5d8cb61
420.9 kB Preview Download
md5:b0cc442f941b2e7b13562de828aa29c6
3.4 MB Preview Download
md5:565a2e818846dd6234625383f7033d10
4.3 MB Preview Download
md5:f4e46a49b0d00b0e9318aebb06f949fb
183.3 kB Preview Download
md5:69dce8b1d4365ac5160f068554cc0e92
3.7 MB Preview Download

Additional details

Funding

Ministerio de Ciencia, Innovación y Universidades
PReLemma (Parámetros para recursos léxicos multilingües más accesibles), Proyectos de Generación de Conocimiento PID2022-137210OB-I00