Artifact for "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention"

Zhang, Yifan; Huang, Chen; Zhang, Yueke; Zhang, Jiahao; Li, Toby; McMillan, Collin; Leach, Kevin; Huang, Yu

doi:10.5281/zenodo.17205682

Published September 26, 2025 | Version v2

Model Open

Artifact for "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention"

1. Vanderbilt University
2. National University of Singapore
3. University of Notre Dame

Companion artifact for the ACL 2026 paper "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention" by Yifan Zhang, Chen Huang, Yueke Zhang, Jiahao Zhang, Toby Li, Collin McMillan, Kevin Leach, and Yu Huang.

EyeMulator aligns code language models with human visual attention. Eye-tracking data is distilled into a small set of reusable priors (Beta distributions over semantic token classes, plus n-gram transition counts), pseudo-scan paths are generated from those priors over arbitrary code, and the model is trained with a weighted cross-entropy loss combined with a token-level preference loss.

This deposit contains the distilled human-attention priors, a small demonstration dataset with the same schema as a full-scale training set, a reference PyTorch implementation of the method components (Algorithm 1), the human-side figures from the paper, and a short analysis script that reproduces the descriptive statistics reported under "RQ1: Artifact Distillation".

Contents:

priors/{combined,reading,writing}/ — distilled Beta parameters and monogram / bigram / trigram transition counts.
dataset_sample/ — 30 examples per split per task (completion, summarization, translation) in the full-scale JSONL schema.
figures/ — the six human/method-side PDF figures from the paper (study design, method overview, pseudo-path, Beta parameters, Beta densities, semantic category distribution).
docs/data_schema.md, docs/method_integration.md, docs/human_attention_analysis.md — field-by-field format, integration guide, and distribution-analysis walkthrough.
example/analyze_human_attention.py, example/compute_token_weights.py, example/weighted_sft_template.py — standard-library and PyTorch reference implementations.

Origin of the eye-tracking data. All priors in this release are derived from the EyeTrans corpus collected by Zhang et al., 2024 (EyeTrans: Merging Human and Machine Attention for Neural Code Summarization, FSE'24), in studies conducted at the University of Notre Dame under the appropriate IRB protocols.

Primary source. The actively maintained version of this artifact lives on GitHub: https://github.com/CoderDoge1108/EyeMulator. This Zenodo deposit is an archival snapshot intended for long-term citability.

License. Code (example/) is released under the MIT License; data and documentation (priors/, dataset_sample/, figures/, docs/) are released under CC-BY-4.0. The underlying eye-tracking data originates from Zhang et al., EyeTrans (FSE'24) — please credit that source as well.

Citing. Please cite both the EyeMulator paper and the EyeTrans dataset. BibTeX is provided in CITATION.bib.

Files

README.md

Files (2.4 MB)

Name	Size	Download all
EyeMulator-HumanArtifact.zip md5:1012887dfc2e13e863088809b2f8c9ab	2.4 MB	Preview Download
README.md md5:d7b782db29d5f5a5ed76df1844e8bced	6.3 kB	Preview Download

	All versions	This version
Views	115	36
Downloads	21	10
Data volume	314.8 MB	12.0 MB

Artifact for "EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention"

Authors/Creators

Description

Files

README.md

Files (2.4 MB)