Published September 19, 2019 | Version 4
Dataset Open

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

  • 1. University of Texas at Austin
  • 2. Carnegie Mellon University

Description

Version 4 of the dataset is available (Sep 19 2019)!

Note this version has significantly more data than Version 2. 

Dataset description paper (full version) is available!

https://arxiv.org/pdf/1903.06754.pdf (updated Sep 7 2019)

Tools for visualizing the data is available!

https://github.com/corgiTrax/Gaze-Data-Processor

 

=========================== Dataset Description ===========================

We provide a large-scale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semi-frame-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. For every game frame, its corresponding image frame, the human keystroke action, the reaction time to make that action, the gaze positions, and immediate reward returned by the environment were recorded.

 

Q & A: Why frame-by-frame game mode?

Resolving state-action mismatch: Closed-loop human visuomotor reaction time is around 250-300 milliseconds. Therefore, during gameplay, state (image) and action that are simultaneously recorded at time step t could be mismatched. Action at time t could be intended for a state 250-300ms ago. This effect causes a serious issue for supervised learning algorithms, since label at and input st are no longer matched. Frame-by-frame game play ensures states and actions are matched at every timestep.

Maximizing human performance: Frame-by-frame mode makes gameplay more relaxing and reduces fatigue, which could normally result in blinking and would corrupt eye-tracking data. More importantly, this design reduces sub-optimal decisions caused by inattentive blindness.

Highlighting critical states that require multiple eye movements: Human decision time and all eye movements were recorded at every frame. The states that could lead to a large reward or penalty, or the ones that require sophisticated planning, will take longer and require multiple eye movements for the player to make a decision. Stopping gameplay means that the observer can use eye-movements to resolve complex situations. This is important because if the algorithm is going to learn from eye-movements it must contain all “relevant” eye-movements.

 

============================ Readme ============================

1. meta_data.csv: meta data for the dataset., including:

  • GameName: String. Game name. e.g., “alien” indicates the trial is collected for game Alien (15 min time limit). “alien_highscore” is the trajectory collected from the best player’s highest score (2 hour limit). See dataset description paper for details.

  • trial_id: Integer. One can use this number to locate the associated .tar.bz2 file and label file.

  • subject_id: Char. Human subject identifiers.

  • load_trial: Integer. 0 indicates that the game starts from scratch. If this field is non-zero, it means that the current trial continues from a saved trial. The number indicates the trial number to look for.

  • highest_score: Integer. The highest game score obtained from this trial.

  • total_frame: Number of image frames in the .tar.bz2 repository.

  • total_game_play_time: Integer. game time in ms. 

  • total_episode: Integer. number of episodes in the current trial. An episode terminates when all lives are consumed.

  • avg_error: Float. Average eye-tracking validation error at the end of each trial in visual degree (1 visual degree = 1.44 cm in our experiment). See our paper for the calibration/validation process.

  • max_error: Float. Max eye-tracking validation error. 

  • low_sample_rate: Percentage. Percentage of frames with less than 10 gaze samples. The most common reason for this is blinking.

  • frame_averaging: Boolean. The game engine allows one to turn this on or off. When turning on (TRUE), two consecutive frames are averaged, this alleviates screen flickering in some games.

  • fps: Integer. Frame per second when an action key is held down.

 

2. [game_name].zip files: these include data for each game, including:

*.tar.bz2 files: contains game image frames. The filename indicates its trial number.

*.txt files: label file for each trial, including:

  • frame_id: String. The ID of a frame, can be used to locate the corresponding image frame in .tar.bz2 file.

  • episode_id: Integer (not available for some trials). Episode number, starting from 0 for each trial. A trial could contain a single trial or multiple trials.

  • score: Integer (not available for some trials). Current game score for that frame.

  • duration(ms): Integer. Time elapsed until the human player made a decision. 

  • unclipped_reward: Integer. Immediate reward returned by the game engine.

  • action: Integer. See action_enums.txt for the mapping. This is consistent with the Arcade Learning Environment setup.

  • gaze_positions: Null/A list of integers: x0,y0,x1,y1,...,xn,yn. Gaze positions for the current frame. Could be null if no gaze. (0,0) is the top-left corner. x: horizontal axis. y: vertical.

 

3.  action_enums.txt: contains integer to action mapping defined by the Arcade Learning Environment. 

 

============================ Citation ============================

If you use the Atari-HEAD in your research, we ask that you please cite the following:

@misc{zhang2019atarihead,

    title={Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset},

    author={Ruohan Zhang and Calen Walshe and Zhuode Liu and Lin Guan and Karl S. Muller and Jake A. Whritner and Luxin Zhang and Mary M. Hayhoe and Dana H. Ballard},

    year={2019},

    eprint={1903.06754},

    archivePrefix={arXiv},

    primaryClass={cs.LG}

}

Zhang, Ruohan, Zhuode Liu, Luxin Zhang, Jake A. Whritner, Karl S. Muller, Mary M. Hayhoe, and Dana H. Ballard. "AGIL: Learning attention from human for visuomotor tasks." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 663-679. 2018.

@inproceedings{zhang2018agil,

  title={AGIL: Learning attention from human for visuomotor tasks},

  author={Zhang, Ruohan and Liu, Zhuode and Zhang, Luxin and Whritner, Jake A and Muller, Karl S and Hayhoe, Mary M and Ballard, Dana H},

  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},

  pages={663--679},

  year={2018}

}



 

Files

action_enums.txt

Files (8.7 GB)

Name Size Download all
md5:f234b8bb2eeadf3dbee9fdbfa47f29a5
536 Bytes Preview Download
md5:f84a0fd37c5b1707184b3d15eeb82ffc
470.6 MB Preview Download
md5:74081aa529eb5bb47f176a1235bbb16a
347.2 MB Preview Download
md5:e53241d6142a0daa2854c2141c11af95
486.3 MB Preview Download
md5:58672b984d38d8aac1961862135f5329
295.4 MB Preview Download
md5:e675deaef229ce5c3dda73a5af7b7868
132.5 MB Preview Download
md5:1790f842907cf42f8e7585ec95d909f5
317.9 MB Preview Download
md5:e6dc52377e4f254066cac7d607ccda54
309.5 MB Preview Download
md5:970ffce235119e9127fc54f7a5d28ab0
780.8 MB Preview Download
md5:212644857bb65df66f9dddcdf7910384
285.5 MB Preview Download
md5:2ffec6446b4f26b3459ea7708a7b129a
657.2 MB Preview Download
md5:31222d8f1bcc166c779a46bd07973e75
591.9 MB Preview Download
md5:c2821a803b6761b9a7bcb186a7988c8f
31.5 kB Preview Download
md5:3ca8dc5729ba4d396d0da4eda4825bef
431.3 MB Preview Download
md5:1e78addb65d01c5945991289a05183aa
534.0 MB Preview Download
md5:01bb6934dac27cb4ea12931670659f33
455.7 MB Preview Download
md5:46b8e7cd45c99fb73701eec5ccfa4ba4
361.3 MB Preview Download
md5:9e8f695f1dd5e23a506c27ac507aa7c5
618.3 MB Preview Download
md5:3e5ec6ac673bb0a7510d20cf3de77244
494.5 MB Preview Download
md5:98b7e25e0bfa675056dbe36bcd818c76
521.3 MB Preview Download
md5:f6cce997e6b27b5d82ad92eb119cfae4
355.9 MB Preview Download
md5:b80e7d153fed37c37cd5791e881a0515
279.6 MB Preview Download