[LS2N_IPI_Salient360] A dataset of head and eye movements for 360° videos
Creators
Description
Datasets & Toolbox
The following datasets and tools have been made available to those interested in developing and benchmarking their models:
- Training dataset: A dataset containing 360º images and videos and their corresponding ground-truth saliency maps and scan-paths (according to the different types of models), so you can train and tune your algorithms as necessary, and may also compute the benchmark scores as a reference for yourself.
- Toolbox: Scripts to parse the provided data and to compute metrics for comparing saliency maps and scan-paths to assess the performance of the models
The dataset is structured as follows:
- Stimuli: 19 omnidirectional videos of 20 seconds in equi-rectangular format, 85 omnidirectional images in equi-rectangular format.
- H: Folder containing the saliency maps and scanpaths from head-only movements.
- HE: Folder containing the saliency maps and scanpaths from head and eye movements.
- Tools: Python scripts to parse the saliency-map binary files, and to compute saliency and scanpanth measures.
The details about the saliency map files and the scanpath files are:
- Saliency maps from head-only movements: Binary files representing the saliency-map sequences are provided. These sequences contain one saliency map per frame with a resolution of 2048x1024. In a binary file, the saliency values (float32) are organized row-wise and one frame after the other. For each sampled head position, the center of the viewport is considered. Then, an isotropic 3.34-degree Gaussian foveation filter centered in the view-port is applied.
- Scanpaths from head-only movements: Text files are provided with scanpaths from head movement with 100 samples per observer. Each line contains a vector that indicates the fixation index, longitude, latitude and fixation timestamp, respectively. The fixation index is incremented serially for a particular observer and resets to 0 when we reach the next observer, after all of the fixations of the given observer are reported. The fixation starting time is indicated in seconds, and latitude and longitude positions are normalized between 0 and 1 (so they should be multiplied according to the resolution of the desired equi-rectangular image output dimension).
- Saliency maps from head and eye movements: Binary files representing the saliency-map sequences. These sequences contain one saliency map per frame with a resolution of 2048x1024. In a binary file, the saliency values (float32) are organized row-wise and one frame after the other. For each eye fixation, an isotropic 2-degree gaussian foveation filter centered at the fixation position is applied. This process is applied to the fixations from both left and right eyes, and then combined in the final saliency map.
- Scanpaths from head and eye movements: Text files are provided with the scanpaths from both left and right eyes. Each line contains a vector that indicates the fixation index, longitude, latitude and fixation timestamp, duration, start frame and end frame, respectively. The fixation index is incremented serially for a particular observer and resets to 0 when we reach the next observer, after all of the fixations of the given observer are reported. The fixation starting time is indicated in seconds, and latitude and longitude positions are normalized between 0 and 1 (so they should be multiplied according to the resolution of the desired equi-rectangular image output dimension).
Experiment
Image
The head mounted display (HMD) Oculus-DK2 was used for this test. It has a frame refresh rate of 75Hz, resolution of 960x1080 per eye and a total viewing angle of 100x100 degrees. The gyroscopic sensors within the device are able to transmit the orientation data at a rate equal to the device frame refresh rate. A small eye-tracking camera from Sensomotoric Instruments (SMI) was integrated into the device and was able to transmit eye-tracking data binocularly at 60Hz.
The software setup included a custom build unity software along with the Oculus-DK2 driver version 2.0. The software had a feature to check for calibration accuracy every two minutes and re-calibrated each time if necessary.
A total of 63 observers in the age group of 19-52 participated in the test. Observers were tested for visual acuity using the Snellen Test and their dominant eye was also determined using the cardboard technique.
To maintain a natural (free-viewing like) gaze pattern, subjects were made to view the scene normally without the need to provide explicit quantitative measurements. They were instructed to watch the scene as normally as possible with a combination of head and eye-movement. Observers were also free to stop the test anytime in case they felt fatigued or had a sensation of vertigo. There were five images used as a training for the observers before starting the actual test.
A total of 60 stimuli were shown to the observers in a sequence. Each stimuli lasted for 25 seconds and there was a 5 second gray screen between two stimuli. Every two minutes there was a calibration performed to check the accuracy of the eye-tracker. The test itself lasted for about 35 minutes and the observers had a pause of 5 minutes at the half point of the experiment. The observers were themselves seated comfortably in a turn-chair and were free to rotate the full 360 degrees and also move the chair within the room if necessary. The position of each 360 image was reset to the equirectangular image center at the start of each viewing (irrespective of their position). This was done to ensure that all observers start at the same starting position in the panorama.
Video
360-degree videos were displayed in a VR headset (HTC VIVE) equipped with an SMI eye-tracker. The HTC VIVE headset allows sampling of scenes by approximately 110-degrees horizontal by 110-degrees vertical field of view (1080x1200 pixels per eye) monocularly at 90 frames per second. The eye-tracker samples gaze data at 250Hz with a precision of 0.2 degrees. A custom Unity3D scene was created to display videos.
57 participants were recruited (25 women; age 19 to 44, mean: 25.7 years), normal or corrected-to-normal vision was verified and dominant eye of all observers was checked. All 19 videos were observed by all observers for their entire duration (20 seconds).
Observers were told to freely explore 360-degrees videos as naturally as possible while wearing a VR headset. Videos were played without audio. In order to let participants safely explore the full 360-degrees field of view, we chose to have them seat in a rolling chair.
Participants started exploring omnidirectional contents either from an implicit longitudinal center (0-degrees and center of the equirectangular projection) or from the opposite longitude (180-degrees). Videos were observed in both rotation modalities by at least 28 participants each. We controlled observers starting longitudinal position in the scene by offsetting the content longitudinal position at stimuli onset, making sure participants started exploring 360-degrees scenes at exactly 0-degrees, or 180-degrees of longitude according to the modality. Video order and starting position modalities were cross-randomized for all participants.
Observers started the experimentation by an eye-tracker calibration, repeated every 5 videos to make sure that eye-tracker's accuracy does not degrade. the total duration of the test was less than 20 minutes.
Files
acm-mmsys-2018.pdf
Files
(20.8 GB)
Name | Size | Download all |
---|---|---|
md5:bac1bf1b378ca9229263dba9e4dade0c
|
798.9 kB | Preview Download |
md5:b54043907c749d226e320c7bbbb6f3f4
|
531.1 MB | Download |
md5:b52bda65ee8e2fb0222c5430bda3aa9c
|
591.7 MB | Download |
md5:ac1a471e058177705ea96120af19ff2c
|
841.6 MB | Download |
md5:c7f0c11d578ca476f345edbae0462ad8
|
9.6 kB | Preview Download |
md5:75f4600fb82d7db9f01e27b832147335
|
3.5 MB | Preview Download |
md5:191f149243a4ed5dc9f7b2e88241b5c4
|
5.7 kB | Preview Download |
md5:7daa9a42dbe27b9fc8887029f8f7dc15
|
5.1 kB | Preview Download |
md5:bc26c2f5f120c5cd2595433c303887df
|
11.2 GB | Download |
md5:1f19c79041b1cd4ca3bc3860945840d3
|
7.2 GB | Download |
md5:40e71d061c52a882add7362042c9d400
|
502.4 MB | Preview Download |
md5:a75f0b43efcb80fcc3ade14a0e49b06c
|
8.5 kB | Preview Download |