Toulouse Campus Surveillance Dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views

Malon, Thierry; Roman-Jimenez, Geoffrey; Guyot, Patrice; Chambon, Sylvie; Charvillat, Vincent; Crouzil, Alain; Péninon, André; Pinquier, Julien; Sèdes, Florence; Sénac, Christine

The Toulouse Campus surveillance Dataset, named ToCaDa, contains two sets of 25 temporally synchronized videos corresponding to two scripted scenarios.

With the help of about 50 persons (actors and camera holders), these videos were shot on July 17th 2017 at 9:50 a.m. and 11:04 a.m. respectively.

Among the cameras:
• 9 were located inside the main building and shot from the windows at different floors. All these cameras are focusing the car park and the path leading to the main entrance of the building with large overlapping fields of view.
• 8 were located in front of the building and filmed it with large overlapping fields of view.
• 8 cameras were arranged further, scattered around the university campus. Each of their views is disjoint from all the others.

About 20 actors were asked to follow two realistic scenarios by performing scripted actions, like driving a car, walking, entering or leaving a building, or holding an item in hand while being filmed.

In addition to ordinary actions, some suspicious behaviors are present.


Due to the wide variety of devices used during the shooting of the two scenarios, issues were encountered on some cameras, leading to videos where a few seconds are lacking. To ensure temporal synchronization between videos, black frames were added on the missing intervals of time. We list these particular videos and their lacking times below:

F1C3: the first 66 seconds are missing.
F1C5: the first 2 seconds are missing.
F1C8: the first 3 seconds are missing.
F1C13: the first 10 seconds are missing.
F1C15: the first second is missing.
F1C19: the first second is missing.
F2C1: the video is accelerated and only lasts a few seconds. We thus did not provide it.
F2C6: lacks from 4:01 to 4:12 and from 4:25 to 4:28.
F2C16: lack from 5:15 to 5:26.

Some videos were recorded with mobile devices whose pixel resolution was lower than 1920 x 1080:

F1C3 and F2C3: pixel resolution is 1280 x 720.
F1C4 and F2C4: pixel resolution is 640 x 480.
F1C15 and F2C15: pixel resolution is 1280 x 720.
F1C20 and F2C20: pixel resolution is 1440 x 1080.

More detailed information about the position of the cameras can be found on the following link:

T. Malon, G. Roman-Jimenez, P. Guyot, S. Chambon, V. Charvillat, A. Crouzil, A. Péninou, J. Pinquier, F. Sèdes and C. Sénac, Toulouse campus surveillance dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views, ACM Multimedia Systems Conference, 2018.

Version 2.0.0 contains: - 25 videos for scenario 1 and 24 videos for scenario 2. For each video, different resolutions are available: full resolution (1920x1080), medium resolution (960x540) and low resolution (640x360). Post treatment was applied on videos to blur faces of people and car plates; - audio annotation files for both scenarios; - visual annotation files for both scenarios.
  • Malon, Thierry et al. (2018). Toulouse campus surveillance dataset: scenarios, soundtracks, synchronized videos with overlapping and disjoint views. In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 393-398).

Cite as