Published August 25, 2024 | Version v2
Dataset Open

CarDA - Car door Assembly Activities Dataset

  • 1. FORTH Institute of Computer Science
  • 2. ROR icon National Technical University of Athens
  • 3. ROR icon Institute of Communication and Computer Systems
  • 1. ROR icon National Technical University of Athens

Description

The CarDA dataset [1] (Car Door Assembly dataset) has been designed and captured to provide a comprehensive, multi-modal resource for analyzing car door assembly activities performed by trained line workers in realistic assembly lines. 

It comprises a set of time-synchronized multi-camera RGB-D videos and human motion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment.

 

Deployment environment:

The use-case scenario concerns a real-world assembly line workplace in an automotive manufacturing industry, as the deployment environment. In this context,
line workers simulate the real car door assembly workflow using the prompts, sequences, and tools under very similar ergonomic and environmental conditions
as in existing factory shop floors.

The assembly line involves a conveyor belt that is separated into three virtually separated work areas that correspond to three assembly workstations. It moves at a low,  constant speed, supporting cart-mounted car doors and material storage. A line worker is assigned to each workstation. All workers assemble car doors as the belt moves, with each station (WS10, WS20, and WS30). A worker completes a workstation-specific set of assembly actions, noted as a task cycle, lasting about 4 minutes before the cart proceeds to the next workstation for further assembly. Upon the successful completion of the task cycle, the cart is left to travel to the virtually defined area of the subsequent workstation where another line worker will continue the assembly process during the new task cycle. Each task cycle lasts approximately 4 minutes and is continuously repeated during the worker’s shift.

Data acquisition:

Data acquisition involves low-cost, passive RGB-D camera sensors that are installed at stationary locations alongside the car door assembly line and a motion
capture system for capturing time-synchronized sequences of images and motion capture data during car door assembly activities performed by real line workers.

Two stationary StereoLabs ZED2 stereo cameras were installed in each of the three workstations of the car door assembly line. The two stationary, workstation-specific cameras are located at bilateral positions on the two sides of the conveyor belt at the center of the area concerning that specific workstation. 

The pair of RGB-D sensors were utilized to acquire stereo color and depth image sequences during car door task cycle executions. Each recording comprises
time-synchronized RGB (color) and depth image sequences captured throughout a task cycle execution at 30 frames per second (fps).

At the same time, the line worker used a wearable XSens MVN Link suit during work activities to acquire time-synced 3D motion capture data at 60 fps. 

Note: Time synchronization between pairs of RGB-D (.svo) recordings (pairs captured during an assembly task cycle simultaneously from the inXX and outXX cameras installed by the wsXX) is guaranteed and relies on the StereoLabs ZED SDK acquisition software. Time synchronization between samples of the RGB-D and mp4 videos (30 fps) and the acquired motion capture data (60 fps) was performed manually with the starting frame/time of the video as a reference time. We have observed some time discrepancies between data samples of the two modalities that might occur after the first 40-50 seconds in some recordings.

 

CarDA Dataset:

The dataset has been split into two subsets, A and B.

Each comprises data acquired at different periods using the same multicamera system in the same manufacturing environment. 

Subset A contains recordings of RGB-D videos, mp4 videos, and 3d human motion capture data (using the XSens MVN Link suit) acquired during car door assembly activities in all three workstations.

Subset B contains recordings of RGB-D videos and mp4 videos acquired during car door assembly activities in all three workstations.

 

CarDA subset Α

It contains:

    • RGB-D was acquired using StereoLabs ZED 2 sensors in .svo format
    • mp4 videos (30fps) extracted from the .svo files (using the left camera of the stereo pair of each camera).
    • 3D human pose data (ground truth) captured using the Movella Xsens MVN Link motion capture system (60 fps) in .bvh format
    • Annotation data (xls file format):
      • Ground truth related to temporal segmentation and classification of car door assembly actions (subgoals) during task cycle executions, performed by personnel working directly on the assembly line for the CarDA dataset.
      • Ground truth data on the duration of basic ergonomic postures based on the EAWS ergonomic screening tool: Two experts in manufacturing and ergonomics performed manual annotations related to the EAWS screening tool. 

     CarDA subset Α files:

    • ws10 - svo - mp4 - bvh.rar
      Five assembly task cycle executions are recorded in WS10 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.
    • ws20 - svo - mp4 - bvh.rar
      Four assembly task cycle executions are recorded in WS20 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.

    • ws30 - svo - mp4 - bvh.rar
      Four assembly task cycle executions are recorded in WS30 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.

 

CarDA subset B 

It contains:

    • RGB-D was acquired using StereoLabs ZED 2 sensors in .svo format
    • mp4 videos (30fps) extracted from the .svo files (using the left camera of the stereo pair of each camera).
    • Annotation data (xls file format):

      • Ground truth related to temporal segmentation and classification of car door assembly actions (subgoals) during task cycle executions, performed by personnel working directly on the assembly line for the CarDA dataset.
      • Ground truth data on the duration of basic ergonomic postures based on the EAWS ergonomic screening tool: Two experts in manufacturing and ergonomics performed manual annotations related to the EAWS screening tool. 

     CarDA subset B files:

    • ws10 - svo - mp4.rar
      Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

    • ws20 - svo - mp4.rar
      Six pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

    • ws30 - svo - mp4.rar
      Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

 

Contact:

Konstantinos Papoutsakis, PhD: papoutsa@ics.forth.gr

Maria Pateraki: mpateraki@mail.ntua.gr
Assistant Professor | National Technical University of Athens
Affiliated Researcher | Institute of Computer Science | FORTH

References:

[1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024.

Files

ECCV_2024_TCAP_HBU.pdf

Files (19.6 GB)

Name Size Download all
md5:43cb9a6c7f95e2b4fedb936c5a54f577
658.6 kB Download
md5:f07c809495b325b42de2002c7eb9b2fb
3.4 MB Preview Download
md5:39b17872b17bb2b7e7585f96887dc470
2.1 GB Download
md5:0ca1ffc001232527e454862c89f747f9
3.1 GB Download
md5:b0925175c4584f5b43e454c4a1298d81
2.1 GB Download
md5:fde6afa0852e64d71bea606cd79968ed
7.4 GB Download
md5:0fd6d73b2b89eea5c27490fe06763352
921.8 MB Download
md5:d6b158d50df4614426b3c7359d98f4a9
4.0 GB Download

Additional details

Funding

European Commission
FELICE - FlExible assembLy manufacturIng with human-robot Collaboration and digital twin modEls 101017151
European Commission
SOPRANO - Socially-Acceptable and Trustworthy Human-Robot Teaming for Agile Industries 101120990

Software

Programming language
Python , C++

References

  • Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024.