Published September 19, 2023 | Version v1
Dataset Open

ActiveHuman Part 1

  • 1. Aristotle University of Thessaloniki (AUTh)

Contributors

Researcher:

  • 1. Aristotle University of Thessaloniki (AUTh)

Description

This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.

Dataset Description

ActiveHuman was generated using Unity's Perception package.

It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).

The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.

Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

 

Folder configuration

The dataset consists of 3 folders:

  • JSON Data: Contains all the generated JSON files.
  • RGB Images: Contains the generated RGB images.
  • Semantic Segmentation Images: Contains the generated semantic segmentation images.

 

Essential Terminology

  • Annotation: Recorded data describing a single capture.
  • Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g.  PNG, JPG, etc.).
  • Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor).
  • Ego coordinate system: Coordinates with respect to the ego.
  • Global coordinate system: Coordinates with respect to the global origin in Unity.
  • Sensor: Device that captures the dataset (in this instance the sensor is a camera).
  • Sensor coordinate system: Coordinates with respect to the sensor.
  • Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital.
  • UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

 

Dataset Data

The dataset includes 4 types of JSON annotation files files:

  • annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
    • id: Integer identifier of the annotation's definition.
    • name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation).
    • description: Description of the annotation's specifications.
    • format: Format of the file containing the annotation specifications (e.g., json, PNG).
    • spec: Format-specific specifications for the annotation values generated by each Labeler.

 

Most Labelers generate different annotation specifications in the spec key-value pair:

  • BoundingBox2DLabeler/BoundingBox3DLabeler:
    • label_id: Integer identifier of a label.
    • label_name: String identifier of a label.
  • KeypointLabeler:
    • template_id: Keypoint template UUID.
    • template_name: Name of the keypoint template.
    • key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
      • label: Joint label.
      • index: Joint index.
      • color: RGBA values of the keypoint.
      • color_code: Hex color code of the keypoint
    • skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
      • label1: Label of the first joint.
      • label2: Label of the second joint.
      • joint1: Index of the first joint.
      • joint2: Index of the second joint.
      • color: RGBA values of the connection.
      • color_code: Hex color code of the connection.
  • SemanticSegmentationLabeler:
    • label_name: String identifier of a label.
    • pixel_value: RGBA values of the label.
    • color_code: Hex color code of the label.

 

  • captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
    • id: UUID of the capture.
    • sequence_id: UUID of the sequence.
    • step: Index of the capture within a sequence.
    • timestamp: Timestamp (in ms) since the beginning of a sequence.
    • sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
      • sensor_id: Sensor UUID.
      • ego_id: Ego UUID.
      • modality: Modality of the sensor (e.g., camera, radar).
      • translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system.
      • rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system.
      • camera_intrinsic:  matrix containing (if it exists) the camera's  intrinsic calibration.
      • projection: Projection type used by the camera (e.g., orthographic, perspective).
    • ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
      • ego_id: Ego UUID.
      • translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system.
      • rotation: Quaternion variable containing the ego's orientation.
      • velocity: 3D vector containing the ego's velocity (in meters per second).
      • acceleration: 3D vector containing the ego's acceleration (in ).
    • format: Format of the file captured by the sensor (e.g., PNG, JPG).
    • annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
      • id: Annotation UUID .
      • annotation_definition: Integer identifier of the annotation's definition.
      • filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image.
      • values: List of key-value pairs containing annotation data for the current Labeler.

 

Each Labeler generates different annotation specifications in the values key-value pair:

  • BoundingBox2DLabeler:
    • label_id: Integer identifier of a label.
    • label_name: String identifier of a label.
    • instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
    • x: Position of the 2D bounding box on the X axis.
    • y: Position of the 2D bounding box position on the Y axis.
    • width: Width of the 2D bounding box.
    • height: Height of the 2D bounding box.
  • BoundingBox3DLabeler:
    • label_id: Integer identifier of a label.
    • label_name: String identifier of a label.
    • instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
    • translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters).
    • size: 3D vector containing the size of the 3D bounding box (in meters)
    • rotation: Quaternion variable containing the orientation of the 3D bounding box.
    • velocity: 3D vector containing the velocity of the 3D bounding box (in meters per second).
    • acceleration: 3D vector containing the acceleration of the 3D bounding box acceleration (in ).
  • KeypointLabeler:
    • label_id: Integer identifier of a label.
    • instance_id: UUID of one instance of a joint. Keypoints with the same joint label that are visible on the same capture have different instance_id values.
    • template_id: UUID of the keypoint template.
    • pose: Pose label for that particular capture.
    • keypoints: Array containing the properties of each keypoint. Each keypoint that exists in the keypoint template file is one element of the array. Each entry's contents have as follows:
      • index: Index of the keypoint in the keypoint template file.
      • x: Pixel coordinates of the keypoint on the X axis.
      • y: Pixel coordinates of the keypoint on the Y axis.
      • state: State of the keypoint.

 

The SemanticSegmentationLabeler does not contain a values list.

  • egos.json: Contains collections of key-value pairs for each ego. These include:
    • id: UUID of the ego.
    • description: Description of the ego.
  • sensors.json: Contains collections of key-value pairs for all sensors of the simulation. These include:
    • id: UUID of the sensor.
    • ego_id: UUID of the ego on which the sensor is attached.
    • modality: Modality of the sensor (e.g., camera, radar, sonar).
    • description: Description of the sensor (e.g., camera, radar).

 

Image names

The RGB and semantic segmentation images share the same image naming convention. However, the semantic segmentation images also contain the string Semantic_ at the beginning of their filenames.

Each RGB image is named "e_h_l_d_r.jpg", where:

  • e denotes the id of the environment.
  • h denotes the id of the person.
  • l denotes the id of the lighting condition.
  • d denotes the camera distance at which the image was captured.
  • r denotes the camera angle at which the image was captured.

Notes

This is Part 1/2 of the ActiveHuman dataset

Files

JSON Data.zip

Files (23.8 GB)

Name Size Download all
md5:4e1fd5ddca1b3426e20a41b5eef0a789
341.4 MB Preview Download
md5:d0751e2b738dcd7cc667c2291b3d0176
19.8 GB Preview Download
md5:09a038df1eb36a96ae204c2452450867
3.6 GB Preview Download

Additional details

Related works

Is described by
Thesis: 10.13140/RG.2.2.21002.34248 (DOI)