Published July 3, 2024 | Version 1.0.0
Dataset Open

UpStory: the Uppsala Storytelling dataset

  • 1. ROR icon Uppsala University

Description

The UpStory dataset is an anonymized child-child interaction dataset, with an experimental manipulation for the level of rapport. It contains data pertaining to pairs of classmates (ages 8-10) playing a storytelling game in a naturalistic setting; pairs are selected to either promote close and friendly interactions (high-rapport condition), or promote distant interactions between acquaintances (low-rapport condition). Due to the experimental design, most children participated in two pairs: one high-rapport and one low-rapport.

A copy of this text is included in the ZIP file.

Dataset Contents

The dataset contains data for 35 pairs. Each pair is given an ID starting with P (high-rapport condition) or N (low-rapport condition), followed by the academic year (2 or 3), and 2 additional digits. E.g.: N251 is a low-rapport pair from year 2; P318 is a high-rapport pair from year 3. Similarly, each child is given a 2-digit ID. E.g.: child 17 participated in pairs P245 and N255.

Each pair played between 1 and 5 rounds of the game. Each round is provided as an individual sample, with its own associated time series as CSV files. In total, 106 rounds are provided.

Pair Information

The top-level CSV file pair-info.csv offers pair-level information, including the following items:

  • pair_id: the pair ID, as described above.
  • condition: the experimental condition this pair belonged to (low_rapport or high_rapport).
  • distance: the distance between the two participants in their year's friendship network (integer in range 2 <= n <=56 for Year 2 pairs, and 2 <= n <= 20 for Year 3 pairs).
  • year: academic year the children belonged to (2 or 3).
  • rounds: number of game rounds the pair played (1 <= n <= 5).
  • child_1: first child in the pair (lower ID; 2 digits).
  • child_2: second child in the pair (higher ID; 2 digits).

Sources

The dataset contains time-series data extracted from two different video sources, each one overviewing the play area from one side: the left-camera and right-camera. Each video source has its own top-level folder, with data extracted from that source inside it.

In each source folder, you will find CSV files named <source>-<pair_id>-round-<round_number>-<face|pose>.csv (e.g., left-camera-N249-round-1-face.csv). There is a separate file for each round of the game; each pair typically played ~3 rounds (min: 1, max: 5). As the names suggest, face files contain information related to head pose and facial expression, while pose files contain information related to full body pose.

Face Data

Face data was extracted with OpenFace, and contains most information that is produced by the tool. See the OpenFace documentation for more details. Time series are given at 25Hz; entries are indexed by frame (0-indexed) and child_id. Included data:

  • confidence and success indicators.
  • Per-eye gaze 3D vectors.
  • Joint gaze angle.
  • Eye landmark information in 2D (frame position in pixels) and 3D (estimated distances).
  • Head position and rotation information in 3D.
  • Face keypoint locations in 2D (frame position in pixels) and 3D (estimated distances).
  • AU presence estimates for 18 AUs (binary variables: 0 or 1).
  • AU intensity estimates for 17 AUs (continuous variables: 0 to 5).

Pose Data

Pose data was extracted with OpenPose. Time series are given at 25Hz; entries are indexed by frame (0-indexed), child_id, and joint (named body part that the row refers to). Data provided per row:

  • x: horizontal position in the frame, in pixels, left-to-right (float; range 0-width).
  • y: vertical position in the frame, in pixels, top-to-bottom (float; range 0-height).
  • confidence: OpenPose's reported prediction confidence (float; range 0-1).

Files

upstory.zip

Files (1.6 GB)

Name Size Download all
md5:5400a3c0d2b5b79e64e565cfeb75ef6e
1.6 GB Preview Download

Additional details

Funding

Swedish Research Council
ELECTRA 020-03167