UpStory: the Uppsala Storytelling dataset
Description
The UpStory dataset is an anonymized child-child interaction dataset, with an experimental manipulation for the level of rapport. It contains data pertaining to pairs of classmates (ages 8-10) playing a storytelling game in a naturalistic setting; pairs are selected to either promote close and friendly interactions (high-rapport condition), or promote distant interactions between acquaintances (low-rapport condition). Due to the experimental design, most children participated in two pairs: one high-rapport and one low-rapport.
A copy of this text is included in the ZIP file.
Dataset Contents
The dataset contains data for 35 pairs. Each pair is given an ID starting with P
(high-rapport condition) or N
(low-rapport condition), followed by the academic year (2 or 3), and 2 additional digits. E.g.: N251 is a low-rapport pair from year 2; P318 is a high-rapport pair from year 3. Similarly, each child is given a 2-digit ID. E.g.: child 17 participated in pairs P245 and N255.
Each pair played between 1 and 5 rounds of the game. Each round is provided as an individual sample, with its own associated time series as CSV files. In total, 106 rounds are provided.
Pair Information
The top-level CSV file pair-info.csv
offers pair-level information, including the following items:
pair_id
: the pair ID, as described above.condition
: the experimental condition this pair belonged to (low_rapport
orhigh_rapport
).distance
: the distance between the two participants in their year's friendship network (integer in range2 <= n <=56
for Year 2 pairs, and2 <= n <= 20
for Year 3 pairs).year
: academic year the children belonged to (2
or3
).rounds
: number of game rounds the pair played (1 <= n <= 5
).child_1
: first child in the pair (lower ID; 2 digits).child_2
: second child in the pair (higher ID; 2 digits).
Sources
The dataset contains time-series data extracted from two different video sources, each one overviewing the play area from one side: the left-camera
and right-camera
. Each video source has its own top-level folder, with data extracted from that source inside it.
In each source folder, you will find CSV files named <source>-<pair_id>-round-<round_number>-<face|pose>.csv
(e.g., left-camera-N249-round-1-face.csv
). There is a separate file for each round of the game; each pair typically played ~3 rounds (min: 1, max: 5). As the names suggest, face
files contain information related to head pose and facial expression, while pose
files contain information related to full body pose.
Face Data
Face data was extracted with OpenFace, and contains most information that is produced by the tool. See the OpenFace documentation for more details. Time series are given at 25Hz; entries are indexed by frame
(0-indexed) and child_id
. Included data:
confidence
andsuccess
indicators.- Per-eye gaze 3D vectors.
- Joint gaze angle.
- Eye landmark information in 2D (frame position in pixels) and 3D (estimated distances).
- Head position and rotation information in 3D.
- Face keypoint locations in 2D (frame position in pixels) and 3D (estimated distances).
- AU presence estimates for 18 AUs (binary variables: 0 or 1).
- AU intensity estimates for 17 AUs (continuous variables: 0 to 5).
Pose Data
Pose data was extracted with OpenPose. Time series are given at 25Hz; entries are indexed by frame
(0-indexed), child_id,
and joint
(named body part that the row refers to). Data provided per row:
x
: horizontal position in the frame, in pixels, left-to-right (float; range 0-width).y
: vertical position in the frame, in pixels, top-to-bottom (float; range 0-height).confidence
: OpenPose's reported prediction confidence (float; range 0-1).
Files
upstory.zip
Files
(1.6 GB)
Name | Size | Download all |
---|---|---|
md5:5400a3c0d2b5b79e64e565cfeb75ef6e
|
1.6 GB | Preview Download |
Additional details
Funding
- Swedish Research Council
- ELECTRA 020-03167
Software
- Repository URL
- https://github.com/MarcFraile/dyadic-storytelling