# EGO-MAGIC / PTG-MAGIC DATASET OVERVIEW

Current Release as of April 7th, 2026.


## OVERVIEW

We introduce a new computer vision data set produced in conjunction with the DARPA Perceptually-enabled Task Guidance (PTG) program. 
It is comprised of over 3000 ego-centric labeled videos for activity recognition for point-of-injury trauma care, specifically for combat medicine applications.

The vast majority of the videos were recorded by trained combat medicine instructors who were performing interventions on medical simulators in the same fashion as students who are being trained in these combat medicine skills.

This dataset is being presented in a similar fashion to other challenge datasets, such as EPIC Kitchens and the Trauma THOMPSON challenge.

What distinguishes this dataset from others is the speed at which the steps in the tasks are performed, and the potential for overlap between steps.  

These two confounding properties make it such that there are can be very few distinctive frames with which to make a decision about the present activity. 
Additionally, the rapid motion of the camera and realistic cluttered scene often presents additional computer vision challenges. 

Our dataset has been utilized in the DAPRA PTG program to create real-time augmented reality assistants [ref] to assist novice users in performing these medical skills. 
To introduce this EGO-MAGIC dataset to the public domain for future research we have extracted the activity detection portion from PTG into a stand-alone challenge, which is the focus of this paper.
While results are presented in terms of activity detection (determining start and stop times for a specific activity), this dataset is also amenable to other challenges like activity recognition (given a clip, what activity is being performed), and activity anticipation.

Overall, this dataset presents a number of unique qualities compared to similar egocentric datasets that make this dataset especially challenging: 
1) the skill steps in many skills are very short, often only a second or so (\avgSkillStepDur seconds per step on average)
2) the skills themselves are often quite short, often only a few tens of seconds (\avgSkillDur seconds on average),
3) frequently skill steps can be performed concurrently,
4) many skills have steps that are considered optional, so certain skill steps can be skipped.

Currently the EGO-MAGIC dataset consists of 3355 videos representing 50 skills.  Across the 50 skills we have labeled over 1.95 million objects for 124 object classes, over 17,000 skill step delineations, and over 39,000 hand object interactions. We offer 40 pre-trained YoloV8 models to help start the developer. 


## **DATASET STRUCTURE AND ANNOTATIONS**
 
This section is meant to explain the file structure and purpose of the components of the dataset. There are 50 skills grouped into two sections: **Tier1** and **Tier2**.
The 8 Tier1 skills are more complete and have videos recorded and labeled in-lab as well as the professional videos that exist for all skills. The remaining 42 (Tier2) have varying degrees of materials included in this release; however, they were not used by the PTG community and received less attention.
 
Below is the abstraction of the file system for one skill, M2.
 
```
M2_Tourniquet/
    M2_skill_steps.txt
    raw_videos/
        in-lab/
            README
            20230417/
                20230403_094350_HoloLens.mp4
                20230403_094350_HoloLens.skill_labels_by_frame.txt
                20230403_094735_HoloLens.mp4
                20230403_094735_HoloLens.skill_labels_by_frame.txt
                ...
            ...
        professional/
            README
            M2-1/
                M2-1.action_labels_by_frame.txt
                M2-1.action_labels_by_second.txt
                M2-1.skill_labels_by_frame.txt
                M2-1.skill_labels_by_second.txt
                M2-1.mp4
                M2-1.svo
                <other video info>
            ...
    yolo_model/
        object_names.txt
        LabeledData/
            train/
                train_image_1.jpg
                train_image_1.txt
                train_image_2.jpg
                train_image_2.txt
                ...
            validation/
                validation_image_1.jpg
                validation_image_1.txt
                ...
            test/
                test_image_1.jpg
                test_image_1.txt
                ...
        output/
            <various charts and images of model results>
            weights/
                best.pt
```

### FILE STRUCTURE
We now explain the structure of the data above.
 
`M2_Tourniquet` is the name of the skill.
 
`raw_videos/` is the directory in which we store our data.
 
`professional/` will contain 50+ numbered data sets structured as X_N, where X is the skill ID and N is the recording session number. **Note that not all recordings were successful for inclusion in the dataset, therefore you may encounter session numbers that do not exist in the release**. Items within each session folder include:
 
* The svo file, which is a stereo recording from the Zed camera worn on a helmet during data collection. The svo file can be accessed using the ZED SDK kit available at ZED SDK 3.8 - `https://www.stereolabs.com/developers/release/`
* The mp4 file, which is a movie with video based on the left lens of the svo file, combined with a separate audio recording. The frame rate for the mp4 movies is 30 fps.
* Other video and audio formats, including .avi, .mp3, and .wav.

We then label the video in two ways – by action and by skill-step – and we offer this in terms of frame number and by timestamp. Video activities follow a nominal "recipe" of steps for how the skill is performed, which can be found in the X\_skill_steps.txt file. In the skill-step version we label the start and end time or frame number of each completed step in the recipe. Some steps may be skipped or repeated multiple times. In the action labeling we label what the user is doing with their hands (or limbs) during the skill. With this background, you will find that remaining four files in the directory for session N are (assuming skill M2):

* M2_N.skill_labels_by_frame
* M2_N.skill_labels_by_second
* M2_N.action_labels_by_frame
* M2_N.action_labels_by_second

An example of the file format for a “skill\_labels\_by\_frame” appears below:
  
```
278     519     Place tourniquet over affected extremity 2-3 inches above wound site.
555     664     Pull tourniquet tight.
675     774     Apply strap to strap body.
808     1002    Turn windless clock wise or counter clockwise until hemorrhage is controlled.
982     1001    Lock windless into the windless keeper.
1042    1151    Secure strap and windless keeper with keeper securing device.
1226    1293    Mark time on securing device strap with permanent marker.
```

The columns are `start_frame end_frame label` in this data (and would be `start_time end_time label` for the "seconds" version). Note that consecutive labels may have overlapping frame/time windows if multiple actions are occurring simultaneously.

The corresponding "action_labels_by_frame" file appears below:

```
278     318     hands open_up tourniquet
367     519     hands put_tourniquet_around casualty_leg
555     664     left_hand pulls_tight tourniquet
675     774     left_hand secures velcro_strap
808     1002    hands twist windlass
982     1001    both_hands lock_into_windlass_keeper windlass
1042    1151    left_hand secures windlass
1226    1293    left_hand writes_on tourniquet_label
```

Note that the first two action labels are combined into one as they are part of a single step.


The `in-lab/` folder contains videos and labels recorded and annotated within our lab. The labels are all skill-level labels in terms of frame number. There is a third column between the end frame and the skill step text indicating the skill ID, and some of the labels within *_negative directories indicate the error type and which step should have happened instead. The syntax of the error descriptions are not always present, but all of the negative training has either a skipped step, added step, or incorrect step. The user may or may not correct the error during the video.
 
 
`yolo_model/` is where we store labeled data and resulting object detection models for the key objects necessary to perform this skill. There are two subdirectories in this folder:

* `output/` contains various output images and files that are produced automatically after the YOLO model finishes training to assess performance. Importantly, it also has a directory called weights/ where we have put the best weights found by the training (the file best.pt).
 
* `LabeledData/` contains the labeled training data for the YOLO model, split into the `training/` (80%), `validation/` (10%), and `test/` (10%) subsets we used. Each subset contains the images used for training as well as corresponding .txt files in YOLO format. The text files have the format `<object_id, x, y, width, height>`. The `object_id` corresponds to an index in the file object_names.txt, which is an ordered list of the objects labeled in the model. The remaining four columns give the X and Y center coordinates, width, and height of the bounding box. The coordinates treat the upper-left corner as (0,0) and all numbers are normalized to be between 0 and 1.
 
* `object_names.txt` – is the aforementioned ordered list of the objects labeled in the model.

*NOTE: The YOLO model provided for M1_Trauma_Assessment is trained to recognize body parts instead of skill steps, as the skill is performed by running hands over the body to search for injuries.*

 
### YOLO MODEL INFO
For more information about the Yolo training process, please refer to the following links:
 
Documentation: 
* [https://docs.ultralytics.com/models/yolov8/]()
* How-to article (written for v7 but info is still relevant): YOLOv7 Training on Custom Data?. Object-detection technology is widely… | by Muhammad Rizwan Munawar | Augmented Startups | Medium
[https://medium.com/augmented-startups/yolov7-training-on-custom-data-b86d23e6623]()

###  Downloads

The files you will want to download (grouped by the "MARCH" nemonic) include:

```
M1_Trauma_Assessment.tar.gz
M2_Tourniquet.tar.gz
M3_Pressure_Dressing.tar.gz
M4_Wound_Packing.tar.gz
M5_X-Stat.tar.gz
M6_Junctional_Tourniquet.tar.gz
M7_Pelvic_Binding.tar.gz
A8_NPA.tar.gz
A9_OPA.tar.gz
A10_Supraglottic_i-Gel.tar.gz
A11_Supraglottic_King.tar.gz
A12_Oral_Intubation.tar.gz
A13_Surgical_Airway.tar.gz
A14_Oral-Nasal_Suctioning.tar.gz
A15_Endotracheal_Suctioning.tar.gz
R16_Ventilate_BVM.tar.gz
R17_Oxygen_Tank.tar.gz
R18_Chest_Seal.tar.gz
R19_Needle_Chest_Decomp.tar.gz
R20_Measure_Pulse_Oxygen.tar.gz
R21_Chest_Tube.tar.gz
R22_Rescue_Breathing.tar.gz
C23_Saline_Lock.tar.gz
C24_Medicine_IV_Piggyback.tar.gz
C25_Injection.tar.gz
C26_Blood.tar.gz
C27_Infusion.tar.gz
C28_FAST_1.tar.gz
C29_EZ-IO.tar.gz
C30_Blood_Pressure.tar.gz
C31_Fluid_Warmer.tar.gz
H32_Open_Head_Injury.tar.gz
H33_Irrigate_Eyes_Syringe.tar.gz
H34_Cervical_Collar.tar.gz
H35_Spine_Board.tar.gz
H36_Eye_Shield.tar.gz
H37_Hypothermia_HPMK.tar.gz
E38_Nine_Line_Medivac.tar.gz
E39_Kendrick_Traction.tar.gz
E40_Glucose_Vitals.tar.gz
E41_Open_Abdominal_Wound.tar.gz
E42_Casualty_With_Impalement.tar.gz
E43_SAM_Ankle.tar.gz
E44_SAM_Tib-Fib.tar.gz
E45_SAM_Wrist.tar.gz
E46_SAM_Forearm.tar.gz
E47_SAM_Humerus.tar.gz
E48_Stretcher.tar.gz
E49_Foxtrot_Sled.tar.gz
E50_CPR.tar.gz
```



## **ABOUT DATA COLLECTION**

Data was collectied in accordance with IRB ____.  Efforts were made to annonymize the participants, and furthermore releases were obtained from the participants acknowledging this data would be released into the public domain wiht their permission.

**Professional data**
This data was collected by trained first responders, former combat medics, and those who teach combat medicine.  This portion of the data was collected with a helmet mounted Zed2i Stereo camera from StereoLabs with a microphone to collect audio data.  Metadata exists for this data collection in terms of notes, individual (annomized to an ID), and other attrubutes. Data was collected on site at Valkaryie Asture Medical Solutions facilities in AL in 20xx. 

**Lab data**
Lab data was collected to augment the professional dataset on a per-request basis by performers in the DARPA PTG program.  This data was collected with the Microsoft Hololens2 and is limited to main-camera video and audio.  This data was collected from 20xx-2025 at BBN offices in St. Louis Park, MN.  

## **NOTES ABOUT DATA PREPROCESSING**

Three items of note:

1. We make no guarantees about the correctness or completeness of the data quality although many measures have been taken.
Our data qualty process included (a) self reporting by the data collector that the data should be discarded, (b) manual examination of the videos (performed for most videos of most skills), (c) examination based on unusual annotations. 

2.  Professional labelers labeled actions occuring in the video.  This was used to derrive the steps taken in the video by a combination of automated and manual process.  These step labels were double checked manually with extra attention spent on anomolous labels to make sure that they were indeed correct. 

3. Labeling was outsourced and the tools and procedures used by the company are unkwnon to BBN.

## **NOTES ABOUT USAGE**


**Accessing the Data**: See DOI 10.5281/zenodo.19239155

**About Data Usage**:  We are providing this data to the community without a specific defined application.  We believe it will serve many purposes in the computer vision and first responder community.  With that said we offer a specifc benchmark application that is in the spirit of the DARPA PTG program. 

**PTG MAGIC Stand Alone Actitivy Detection Challenge on a subset of 8 Medical skills**

See: [https://github.com/BBN-VISUAL/egomagic_eval](https://github.com/BBN-VISUAL/egomagic_eval)



## DISCLAIMERS
Our data is provided for research use “as is” and may not be fully suitable for all purposes, especially those requiring certified accuracy or consistency.  The following is a limited but not complete list of things users should be aware of:

* Due to the nature of our data collections they contain limited variabilty of environment and equipment.
* The actions performed do not necessarily reflect the ideal performance or methodology.  In fact, mistakes and variations were encouraged in some data collections. 
* The participant pool was of limited diversity and as such the data may contain biases that make it unsitable for any specific purpose.
* Audio is included for contextual richness and has not been reviewed


## **LICENSE INFORMATION**

**License Information**: Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0)

**Usage Restrictions**: This dataset is provided for research, educational, and evaluation purposes only. Users agree not to attempt to identify, contact, or otherwise re-identify any individual represented in the dataset and to comply with all applicable human subjects protection and privacy regulations. The dataset is not appropriate for clinical decision-making, patient care, operational deployment, or safety-critical applications without independent validation and appropriate regulatory approval. Redistribution of the dataset or derived data must retain this usage statement and associated citation requirements. Use of the dataset constitutes acceptance of these conditions.

## **CITATIONS AND REFERENCES**


### **How to Cite:** 
 DOI 10.5281/zenodo.19239154

## **CONTRIBUTIONS**
All contirbutions should be handled via emailing bbn-magic@bbn.com

**Results:** While we are not operating a challenge at this time we would like to collect your results to share alongside others with our data set. 

**Contributing data or data labeling augmentations** We welcome additions to our data set, and corrections to our data labels.

**Supporting scripts and software:**. We welcome toold and scripts that work with our data.


## ACKNOWLEDGEMENTS

The BBN PTG MAGIC was comprised of several key contributors including:
Nick Walczak
Chris Gilleo
Charlie Mesiner
Lee Tarlin
Alexander Christner
Adam Diller
Reese Kneeland
David Diller
Elias Noyes

The project was led by Brian VanVoorst and Katelyn Carino.

External to BBN our team would like to thank:

* DARPA. Our program was started by Dr. Bruce Draper and completed by Dr. Matthew Marge.  https://www.darpa.mil/research/programs/perceptually-enabled-task-guidance
* PTG Team led by Ehsan Elhamifar at Northeastern University, https://www.khoury.northeastern.edu
* PTG Team led by Claudio Silva at NYU, https://engineering.nyu.edu
* PTG Team led by Brian Clipp at Kitware, https://www.kitware.com
* PTG Team led by Jason Corso at the University of Michigan, https://robotics.umich.edu
* PTG Team led by Aaron Jaminson at Valkyries Austere Medical Solutions https://valkyriesaustere.com
* Guidance provided by Dr. Jack Norfleet and Dr. Mathew Hackett at DEVCOM STTC


## **FAQ**

- **Common Questions:** Address frequently asked questions to assist users in troubleshooting or understanding the dataset better.

**Q**: Why are some folders numbered but empty?  
**A**: These data sets were contaminated or recorded in error.  To preserve the numbering scheme their emptry directories are maintained in the list.

## **CONTACT INFORMATION**

You can email the current maintaners at ptg-magic@rtx.com


### **Release History:** 
Document different versions of the dataset, including changes or updates made in each version.

