Published May 8, 2026 | Version v2
Dataset Open

Ego-MAGIC

Description

We introduce a new computer vision data set produced in conjunction with the DARPA Perceptually-enabled Task Guidance (PTG) program. It is comprised of over 3000 ego-centric labeled videos for activity recognition for point-of-injury trauma care, specifically for combat medicine applications.

The vast majority of the videos were recorded by trained combat medicine instructors who were performing interventions on medical simulators in the same fashion as students who are being trained in these combat medicine skills.

This dataset is being presented in a similar fashion to other challenge datasets, such as EPIC Kitchens and the Trauma THOMPSON challenge.

What distinguishes this dataset from others is the speed at which the steps in the tasks are performed, and the potential for overlap between steps.

These two confounding properties make it such that there are can be very few distinctive frames with which to make a decision about the present activity. Additionally, the rapid motion of the camera and realistic cluttered scene often presents additional computer vision challenges.

Our dataset has been utilized in the DAPRA PTG program to create real-time augmented reality assistants [ref] to assist novice users in performing these medical skills. To introduce this PTGMAGIC dataset to the public domain for future research we have extracted the activity detection portion from PTG into a stand-alone challenge, which is the focus of this paper. While results are presented in terms of activity detection (determining start and stop times for a specific activity), this dataset is also amenable to other challenges like activity recognition (given a clip, what activity is being performed), and activity anticipation.

Overall, this dataset presents a number of unique qualities compared to similar egocentric datasets that make this dataset especially challenging: 1) the skill steps in many skills are very short, often only a second or so (\avgSkillStepDur seconds per step on average) 2) the skills themselves are often quite short, often only a few tens of seconds (\avgSkillDur seconds on average), 3) frequently skill steps can be performed concurrently, 4) many skills have steps that are considered optional, so certain skill steps can be skipped.

Currently the PTGMAGIC dataset consists of 3355 videos representing 50 skills. Across the 50 skills we have labeled over 1.95 million objects for 124 object classes, over 17,000 skill step delineations, and over 39,000 hand object interactions. We offer 40 pre-trained YoloV8 models to help start the developer.

 

Files

dataset_access_request_form.pdf

Files (202.9 kB)

Name Size Download all
md5:b2f8e097d663dadb86c5bb13ec1f5808
28.8 kB Download
md5:432ee44b8087b9118071afa284be98eb
51.1 kB Preview Download
md5:a58cd00d420faeba0f7908e4c3bf71db
17.7 kB Preview Download
md5:f164ab572e3c8fb87c122074d5ecafbe
105.4 kB Preview Download

Additional details

Additional titles

Alternative title
PTG-MAGIC

Dates

Updated
2026-05-08
Fixed email address in submission form.