Ego-MAGIC
Description
We introduce a new computer vision data set produced in conjunction with the DARPA Perceptually-enabled Task Guidance (PTG) program. It is comprised of over 3000 ego-centric labeled videos for activity recognition for point-of-injury trauma care, specifically for combat medicine applications.
The vast majority of the videos were recorded by trained combat medicine instructors who were performing interventions on medical simulators in the same fashion as students who are being trained in these combat medicine skills.
This dataset is being presented in a similar fashion to other challenge datasets, such as EPIC Kitchens and the Trauma THOMPSON challenge.
What distinguishes this dataset from others is the speed at which the steps in the tasks are performed, and the potential for overlap between steps.
These two confounding properties make it such that there are can be very few distinctive frames with which to make a decision about the present activity. Additionally, the rapid motion of the camera and realistic cluttered scene often presents additional computer vision challenges.
Our dataset has been utilized in the DAPRA PTG program to create real-time augmented reality assistants [ref] to assist novice users in performing these medical skills. To introduce this PTGMAGIC dataset to the public domain for future research we have extracted the activity detection portion from PTG into a stand-alone challenge, which is the focus of this paper. While results are presented in terms of activity detection (determining start and stop times for a specific activity), this dataset is also amenable to other challenges like activity recognition (given a clip, what activity is being performed), and activity anticipation.
Overall, this dataset presents a number of unique qualities compared to similar egocentric datasets that make this dataset especially challenging: 1) the skill steps in many skills are very short, often only a second or so (\avgSkillStepDur seconds per step on average) 2) the skills themselves are often quite short, often only a few tens of seconds (\avgSkillDur seconds on average), 3) frequently skill steps can be performed concurrently, 4) many skills have steps that are considered optional, so certain skill steps can be skipped.
Currently the PTGMAGIC dataset consists of 3355 videos representing 50 skills. Across the 50 skills we have labeled over 1.95 million objects for 124 object classes, over 17,000 skill step delineations, and over 39,000 hand object interactions. We offer 40 pre-trained YoloV8 models to help start the developer.
Files
dataset_access_request_form.pdf
Additional details
Additional titles
- Alternative title
- PTG-MAGIC
Dates
- Updated
-
2026-05-08Fixed email address in submission form.