Published June 14, 2022 | Version 1.0.0
Dataset Open

MineDojo Internet Knowledge Base (YouTube)

  • 1. NVIDIA
  • 2. Caltech
  • 3. Stanford
  • 4. Columbia
  • 5. SJTU
  • 6. NVIDIA, UT Austin
  • 7. NVIDIA, Caltech


Project website:



Minecraft is among the most streamed games on YouTube. Human players have demonstrated a stunning range of creative activities and sophisticated missions that take hours to complete. We collect 730K+ narrated Minecraft videos, which add up to 33 years of duration and 2.2B words in English transcripts. The time-aligned transcripts enable the agent to ground free-form natural language in video pixels and learn the semantics of diverse activities without laborious human labeling.

There are two files in our YouTube knowledge base.

  • youtube_tutorial.json (tutorial videos): 

    Minecraft tutorial videos include step-by-step demonstrations and sometimes detailed verbal explanations. They also serve as a rich source of creative missions that humans find interesting. We harvest thousands of tasks from these videos in our benchmarking suite. 

  • youtube_full.json (general gameplay videos):

    Unlike tutorials, general gameplay videos do not necessarily provide guidance on particular tasks. Instead, they capture the “in-the-wild” human experiences that are much larger in quantity, diverse in contents, and rich in learning signals.

Data Structure

        "id": str,         # video id
        "title": str,      # video title
        "link": str,       # video link
        "view_count": int  # number of times the video has been viewed
        "like_count": int  # number of users who have indicated that they liked the video
        "duration": float  # video duration in seconds
        "fps": float,      # video FPS

Check out our paper!

  title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
  author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
  year = {2022},
  journal = {arXiv preprint arXiv: Arxiv-2206.08853}




Files (183.2 MB)

Name Size Download all
990.2 kB Preview Download
174.0 MB Preview Download
8.1 MB Preview Download