Minecraft is among the most streamed games on YouTube. Human players have demonstrated a stunning range of creative activities and sophisticated missions that take hours to complete. We collect 730K+ narrated Minecraft videos, which add up to ~300K hours and 2.2B words in English transcripts. The time-aligned transcripts enable the agent to ground free-form natural language in video pixels and learn the semantics of diverse activities without laborious human labeling. Please refer to the doc page for how to load our YouTube database.
Minecraft tutorial videos include step-by-step demonstrations and sometimes detailed verbal explanations. They also serve as a rich source of creative missions that humans find interesting. We harvest thousands of tasks from these videos in our benchmarking suite. Samples below show each tutorial video paired with a language task prompt.
Build crop farms
Find jungle temples and desert temples
Light up caves
Build and decorate a Nether Portal
Make a lava door
Make an xbox one controller statue
General Gameplay Videos
Unlike tutorials, general gameplay videos do not necessarily provide guidance on particular tasks. Instead, they capture the “in-the-wild” human experiences that are much larger in quantity, diverse in contents, and rich in learning signals.
The Wiki pages cover almost every aspect of the game mechanics, and supply a rich source of unstructured knowledge in multimodal tables, recipes, illustrations, and step-by-step tutorials. We scrape ~7K pages that interleave text, images, tables, and diagrams. To preserve the layout information, we also save the screenshots of entire pages and extract bounding boxes of the visual elements. Please refer to the doc page for how to load our Wiki database.
We collect 340K+ Reddit posts along with 6.6M comments under the “r/Minecraft” subreddit . These posts ask questions on how to solve certain tasks, showcase cool architectures and achievements in image/video snippets, and discuss general tips and tricks for players of all expertise levels. Large language models can be finetuned on our Reddit corpus to internalize Minecraft-specific concepts and develop sophisticated strategies. Please refer to the doc page for how to load our Reddit database.
MineDojo team ©2022