Voyager: Exploring, Learning, and Adapting Autonomously in Minecraft

Voyager is designed to enable lifelong learning in the video game Minecraft. Minecraft is a 3D open-world sandbox featuring diverse biomes, dynamic day–night cycles, various creatures, and a rich variety of blocks and items that can be gathered, crafted, and used for building or exploration. The Voyager's objective is to explore the game world, acquires diverse skills, and makes novel discoveries without manually designed reward functions or human intervention. It operates by generating its own curriculum of tasks, completing them through code-based actions, and refining its behavior via iterative feedback and self-verification. Through this process, the agent builds a growing library of reusable and compositional skills that support progressively more sophisticated behaviors.

Voyager exploring Minecraft via Mineflayer code
VOYAGER consists of three key components: an automatic curriculum for open-ended exploration, a skill library for increasingly complex behaviors, and an iterative prompting mechanism that uses code as action space.

We will now use examples and prompt templates to illustrate how Voyager applies planning, tool use, memory, and reflection to explore Minecraft and grow its skill library autonomously.

Planning

Voyager uses the automatic curriculum module for planning within the game, including automatically setting tasks and decomposing high-level tasks into sub-tasks. The automatic curriculum module autonomously generates new tasks based on its current abilities and the state of the environment. Then, it decomposes high-level tasks into actionable sub-tasks based on LLM's internal knowledge. The planning ability of the automatic curriculum module enables Voyager to gradually progress from completing simple tasks to tackling more difficult ones. In addition, when generating the skill code, Voyager also produces the corresponding plan. We show an example of an automatically generated task below.

Example of an automatically generated task by the automatic curriculum module

INPUT:

Instruction: You are a helpful assistant that tells me the next immediate task to do in Minecraft. My ultimate goal is to discover as many diverse things as possible, accomplish as many diverse tasks as possible and become the best Minecraft player in the world. You should act as a mentor and guide me to the next task based on my current learning progress.
Game State:
Biome: ...
Time: ...
Nearby blocks: ...
Other blocks that are recently seen: ...
Nearby entities (nearest to farthest): ...
Health: ...
Hunger: ...
Position: ...
Equipment: ...
Inventory (xx/36): ...
Completed and Failed Tasks:
Completed tasks so far: ...
Failed tasks that are too hard: ...


OUTPUT:

Reasoning: The inventory is empty now, chop down a tree to get some wood.
Task: Obtain a wood log.

Tool use

Voyager controls the game character to complete tasks proposed by the automatic curriculum module by leveraging the LLM generate action code for the character. The action code is composed of Mineflayer API functions, which enable the game character to perform specific actions in Minecraft, such as moving, mining, crafting, and fighting. The use of Mineflayer first allows the LLM to directly access structured game state and high-level action interfaces, eliminating the need to interpret complex visual inputs and frequent low-level decision-making, thereby greatly simplifying environment interaction and improving efficiency. More importantly, each generated Mineflayer function is stored as an independent skill, which can be retrieved for new tasks and combined to form more complex behaviors.

Example of the generated code for obtaining a wood log

INPUT:

Instruction: You are a helpful assistant that writes Mineflayer javascript code to complete any Minecraft task specified by me. Here are some useful programs written with Mineflayer APIs. You should Write an async function taking the bot as the only argument. Reuse the above useful programs as much as possible. Your function will be reused for building more complex functions. Therefore, you should make it generic and reusable.
Code Example: {programs}

Game State:
Biome: ...
Time: ...
Nearby blocks: ...
Other blocks that are recently seen: ...
Nearby entities (nearest to farthest): ...
Health: ...
Hunger: ...
Position: ...
Equipment: ...
Inventory (xx/36): ...
Task: Obtain a wood log
Context: ...

Critique: ...


OUTPUT:

Explain:
1. First check if a wooden pickaxe is in the inventory.
2. If not, craft one. If the wooden pickaxe is available, equip the wooden pickaxe in the hand.
3. ...
4. ...

Code:

async function mineCobblestone(bot) {
  let woodenPickaxe = bot.inventory.findInventoryItem(mcData.itemsByName["wooden_pickaxe"].id);
  if (!woodenPickaxe) {
    bot.chat("Crafting a wooden pickaxe.");
    await craftWoodenPickaxe(bot);
    woodenPickaxe = bot.inventory.findInventoryItem(mcData.itemsByName["wooden_pickaxe"].id);
  }
  ...
}

Memory

Voyager exhibits memory through its skill library, which is extensible and stores all verified executable programs. The skill library is implemented based on a vector database, enabling Voyager to retrieve similar skills when facing new tasks and to compose complex skills through the combination of existing skills. In detail, When a new task is proposed by the automatic curriculum, Voyager queries the skill library using the embedding of the task description and environment feedback to retrieve relevant skills. These retrieved skills are provided to an LLM, which can reuse or compose them to generate new executable programs. Verified programs are then added back to the skill library as a new skill, enabling Voyager to continually expand its repertoire of modular and compositional behaviors for solving increasingly complex and novel tasks.

Reflection

The iterative prompting mchanism endows Voyager with reflection ablitity. After executing each generated program, Voyager observes environmental feedback, execution errors and selfverification for checking task success. By incorporating signals such as execution errors and success checks, Voyager continuously revises its action codes until the task succeeds. This trial-and-error cycle mirrors reflective reasoning, allowing the agent to learn from its mistakes and autonomously improve its action. Below is an example of Voyager's self-verification.

Example of self-verification for checking task success

INPUT:

Instruction: You are an assistant that assesses my progress of playing Minecraft and provides useful guidance. You are required to evaluate if I have met the task requirements. Exceeding the task requirements is also considered a success while failing to meet them requires you to provide critique to help me improve.
Game State:
Biome: ...
Time: ...
Nearby blocks: ...
Other blocks that are recently seen: ...
Nearby entities (nearest to farthest): ...
Health: ...
Hunger: ...
Position: ...
Equipment: If I have better armor in my inventory, you should ask me to equip it.
Inventory (xx/36): {'crafting table': 1, 'spruce planks': 6, 'stick': 4}
Task:Craft a wooden pickaxe


OUTPUT:

Reasoning: You have enough materials to craft a wooden pickaxe, but you didn't craft it.
Success: False. Critique: Craft a wooden pickaxe with a crafting table using 3 spruce planks and 2 sticks.

If you find this work helpful, please consider citing our paper:

@article{hu2025hands,
  title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
  author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
  journal={Hands-on},
  volume={21},
  pages={6},
  year={2025}
}