[論文紹介] Voyager | An Open-Ended Embodied Agent with Large Language Models

ryunosuke

2023年6月25日 14:59

今更ながらちゃんと読んだのでまとめました。

論文情報

Voyager: An Open-Ended Embodied Agent with Large Language Models
https://voyager.minedojo.org/
2023/05/25 arxiv公開

3行で

コーディングを行い自律的にスキルを発明しMinecraftをプレイするLLMエージェント
フィードバックループにより複雑なコーディングを可能にするフレームワークを提案
自発的にタスクを探索し幅広いスキルを身につけることができる

Note

実際のMinecraft内でのアクションにはmineflayerという基本的な操作が抽象化され、ゲーム内の様々なメタデータにアクセスできるAPIを使っています。
例えば「ダイアモンドを掘りたい！」と思えば、周囲Nマスのダイアモンドブロックを検索し、見つかったブロックまでの経路を探索するといった操作はAPI側でよしなにやってくれるようです。

今回の論文では基本的な操作を組み合わせた複雑なスキルの構築と、多様なスキルを獲得するための探索ポリシーに焦点を絞っており、実際の画面の映像をインプットとして受け取ってコントローラーのボタンの出力を返す、といったようなEnd to EndのAIではないです。自分も勘違いしており過剰に解釈していたので念のため注意でした。
（こんな便利ライブラリがあるんですね…）

Method

全体の構造

大きく3つのコンポーネントから構成されます。

Automatic Curriculum：今の状態から、次に遂行すべきタスクを決定する
Skill Library：既存のスキルを管理し、新しいスキルの作成に活用する
Iterative Prompting Mechanism：新しいスキルをコードとして作成する

Automatic Curriculumが次にどんなタスクを実行するか決定します（例：木を切る、作業台を作る、ゾンビを倒す、ダイアモンドを掘る）。現在の状況やこれまでの行動履歴、手元にあるアイテムなどを考慮して次に何をすべきかを決めるPolicy Agent的な役割をします。

次に行うタスクが決まったら、そのタスクを解くためのスキルをIterative Prompting MechanismでJavascriptのコードとして生成します。今回の論文の肝ですね。
Skill Libraryからretrieveした既存のスキル（組み込み or 以前に作成したもの）を組み合わせて新しいスキルを開発することができるため、回数を重ねるにつれてより複雑で高度な操作が行えるようになります。

Automatic Curriculum

次に実行するタスクを決定します。以下の情報をプロンプトに埋め込み、GPT-4に答えてもらいます。多様なタスクが出現することを目的として与えることで、ゲーム空間の探索が促進されるようになっているところがポイントです。

目的
- あなたの最終的な目的は、できるだけ多様なタスクを完了し、できるだけ多様な事物を発見することです
環境情報
- インベントリの状態、装備、近くのブロックやエンティティ、チェストの状態、バイオーム、時間、体力や飢えのゲージ、位置
基準、制約
- 正確なフォーマット
- 提示するタスクは今の状態を考慮して難しすぎないものにすること
- 斬新で面白いタスク（レアなリソースの発見、装備の強化、良い素材の使用）を提示すること
- 必要な時だけ既存のタスクを繰り返すこと
- シェルターを作らないこと（一つの場所に留まらないようにするため）
- 画像情報が必要なタスクは避ける
Additional Context：Wikiからの追加情報（後述）

ただし、実際にはプレイ初期ではこのうち一部の情報のみ与えることで、難しすぎるタスクの生成を抑制する工夫もされているようです (Warm-up schedule)。

[Automatic Curriculumのプロンプト]

You are a helpful assistant that tells me the next immediate task to
do in Minecraft . My ultimate goal is to discover as many diverse
things as possible , accomplish as many diverse tasks as possible
and become the best Minecraft player in the world .
I will give you the following information :
Question 1: ...
Answer : ...
Question 2: ...
Answer : ...
Question 3: ...
Answer : ...
...
Biome : ...
Time : ...
Nearby blocks : ...
Other blocks that are recently seen : ...
Nearby entities ( nearest to farthest ) : ...
Health : Higher than 15 means I ’m healthy .
Hunger : Higher than 15 means I ’m not hungry .
Position : ...
Equipment : If I have better armor in my inventory , you should ask me
to equip it .
Inventory ( xx /36) : ...
Chests : You can ask me to deposit or take items from these chests .
There also might be some unknown chest , you should ask me to open
and check items inside the unknown chest .
Completed tasks so far : ...
Failed tasks that are too hard : ...
You must follow the following criteria :
1) You should act as a mentor and guide me to the next task based on
my current learning progress .
2) Please be very specific about what resources I need to collect ,
what I need to craft , or what mobs I need to kill .
3) The next task should follow a concise format , such as " Mine [
quantity ] [ block ]" , " Craft [ quantity ] [ item ]" , " Smelt [ quantity ] [
item ]" , " Kill [ quantity ] [ mob ]" , " Cook [ quantity ] [ food ]" , " Equip
[ item ]" etc . It should be a single phrase . Do not propose multiple
tasks at the same time . Do not mention anything else .
4) The next task should not be too hard since I may not have the
necessary resources or have learned enough skills to complete it
yet .
5) The next task should be novel and interesting . I should look for
rare resources , upgrade my equipment and tools using better
materials , and discover new things . I should not be doing the same
thing over and over again .
6) I may sometimes need to repeat some tasks if I need to collect more
resources to complete more difficult tasks . Only repeat tasks if
necessary .
7) Do not ask me to build or dig shelter even if it ’ s at night . I want
to explore the world and discover new things . I don ’ t want to
stay in one place .
8) Tasks that require information beyond the player ’ s status to verify
should be avoided . For instance , " Placing 4 torches " and " Dig a 2
x1x2 hole " are not ideal since they require visual confirmation
from the screen . All the placing , building , planting , and trading
tasks should be avoided . Do not propose task starting with these
keywords .
You should only respond in the format as described below :
RESPONSE FORMAT :
Reasoning : Based on the information I listed above , do reasoning about
what the next task should be .
Task : The next task .
Here ’ s an example response :
Reasoning : The inventory is empty now , chop down a tree to get some
wood .
Task : Obtain a wood log .

論文に出てくる文章に真顔で "Your ultimate goal is to become the best Minecraft player in the world" ってある時代面白すぎる…

Additional Context

より深いコンテキストを活用するため、Wikiから抽出した情報をGPT-3.5によりSelf QAという形で埋め込んでいます。関連するドキュメントをプロンプトに埋め込んで、次のタスクを決定する上で有用な質問を出力させ、別プロンプトでその回答も作成させます。
Minecraftに関してはGPT-3.5自体が十分な知識を持っていたためドキュメントの埋め込みは不要だったとのことですが、この手順を通してGPTの知識が不十分なドメインにも今回の手法を適用することができます。

[Additional Contextの質問生成プロンプト]

You are a helpful assistant that asks questions to help me decide the
next immediate task to do in Minecraft . My ultimate goal is to
discover as many things as possible , accomplish as many tasks as
possible and become the best Minecraft player in the world .
I will give you the following information :
Biome : ...
Time : ...
Nearby blocks : ...
Other blocks that are recently seen : ...
Nearby entities ( nearest to farthest ) : ...
Health : ...
Hunger : ...
Position : ...
Equipment : ...
Inventory ( xx /36) : ...
Chests : ...
Completed tasks so far : ...
Failed tasks that are too hard : ...
You must follow the following criteria :
1) You should ask at least 5 questions ( but no more than 10 questions )
to help me decide the next immediate task to do . Each question
should be followed by the concept that the question is about .
2) Your question should be specific to a concept in Minecraft .
Bad example ( the question is too general ) :
Question : What is the best way to play Minecraft ?
Concept : unknown
Bad example ( axe is still general , you should specify the type of
axe such as wooden axe ) :
What are the benefits of using an axe to gather resources ?
Concept : axe
Good example :
Question : How to make a wooden pickaxe ?
Concept : wooden pickaxe
3) Your questions should be self-contained and not require any context
.
Bad example ( the question requires the context of my current biome ) :
Question : What are the blocks that I can find in my current biome ?
Concept : unknown
Bad example ( the question requires the context of my current
inventory ) :
Question : What are the resources you need the most currently ?
Concept : unknown
Bad example ( the question requires the context of my current
inventory ) :
Question : Do you have any gold or emerald resources ?
Concept : gold
Bad example ( the question requires the context of my nearby entities
) :
Question : Can you see any animals nearby that you can kill for
food ?
Concept : food
Bad example ( the question requires the context of my nearby blocks ) :
Question : Is there any water source nearby ?
Concept : water
Good example :
Question : What are the blocks that I can find in the sparse jungle
?
Concept : sparse jungle
4) Do not ask questions about building tasks ( such as building a
shelter ) since they are too hard for me to do .
Let ’ s say your current biome is sparse jungle . You can ask questions
like :
Question : What are the items that I can find in the sparse jungle ?
Concept : sparse jungle
Question : What are the mobs that I can find in the sparse jungle ?
Concept : sparse jungle
Let ’ s say you see a creeper nearby , and you have not defeated a
creeper before . You can ask a question like :
Question : How to defeat the creeper ?
Concept : creeper
Let ’ s say you last completed task is " Craft a wooden pickaxe ". You can
ask a question like :
Question : What are the suggested tasks that I can do after crafting a
wooden pickaxe ?
Concept : wooden pickaxe
Here are some more question and concept examples :
Question : What are the ores that I can find in the sparse jungle ?
Concept : sparse jungle
( the above concept should not be " ore " because I need to look up the
page of " sparse jungle " to find out what ores I can find in the
sparse jungle )
Question : How can you obtain food in the sparse jungle ?
Concept : sparse jungle
( the above concept should not be " food " because I need to look up the
page of " sparse jungle " to find out what food I can obtain in the
sparse jungle )
Question : How can you use the furnace to upgrade your equipment and
make useful items ?
Concept : furnace
Question : How to obtain a diamond ore ?
Concept : diamond ore
Question : What are the benefits of using a stone pickaxe over a wooden
pickaxe ?
Concept : stone pickaxe
Question : What are the tools that you can craft using wood planks and
sticks ?
Concept : wood planks
You should only respond in the format as described below :
RESPONSE FORMAT :
Reasoning : ...
Question 1: ...
Concept 1: ...
Question 2: ...
Concept 2: ...
Question 3: ...
Concept 3: ...
Question 4: ...
Concept 4: ...
Question 5: ...
Concept 5: ...
...

[Additional Contextの回答生成プロンプト]

You are a helpful assistant that answer my question about Minecraft .
I will give you the following information :
Question : ...
You will answer the question based on the context ( only if available
and helpful ) and your own knowledge of Minecraft .
1) Start your answer with " Answer : ".
2) Answer " Answer : Unknown " if you don ’ t know the answer .

Wikiの情報を直接Automatic Curriculumのプロンプトに埋め込む代わりにQAを埋め込んでいる理由については特に言及はありませんでしたが、ドキュメントの埋め込みが不要な今回のタスクでMinecraftに関する明示的な知識を抽出することと、トークン量の削減という目的があると推察されます。実際Additional Contextの生成にGPT-4ではなくGPT-3.5を利用している理由についてはbudgetary considerations（予算の考慮）という言及があります。

Skill Library

タスクを解くために利用できるスキル群です。以下の3つを組み合わせて利用することができます。

Mineflayerが提供する基本的な操作API
論文の著者たちが実装したヘルパー関数
エージェントが自ら新しく発明したスキル

SkillのRetrieval関連のテクニックとしては、①スキル生成後descriptionをGPT-3.5に生成させる ②Retrieval時、関連ドキュメントからGPT-3.5に出力させたタスクの解法（≒Additional Context）と後述するEnvironment Feedbackをクエリとしてベクトルサーチを行うといった工夫が挙げられています。

Iterative Prompting Mechanism

Skill Libraryから取得したスキルを利用して、次のタスクを解くためのスキルをコードとして生成します。
一発でバシッと正解のコードを生成することは難しいため、コード実行時のエラーや環境からのフィードバックを用いて段階的に正解を目指します。
全体の擬似コードにもあるとおり、このステップ自体は単純なループとして表現されますが、毎実行時の結果を次のループに持ち越すことでPDCAループが成立しているところに特徴があります。
各ステップは、

Skill Libraryからのスキルの取得
コードの生成
生成されたコードの実行
実行結果の評価

から構成され、実行結果の評価で成功と判定された時点で完了とします。複数回繰り返しても成功しなかった場合は、タスクが悪いと判断して次のタスクループに移行します。

Environment Feedback

スキルを実行した時の実行ログ履歴です。これはMinecraft内のチャットログとして実装されており、またコード生成時にも適切なログが生成されるようにすることで実行に対するフィードバックを得られるようにしているところがポイントです。

作成されたスキルの例。失敗時に適切なエラー文を出力させ、
他のスキルから利用した時にコードの改善に活用する

また、そもそもコードの実行に失敗した場合はExecution errorとしてこれもフィードバックされます。

Self-Verification

タスクの成功可否もGPT-4に判定してもらいます。失敗と判定した場合には先生からのCritique = 講評（どこがダメだったのか、次はこうしてみましょう）がつき、次のラウンドのフィードバックとして利用されます。

[Self-Verificationのプロンプト]

You are an assistant that assesses my progress of playing Minecraft
and provides useful guidance .
You are required to evaluate if I have met the task requirements .
Exceeding the task requirements is also considered a success while
failing to meet them requires you to provide critique to help me
improve .
I will give you the following information :
Biome : The biome after the task execution .
Time : The current time .
Nearby blocks : The surrounding blocks . These blocks are not collected
yet . However , this is useful for some placing or planting tasks .
Health : My current health .
Hunger : My current hunger level . For eating task , if my hunger level
is 20.0 , then I successfully ate the food .
Position : My current position .
Equipment : My final equipment . For crafting tasks , I sometimes equip
the crafted item .
Inventory ( xx /36) : My final inventory . For mining and smelting tasks ,
you only need to check inventory .
Chests : If the task requires me to place items in a chest , you can
find chest information here .
Task : The objective I need to accomplish .
Context : The context of the task .
You should only respond in JSON format as described below :
{
" reasoning ": " reasoning " ,
" success ": boolean ,
" critique ": " critique " ,
}
Ensure the response can be parsed by Python ‘ json . loads ‘ , e . g .: no
trailing commas , no single quotes , etc .
Here are some examples :
INPUT :
Inventory (2/36) : { ’ oak_log ’:2 , ’ spruce_log ’:2}
Task : Mine 3 wood logs
RESPONSE :
{
" reasoning ": " You need to mine 3 wood logs . You have 2 oak logs
and 2 spruce logs , which add up to 4 wood logs ." ,
" success ": true ,
" critique ": ""
}
INPUT :
Inventory (3/36) : { ’ crafting_table ’: 1 , ’ spruce_planks ’: 6 , ’ stick ’:
4}
Task : Craft a wooden pickaxe
RESPONSE :
{
" reasoning ": " You have enough materials to craft a wooden pickaxe ,
but you didn ’ t craft it ." ,
" success ": false ,
" critique ": " Craft a wooden pickaxe with a crafting table using 3
spruce planks and 2 sticks ."
}
...
（以下同様のExamplesのため省略）

これらのテクニックをフル活用したのが次のプロンプトになります。

[コード生成プロンプト]

You are a helpful assistant that writes Mineflayer javascript code to
complete any Minecraft task specified by me .
Here are some useful programs written with Mineflayer APIs .
/*
Explore until find an iron_ore , use Vec3 (0 , -1 , 0) because iron ores
are usually underground
await exploreUntil ( bot , new Vec3 (0 , -1 , 0) , 60 , () = > {
　　　　const iron_ore = bot . findBlock ({
　　　　　　　　matching : mcData . blocksByName [" iron_ore "]. id ,
　　　　　　　　maxDistance : 32 ,
　　　　}) ;
　　　　return iron_ore ;
}) ;
Explore until find a pig , use Vec3 (1 , 0 , 1) because pigs are usually
on the surface
let pig = await exploreUntil ( bot , new Vec3 (1 , 0 , 1) , 60 , () = > {
　　　　const pig = bot . nearestEntity (( entity ) = > {
　　　　　　　　return (
　　　　　　　　　　　　entity . name === " pig " &&
　　　　　　　　　　　　entity . position . distanceTo ( bot . entity . position ) < 32
　　　　　　　　) ;
　　　　}) ;
　　　　return pig ;
}) ;
*/
async function exploreUntil ( bot , direction , maxTime = 60 , callback ) {
　　　　/*
　　　　Implementation of this function is omitted .
　　　　direction : Vec3 , can only contain value of -1 , 0 or 1
　　　　maxTime : number , the max time for exploration
　　　　callback : function , early stop condition , will be called each
　　　　second , exploration will stop if return value is not null
　　　　Return : null if explore timeout , otherwise return the return value　of callback
　　　　*/
}
// Mine 3 cobblestone : mineBlock ( bot , " stone " , 3) ;
async function mineBlock ( bot , name , count = 1) {
　　　　const blocks = bot . findBlocks ({
　　　　　　　　matching : ( block ) = > {
　　　　　　　　　　　　return block . name === name ;
　　　　　　　　} ,
　　　　　　　　maxDistance : 32 ,
　　　　　　　　count : count ,
　　　　}) ;
　　　　const targets = [];
　　　　for ( let i = 0; i < Math . min ( blocks . length , count ) ; i ++) {
　　　　　　　　targets . push ( bot . blockAt ( blocks [ i ]) ) ;
　　　　}
　　　　await bot . collectBlock . collect ( targets , { ignoreNoPath : true }) ;
}
...
（しばらく同様のコード例が続く）
...
// These are other Mineflayer async functions you can use :
await bot . equip ( item , destination ) ; // Equip the item in the specified
destination . ‘ item ‘ is ‘ Item ‘ , ‘ destination ‘ can only be " hand " ,
" head " , " torso " , " legs " , " feet " , " off - hand "
...
（Mineflayerの利用可能な関数の説明）
...
{ retrieved_skills }
At each round of conversation , I will give you
Code from the last round : ...
Execution error : ...
Chat log : ...
Biome : ...
Time : ...
Nearby blocks : ...
Nearby entities ( nearest to farthest ) :
Health : ...
Hunger : ...
Position : ...
Equipment : ...
Inventory ( xx /36) : ...
Chests : ...
Task : ...
Context : ...
Critique : ...
You should then respond to me with
Explain ( if applicable ) : Are there any steps missing in your plan ? Why
does the code not complete the task ? What does the chat log and
execution error imply ?
Plan : How to complete the task step by step . You should pay attention
to Inventory since it tells what you have . The task completeness
check is also based on your final inventory .
Code :
1) Write an async function taking the bot as the only argument .
2) Reuse the above useful programs as much as possible .
- Use ‘ mineBlock ( bot , name , count ) ‘ to collect blocks . Do not
use ‘ bot . dig ‘ directly .
- Use ‘ craftItem ( bot , name , count ) ‘ to craft items . Do not use
‘ bot . craft ‘ directly .
- Use ‘ smeltItem ( bot , name count ) ‘ to smelt items . Do not use
‘ bot . openFurnace ‘ directly .
- Use ‘ placeItem ( bot , name , position ) ‘ to place blocks . Do not
use ‘ bot . placeBlock ‘ directly .
- Use ‘ killMob ( bot , name , timeout ) ‘ to kill mobs . Do not use ‘
bot . attack ‘ directly .
3) Your function will be reused for building more complex
functions . Therefore , you should make it generic and reusable . You
should not make strong assumption about the inventory ( as it may
be changed at a later time ) , and therefore you should always check
whether you have the required items before using them . If not ,
you should first collect the required items and reuse the above
useful programs .
4) Functions in the " Code from the last round " section will not be
saved or executed . Do not reuse functions listed there .
5) Anything defined outside a function will be ignored , define all
your variables inside your functions .
6) Call ‘ bot . chat ‘ to show the intermediate progress .
7) Use ‘ exploreUntil ( bot , direction , maxDistance , callback ) ‘ when
you cannot find something . You should frequently call this before
mining blocks or killing mobs . You should select a direction at
random every time instead of constantly using (1 , 0 , 1) .
8) ‘ maxDistance ‘ should always be 32 for ‘ bot . findBlocks ‘ and ‘ bot
. findBlock ‘. Do not cheat .
9) Do not write infinite loops or recursive functions .
10) Do not use ‘ bot . on ‘ or ‘ bot . once ‘ to register event listeners .
You definitely do not need them .
11) Name your function in a meaningful way ( can infer the task
from the name ) .
You should only respond in the format as described below :
RESPONSE FORMAT :
Explain : ...
Plan :
1) ...
2) ...
3) ...
...
Code :
‘‘‘ javascript
// helper functions ( only if needed , try to avoid them )
...
// main function after the helper functions
async function yourMainFunctionName ( bot ) {
// ...
}
‘‘‘

大量のコード例
Mineflayerの利用可能な関数一覧
環境情報
タスク
コンテキスト (Additional Context)
前ラウンドのコード
前ラウンドからのフィードバック
- Environment Feedback
- Execution Error
- Critique
注意事項
- 再利用可能なコードはできるだけ再利用し、MineflayerのAPIは極力利用しないこと
- 作成された関数は今後再利用されるため、一般的で有用なものとすること。インベントリについての強い仮定を置かず、アイテムの作成時に毎回内容を確認し、必要なアイテムは都度作成すること
- 前ラウンドのコードは保存されていないので使用しないこと
- 関数内で全ての変数を定義すること
- 中間プロセスの表示のためにbot.chat()を利用すること
- 無限ループや再帰関数を書かないこと
- 関数に適切な名前をつけること（名前から内容が推察できること）
- その他関数の利用に関する細かい注意

こうして無事コード生成を行うプロンプトが完成しました。作成されたコードはSkill Libraryに追加され、実行後の状態に応じてまた次のタスクループが繰り返されていきます。

改めて全体像をまとめると次のようになります。Automatic CurriculumがTaskを決定し、Taskに対してIterative Prompting Mechanismを用いて前回のフィードバックを使いながら繰り返しコードをrefineしていきます。

Evaluation

ReActやReflexion、AutoGPTなど既存LLMエージェント手法との性能を比較。また本手法で提案しているコンポーネントの一部を削除した場合にどんな影響が起こるかを検証しています。

生成されたコードの例

Voyagerは既存手法よりも効率的に多様なアイテムを収集する

Voyagerは既存手法よりも広い探索を行う

Automatic Curriculumはエージェントの一貫した成長のために重要な役割を果たしている
Skill Libraryを落としても最初は大きな性能の劣化がないが、より複雑なタスクになるとSkill Libraryがない場合停滞してしまう = 複雑なタスクにおけるSkill Libraryの有用性を確認
どのフィードバック要素を落としても顕著に性能が悪化するが、特にSelf-Verificationの影響が大きい = コード生成ステップにおいて最も重要な役割を果たす
コード生成にGPT-3.5を使うと顕著に性能が悪化する = コーディング能力において、quantum leap（飛躍的な進歩）がある

その他、応用として人間のフィードバックを与えることによって画像情報を活用したタスクも行えることになる可能性を示唆しています。

画像情報に対する人間のフィードバックをコード生成時に与えることで、
画像情報を扱うタスクも解けるようになる

感想メモ

GPT-4とGPT-3.5の使い分けについては LATM に引き続き色んな工夫が見られますね。複雑なタスクにのみGPT-4を使い、他はGPT-3.5で済ますというのは今後のスタンダードになっていきそう
コード生成における関数の再利用と作成のフィードバックループの汎用的な枠組みを示してくれているところが嬉しい。プロンプトを見ると大量の注意書きや個別の関数の説明もあり、苦労なく他のタスクに適用できるわけではなさそうですが試してみる価値は大いにありそう
今回生成しているコードでは関数の引数は存在せず、例えばN個の鉄を精錬する関数はsmeltFiveRawIronV2というように別々の関数として実装する必要がある。実世界の応用を考えると、引数を取るより汎用的な関数をどう生成できるようにするかが一つのキーポイントになりそう
コードの一般的なベストプラクティス（DRY原則、関数名は簡潔に、ちゃんとログを書く、暗黙的な状態への依存を作らない）がプロンプトに出てきて面白い。AIに完全にコーディングを任せる時代も近いのかもしれない
ていうかMineflayerすごい

Voyagerはコードも公開されてみるので興味のある方はぜひ触ってみてください！自分もこれから詳しく触ってみようと思います。

以上となります。それでは良きLLMライフを！