LLMでブラウザ操作できると話題のbrowser-useを動かし方

2025年1月19日 19:01

少し前にTwitterで話題にもなっていましたし、GitHubのトレンドにも上がっていたLLMでブラウザ操作できるbrowser-useを紹介していきます。

「Browser Use」は、ウェブブラウザを自動操作するためのPythonライブラリです。これにより、ウェブサイトの情報収集、フォーム入力、複雑なワークフローの自動化など、さまざまなタスクを効率的に実行できます。

今回は、上記ページのコードを参考に書いています。

最初に、browser-useライブラリをインストールします。

pip install browser-use

次に、.OpenAIのAPIキーを設定します。sk-xxxxxにはOpenAIのAPIキーを記載する。

set OPENAI_API_KEY=sk-xxxxxxx

次に、test.pyとして次のコードを作成します。

import openai

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio

async def main():
    agent = Agent(
        task="Go to Google, search for 'browser-use' in the search bar, summarize the contents and return the results",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

test.pyを実行します。

python test.py

実行結果は、次の通りになります。

The search term 'browser-use' yields information about a tool designed to automate browser operations using AI technologies. Notable articles from sources like DevelopersIO and GitHub highlight its capability to interface with AI agents, allowing for seamless web interactions. The tool is noted for its integration with Python and its utility in enhancing AI-driven tasks with ease of use. For more interactive experiences and comprehensive tutorials, sites like `browser-use.com` provide extensive details and user guides.

所感としては、MicrosoftのPowerAutomateをPythonコードで利用できるようにしたような印象を受けました。ただし、自然言語でブラウザ操作をすることができるのはポイントが高いです。まだ、出始めの状況ですので、今後の発展が望まれるからです。gpt-4oを使った費用も1回あたり0.1ドル程度なのである程度はコストが容認できるレベルなのかと思われます。

また、調べてはいないですが、おそらくローカルLLMも使うことが出来るのではないかと思いますので、ローカルLLMを使うという前提なら良いかもしれません。

LLMでブラウザ操作できると話題のbrowser-useを動かし方

いいなと思ったら応援しよう！