LLMでPC を操作 – Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

2024-10-23 AI がまた新しいステージに入りました。
LLM で OS を操作するステージに入りました。

https://www.anthropic.com/news/3-5-models-and-computer-use

OS の操作といえば、まずは RPA が候補に上がりますが、RPA自体はAIの機能を搭載していないので、
自動処理自体の構築が面倒で、導入障壁の高いものでした。

一方、今回の Claude による OS 操作は、
Claude が画面のスクリーンショットを撮影・解析し、アイコンを検知するなどしてその場その場の処理を決断します。

“public beta” ということですが、どの程度のものなのか、触ってみました。

なお、公式ドキュメントには、誤作動によってPCが破壊されたり、重要な情報が漏洩する可能性を考慮するように書かれています。

Computer use is a beta feature. Please be aware that computer use poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using computer use to interact with the internet. To minimize risks, consider taking precautions such as:

Use a dedicated virtual machine or container with minimal privileges to prevent direct system attacks or accidents.

Avoid giving the model access to sensitive data, such as account login information, to prevent information theft.

Limit internet access to an allowlist of domains to reduce exposure to malicious content.

Ask a human to confirm decisions that may result in meaningful real-world consequences as well as any tasks requiring affirmative consent, such as accepting cookies, executing financial transactions, or agreeing to terms of service.

In some circumstances, Claude will follow commands found in content even if it conflicts with the user’s instructions. For example, Claude instructions on webpages or contained in images may override instructions or cause Claude to make mistakes. We suggest taking precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection.

Finally, please inform end users of relevant risks and obtain their consent prior to enabling computer use in your own products.

確かにその通りです。

近い将来に、AIの乗っ取り・改ざんによるPCのハッキングが行われるのは間違いないでしょう。

今回は Raspberry PI 5 で Docker を起動し、仮想OS の上での動作を確認しました。

Anthropic Computer Use Demo

https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo

この公式に従いますが、RPI5をVNC で動作させているので、VNCポートも書き換えました。

API_KEY を書き換える
VNC ポートを書き換える

export ANTHROPIC_API_KEY=%your_api_key%
docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/computeruse/.anthropic \
    -p 5901:5900 \
    -p 8501:8501 \
    -p 6080:6080 \
    -p 8080:8080 \
    -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

画面左に指示用LLM画面、右にOSのデスクトップが見えます。

なお、OS部分にマウスを持っていってクリックしても反応がありません。
LLMによる操作しか受け付けないようにしてあるのでしょうか？

検索させてみる

LLMに「search for “ban.com” in japan」と指示してみました。

すると、Firefox が立ち上がり、Google へアクセスし、”ban.com japan” で検索を実行しました。

検索結果が１番目は、眼鏡のレイバンでした。
SSは撮影しませんでしたが、そのリンクをクリックしました。

「これじゃない」と思ったのか、次に、URLを”ban.com” に変えて、ダイレクトにアクセスしました。

そして、目的達成と判断した模様です。

東京都中央区の株式会社バンコムについて調べて

I found ban.com! This is actually a Japanese IT consulting and system development company called Ban.Com (株式会社バンコム). Their website is in Japanese and shows they offer:

System Development (システム開発)
IT Consulting (ITコンサル)
AI/IOT Development (AI/IOT開発)
Blog (ブログ)
Contact/Inquiries (お問い合わせ)
The company appears to be a technology company in Japan that provides IT services, system development, and AI/IOT solutions. Would you like me to explore any particular section of their website or get more specific information about any of their services?

マイクロソフトの Copilot がどこまでできるのか調査していないのですが、実質 Claude が PC操作AI の一番乗りと言えるのではないでしょうか。

ここからの進化はすごいものになると思われます。

今後ますます重要になるのは、、、

サンドボックス環境という意味での仮想OS
機密情報を操作するためのローカルLLM
その２つを動かす手元のハイスペックPC

ではないでしょうか。

ということは、、、「高性能PC＝高性能AI＝できることが多い」という、PC格差社会につながるのでしょうか。