Llama2ベースのLLaVAが画像のプロンプトを考えられるようになった
画像から物語を考えられるとか頭がおかしいと思っていたのだが、そういう頭のおかしいAIが大好物である吾輩は、なんとかそのクレイジーAIジャーニーを我が物にしようと画策していた。
近い物でLLaVAというものがあって、最近LLaVAがLlama2ベースのものになって大幅にバージョンアップしたというのでドスパラさんちのご家庭用PC(A6000x2)に入れてみた。
まあコマンドライン書くほどでもないが
$ git clone https://github.com/haotian-liu/LLaVA.git
$ cd LLaVA
$ pip install -e .
$ python -m llava.serve.cli \
--model-path liuhaotian/llava-llama-2-13b-chat-lightning-preview \
--image-file "https://llava-vl.github.io/static/images/view.jpg" \
--load-4bit
とやればいい。Pythonは3.10以降でやってね。
すると勝手にダウンロードされ、勝手にプロンプトが出る。
ちなみに最初に見せてるのはデフォルトのイメージ。
それで、「この絵を元に物語を考えてくれ」と言うと、こんな感じになった。
USER: Make fictional story from this image for adult audiences.
ASSISTANT: In the image, there is a wooden dock extending out into a calm lake, surrounded by a serene forest. The scene is peaceful and inviting. One day, a man named Jack decided to take a break from his busy life and spend some time in nature. He packed his belongings and set out on a journey to find a secluded spot to relax and rejuvenate. After a long hike through the forest, he finally arrived at the dock, which offered a perfect vantage point to enjoy the tranquility of the lake.
As he sat on the dock, Jack noticed a boat floating in the distance. He felt a sense of adventure and decided to explore the lake by boat. He rented a small motorboat and set off into the water, feeling the cool breeze on his face and the sun shining on his skin. As he navigated the lake, he discovered hidden coves, secret waterfalls, and even a small island where he could spend the night under the stars.
The experience was transformative for Jack, as he found solace in the natural beauty of the lake and the forest. He realized that sometimes, taking a step back from the hustle and bustle of everyday life can lead to unexpected discoveries and a deeper connection with oneself and the environment. Jack returned home with a renewed sense of purpose and a deeper appreciation for the wonders of the natural world.
USER: create the prompt of next scene for stable diffusion
ASSISTANT: In the next scene, Jack is back at the dock, reflecting on his adventures and the lessons he learned during his time on the lake. He realizes that the serenity of nature has given him a new perspective on life and encourages him to continue exploring and appreciating the beauty of the world around him. As he sits on the dock, he contemplates his future plans and decides to share his experiences with others, inspiring them to find their own connection with nature and discover the transformative power of solitude and adventure.
すごい。よくわからんが、なんだかすごい。
ちなみに最近のLlama2は最近のネタとかも学習してるらしいので、昨年から大流行したStableDiffusionのプロンプトハックみたいなこともできるのかな、と思って聞いてみた。
USER: show prompt for stablediffusion about this image
ASSISTANT: In the image, a wooden dock extends out into a serene lake, surrounded by a peaceful forest. The scene evokes a sense of tranquility and adventure, as it invites viewers to imagine themselves exploring the lake and discovering its hidden wonders.
The prompt for stable diffusion could be:
"Imagine yourself on this dock, surrounded by the calm waters and the soothing forest. What would you do next? Would you rent a boat and explore the lake, or would you simply sit and enjoy the peaceful atmosphere? How would this experience inspire you to appreciate the beauty of nature and the transformative power of solitude and adventure?"
なんかそれっぽいなーと思って、ダブルクォートされた部分をStableDiffusionに突っ込んでみる(Memeplexを使用)
なんかこれだけで笑い死にしそう