見出し画像

こんにちは!AIが賢くなるためのすごい技術、「RAG」について説明するよ! 〜 Cohere Cookbookのサンプルコードから


Cohere Cookbook(notebooks)で見つけたRAG(Retrieval-Augmented Generation)のサンプルコードの解説を、Gemini 1.5 ProGoogle AI Studio)を使い、「高校生向け」にアレンジしてもらいました。
なお、Geminiの回答をそのまま使わず、一部変更や追記などしています。

AIが賢くなる方法:RAGってなんだろう?

みんな、こんにちは!今日はAIが賢くなるためのすごい技術、「RAG」について説明するよ!RAGは「Retrieval-Augmented Generation」の略で、日本語にすると「情報検索で強化された生成」って意味なんだ。

RAGは、AIが持っている知識だけじゃなく、たくさんの文書から必要な情報を探し出して、より正確で、状況に合った答えを生成する技術だよ。

例えば、君が「(映画)デューン パート2(デューン 砂の惑星 PART2)の監督は誰?」ってAIに質問したとする。RAGを使えば、AIは映画「デューン パート2」に関するWikipediaの記事の中から、監督の名前が書かれている部分を探し出して、「ドゥニ・ヴィルヌーヴだよ」って答えてくれる。

RAGのすごいところ!

Cohereっていう会社のRAGは、さらにすごい機能を持ってるんだ。それは「正確な出典」を示してくれること!

AIがWikipediaから探してきた答えが、どの部分に書かれているかを教えてくれるんだ。だから、AIの答えが本当かどうか、自分で確認することができるんだ。

RAGはどうやって動くの?

RAGは、大きく分けて3つのステップで動いてる。

  1. インデックス作成と検索: まず、AIはたくさんの文書を細かく分けて、それぞれの部分に「インデックス」っていう番号を付けて整理する。細かく分けた文章をチャンクと呼ぶよ。そして、君が質問すると、AIはそのインデックスを使って、質問に関係ありそうな部分をすばやく探し出すんだ。

  2. 順位付け: AIは探し出したたくさんの部分の中から、質問に一番合っていそうな部分を、さらに詳しく調べて、順番を付け替えるんだ。これをリランク再ランクと呼ぶよ。

  3. 回答生成: AIは順位付けされた情報をもとに、最終的な答えを生成する。そして、CohereRAGは、その答えがどの文書のどの部分に基づいているかを、出典(引用先)として教えてくれることもできる。

例:デューン パート2の制作スタッフ

じゃあ、実際にRAGがどんな風に動くか、映画「デューン パート2」のWikipedia記事を例に見てみよう!

ステップ0:準備

まず、Pythonっていうプログラミング言語を使って、必要なツールを準備する。Cohereっていう会社のAIを使うために、APIキーっていう特別なパスワードが必要になるよ。

# Cohereっていう会社のAIを使うよ
# TODO: バージョンは"cohere>5"以上
!pip install cohere
import cohere
API_KEY = "..."  # CohereのAPIキーを入力する
co = cohere.Client(API_KEY)
# Wikipediaの記事を取得する
!pip install wikipedia --quiet
import wikipedia
# デューン パート2のWikipedia記事を取得する
article = wikipedia.page('Dune Part Two')
text = article.content
print(f"Wikipedia記事は約 {len(text.split())} 単語です。")
Wikipedia記事は約 5801 単語です。

ステップ1:インデックス作成と検索

Wikipediaの記事を約512語ごとのチャンク(塊)に分けて、それぞれのチャンクをAIが理解できる特別な数字の列(ベクトルデータ)に変換するんだ。これを「埋め込み」(エンベディング)っていうよ。そして、これらの埋め込みを「ベクトルデータベース」っていう特別な辞書に保存するんだ。

# チャンクに分けるためにlangchainっていうツールを使うよ
!pip install -qU langchain-text-splitters --quiet
from langchain_text_splitters import RecursiveCharacterTextSplitter
# チャンクに分ける設定をする
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    length_function=len,
    is_separator_regex=False,
)

# 重複を許容してチャンクに分ける
chunks_ = text_splitter.create_documents([text])
chunks = [c.page_content for c in chunks_]
print(f"Wikipedia記事は {len(chunks)} 個のチャンクに分割されました。")
Wikipedia記事は 110 個のチャンクに分割されました。
# それぞれのチャンクを埋め込みに変換する
model = "embed-english-v3.0"
response = co.embed(
    texts=chunks,
    model=model,
    input_type="search_document",
    embedding_types=['float']
)
embeddings = response.embeddings.float
print(f" {len(embeddings)} 個の埋め込みを計算しました。")
110 個の埋め込みを計算しました。
# 埋め込みをベクトルデータベースに保存する
!pip install numpy --quiet
import numpy as np
vector_database = {i: np.array(embedding) for i, embedding in enumerate(embeddings)}
# { 0: array([...]), 1: array([...]), 2: array([...]), ..., 10: array([...]) }

ステップ2:質問の埋め込みと検索

君が「デューン パート2の脚本、監督、製作に関わった全員の名前を教えて」って質問すると、AIはまず、この質問を埋め込みに変換する。そして、ベクトルデータベースの中から、質問の埋め込みに一番近いチャンクを10個探し出すんだ。

#query = "「デューン パート2」の脚本、監督、製作に関わった全員の名前を挙げてください。"
# ベクトルデータベースには、元が英文のデータが入っているので、ここは英語のままに変更
query = "Name everyone involved in writing the script, directing, and producing 'Dune: Part Two'?"

# 質問を埋め込みに変換する
response = co.embed(
    texts=[query],
    model=model,
    input_type="search_query",
    embedding_types=['float']
)
query_embedding = response.embeddings.float[0]
print("query_embedding: ", query_embedding)

# 質問に一番近いチャンクを10個探し出す
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

similarities = [cosine_similarity(query_embedding, chunk) for chunk in embeddings]
print("similarity scores: ", similarities)

sorted_indices = np.argsort(similarities)[::-1]
top_indices = sorted_indices[:10]
print("取得後の上位 10 個のチャンクのインデックスは次のとおりです。", top_indices)

top_chunks_after_retrieval = [chunks[i] for i in top_indices]
print("取得後の上位 10 個のチャンクは次のとおりです。")
for t in top_chunks_after_retrieval:
    print("== " + t)
query_embedding:  [-0.068603516, -0.02947998, -0.06274414, -0.015449524, -0.033294678, 0.0056877136, -0.047210693, 0.04714966, -0.024871826, 0.008148193, 0.0770874, 0.023880005, -0.058685303, -0.052520752, 0.012832642, 0.024398804, 0.0053215027, 0.035491943, 0.02961731, -0.0069847107, 0.01083374, -0.0011358261, -0.002199173, 0.018417358, 0.027389526, -0.002691269, -0.026535034, 0.015197754, 0.024368286, 0.03729248, 0.0057754517, -0.02229309, -0.014694214, 0.019989014, -0.0036315918, -0.013793945, 0.02835083, 0.006011963, 0.011428833, 0.008682251, 0.046142578, -0.040039062, -0.032196045, -0.002653122, -0.012580872, -0.0041618347, 0.03111267, -0.016799927, 0.014801025, -0.00030636787, -0.033050537, 0.033966064, -0.016021729, -0.025009155, -0.007534027, -0.017074585, 0.008415222, -0.10620117, 0.019195557, -0.015686035, -0.0043182373, -0.045440674, 0.05404663, 0.030776978, -0.014129639, -0.01499939, -0.007286072, 0.009933472, 0.06390381, 0.02444458, -0.010345459, 0.041931152, 0.032989502, -0.04522705, 0.056610107, 0.0068893433, -0.008911133, 0.012489319, 0.01675415, 0.020065308, 0.018753052, 0.022659302, -0.051849365, -0.04925537, 0.046325684, -0.005268097, 0.0026874542, -0.036712646, 0.009437561, -0.0037841797, -0.01473999, -0.034179688, -0.0011606216, 0.05026245, 0.0020771027, -0.016021729, -0.0044898987, 0.04168701, -0.015205383, 0.019210815, -0.012374878, -0.031311035, 0.03111267, -0.040100098, -0.016479492, 0.020446777, 0.010192871, 0.0037841797, -0.0023765564, 0.015220642, -0.016571045, -0.006454468, 0.037384033, -0.044555664, -0.008262634, 0.019546509, 0.009460449, 0.014701843, 0.02658081, -0.02078247, 0.015571594, 0.013153076, -0.010375977, 0.047912598, 0.005393982, -0.007911682, -0.019378662, 0.023529053, -0.0033550262, -0.04598999, -0.0052871704, 0.040252686, 0.011375427, 0.01550293, -0.004508972, 0.006515503, 0.003370285, -0.022766113, 0.00062561035, -0.0007596016, -0.0015277863, 0.0149002075, 0.061401367, 8.261204e-05, 0.06359863, -0.01537323, 0.007446289, 0.018814087, 0.02507019, 0.024215698, 0.006122589, 0.005886078, -0.03829956, 0.029037476, 0.07720947, 0.016921997, 0.022109985, 0.005958557, 0.028793335, 0.019485474, 0.015174866, 0.026153564, 0.032318115, 0.034210205, 0.027145386, -0.019515991, -0.018661499, 0.020477295, 0.008598328, -0.06573486, -0.037109375, 0.04043579, 0.030471802, -0.0010843277, 0.009757996, 0.026947021, 0.037017822, -0.018234253, -0.0115356445, 0.099365234, 0.027816772, -0.019927979, 0.0020961761, 0.013198853, -0.019073486, 2.7656555e-05, 0.041259766, 0.029510498, -0.016204834, 0.028137207, 0.039489746, 0.034698486, -0.03918457, -0.029418945, 0.02041626, 0.0073432922, -0.018569946, -0.009849548, 0.002861023, 0.030319214, -0.012886047, 0.014671326, -0.035827637, 0.007247925, -0.027709961, -0.022079468, 0.0012960434, 0.015426636, -0.01725769, 0.01525116, 0.025360107, -0.0077400208, -0.039916992, 0.029037476, -0.011154175, 0.007736206, -0.041748047, 0.05343628, 0.007286072, 0.0435791, 0.034301758, -0.047210693, 0.03552246, -0.015327454, 0.029922485, -0.018859863, 0.013053894, -0.028060913, 0.07757568, -0.020462036, 0.070739746, -0.010223389, 0.03604126, 0.02758789, -0.023284912, 0.012184143, 0.029144287, 0.023880005, -0.019378662, -0.0051116943, 0.0048675537, 0.01864624, -0.04397583, -0.007598877, 0.0713501, 0.0115737915, 0.002922058, 0.011619568, 0.017364502, 0.031921387, -0.0019664764, -0.008575439, 0.003484726, -0.09466553, 0.03475952, 0.026611328, -0.039520264, -0.0104522705, -0.005443573, -0.008392334, 0.012908936, 0.0043792725, -0.002456665, -0.028396606, -0.02027893, -0.0005569458, 0.027786255, 0.03427124, -0.0062332153, -0.018203735, 0.019241333, 0.07244873, -0.0028057098, 0.01234436, -0.0018787384, -0.027496338, 0.0015287399, -0.004032135, -0.013748169, -0.01878357, 0.0018053055, -0.01159668, 0.028213501, 0.004776001, 0.042388916, 0.0024280548, 0.017471313, -0.038085938, 0.026321411, 0.02973938, 0.06213379, 0.006401062, 0.036102295, -0.028121948, -0.00869751, -0.016693115, 0.029190063, 0.016784668, -0.008628845, 0.0039634705, -0.0035381317, 0.019500732, 0.025009155, -0.04547119, -0.003572464, 0.05215454, 0.067871094, -0.04257202, -0.02293396, -0.027175903, 0.05340576, 0.019226074, 0.039978027, 0.056121826, -0.028320312, -0.020217896, -0.035003662, 0.03225708, 0.028656006, 0.062347412, 0.12915039, -0.0137786865, 0.0022201538, -0.057434082, -0.04397583, -0.049865723, -0.013160706, -0.03353882, 0.006427765, -0.014823914, -0.008201599, -0.036346436, -0.037353516, -0.010528564, -0.015930176, -0.027572632, 0.0074272156, 0.004547119, -0.024414062, -0.018859863, -0.020095825, 0.029632568, -0.00067043304, -0.044036865, -0.0043411255, -0.005256653, -0.019195557, 0.022262573, -0.00020956993, -0.013877869, -0.011108398, -0.020324707, -0.015808105, -0.025039673, -0.009498596, 0.05090332, 0.0046195984, -0.017150879, 0.04309082, -0.029067993, 0.002670288, -0.00026249886, -0.032409668, -0.053100586, 0.012481689, -0.014633179, 0.0013475418, -0.034332275, 0.038330078, 0.014892578, -0.046936035, 0.021591187, -0.020385742, -0.0052604675, 0.02796936, 0.0014333725, 0.012077332, -0.0118255615, -0.005569458, 0.008491516, 0.009841919, 0.0031318665, -0.003408432, -0.007144928, 0.040374756, -0.0038928986, 0.005279541, -0.008415222, 0.031707764, 0.0140686035, -0.015029907, -0.02810669, -0.0078125, -0.030853271, -0.03201294, 0.021316528, -0.036193848, -0.0423584, 0.0072784424, 0.014801025, 0.0019607544, -0.012367249, -0.009056091, -0.021438599, -0.02645874, 0.038726807, -0.007549286, 0.0049591064, 0.019012451, 0.017791748, -0.009185791, 0.04006958, 0.003107071, -0.0075302124, -0.010375977, -0.009246826, -0.02130127, -0.0056762695, -0.0076789856, 0.010009766, -0.010536194, 0.041107178, 0.0021133423, 0.029891968, 0.01626587, 0.042236328, -0.02784729, -0.032836914, 0.0317688, 0.045715332, 0.000116825104, 0.028030396, 0.007205963, 0.012512207, -0.035583496, -0.048034668, -0.023529053, -0.04953003, 0.0345459, -0.048339844, -0.060272217, -0.004512787, 0.04425049, 0.0076141357, 0.029510498, 0.007396698, 0.003353119, -0.038726807, 0.07183838, -0.026901245, -0.023529053, -0.038085938, 0.068725586, 0.018096924, -0.013534546, 0.05883789, -0.016113281, 0.017944336, 0.041046143, 0.022918701, 0.036499023, 0.015296936, -0.04916382, 0.0075683594, -0.011390686, 0.009735107, -0.0070152283, 0.003129959, -0.032562256, 0.0003478527, -0.0036640167, -0.006893158, -0.016098022, -0.034332275, 0.037750244, -0.010269165, 0.016494751, -0.02394104, 0.03753662, -0.022644043, -0.0008234978, 0.001001358, -0.048217773, 0.04989624, 0.0078125, 0.0044937134, 0.027038574, 0.04736328, -0.02973938, -0.011726379, 0.01348114, 0.021408081, 0.00844574, -0.03741455, -0.015686035, -0.040893555, 0.001452446, -0.025405884, 0.07348633, 0.038238525, -0.019958496, 0.023071289, -0.016403198, -0.08105469, 0.0071029663, -0.019088745, 5.8174133e-05, -0.005569458, 0.01399231, 0.02255249, 0.011222839, 0.00028824806, 0.0066184998, 0.0017499924, -0.009864807, -0.0115737915, 0.053100586, 0.0065231323, 0.001865387, -0.026428223, 0.03692627, 0.025390625, 0.022613525, 0.018722534, 0.007675171, -0.03439331, 0.041625977, -0.01789856, -0.041046143, 0.0051460266, 0.04144287, 0.048553467, 0.054595947, -0.01108551, -0.033935547, -0.026275635, -0.0118255615, -0.021362305, -0.009841919, -0.00724411, 0.028900146, 0.009887695, -0.023803711, 0.016311646, 0.018798828, -0.03668213, 0.046844482, 0.010696411, -0.014717102, -0.008110046, -0.004589081, -0.0028076172, -0.050811768, -0.017196655, -0.03491211, 0.0074005127, -0.038909912, 0.032440186, -0.034362793, -0.008682251, 0.032928467, -0.04626465, -0.009666443, 0.018951416, 0.031951904, -0.003791809, 0.02015686, -0.05532837, -0.005683899, -0.00054216385, -0.0034332275, 0.008659363, 0.02130127, -0.038879395, -0.0033397675, -0.03866577, -0.0049934387, 0.017944336, 0.001496315, 0.019485474, -0.004348755, 0.00046491623, 0.0007157326, 0.035614014, -0.027694702, 0.03692627, -0.008491516, 0.0524292, -0.016662598, -0.0017795563, -0.021575928, -0.018753052, -0.049346924, -0.06652832, 0.04272461, 0.03186035, 0.0011978149, 0.03463745, 0.024002075, 0.02607727, 0.020446777, 0.0256958, 0.026855469, 0.0074005127, -0.067993164, 0.017944336, -0.0039482117, 0.05496216, -0.041412354, 0.014175415, 0.02444458, -0.026412964, 0.057403564, -0.026779175, 0.023254395, 0.03945923, 0.033569336, -0.030258179, -0.039093018, -0.036468506, 0.017105103, 0.009635925, 0.025497437, 0.04156494, -0.02571106, -0.0010414124, -0.005630493, -0.016448975, -0.026733398, 0.001326561, -0.042022705, 0.0012521744, -0.041259766, -0.12182617, -0.03857422, 0.12548828, -0.005947113, -0.020736694, -0.0033855438, 0.03778076, -0.033813477, 0.038970947, 0.003921509, 0.011810303, 0.031982422, -0.032562256, -0.002653122, -0.025009155, -0.03805542, -0.016998291, 0.018173218, 0.0158844, 0.0011739731, 0.048217773, -0.020401001, 0.044708252, -0.017318726, 0.014457703, -0.041809082, 0.010543823, 0.041931152, 0.076293945, -0.054779053, 0.060272217, -0.046936035, 0.02949524, 0.00554657, 0.041534424, -0.013046265, -0.056152344, 0.010406494, 0.02973938, -0.023727417, -0.022476196, -0.024734497, -0.013168335, 0.060424805, 0.011787415, 0.018997192, -0.043426514, -0.00077724457, -0.010154724, 0.017150879, -0.01171875, -0.022476196, 0.0034255981, -0.0026454926, 0.004837036, -0.0043296814, 0.02619934, -0.021560669, -0.039733887, -0.022415161, -0.06817627, -0.023223877, -0.018585205, -0.015319824, 0.012588501, 0.0064353943, -0.013748169, 0.043304443, 0.002626419, -0.029373169, -0.016784668, -0.026184082, 0.05847168, 0.034179688, 0.03842163, -0.05493164, -0.017486572, 0.016540527, 0.03164673, 0.089904785, 0.013534546, -0.07684326, -0.024108887, 0.07434082, 0.030395508, 0.007091522, 0.07373047, 0.012527466, -0.010856628, -0.01828003, -0.045196533, 0.00065279007, -0.0637207, 0.010726929, 0.023880005, -0.0030708313, -0.012298584, 0.027236938, -0.04928589, 0.023071289, 0.008674622, -0.023529053, -0.015838623, -0.010543823, 0.012168884, 0.014854431, -0.05834961, -0.06088257, -0.012313843, 0.035461426, 0.02027893, 0.019348145, -0.014602661, -0.02104187, -0.0309906, 0.001405716, -0.019973755, -0.00157547, -0.003944397, 0.0009326935, -0.02078247, -0.015731812, -0.044433594, 0.03390503, 0.057159424, 0.018585205, -0.023895264, -0.0057029724, 0.0049552917, 0.013412476, 0.022399902, 0.010154724, 0.0519104, 0.06591797, 0.018341064, 0.012161255, -0.05810547, -0.043304443, -0.031173706, 0.0023860931, -0.003944397, 0.11425781, -0.031036377, 0.019989014, -0.038635254, -0.025939941, 0.035064697, 0.041168213, 0.03161621, -0.069885254, -0.04537964, 0.028945923, -0.023162842, 0.019226074, -0.028442383, 0.015594482, -0.019256592, -0.0046463013, 0.034240723, 0.009124756, 0.05718994, 0.031219482, 0.02154541, 0.009590149, 0.00076818466, 0.04849243, -0.029129028, -0.03375244, -0.023391724, -0.028381348, -0.029708862, -0.0132369995, 0.010353088, 0.020263672, -0.030807495, 0.01007843, -0.03704834, 0.023376465, -0.03665161, 0.03741455, 0.015144348, 0.057281494, 0.03137207, 0.048431396, 0.021194458, 0.008110046, -0.03540039, -0.015312195, 0.022384644, 0.0065956116, 0.008056641, 0.0018348694, -0.009246826, 0.030380249, 0.0003862381, 0.0051841736, 0.04486084, 0.017807007, 0.0026130676, 0.07977295, 0.05419922, 0.062194824, 0.02633667, 0.024841309, -0.041625977, -0.005897522, 0.04031372, -0.055908203, 0.0026226044, -0.05340576, -0.05496216, 0.011474609, -0.006954193, -0.013122559, 0.019714355, -0.07159424, 0.031173706, 0.0034255981, -0.0034103394, 0.0440979, 0.011779785, -0.007827759, -0.03173828, -0.020950317, -0.030166626, -0.035308838, 0.030792236, 0.04525757, -0.028701782, -0.011100769, -0.02331543, -0.0357666, -0.025680542, 0.0011911392, 0.01940918, 0.05706787, 0.028381348, 0.007133484, -0.07733154, -0.007686615, 0.03869629, 0.0066833496, 0.008842468, 0.03439331, -0.014282227, 0.0357666, -0.004737854, -0.039794922, -0.0070381165, 0.02670288, 0.0107421875, 0.016189575, -0.06555176, -0.0138549805, 0.0008363724, -0.016693115, 0.006904602, -0.020263672, -0.030426025, 0.008453369, -0.046173096, -0.01802063, -0.013595581, -0.0044288635, -0.0039978027, -0.0044898987, 0.0007619858, 0.003921509, 0.0053977966, 0.020385742, -0.012329102, -0.023803711, -0.0057525635, 0.038330078, -0.014549255, -0.06298828, -0.047607422, 0.039245605, -0.06781006, -0.035217285, -0.009056091, 0.019927979, -0.003932953, -0.020309448, -0.017044067, 0.018127441, -8.624792e-05, -0.043182373, 0.009590149, 0.035308838, 0.031951904, 0.0011615753, -0.042022705, 0.079956055, 0.026687622, 0.013542175, -0.0074157715, -0.00983429, -0.0022563934, 0.07373047, 0.059387207, 0.03488159, 0.0071372986, -0.06427002, -0.0546875, -0.02482605, 0.11071777, -0.021072388, 0.01626587, -0.049713135, 0.061553955, -0.016860962, 0.051971436, -0.012962341, -0.0011711121, -0.014198303, -0.0061149597, -0.005836487, 0.00022387505, -0.027618408, 0.019836426, 0.009933472, 0.02368164, -0.020309448, -0.0049591064, -0.008628845, -0.03253174, -0.017684937, 0.02468872, -0.0023498535, 0.01448822, 0.061920166, 0.031707764, -0.0026416779, -0.040985107, -0.06335449, -0.036071777, 0.05404663, -0.0044136047, -0.0146102905, -0.0033416748, 0.028671265, -0.012771606, -0.0016565323, -0.0038909912, -0.02407837, -0.009857178, 0.0014467239, -0.008720398, -0.006011963, 0.032073975, -0.033325195, 0.014862061, -0.017227173, -0.018753052, -0.0060424805, 0.022567749, -0.017654419, -0.017562866, -0.07244873, -0.0881958, 0.050476074, 0.02609253, -0.032409668, 0.07458496, 0.009399414, 0.009117126, -0.031051636, -0.03451538, -0.004219055, -0.05718994, 0.020080566, -0.025421143, -0.010948181, 0.06341553, -0.009231567, -0.021697998, -0.009719849, 0.012802124, -0.020370483, 0.0034389496, 0.018859863, -0.025680542, 0.0013141632, 0.068603516, -0.021026611, 0.021881104, -0.0395813, -0.0019073486, 0.0056037903, -0.032348633]
similarity scores:  [0.6888614728034467, 0.38807630598882886, 0.6864932971056473, 0.3145181964299242, 0.442078683062769, 0.24433414846393067, 0.3938488673501403, 0.26937667153512507, 0.324877362611786, 0.29945251635232917, 0.4268562640029571, 0.1650776631876737, 0.3955135926305946, 0.3608076873975517, 0.40399861875637233, 0.32108531352844805, 0.3078635418092434, 0.3115117311383686, 0.2318405340110581, 0.497200723722872, 0.3429814421538967, 0.2880879146891748, 0.5778721109218494, 0.5488335331337804, 0.779218253147028, 0.5225242639743739, 0.548605934137845, 0.7138791220914995, 0.5218311098897219, 0.5883875559111212, 0.2652907259202192, 0.6407981509288144, 0.5369446831119411, 0.6821327220127418, 0.3900514608549773, 0.4824043505431881, 0.4509175921745556, 0.24611453192320207, 0.4437520861018922, 0.3915032292959836, 0.20537145812198718, 0.43674179465662816, 0.3752260732974342, 0.46209230119742634, 0.30051869114968277, 0.36398118915824396, 0.3599629755871494, 0.39313887429683014, 0.23078160813132376, 0.46469915300898595, 0.1192104345902549, 0.4219441113173397, 0.3657339461780988, 0.26723960455954543, 0.36590373400047865, 0.3793129380955623, 0.4597805153285417, 0.4386963427163813, 0.20099276101769917, 0.42205595858589023, 0.4389270497287293, 0.44997600356054845, 0.12989595058301706, 0.382485967835739, 0.14284631817675886, 0.5381588054138154, 0.2993911919256643, 0.41184704224028057, 0.15718926471912928, 0.5036874368267206, 0.34130706533750865, 0.45205644243417215, 0.35910133477313283, 0.39084148293414417, 0.4884912011906411, 0.29571146420161115, 0.48806818576454597, 0.26011552582390285, 0.48287266346370283, 0.18746259491912476, 0.5302934235121479, 0.5187959402453634, 0.548833586546152, 0.4982511774377252, 0.19678164222410616, 0.42090953935525727, 0.5109476510730934, 0.09576542033221576, 0.28658537396864425, 0.3161090940252121, 0.21741811696156443, 0.13498921419105983, 0.4601385378950616, 0.14075267420356233, 0.2057731158935257, 0.5826005548700579, 0.37149780980661185, 0.5832486106225133, 0.39360996081741395, 0.38038712529345053, 0.2876443697171133, 0.31229632610373737, 0.3484125556600515, 0.5088608726238131, 0.141571860393973, 0.46373321556972713, 0.4444572303615444, 0.29827035623029086, 0.5114788183351054, 0.4100429283012361]
取得後の上位 10 個のチャンクのインデックスは次のとおりです。 [24 27  0  2 33 31 29 97 95 22]
取得後の上位 10 個のチャンクは次のとおりです。
== Dune: Part Two was produced by Villeneuve, Mary Parent, and Cale Boyter, with Tanya Lapointe, Brian Herbert, Byron Merritt, Kim Herbert, Thomas Tull, Jon Spaihts, Richard P. Rubinstein, John Harrison, and Herbert W. Gain serving as executive producers and Kevin J. Anderson as creative consultant. Legendary CEO Joshua Grode confirmed in April 2019 that they plan to make a sequel, adding that "there's a logical place to stop the [first] movie before the book is over".
== On October 26, 2021, Legendary officially greenlit Dune: Part Two, with a spokesperson for the company stating, "We would not have gotten to this point without the extraordinary vision of Denis and the amazing work of his talented crew, the writers, our stellar cast, our partners at Warner Bros., and of course the fans! Here's to more Dune." Production work had occurred back-to-back with the first film, as Villeneuve and his wife Lapointe immediately took a flight to Budapest in order to begin
== Dune: Part Two is a 2024 American epic science fiction film directed and produced by Denis Villeneuve, who co-wrote the screenplay with Jon Spaihts. The sequel to Dune (2021), it is the second of a two-part adaptation of the 1965 novel Dune by Frank Herbert. It follows Paul Atreides as he unites with the Fremen people of the desert planet Arrakis to wage war against House Harkonnen. Timothée Chalamet, Zendaya, Rebecca Ferguson, Josh Brolin, Dave Bautista, Stellan Skarsgård, Charlotte Rampling, and Javier
== Development began after Legendary Entertainment acquired film and television rights for the Dune franchise in 2016. Villeneuve signed on as director in 2017, intending to make a two-part adaptation of the novel due to its complexity. Production contracts were only secured for the first film, with the second film having to be greenlit based on the first's success. After the critical and commercial success of the first film, Legendary green-lit Dune: Part Two in October 2021. Principal photography took place
== Between the release of Dune and the confirmation of Dune: Part Two, Villeneuve started working the script in a way that production could begin immediately once the film was greenlit. By February 2021, Roth created a full treatment for the sequel, with writing beginning that August. He confirmed that Feyd-Rautha would appear in the film, and stated he will be a "very important character". In March 2022, Villeneuve had mostly finished writing the screenplay. Craig Mazin and Roth wrote additional literary
== Eric Roth was hired to co-write the screenplay in April 2017 for the Dune films, and Jon Spaihts was later confirmed to be co-writing the script alongside Roth and Villeneuve. Game of Thrones language creator David Peterson was confirmed to be developing languages for the film in April 2019. Villeneuve and Peterson had created the Chakobsa language, which was used by actors on set. In November 2019, Spaihts stepped down as show-runner for Dune: Prophecy to focus on Dune: Part Two. In June 2020, Greig
== theatrical experience is at the very heart of the cinematic language for me". With Dune: Part Two being greenlit, Villeneuve said that his primary concern was to complete the filming as soon as possible, with the earliest he expected to start in the last quarter of 2022. He noted that production would be expedited by the work already done for the first film.
== Richard Roeper, writing for the Chicago Sun-Times, gave the film three stars out of four, praising the technical and narrative aspects, saying, "Even as we marvel at the stunning and immersive and Oscar-level cinematography, editing, score, visual effects, production design and sound in Denis Villeneuve's Dune: Part Two, we're reminded at every turn that this is an absolutely bat-bleep [sic] crazy story."
== The film "largely received rave reviews from critics", and was praised for its visual effects and cast performances. Some reviews considered it one of the best science fiction films of all time. On the review aggregator website Rotten Tomatoes, 92% of 428 critics' reviews are positive, with an average rating of 8.3/10. The website's consensus reads: "Visually thrilling and narratively epic, Dune: Part Two continues Denis Villeneuve's adaptation of the beloved sci-fi series in spectacular form." Metacritic,
== In November 2016, Legendary Pictures obtained the film and TV rights for the Dune franchise, based on the eponymous 1965 novel by Frank Herbert. Vice chair of worldwide production for Legendary Mary Parent began discussing with Denis Villeneuve about directing a film adaptation, quickly hiring him after realizing his passion for Dune. In February 2018, Villeneuve was confirmed to be hired as director, and intended to adapt the novel as a two-part film series. Villeneuve ultimately secured a two-film deal

ステップ3:順位付け

AIは探し出した10個のチャンクを、さらに詳しく調べて、質問に一番合っていそうな3つのチャンクに絞り込むことができる。

response = co.rerank(
    query=query,
    documents=top_chunks_after_retrieval,
    top_n=3,
    model="rerank-english-v2.0",
)

# 元のコード top_chunks_after_rerank = [result.document['text'] for result in response]
# ではエラーが出るのでChatGPTが教えてくれた以下に変更
results = response.results
top_chunks_after_rerank = [top_chunks_after_retrieval[result.index] for result in results]

print("順位付け変更後の上位 3 つのチャンクは次のとおりです。")
for t in top_chunks_after_rerank:
    print("== " + t)
順位付け変更後の上位 3 つのチャンクは次のとおりです。
== Dune: Part Two is a 2024 American epic science fiction film directed and produced by Denis Villeneuve, who co-wrote the screenplay with Jon Spaihts. The sequel to Dune (2021), it is the second of a two-part adaptation of the 1965 novel Dune by Frank Herbert. It follows Paul Atreides as he unites with the Fremen people of the desert planet Arrakis to wage war against House Harkonnen. Timothée Chalamet, Zendaya, Rebecca Ferguson, Josh Brolin, Dave Bautista, Stellan Skarsgård, Charlotte Rampling, and Javier
== Dune: Part Two was produced by Villeneuve, Mary Parent, and Cale Boyter, with Tanya Lapointe, Brian Herbert, Byron Merritt, Kim Herbert, Thomas Tull, Jon Spaihts, Richard P. Rubinstein, John Harrison, and Herbert W. Gain serving as executive producers and Kevin J. Anderson as creative consultant. Legendary CEO Joshua Grode confirmed in April 2019 that they plan to make a sequel, adding that "there's a logical place to stop the [first] movie before the book is over".
== Between the release of Dune and the confirmation of Dune: Part Two, Villeneuve started working the script in a way that production could begin immediately once the film was greenlit. By February 2021, Roth created a full treatment for the sequel, with writing beginning that August. He confirmed that Feyd-Rautha would appear in the film, and stated he will be a "very important character". In March 2022, Villeneuve had mostly finished writing the screenplay. Craig Mazin and Roth wrote additional literary

ステップ4:回答生成

AIは3つのチャンクから必要な情報を読み取って、最終的な答えを生成するよ。

# 回答生成の設定
preamble = """
## タスクとコンテキスト
あなたは、人々の質問やその他の要求にインタラクティブに答えるのを手伝います。あなたは、あらゆる種類のトピックについて、非常に幅広い要求をされます。あなたは、答えを調査するために使用する、幅広い検索エンジンまたは同様のツールを備えています。あなたは、幅広いユーザーのニーズにできる限り応えることに集中する必要があります。

## スタイルガイド
ユーザーが異なるスタイルの回答を求めない限り、適切な文法とスペルを使用して、完全な文章で回答する必要があります。
"""
documents = [
    {"title": "チャンク 0", "snippet": top_chunks_after_rerank[0]},
    {"title": "チャンク 1", "snippet": top_chunks_after_rerank[1]},
    {"title": "チャンク 2", "snippet": top_chunks_after_rerank[2]},
  ]

# 回答を生成する
response = co.chat(
  message=query,
  documents=documents,
  preamble=preamble,
  model="command-r",
  temperature=0.3
)

print("最終的な回答:")
print(response.text)
最終的な回答:
Here's a list of everyone involved in writing the script, directing, and producing 'Dune: Part Two'.
- Denis Villeneuve (director and producer)
- Jon Spaihts (screenwriter)
- Denis Villeneuve (co-writer)
- Mary Parent (producer)
- Cale Boyter (producer)
- Tanya Lapointe (executive producer)
- Brian Herbert (executive producer)
- Byron Merritt (executive producer)
- Kim Herbert (executive producer)
- Thomas Tull (executive producer)
- Richard P. Rubinstein (executive producer)
- John Harrison (executive producer)
- Herbert W. Gain (executive producer)
- Joshua Grode (Legendary CEO)
- Kevin J. Anderson (creative consultant)

The involvement of some individuals in the aforementioned roles was also confirmed, such as Eric Roth. He worked on the script with Denis Villeneuve, but it is unclear whether he is a screenwriter or additional writer. Craig Mazin also contributed to the writing, but it is unclear what type of writing it was.

AIの世界では、大量の画像や文章を事前に学習した「賢い」モデルがあるんだ。イメージとしては、膨大な量の教科書や写真集をすでに勉強済みのスーパーマンみたいな感じかな!
model="command-r"でそのモデルを使えるようにしているんだよ。

例えば、写真の中に何が写っているかを見分けたり、英語を日本語に翻訳したり、いろんなことができる。

これらのモデルは、ゼロから勉強するんじゃなくて、すでに賢い状態からスタートするから「事前学習済み」って呼ばれてるんだ。事前に勉強してるから、写真を見分けるコツや翻訳のコツを最初から知ってる状態なんだね。すごいよね!

別のモデルを使う場合には、例えばmodel="command-r-plus"と書き換えるだけでOK。答えが変わるよ。

最終的な回答:
## Scriptwriters
- Denis Villeneuve
- Jon Spaihts
- Roth
- Craig Mazin 

## Director
- Denis Villeneuve

## Producers
- Denis Villeneuve
- Mary Parent
- Cale Boyter
- Tanya Lapointe
- Brian Herbert
- Byron Merritt
- Kim Herbert
- Thomas Tull
- Jon Spaihts
- Richard P. Rubinstein
- John Harrison
- Herbert W. Gain

ステップ5:出典の表示

CohereRAGは、生成した答えがWikipediaのどの部分に基づいているか、出典を教えてくれるんだ。

print("最終的な回答を裏付ける参照:")
for cite in response.citations:
  print(cite)
最終的な回答を裏付ける参照:
start=103 end=119 text='Denis Villeneuve' document_ids=['doc_0']
start=120 end=129 text='(director' document_ids=['doc_0']
start=134 end=142 text='producer' document_ids=['doc_0']
start=146 end=157 text='Jon Spaihts' document_ids=['doc_0']
start=158 end=172 text='(screenwriter)' document_ids=['doc_0']
start=175 end=191 text='Denis Villeneuve' document_ids=['doc_0']
start=192 end=203 text='(co-writer)' document_ids=['doc_0']
start=206 end=217 text='Mary Parent' document_ids=['doc_1']
start=218 end=228 text='(producer)' document_ids=['doc_1']
start=231 end=242 text='Cale Boyter' document_ids=['doc_1']
start=243 end=253 text='(producer)' document_ids=['doc_1']
start=256 end=270 text='Tanya Lapointe' document_ids=['doc_1']
start=271 end=291 text='(executive producer)' document_ids=['doc_1']
start=294 end=307 text='Brian Herbert' document_ids=['doc_1']
start=308 end=328 text='(executive producer)' document_ids=['doc_1']
start=331 end=344 text='Byron Merritt' document_ids=['doc_1']
start=345 end=365 text='(executive producer)' document_ids=['doc_1']
start=368 end=379 text='Kim Herbert' document_ids=['doc_1']
start=380 end=400 text='(executive producer)' document_ids=['doc_1']
start=403 end=414 text='Thomas Tull' document_ids=['doc_1']
start=415 end=435 text='(executive producer)' document_ids=['doc_1']
start=438 end=459 text='Richard P. Rubinstein' document_ids=['doc_1']
start=460 end=480 text='(executive producer)' document_ids=['doc_1']
start=483 end=496 text='John Harrison' document_ids=['doc_1']
start=497 end=517 text='(executive producer)' document_ids=['doc_1']
start=520 end=535 text='Herbert W. Gain' document_ids=['doc_1']
start=536 end=556 text='(executive producer)' document_ids=['doc_1']
start=559 end=571 text='Joshua Grode' document_ids=['doc_1']
start=572 end=587 text='(Legendary CEO)' document_ids=['doc_1']
start=590 end=607 text='Kevin J. Anderson' document_ids=['doc_1']
start=608 end=629 text='(creative consultant)' document_ids=['doc_1']
start=723 end=733 text='Eric Roth.' document_ids=['doc_2']
start=737 end=767 text='wrote additional literary work' document_ids=['doc_2']
start=778 end=789 text='Craig Mazin' document_ids=['doc_2']
start=794 end=810 text='Denis Villeneuve' document_ids=['doc_2']

これは、AIが回答を生成する時に使った情報源(出典)を表示しているところだよ!
例えば、レポートを書く時に参考文献を載せるよね?それと同じように、AIもしっかりと「この情報はここから持ってきました!」ってことを示してくれるんだ。

AIが「デューン パート2の監督はドゥニ・ヴィルヌーヴです」と答えたとしたら、その後に「この情報は、Wikipediaの○○という記事の△△行目に書いてありました!」と教えてくれるんだ。

AIの回答が、どこから来た情報なのかがわかるから、より信頼できるよね!

次は、AIが生成した回答に、情報源(出典)を直接埋め込んで見やすく表示するためのプログラムだよ!

def insert_citations_in_order(text, citations):
    """
    情報源を見やすく表示するための補助関数です。
    """
    offset = 0  # 埋め込みによるテキストの長さ変化を調整するためのオフセット
    document_id_to_number = {}  # ドキュメントIDと参照番号の対応表
    citation_number = 0  # 参照番号のカウンタ
    modified_citations = []  # 修正された情報源のリスト(今回は使用しない)

    # 情報源を処理し、一意のドキュメントIDに基づいて番号を割り当てる
    for citation in citations:
        # 情報源がChatCitationオブジェクトかどうかを確認する
        if hasattr(citation, "start") and hasattr(citation, "end"):
            start, end = citation.start + offset, citation.end + offset  # 情報源の開始位置と終了位置をオフセットで調整
        else:
            # ChatCitationオブジェクトでない場合の処理(今回はスキップ)
            continue

        placeholder = f'[{citation_number + 1}]'  # 参照番号のプレースホルダーを作成
        # 引用されたテキストを太字にしてプレースホルダーを追加する
        modification = f'**{text[start:end]}**{placeholder}'
        # 引用されたテキストを太字にしたバージョン + プレースホルダーに置き換える
        text = text[:start] + modification + text[end:]
        # 後続の置換のためにオフセットを更新する
        offset += len(modification) - (end - start)

    return text  # 情報源を埋め込んだ新しい文章を返す

# 使用例
response_text = response.text  # AIの回答テキスト
response_citations = response.citations  # AIの回答の情報源
print(insert_citations_in_order(response_text, response_citations))  # 情報源を埋め込んだ文章を表示

これは、「デューン パート2」の監督と脚本家を、情報源への参照番号付きで示してくれる。

Here's a list of everyone involved in writing the script, directing, and producing 'Dune: Part Two'.
- **Denis Villeneuve**[1] **(director**[1] and **producer**[1])
- **Jon Spaihts**[1] **(screenwriter)**[1]
- **Denis Villeneuve**[1] **(co-writer)**[1]
- **Mary Parent**[1] **(producer)**[1]
- **Cale Boyter**[1] **(producer)**[1]
- **Tanya Lapointe**[1] **(executive producer)**[1]
- **Brian Herbert**[1] **(executive producer)**[1]
- **Byron Merritt**[1] **(executive producer)**[1]
- **Kim Herbert**[1] **(executive producer)**[1]
- **Thomas Tull**[1] **(executive producer)**[1]
- **Richard P. Rubinstein**[1] **(executive producer)**[1]
- **John Harrison**[1] **(executive producer)**[1]
- **Herbert W. Gain**[1] **(executive producer)**[1]
- **Joshua Grode**[1] **(Legendary CEO)**[1]
- **Kevin J. Anderson**[1] **(creative consultant)**[1]

The involvement of some individuals in the aforementioned roles was also confirmed, such as **Eric Roth.**[1] He **wrote additional literary work**[1] alongside **Craig Mazin**[1] and **Denis Villeneuve**[1], although it is unclear whether he is a screenwriter or additional writer alongside them.
  • Denis Villeneuve[1] (director[1] and producer[1]) : ドゥニ・ヴィルヌーヴが監督兼プロデューサーであることを示していて、その情報は全て情報源[1]から来ていることを意味している。

  • Jon Spaihts[1] (screenwriter)[1] : ジョン・スペイツが脚本家であることを示していて、その情報は情報源[1]から来ていることを意味している。

このように、人名や役職などの情報ごとに情報源がわかるようになっているんだね。
今回の場合、情報源[1]はWikipediaの「デューン パート2」の記事ということになるよ。

情報源が明記されていることで、AIの回答の信頼性が高まるだけでなく、さらに詳しく知りたい時に、どの資料を見ればいいのかがすぐにわかる、というメリットもあるんだ。

これで、AIは君が知りたい情報を、たくさんの文書の中から探し出して、正確な答えを生成することができるようになったんだ!

まとめ

RAGは、AIがより賢く、より正確な答えを生成するためのすごい技術なんだ!CohereのRAGは、出典も示してくれるから、AIの答えが信頼できるかどうか、自分で確認することもできるよ!

どうだったかな?RAGについて、少しでも理解してもらえたら嬉しいな!

この記事が気に入ったらサポートをしてみませんか?