ð» LLMã®ã¬ããŒãçæã®åºæ¬æ§ç¯ãåºããã©ã°ãè¶ ããŠðã
ãã®èšäºã§ã¯ãRAG (æ€çŽ¢åŒ·åçæ) ããå€ãããäžè¬çãªè³ªåå¿çãããããã«é²åããããèªåçãªã¬ããŒãçæã·ã¹ãã ããžãšæšãé²ããæ¹æ³ã解説ããŸãðâš
ç®æ¡æž
æ¢ã«åŸæ¥ã®RAGå®è£ ã¯ã倧åãåçŽãªè³ªåå¿çã«éãããŠããŸããäŸãã°ãäžè¬çãªRAGãã£ãããããã¯ã人éãã¬ã¹ãã³ã¹ãèªãã ããæ å ±ãçµ±åãããããŠæçµçãªè³æãåæã®äœæãããŠããã®ãå®æ ã§ãã ãããããããAIã®åã§ãã£ãšé²åãããããšã¯å¯èœãªã®ã§ã¯ãªãã§ããããïŒð
ãã®èšäºã§ã¯ããããå¯èœã«ããæ¹æ³ã解説ããŠããŸãã
ã¬ããŒãçæã®æŠå¿µ
ã¬ããŒãçæã¯ãRAGã·ã¹ãã ã®æ¬¡ã®é²åã§ããåã«è³ªåã«çããã ãã§ã¯ãªããèªåçã«å®å šãªããã¥ã¡ã³ããçæããŸããããã¯ç 究ã¬ããŒããããã¬ãŒã³ã®äœæãåæã®å®æãŸã§ãããŸããŸãªåœ¢åŒã®ã¬ããŒããçæããããšãã§ããŸãã ãããã®ã¬ããŒãçæã¯ããã³ãã¬ãŒããã¹ã¿ã€ã«èŠçŽã«åŸã£ãŠé²ããããæŽåœ¢ãããããŒãã«ãå³è¡šãçµ±åãããŸãããã€ãè€æ°ã®æ å ±æºããæ å ±ãçµ±åããŠãç¶ããçŽãããèãå¢ãã®ãªããã©ãã£ããªèšè¿°ãçæããããšãå¯èœã§ãã ãã®æ©èœã®åœ±é¿ã¯åžå Žãã絶倧ãªãã®ã§ãããæè³ãã¡ãŒã ã§ã¯æ±ºç®çºè¡šãSECæåºæžé¡ããäŒæ¥åæã¬ããŒããçæããããšãå¯èœã«ãªã£ãŠããŸãããŸãããããžã¡ã³ãã³ã³ãµã«ãã£ã³ã°ããŒã ã¯æ¥çã®ç 究ãã¯ã©ã€ã¢ã³ãåããã¬ãŒã³ã«ãŸãšããããšãã§ããæè¡ããŒã ã¯è£œåææžãAPIã¬ã€ãã®äœæãèªååããŠããŸããããã«ãããæ¥åžžçãªããã¥ã¡ã³ãäœæã«è²»ããæéãåæžããå°é家ãé«ä»å 䟡å€ã®åæãææ決å®ã«éäžã§ããããã«ãªããŸãðâš
ã³ã¢ãã«ãã£ã³ã°ããã㯠ðš
ã¬ããŒãçæã®äž»èŠãªæ§ç¯èŠçŽ ã¯äžèšã®ãšããã§ãããããã¯ãé²åããã¬ããŒãçæã·ã¹ãã ãæ§ç¯ããäžã§ãéèŠãªåºç€ãšãªããŸãã
1. æ§é ååºåã®å®çŸ©
ä»»æã®ã¬ããŒãçæã·ã¹ãã ã®åºç€ãšãªãã®ã¯ãåºåãã©ã®ãããªåœ¢åŒã§ããã¹ãããæ確ã«ããããšã§ããPydanticã¹ããŒãã䜿çšããŠãã¬ããŒãæ§ç¯ã決ããŸããããã«ãããè€æ°ã®ã¿ã€ãã®ã³ã³ãã³ããããã¯ãå®çŸ©ããäžã§ãããã®é¢ä¿ã解説ããããšãå¯èœã§ãã
ããã«äžäŸã瀺ããŸãïŒ
from pydantic import BaseModel
from typing import List, Union, Dict, Any
class TextBlock(BaseModel):
text: str
class ImageBlock(BaseModel):
file_path: str
caption: str
class ReportOutput(BaseModel):
blocks: List[Union[TextBlock, ImageBlock]]
title: str
metadata: Dict[str, Any]
ãã®æ§é ååºåå®çŸ©ã«ãããã·ã¹ãã ã¯æ確ã§äžè²«ããåºåãã©ãŒãããã確ç«ããããšãã§ããæ å ±ã®æŽçãšã¬ããŒãã®å質ãä¿èšŒããŸãã
2. é«åºŠããã¥ã¡ã³ãåŠç
ã¬ããŒãçæã¿ã¹ã¯ã¯ãå ¥åå 容ãšããŠçµ±åã®åããªãããã¥ã¡ã³ãã«äŸåããŸãã ãã®ããã¥ã¡ã³ãã«ã¯PDFãPPTX, XLSXãããã«DOCXããã®ä»ã®åœ¢åŒãå«ãŸããè¡šãå³è¡šãšãã£ãè€éãªèŠçŽ ãå«ãŸããŸããLlamaParseã®ãããªGen-AIããŒãµãŒã¯ããã®ãããªä»»æã®ã³ã³ãã³ãããæ å ±ãæœåºããLLMãç解ã§ãã圢ã«å å·¥ããããšãåŸæã§ãð€âš
äŸãã°ãLlamaParseã¯å³è¡šã®æ§é ãç解ãããããããã¹ãããŒã¿ã«å€æããããšã§ãLLMãããã«æ·±ãç解ããããããŸããããã«ãããè€éãªããã¥ã¡ã³ãå ã®æ å ±ã容æã«åŠçãã質ã®é«ãã¬ããŒãçæã«è²¢ç®ããŸãã
3. ç¥èããŒã¹ã®çµã¿èŸŒã¿
ã¬ããŒãçæã®ãšã³ãžã³ãšãªãã®ã¯ç¥èããŒã¹ã§ãð ããã¯åã«ããã¹ããä¿åãåãåºããã ãã§ã¯ãªãããã«ãã¢ãŒãã«ã³ã³ãã³ããåŠçããããæ€çŽ¢æ¹æ³ã調æŽãããããèœåãèŠæ±ããŸããã·ã¹ãã ã¯ããã¥ã¡ã³ãã®çš®é¡ãæ¥ä»ãæºæ³ã«é¢ããæ å ±ãç解ããã¬ããŒã欲æ±ã«å¿ããåçš®ã®æ€çŽ¢çµç«¯ãæäŸããããšãå¿ èŠã§ãã
äŸãã°ãç¥èããŒã¹ã¯ç¹å®ã®ããã¥ã¡ã³ãã®ææ°çãåªå ããŠæ€çŽ¢ããããéå»ã®é¢é£æ å ±ãå¹ççã«åãåºãããããããšã§ãããå å®ããã¬ããŒããçæããŸãããŸãããã«ãã¢ãŒãã«ãªããŒã¿ïŒäŸãã°ãããã¹ããç»åãè¡šãªã©ïŒãçµ±åããããšã§ãããç«äœçã§è©³çŽ°ãªã¬ããŒããå¯èœã«ãªããŸãðð¡
4. ãã«ããšãŒãžã§ã³ãã®ã¯ãŒã¯ãããŒã¢ãŒããã¯ãã£
åäžã®LLMã§ã¬ããŒãå šäœãçæãã以äžã«ãäœæ¥ãç¹åãããšãŒãžã§ã³ãã«åå²ããæ¹ãåªããŠããŸãð¡ äŸãã°ãæ å ±ãæ€çŽ¢ãæ€èšŒãããç 究è ãšãŒãžã§ã³ãããããŒã¿ãããšã«ã¬ããŒããæžããã©ã€ã¿ãŒãšãŒãžã§ã³ããããããŠæçµçã«å 容ãæ ¡æ£ãä¿®æ£ããããšãã£ã¿ãŒãšãŒãžã§ã³ãããªã©ãå®çŸ©ãããŠããŸãããããã®äœæ¥ã¯ã人éã®äœæ¥ãææ¬ã«åæ ãããã®ã§ãã¬ããŒãã®è³ªãé«ããããšãã§ããŸãã
ãããããšãŒãžã§ã³ãã®åœ¹å²åæ ã«ãã£ãŠãåãšãŒãžã§ã³ãã¯ç¹å®ã®ã¿ã¹ã¯ã«éäžã§ããå šäœãšããŠããé«å質ãªã¬ããŒããäœæããããšãã§ããŸãããã®æ¹æ³ã¯äººéã®ããŒã ãè¡ãäœæ¥åæ ãšäŒŒãŠãããç¹åããããšã§ç²ŸåºŠãšå¹çãåäžããŸãðâš
5. ãã³ãã¬ãŒãåŠçã·ã¹ãã
å€ãã®ã¬ããŒãã¯ãæ¢åã®ãã³ãã¬ãŒãããã©ãŒãããã«åŸããŸãð ãã³ãã¬ãŒããã·ã¹ãã ã§è§£æããããã«å¿ èŠãªæ å ±ã¿ã€ãã«å¯Ÿå¿ããããå®è¡å¯èœãªèšç»ãæ§ç¯ããããšãéèŠã§ããäŸãã°ãç¹å®ã®åœ¢åŒã«åŸã£ãäŒèšå ±åæžãRFPïŒææ¡äŸé ŒæžïŒã®äœæã§ã¯ãæ¢åã®ãã³ãã¬ãŒããããšã«ããŠæ å ±ãæ£ç¢ºã«é 眮ããããšãæ±ããããŸãã
ãã³ãã¬ãŒãåŠçã·ã¹ãã ã«ãããæ¢åã®çµç¹ã®åºæºãæ £è¡ã«åèŽããã¬ããŒããçæããããããçµ±äžæãšä¿¡é Œæ§ã確ä¿ããããšãã§ããŸãððïž
ããŒã«ã®ã玹ä»âïž
LlamaCloudïŒãã«ãã¢ãŒãã«ã³ã³ãã³ãã®åŠçãã€ã³ããã¯ã¹åãè¡ããã¬ããŒãçæã®æ§ç¯ã«æé©åããããŒã¿åœ¢åŒãæäŸããŸããè€éãªããŒã¿æ§é ã管çãããããäžè²«æ§ã®ããåºåãå¯èœã§ãã
LlamaParseïŒè€éãªããã¥ã¡ã³ãã®æ§é ãåŠçããé«è³ªãªããŒã¿ã®ååŸãå¯èœã«ããŸãð ïžäŸãã°ãPDFå ã®å³è¡šã解æããŠããã¹ããšããŠæœåºããLLMãããã«ç解ãæ·±ããããããã«ããŸãã
LlamaIndex WorkflowsïŒè€æ°ã®ç¹åããããšãŒãžã§ã³ãã«ããã¯ãŒã¯ãããŒãçµ±å¶ããã¬ããŒãçæã®ããã»ã¹ãå©ããŸãð§ªåãšãŒãžã§ã³ããæ åœããã¿ã¹ã¯ãå¹ççã«ãŸãšãäžããããšã§ãæçµçãªã¢ãŠããããã®è³ªãé«ããããšãã§ããŸãã
ãŠãŒã¹ã±ãŒã¹äžèŠ§ð
éèã¬ããŒãã®çæïŒå®éã®éèããŒã¿ãã¢ãã«ãšããã¬ããŒããçæããäŸã§ããæ ªåŒã®ããã©ãŒãã³ã¹ãåžå Žåæãèªåçã«ãŸãšããããšã§ãæè³ã®ææ決å®ã«åœ¹ç«ã¡ãŸãðð¹
Excelãã³ãã¬ãŒãã®èšå ¥ïŒå®éã«Excelã®æžåŒãåºã«ããèšç®ã«åŸããããŒã¿ãåŒãåºããŸããããã«ãããæåã®ããŒã¿å ¥åã®æéãçããæéã®ç¯çŽãå¯èœã§ãâ³
RFPã®äœæïŒåžå Žã®æ å ±ãåéããäžã§æ°ããªææ¡èšèšãè¡ãã¡ãœããã§ãðRFPïŒææ¡äŸé ŒæžïŒã®èªåäœæã«ãããããå¹ççã«å ¥æãææ¡ããã»ã¹ãé²ããããšãã§ããŸãã
ã¬ããŒãçæã®ããã»ã¹
ãã³ãã¬ãŒãåæïŒãªã¯ãšã¹ããæ¥ããšããŸããã³ãã¬ãŒãããã»ããµãå¿ èŠãªãã©ãŒããããåæããŸãããã®ãã³ãã¬ãŒãåæã«ãããçæãããã¹ãã¬ããŒãã®åºæ¬æ§é ãæ確åãããŸãã
æ å ±åéãšãã£ãã·ã³ã°ïŒæ¬¡ã«ããªãµãŒãã£ãŒãšãŒãžã§ã³ããç¥èããŒã¹ã䜿ã£ãŠæ å ±ãåéãããããäžæçã«ãã£ãã·ã¥ããŸãããã®æ®µéã§ãå¿ èŠãªæ å ±ããã¹ãŠæããŸãã
ã³ã³ãã³ãçæïŒã©ã€ã¿ãŒãšãŒãžã§ã³ããåéãããæ å ±ã«åºã¥ããŠãæ§é ååºåå®çŸ©ã«åŸã£ãã³ã³ãã³ããçæããŸããããã«ãããå¿ èŠãªæ å ±ãäžè²«ãã圢åŒã§ãŸãšããããŸãã
ç·šéãšç²Ÿæ»ïŒæåŸã«ããšãã£ã¿ãŒãšãŒãžã§ã³ããçæãããå 容ãã¬ãã¥ãŒããå¿ èŠã«å¿ããŠä¿®æ£ããŸãããã®ããã»ã¹ã«ãããã¬ããŒãã®è³ªã確ä¿ããå®æ床ãé«ããŸãã
LlamaIndexã®æªæ¥ãžã®å¯èœæ§
LlamaIndexãå©çšããã¬ããŒãçæã¯ãçŸåšãé²åãç¶ããŠããŸããäŒæ¥ã¯LlamaIndexã䜿ã£ãŠãå€æ§ãªã¬ããŒããå¹ççã«çæããæ å ±ã®é®®åºŠãšå質ãä¿ã€ããšãå¯èœã§ãããŸããAIã¢ã·ã¹ã¿ã³ãã䜿ã£ãç¥è管çãšã¬ããŒãçæã¯ãä»åŸããã«æ¡å€§ããŠããã§ãããã
LlamaIndexããŒã ã¯ãããé«åºŠãªã¬ããŒãçæã®å®çŸã«åããŠããšãŒãžã§ã³ãã®æ¹åãæ°ããã¯ãŒã¯ãããŒã®éçºãè¡ã£ãŠããŸããããã«ãããAIãç¥èäœæ¥ãæ¯æŽããæªæ¥ã®ããžã§ã³ã«åããåãçµã¿ãé²ããããŠããŸãð€âš
ãŸãšã
åžžã«åçŽãªRAGãããã¬ããŒãçæãžã®é²åã¯ãAIã®ç¥çäœæ¥ã®ã¢ã·ã¹ã¿ã³ã¹ã«ããã倧ããªæ©ã¿ã§ãð ïžçµç¹ã¯èªååã®äžç°ãšããŠã®ã¬ããŒãæ§ç¯ãå®çŸããäžã§ããããŸã§ãããå€ãã®æ å ±ãé«ãæºç¢ºæ§ã§åŠçããäºãåºæ¥ãŸãã
ãã®å šäœã®ã¢ãŒããã¯ãã£ãç解ããããšã§ãã¬ããŒãã®èŠæ±ãå®éã«çæã§ããããšã¯çµ¶å€§ãªè£éã®äžç°ãšããŠå®è¡ã§ããããš