⏱️ タイムスタンプ
00:00 イントロダクション
01:30 アプリ・アーキテクチャ
03:50 ローカルインストール
07:05 Google Colabのランオフ
Hey, what is up guys?
Welcome back to another YouTube video at the world of AI.
In today's video, we're going to be showing you guys how you can actually utilize LangChain as well as GPT-4 to actually create a chatbot in a certain way that can scan through your PDFs and files, and provide you with answers as well as additional information that will basically utilize the PDF as well as the information within it to actually help you facilitate better prompts, as well as better generative answers.
That will definitely summarize the PDF in a certain way.
So with that thought, before we actually get into the actual video, I would really mean the world to me, guys, if you guys can go subscribe, turn on the notification bell, like the video, as it will definitely help the algorithm out.
I have a lot of different content, guys, that will definitely be very beneficial for you guys, so definitely check it out, as there's a lot of value in different aspects in the AI world.
So with that thought, let's get right into the video.
So before we actually get into the actual gist of how we can actually do it, I want to basically explain the architecture as well as how this will basically function.
Now I'm not going to be actually installing it locally on your desktop as that's kind of a little bit more complex, so this is more of an easy approach that anyone can actually do, and it doesn't require heavy tech to actually run this or operate it on your local desktop.
If you want me to do a video on actually installing it locally, I can definitely do so in future videos.
But for today's video, we're just going to be using Google Colab as our source code that will basically easily install this application, as well as using it on the web front to basically summarize your PDF files.
しかし、今日のビデオでは、Google Colabをソースコードとして使用し、このアプリケーションを簡単にインストールし、ウェブ上でPDFファイルを要約するために使用することを説明します。
Now, in terms of the PDF chat box architecture, how it works is that we can see over here that you start off with a PDF doc that you want to summarize, and basically, this PDF is going to be converted to a text file.
And this is going to be used for the application to actually utilize as well as split it up into different chunks.
So, for example, if you have a PDF like this, for example, there are different paragraphs as well as different pages.
Now what the actual application will do is that it will split it up into separate chunks depending on how big the file actually is, as well as how much text and words are actually in the actual PDF.
And what it will do is that it will create an embedding.
What this will do is that it will store the embedding into different chunks, and it will be used for relevant questionnaires, as well as like relevant sections, as to when you actually use the chat box.
What it will do is, it will pull out each piece of information that is in those chunks, and will separate them with different functioning titles, as well as different ideas as to what it is.
It will basically pull out relevant information whenever you need it, when you search for it with your chatbot or your web source on Google Colab.
チャットボットやGoogle Colabのウェブソースで検索すると、基本的に必要な時に関連する情報を引き出してくれます。
So basically, it is stored, and the relevant docs are put onto the side.
Whenever you have a chatbot or something else that sources these codes or prompts, the language model then takes the standalone questions, and it utilizes the relevant docs, which are embedded with the chunks.
It takes out certain things that are relevant to the actual prompt.
And once it finds something that is relevant to the prompts from the chunks, it takes that answer, and it gives you, or it takes what the text is, and it gives you an answer based off the prompt that you had, and that's what we see over here in terms of its answer.
This is quite unique, guys, and it's definitely very beneficial for a lot of people as this is a new tech that will be utilized for a lot of different use cases.
So, I highly recommend that you support these guys who have actually contributed to creating this, as it's fairly easy to install, and it's a great tech as well that will be very beneficial for you guys.
Now, in terms of installing it for the people who want to actually install it locally, I can give you a little rundown as to how you can actually do it.
First things first, you're going to need Git, which is an application that clones the repository of GitHub onto your desktop, so you will need that.
Then, you will need Python, which is to actually help you install certain parameters as well as certain applications of code onto your desktop, and lastly, you will need a code editor.
Personally, what I use, as well as what I preach in my other videos, is using Visual Studio Code, which is a really good code editor that is very appealing.
個人的には、他のビデオでも説明しているように、Visual Studio Codeを使用しています。
It's very easy to use, and I feel like it's way better than actually using the command prompt as it's more robust in a way, as well as more applicable to certain ways.
So, I highly recommend that you use that Visual Studio Code as your code editor.
ですから、コードエディターとしてVisual Studio Codeを使うことを強くお勧めします。
Now, it's fairly easy.
You clone the actual repository onto your command prompt, and once you're able to do that, you install the packages.
Once you're able to install the packages onto your desktop, you open it with your code editor, and once you're at that stage, you input the different keys.
You will need an API key as well as a Pinecone key, which is Pinecone.
You can have a free trial with, as well as an API key.
There is a free trial for it, and you can get these keys by going onto the API key section on OpenAI.
For Pinecone, you search up Pinecone website, and you can get the API key from it.
What you'll need to do is paste it into the .env file, and you need to also paste the API environment.
So that's basically where you're located in that region, and you paste it in the environment section.
You can give the index a name.
Now, what this will do is that it will run this locally on a chat box.
Like, it's going to run the chat box on your local Pinecone API, and this is going to be to utilize the questionnaires as well as the summary of what you're trying to do with the PDF file.
例えば、ローカルのPinecone APIでチャットボックスを実行し、アンケートを活用したり、PDFファイルでやろうとしていることをまとめたりすることになります。
Now, this part is quite easy, obviously, but then in terms of converting the PDF files, you can run the app as well.
Basically, you just have to run the actual PDF into a doc folder, and you just need to run the script npm, which is the actual ingest and embedded to your documents.
This is what is going to pair your document with the application, and that is quite easy as that.
Once you're able to do that, you just have to set the local Pinecone API with your application, and you just click run.
それができたら、ローカルのPinecone APIをアプリケーションに設定し、実行をクリックするだけです。
Then you'll be able to prompt it onto a local server in which you can actually run the chat box, and that's basically how you can actually run it locally.
Obviously, this is just a rough breakdown as to how you can do it locally.
If you want me to go more in detail as to step by step, I can definitely make another video in doing so.
Now let's get right into the gist of actually how you can run it on Google Colab, which is another source code in which you can run it without using your actual local desktop to do so.
では、実際にGoogle Colabで実行する方法の要点を説明します。Google Colabは、実際のローカルデスクトップを使わずに実行できる別のソースコードです。
Now, I have two options as to how you can do this, so I'll leave both the options in the description below as both are useful in actually prompting and summarizing your PDF files.
For the people who do not know, LangChain is the blockchain-based language data marketplace that actually allows users to access and purchase language data for natural language processing applications.
It provides an efficient way to actually obtain high-quality training data for NLP models.
Now, GPT-4 is basically for the people who do not know.
It's the GPT generative pre-trained transformer language model developed by OpenAI, and it's basically utilizing using LangChain and GPT-4 for this application for the chatbox to summarize as well as help be that questionnaire for the PDF files.
これはOpenAIによって開発されたGPT(Generative Pre-trained Transformer)言語モデルで、LangChainとGPT-4をこのアプリケーションのチャットボックスに組み込んで、PDFファイルの質問者として役立ち、要約も行うのに役立ちます。
Now, to create ChatGPT chatbot for your PDF file through this Google Colab, you'll first need to obtain the training data that is specific to the domain of the PDF.
さて、このGoogle ColabでPDFファイル用のChatGPTチャットボットを作るには、まずPDFのドメインに特化した学習データを入手する必要があります。
So, first things first, whenever you have a Google Colab, the first thing you want to do is save a copy in your drive.
そこで、まず最初に、Google Colabができたら、いつでも、まず、自分のドライブにコピーを保存しておきます。
So, I'll leave the link down in the description, and that's the first step, basically.
And what you want to do is connect to your actual server, and this is what is going to be utilizing your RAM and your usage to actually operate this.
Now, firstly, you want to install all the packages.
You're going to be installing the packages and tools from LangChain, OpenAI, as well as these other basic applications that will help you run this overall app.
And once you're able to do that, you want to start off by running the next, which is opening the packages.
And in this case over here, you will need to obtain your OpenAI key as well as your Pine Cone key, and what you do is you paste your OpenAI key over here, and we'll move on to the next step.
この場合、OpenAIキーとPine Coneキーを取得する必要があるので、OpenAIキーをここに貼り付けて、次のステップに進みます。
Once you have pasted your API key from OpenAI, you want to click on connecting your Google Drive.
OpenAIのAPIキーを貼り付けたら、Google Driveとの接続をクリックします。
So, what you will do is click on this button over here, and you'll run anyways.
And what this will do is you connect onto your Google Drive.
すると、Google Driveに接続されます。
Obviously, I'm not going to do this because I do not want to run this application, but basically, once you are done that, you get the root file as well as the file location, and you can do that by going to your Google Drive and copying the location of where you uploaded your PDF.
So, you need to upload your PDF onto the Google Drive, copy the file folder destination, and you paste it over here.
In this case, we pasted it over here as we're running it from the Google Collab.
今回は、Google Collabから実行しているので、ここに貼り付けました。
Now, once you're able to do that, you run the reader, and the reader will start going through each page, as we talked about at the front, it's going to start separating it into different chunks.
And once that is done, you can now see all the text that is piled into different chunks.
Once that is completed, you run each of the functions that will actually help you with the applications.
So, you want to click on each and every play button that is on the side here, and what it will do is it will start retrieving information from each chunk.
It will be basically ready to use for downloading the embeddings from the actual application and PDFs, as well as the embeddings from the actual large language model, to actually start running your chatbot on the actual Google Colab.
これで、実際のアプリケーションやPDFからエンベッディングをダウンロードしたり、大規模言語モデルからエンベッディングをダウンロードして、実際にGoogle Colabでチャットボットを動かすための準備が整いました。
Now you need to also download each and every one of these and once you're done, you'll get to the start at the end where you can actually start chatting with the PDF as well as the chatbot to actually start summarizing as well as getting value out of your PDF.
In this example, we can see that we asked the question "who are the authors of the article," and what the chatbot is starting to do is that the authors of the article are (I cannot pronounce this name, so I'm not even going to try), and we can see that the authors are actually these guys.
That is quite remarkable, guys, because you can see right away that it's starting to work with the chatbot as well as it's starting to retrieve the information from the actual PDF.
Obviously, this is Google Collab so you're not going to get a good user interface, so if you want to run it on a local server, you would obviously need to run it off of Pinecone to do so.
もちろん、これはGoogle Collabなので、良いユーザーインターフェイスを得ることはできません。
If you really want to do that, I can definitely make a better appealing interface to show you guys how you can actually run it on the local server.
But in this case, we're just going to be using Google Collab as it's much easier and more applicable to anyone as you just need to do a click of three buttons to actually start running this as it's quite easy to do so for anyone and you don't need a coding background to do so.
しかし、今回はGoogle Collabを使うことにしました。それは非常に簡単で、誰でも簡単にできるように、3つのボタンをクリックするだけで実行を開始できます。そして、コーディングのバックグラウンドがなくても大丈夫です。
Another question we can see in another example is that "what is the cost of training GPT all model" and we were able to get this answer of a hundred dollars, and what it does is it scans, as we talked about, scans through different information and categorizes into different chunks.
Now, cost obviously is going to be put into a different chunk, so we're able to see that the answer is a hundred, and when we ask it, we get this answer.
There are different questions and answers we can see over here, but it is quite easy to do so and run this application, guys.
I hope you found this video beneficial.
If you have any questions or if you want me to cover anything else, definitely let me know as I'll be definitely free to actually answer and provide you different tutorials on how you can actually run this.
So, that's basically it for today's video, guys.
I would highly recommend that you check out these links in the description below.
If you want me to cover anything else, I'll definitely do so as well, guys.
Thank you so much for supporting the channel, guys.
I've been getting a lot of liking and support, so it really means a lot to me.
If you guys haven't seen any of these previous videos, definitely do so, share this video, like, and comment anything that you want to see, and I'll definitely see you guys next time.
Have a nice day, fellas.