The uses of large language models like ChatGPT are exploding, and it seems like we see new use cases every day.
But as investors, the key question is whether we can use these models to improve our returns.
In this video, we look at some recent research that's been published, which has some very interesting results in that regard, and we'll also consider whether we can practically apply this to our own portfolios.
If you do enjoy our content, don't forget, please do subscribe to our channel.
That way, you won't miss any new content, and you could also click on like for this video.
So let's look at whether ChatGPT can be used to forecast stock returns in a bit more detail.
Now what we'll be talking about is sentiment analysis, and the basic idea here is that we take news headlines.
These could come from any kind of news service, and then we feed those news headlines into some kind of model.
That model then converts that news headline into a buy or sell signal, depending on whether it thinks it's a positive news story or a negative news story.
And if the models write about the sentiment and it captures the market reaction properly, that's the key thing.
Then we could actually make money based on those recommendations.
What do these models look like?
Well, some of them have been around for some time.
One example that you can see beside me here is from Raven pack, and what that tries to do is to take a huge amount of unstructured data, so this could come from things like news flow, but also social media, textual data, transcripts from earnings reports, and it turns those into a sentiment analysis which hopefully gives you a signal as to whether to buy or sell a stock.
この横にあるのはRaven Packの例ですが、これは膨大な量の非構造化データ(ニュースフロー、ソーシャルメディア、テキストデータ、決算報告書のトランスクリプトなど)を取り込み、それらをセンチメント分析に変換して、株の売買のシグナルを与えるというものです。
Now Raven's pack analysis is very much crafted to work only for financial data.
Large language models, on the other hand, are kind of general learning models which are unsupervised.
You just show them a set of data, and then they can learn to produce the right outputs based on a prompt which you give them.
Now, recently those have exploded in complexity, and one measure of that complexity is the number of parameters in the model.
Now in 2018, the state-of-the-art model was from Google AI, and that was BERT LARGE.
2018年、最先端のモデルはGoogle AIによるもので、「BERT LARGE」でした。
That had 340 million parameters in the model, and the competitors with that model would come from OpenAI.
So the first version of GPT, that was the Generative P-Train Transformer model, had 110 million parameters.
GPTの最初のバージョン、つまりGenerative P-Train Transformerモデルは、1億1000万個のパラメーターを持っていました。
GPT-2 had 1.5 billion parameters.
So as you add complexity to these models, their analysis becomes more sophisticated, and they can pick up more long-term relationships between words in natural language.
And if you're wondering, BERT stands for bi-directional encoder representations from transformers, hence the acronyms.
Now the prompts that you feed into these large language models are absolutely critical.
So for example, here you can see I'm using OpenAI's ChatGPT, and I told it to write a short limerick about active versus passive investing.
Now you can see it's done pretty well.
It's picked up the rhyming scheme, so it goes A-A-B-B-A. But what's a bit wonky about this, if you try reading it out to yourself, is that it doesn't have the right number of syllables.
Particularly on line number four, it's got 11 syllables which just sounds really awkward.
So I tried to correct it by telling it how many syllables should be in each line, and you can see it still didn't really figure it out.
So because this is a general language model, it's not tuned for a particular thing like limericks.
It doesn't perform incredibly well on every task.
It does pretty well on a very broad range of tasks.
Don't get me wrong, it is incredible.
So here I told it to write an R script to convert a zoo time series from prices to returns, and it did so, and the script actually works.
Not only that, but it actually came up with an explanation of how the code works.
So these models are now incredibly sophisticated, but they still do sometimes go wrong.
But the key thing is whether you can craft a prompt to produce the right output, and in fact, there's a whole new job category, prompt engineer, in crafting those prompts for these large language models.
So how can these large language models be used to do sentiment analysis for stocks?
The results I'm about to show you are from this paper which has been published in April 2023, and what it does is to compare predictive accuracy for various models, and that includes the RavenPack news analytics tool which we just saw, but also increasingly complex versions of GPT going from GPT 1 to GPT 2 and ChatGPT which uses the GPT 3.5 version, but also two flavors of the BERT model.
Now the approach is not to feed a whole news story about a stock to the model, but instead just to focus on news headlines.
Those are presented to the model using a very standardized prompt as we'll see, and then it has to figure out whether this headline is positive or negative for stock sentiment.
Now I think it's useful to look at an example headline, one which they give in the paper, which is looking at a story about Oracle, the database provider, and a less well-known company which is Rimini Street.
Now Rimini Street provides support products and services around these software products from Oracle and SAP, and in this example, we'll be looking at this headline and here's the entire story that goes with it.
Now really it was a question about intellectual property which allegedly Rimini had leaked to other clients and which belonged to Oracle.
And Oracle managed to get a $630,000 fine against Rimini Street in court.
So personally, I think that this would be positive for Oracle.
So the standardized prompt that the research paper used to feed these headlines to the model is as follows.
Firstly, it tells the model to forget all of your previous instructions.
Presumably, that wipes the slate clean in terms of previous prompts.
Then it says pretend you're a financial expert.
Not only that, but you're a financial expert with stock recommendation experience.
And then it tells ChatGPT how to respond to the headline.
In the first line of the response, it either says yes, no, or unknown, according to whether the headline is good news for the stock, bad news for the stock, or uncertain.
Then, after that one-word, one-line response, it asks the model to elaborate on its answer, and it has to do that with a short and concise sentence on the next line, explaining its decision.
And then we tell the model which stock we're talking about.
So, in this case, is the headline good or bad for the stock price of Oracle, and we also specify whether it's over the short, medium, or long term.
In this case, it's over the short term.
And then finally, the model's given the actual headline.
So, the bits in red are the bits which are substituted for different stocks, for different terms, and for different headlines.
But all of the rest of the text, which is in white, would be the same for every stock and for every headline.
So here's ChatGPT's response to that prompt: The answer is yes.
では、そのプロンプトに対するChatGPTの回答です: 答えは「イエス」です。
In other words, the headline is good for the stock, and its explanation is actually quite sophisticated.
It's actually saying that the fine against Rimini Street could boost investor confidence because it shows Oracle can protect its intellectual property, and that indirectly could increase demand for Oracle's products and services.
つまり、Rimini Streetに対する罰金は、Oracleが知的財産を保護できることを示すため、投資家の信頼を高め、間接的にOracleの製品とサービスに対する需要を高める可能性があるということです。
And in fact, I think that's quite a good answer.
Compare that with a response from Ravenpack, which came up with a minus 0.52 negative sentiment for Oracle.
So, you can see that the ability of the language model to infer interesting things, relationships between the words in the headline, allows it to see through the immediate problem, which is that this is a court settlement, but to see that that could be construed as positive for the stock.
Now, if you're enjoying this video and you want to learn more about investing, remember we do offer a membership where you can join our community and learn with like-minded investors online.
It's very friendly.
There's lots of members-only content, members-only videos, and you get to chat to each other using a chat application and ask questions anytime you like.
Now, if you do want to learn more about that, just click on the link in the description beneath me or just go to pensioncraft.com.
So, how well did the models perform?
Well, whenever you're measuring performance, you have to be very careful that you do things out of sample.
Otherwise, the model could actually know what happens to the stock price.
Now, fortunately, the version of ChatGPT which the researchers used had no training after the date of September 2021.
So, by only using headlines from October 2021 to December 2022, the researchers could ensure that the model wouldn't have access to the answer.
And in terms of the stocks which they used, they had to have a news story published about them, a headline which they took from that Raven Pack Dow Jones news story feed, and they used stocks from the US exchanges.
また、使用する銘柄については、その銘柄に関するニュース記事が掲載され、その見出しがRaven Pack Dow Jonesのニュース記事フィードから取得されている必要があり、米国の取引所の銘柄を使用しました。
So, that's the New York Stock Exchange, NASDAQ, and the American Exchange.
Now let's assume that we had an absolutely perfect model.
What would we expect it to look like?
Well, beside me here, you can see a distribution of returns chopped up into two buckets.
So let's say we receive a positive headline about a stock, as judged by the model, and that would go into the good news category.
Well, if the model was absolutely perfect, then on the following day, the return would always be positive if it was right about the good news story, and conversely, if it thought it was a bad news story, then it would have 100% accuracy in predicting negative returns.
Of course, in the real world, it never looks like that.
In reality, you'd have something like this that would also be a fantastic model because notice that the average return on good news is higher than the average return on bad news.
That's the best that we can really hope for.
There'll always be some false positives and false negatives, and this is what the results looked like.
So we've got each of the models in the columns here.
We've got GPT, ChatGPT, the most sophisticated GPT model.
We've got the more basic versions of GPT - that's GPT-1, GPT-2.
We've got the BERT model and we've got Raven.
So what this score table does is it looks to see whether the model said that there was good news, unknown or bad news and then looks at the average return on the following day, and that's for the stock that was in the prompt.
Notice that for GPT, when there was a good news story, the average return was plus 0.13 percent.
When there was a bad news story, it was minus 0.13 percent.
So it did that job of separating the good and bad returns and interpreting sentiment.
However, if you look at any of the other models, they do a very poor job.
The average returns for GPT-1 and GPT-2 were quite similar, depending on whether it was good news or bad news according to the model.
So what this shows is that the more sophisticated GPT model with more parameters and more inferential ability could actually do a pretty good job of discriminating positive and negative sentiment from headlines.
If we actually implement that into a trading strategy - and the authors aren't very clear on how they generated these lines - what they do is they take all of the stocks for which there was a headline.
If the model thinks the headline is positive, then they buy that stock, hold it to the next day, and then sell.
If the model, on the other hand, thinks the news is negative, then they'd short the stock which profits from a fall and then close out the short the next day.
So in other words, they'd have a long-short portfolio which is regenerated every day, and they'd simply repeat that over this period, and you can see the long-short strategy does very well.
If you have a short-only strategy where they simply short the stocks where there's a negative sentiment story according to the model, again, the performance there is quite good, but notice that the performance is not steady, so there was a period of outperformance at the beginning of the period, then the results were kind of flat, didn't really go anywhere, and then outperformance again at the end of the period.
There's also a long-only strategy, which is for stocks that had good news which they bought, and I'm not clear what all news is - they didn't explain that in the paper - but this was a period when stocks were falling in 2022.
So the fact that it did generate a positive return is actually quite interesting.
Notice, however, that this is without transaction costs, and as we'll see later, that's very significant.
So the authors conclude that the superiority of ChatGPT in predicting stock market returns can be attributed to its advanced language understanding capabilities, which allow it to capture the nuances and subtleties within news headlines.
But I think the more interesting point here is that it actually interprets things in the same way as market participants because remember that's the goal here.
In order to predict what the stock market does, you have to get
But I think the more interesting point here is that it actually interprets things in the same way as market participants, because remember, that's the goal here.
In order to predict what the stock market does, you have to get into the mind of investors, so I think it's truly remarkable that the model can do that, particularly as it's not specifically trained to do so.
It is a general language learning model, but their conclusion is that this enables the model to generate more reliable sentiment scores, leading to better predictions of daily stock market returns.
So you're probably thinking, "Can I make money with this?"
But personally, I think it would be impractical with the model as it's implemented here, and that's because of churn.
Remember what the model was doing: it was looking at a headline today, deciding whether to buy or sell, and then selling or closing out a short the following day.
So there could be a huge turnover of stock in this portfolio.
Now, of course, you could restrict the number of stocks you apply this to, and there are various workarounds, but that churn, the bid-off has spread, is going to eat a large amount of the profits for this strategy.
And if you remember, the outperformance came in two fairly concentrated bursts at the beginning and the end of the period.
So if this thing doesn't outperform, you'll still be continually churning and losing money on the bid-off of spread.
Also, the strategy which worked best was a long short strategy, which is kind of like a market neutral hedge fund strategy.
So for a hedge fund to implement this could be practical, I think, because they can go short quite easily on single stocks.
For retail investors, it's not easy to do that.
So we'd have to probably go for the long only strategy, which, if you remember, didn't perform as well.
Another problem is the fact that everybody now knows about this model and how well it performs.
So alpha, remember, is outperformance if you can beat the market due to some special insight into the market.
And this is an insight which anybody can buy from OpenAI very cheaply.
So this thing is now in the hands of investors almost everywhere.
And what usually happens when news spreads about a strategy is that the performance of that strategy starts to diminish.
That's because the trades start to get crowded, everyone does the same thing, and some people will do it more quickly than others, so they'll probably eat their lunch.
So this gradual diminishing of the outperformance is called alpha decay, and it's something that hedge funds are used to all the time.
If they come up with a great strategy, gradually the outperformance of that strategy often wanes with time, and so they have to come up with a new strategy.
So personally, I'm not convinced that this thing is going to work for retail investors such as you and I. If anybody's going to monetize it, it'll probably be hedge funds which have lots of resources they can throw at this problem, and they're probably doing this kind of thing anyway.
So I think the results of the paper are really fascinating because they show what the model can do is get inside the mind of investors and predict what this sentiment will be in response to a headline in the same way as a human would.
Now, that is truly remarkable.
In terms of practical implementation of the strategy, I think it's not really practical for retail investors.
Maybe you could use it if you have variations on it, or you could use macroeconomic information which you'd feed to the model for longer term forecasts, but I don't think it's practical as it's presented in this paper.
Now, don't forget, our offer: if you do want to join our membership and get access to all of the goodies like our trackers to track macroeconomic variables but also market variables, and also members-only videos and be able to chat to other members of the community, then just look at the description in the link below or go to pensioncraft.com.
And as always, thank you for listening.