【English Composition Practice Book】What is the “strawberry problem” of large-scale language models such as GPT4 and Claude?

2024年10月31日 08:00

While AI based on large-scale language models (LLMs) can demonstrate high performance, it is often pointed out that they are also vulnerable, with research showing that they are easily fooled by lies and that their ability to reason about arithmetic problems is less than that of elementary school students.

Machine learning engineer Chinmay Jog explains the “Strawberry Problem,” which shows the limits of AI's capabilities.

The ‘strawberrry’ problem: How to overcome AI’s limitations

https://venturebeat.com/ai/the-strawberrry-problem-how-to-overcome-ais-limitations/

Generative AI, such as ChatGPT and Stable Diffusion, can be made highly capable by anyone, capable of writing sophisticated text and code and outputting illustrations and realistic images.

However, image-generating AI, for example, suffers from the “one banana problem” vulnerability.

Daniel Hook, CEO of Digital Science, an IT news site, generated an image with the prompt “A single banana,” but the output was “two bananas in a bunch.

According to CEO Hook, it is believed that AI has a bias such as “the banana in the picture is two bananas,” and that AI, which does not actually understand what the prompt specifies, will be pulled into the bias learned in the data set.

A similar problem that Jogg raises as a problem facing LLMs that deal with sentences is the “strawberry problem”.

Mr. Jog asked ChatGPT, “How many 'r's are contained in the English word strawberry?

As anyone can see from the word “strawberry,” there are a total of three “r's,” the third letter and the eighth and ninth letters.

However, ChatGPT responded, “The English word strawberry is composed of two 'r's.

The following is the same “what is the number of r s in strawberry” question asked to Claud, an LLM-based interactive AI developed by Anthropic.

The same answer was returned here as well, “there are two r's in strawberry”.

In other instances, the LLM-based AI also makes mistakes when counting the “m” in “mammal” and the “p” in “hippopotamus.”

According to Jog, the strawberry problem stems from a feature of LLMs.

Almost all high-performance LLMs are built on Transformer, a deep learning model published by Google researchers and others.

Transformer does not take input text directly, but uses a process to “tokenize” the text as a numerical representation.

A token can be a complete word or a partial word.

For example, if there is a token for the word “strawberry,” it will be read as is, but some words may take input in the form of a combination of two tokens, “straw” and “berry.

By breaking down the input into tokens, the model can more accurately predict which token comes next in the sentence.

Therefore, LLM-based AIs that work with tokens are good at predicting the content of a sentence based on context, but breaking down words into alphabetical units is difficult.

Jog said, “This problem may not occur in model architectures where individual characters can be directly verified without tokenization. However, with the current Transformer architecture, this is not feasible.”

In light of the problems faced by LLMs, Jogg also explains a workaround for the strawberry problem.

The way to do this is to ask questions by instructing the user to use programming language code instead of interacting in ordinary sentences.

Also, Mr. Jog visited ChatGPT with “answer how many r s in strawberry using python. show the code, and the explanation”.

ChatGPT was able to answer “Output = 3” which is the correct number of “r's” after using the count function in Python.

Jog said, “The simple character counting experiment revealed a fundamental limitation of the LLM. The experiment shows that LLM is a token pattern-matching prediction algorithm, not an 'intelligence' capable of understanding or reasoning. However, knowing in advance what prompts work well can mitigate the problem to some extent. As AI is integrated into our lives, it is important to recognize its limitations in order to use it responsibly and have realistic expectations,” he summarizes.

【参考図書】
「日本人のための英作文練習帳」酒井文秀(著)GregoryAlanDick(英文監修)