Search
Close this search box.

OpenAI’s Advanced AI Model Strawberry Capable of Methodical Problem-Solving

Advanced AI Model Strawberry Capable of Methodical Problem-Solving

The most recent significant advancement in artificial intelligence was achieved by OpenAI, which introduced GPT-4 last year, by expanding the scale of its models to dizzying proportions. The company disclosed a new development that indicates a change in strategy: a model that is capable of logically reasoning through numerous complex issues and is substantially more intelligent than current AI without requiring a significant scale-up.

The new model, OpenAI o1, has the potential to resolve issues that have stumped existing AI models, including OpenAI’s most capable model, GPT-4o. It does not generate an answer in a single phase, as a large language model typically does. Rather, it rationalizes the problem, thinking aloud as a human would, before achieving the correct outcome. The company has stated that the new model, code-named Strawberry within OpenAI, is not a successor to GPT-4o but rather a complement.

LLMs typically generate their responses by feeding enormous quantities of training data into massive neural networks. They have the potential to demonstrate exceptional linguistic and logical skills; however, they have historically encountered difficulty with seemingly straightforward tasks, such as basic math queries that necessitate reasoning.

According to OpenAI, its latest model exhibits significantly superior performance on numerous problem sets, such as those that pertain to mathematics, chemistry, biology, physics, and coding. The company reported that GPT-4o solved an average of 12 percent of the problems on the American Invitational Mathematics Examination (AIME), a test for math students, while o1 answered 83 percent of the questions correctly. The new model is slower than GPT-4o, and OpenAI has stated that it does not always perform better. This is due to the fact that it is not multimodal, meaning it cannot parse images or audio, and it is unable to search the web, unlike GPT-4o. For some time, the enhancement of the reasoning capabilities of LLMs has been a prominent topic in the research community. In fact, competitors are conducting comparable research. Google unveiled AlphaProof in July, a project that integrates reinforcement learning and language models to address complex mathematical challenges.

Read: Adyen Reveals Key Insights from 100 Global Businesses on Revenue Efficiencies That Could Generate Significant Savings

Industry Comments

Murati says OpenAI o1 uses reinforcement learning, which involves giving a model positive feedback when it gets answers right and negative feedback when it does not, in order to improve its reasoning process. “The model sharpens its thinking and fine tunes the strategies that it uses to get to the answer,” she says. Reinforcement learning has enabled computers to play games with superhuman skill and do useful tasks like designing computer chips. The technique is also a key ingredient for turning an LLM into a useful and well-behaved chatbot.

According to Murati, OpenAI is currently in the process of developing its next master model, GPT-5, which will be significantly larger than its predecessor. However, the organization continues to maintain its conviction that AI’s capabilities will be enhanced through scale. Consequently, it is probable that GPT-5 will incorporate the reasoning technology that was introduced today. Murati asserts that there are two paradigms. “The scaling paradigm and this new paradigm.” We anticipate that we will facilitate their reunion.

AlphaProof was able to learn how to reason over math problems by looking at correct answers. A key challenge with broadening this kind of learning is that there are not correct answers for everything a model might encounter. Chen says OpenAI has succeeded in building a reasoning system that is much more general. “I do think we have made some breakthroughs there; I think it is part of our edge,” Chen says. “It’s actually fairly good at reasoning across all domains.”

Noah Goodman, a professor at Stanford who has published work on improving the reasoning abilities of LLMs, says the key to more generalized training may involve using a “carefully prompted language model and handcrafted data” for training. He adds that being able to consistently trade the speed of results for greater accuracy would be a “nice advance.”

Share With
Contact Us