ChatGPT Is Losing Intelligence – Recent Study Unveils


A fascinating discussion has arisen regarding the proficiency of ChatGPT, particularly versions GPT-3.5 and GPT-4. These two iterations have dominated the market as highly sought-after language model services.

However, a new study conducted between March and June 2023 has raised concerns about the declining intelligence of ChatGPT and the question, “Is ChatGPT losing its edge?”

ChatGPT Updates Lag Behind Older Versions

Esteemed scholars from Stanford University and the University of California, Berkeley, thoroughly examined ChatGPT’s competence across various tasks. Their comprehensive evaluation revealed a significant inconsistency in its performance over a three-month period.

This inconsistency not only raises eyebrows but also highlights the need to consistently monitor and improve the quality of this technology.

“Our findings demonstrate that the behavior of the ‘same’ large language model (LLM) service can vary significantly in a relatively short amount of time,” states the report.

Comparison of Performance between ChatGPT-4 and ChatGPT-3.5
Comparison of Performance between ChatGPT-4 and ChatGPT-3.5. Source: arXiv

Delving into the specifics, GPT-4’s ability to solve mathematical problems, specifically identifying prime numbers, experienced a drastic decline in proficiency.

In March, the accuracy rate for prime number identification stood at an impressive 97.6%, but by June, it had plummeted to a shocking 2.4%. On the other hand, GPT-3.5 showed remarkable improvement during the same timeframe, surging from 7.4% to 86.8%.

The stark contrasts have left industry experts perplexed, as they expected newer versions to outperform their predecessors. This discrepancy raises concerns about the true impact of “updates” and “improvements” on the AI’s capabilities.

Lack of Detailed Explanation and Code Generation

Upon further investigation, the study unearthed another intriguing aspect when controversial questions were posed. GPT-4 demonstrated a significant decrease in the direct answering of sensitive queries from March to June, indicating an enhanced safety mechanism.

However, there was a noticeable reduction in the generated explanations when declining to answer. This has sparked speculation about whether the model is erring on the side of caution at the expense of user engagement and clarity.

Comparison of Verbosity between ChatGPT-4 and ChatGPT-3.5
Comparison of Verbosity between ChatGPT-4 and ChatGPT-3.5. Source: arXiv

However, it’s not all doom and gloom. The study identified an area where GPT-4, and to some extent GPT-3.5, showcased marginal improvement: visual reasoning. Although the overall success rates are still relatively low, there are signs of progress in their performance.

What truly stands out is the unpredictability of this technology. GPT-4’s capability in generating executable code experienced a decline, raising concerns for industries heavily reliant on these models as inconsistencies can have devastating effects on large software ecosystems.

Complacency Is Not an Option

The critical lesson to learn from this in-depth analysis is not the fluctuations in GPT-4 and GPT-3.5’s performance, but the overarching understanding that AI efficiency is not guaranteed.

As technology continues to advance rapidly, there is a common assumption that newer models will inevitably surpass their predecessors. However, this study challenges that notion.

The message for businesses and developers who heavily depend on ChatGPT is to regularly monitor and evaluate these models. The study serves as a stark reminder that advancements in AI are not always linear.

Global Companies Utilizing ChatGPT
Global Companies Utilizing ChatGPT. Source: Statista

The assumption that newer is inherently better may be an oversimplification in need of addressing within the tech community. The erratic behavior of GPT-4 and GPT-3.5 within a few months emphasizes the urgency to remain vigilant, evaluate, and recalibrate, ensuring that this technology consistently delivers its intended purpose with utmost proficiency.


