Artificial intelligence has become the centrepiece of global business narratives, investor decks, and public‑sector strategy documents. Market forecasters predict multi‑trillion‑dollar impacts,[1]venture capital continues to pour billions into AI infrastructure,[2]and major vendors routinely describe breakthroughs that appear to signal dramatic progress. The momentum feels undeniable.
Yet when we examine the evidence behind these claims, a different picture emerges. The gap between what AI appears to do and what it can reliably achieve has widened sharply over the past three years. Organizations deploying AI discover inconsistent behaviour, hidden operational costs, and reliability limitations that contradict the confident messaging surrounding the field.[3]The result is an overhype cycle, built less on validated performance and more on selective disclosure and optimistic interpretation.
Understanding this cycle is essential. AI is moving into healthcare, finance, public services, and critical infrastructure. If expectations are inflated while limitations remain poorly understood, organizations risk financial loss, compliance exposure, and operational breakdowns. The real danger is not that AI is weak, but that it is misunderstood.
Marketing as acceleration
Much of the hype originates from the vendors who create the systems. Announcements often rely on proprietary evaluations that cannot be independently reproduced. When OpenAI introduced GPT‑4, only selective benchmark results were released.[4]Google’s Gemini Ultra launch was accompanied by a promotional video later revealed to have been edited to enhance the appearance of real‑time reasoning.[5]These promotional materials shape investor perception long before rigorous assessment is possible.
Benchmark illusions
Benchmarks dominate AI marketing, yet they rarely capture real‑world performance. Research from Stanford’s Center for Research on Foundation Models (CRFM) documented year‑over‑year degradation in benchmark scores for multiple leading AI models.[6]This occurred not because the models worsened, but because benchmarks failed to represent real complexity. Most measure pattern reproduction, not reasoning, consistency, or truth verification.
The expectations gap
A persistent belief is that AI is close to general reasoning. Evidence does not support this. Harvard researchers demonstrated that LLMs fail to maintain coherent reasoning chains across slightly varied prompts.[7]What appears as reasoning is often prompt‑dependent pattern generation.
Many organizations expect AI to reduce workload. Instead, more than half reported increased verification demands.[8]Output must be checked for accuracy, bias, and compliance, shifting labour rather than eliminating it.
In high‑stakes environments, accuracy cannot be assumed. Studies published in JAMA Internal Medicine found that LLMs generated clinically unsafe recommendations in more than 25% of cases.[9]In law, a New York case confirmed that ChatGPT fabricated citations that initially passed casual validation.[10]
Structural causes of overhype
The widening gap between perception and reality is driven by several forces. Vendors rarely disclose failure modes. Scaling laws that once predicted steady improvement now show diminishing returns.[11]Companies, researchers, and investors have incentive to present AI as advancing more rapidly than it is. Media outlets amplify these claims, often without verification; more than 300 news stories repeated “sparks of artificial general intelligence” claims from a single non‑peer‑reviewed paper.[12]
Impact on business and government
The consequences are significant. A 2024 McKinsey analysis concluded that fewer than 15% of generative‑AI pilots created measurable operational improvement.[13] Compliance requirements continue to tighten, especially under the European Union AI Act, which imposes fines for using systems without adequate verification.[14]Organizations expecting automation instead face escalating oversight burdens.
Separating signal from noise
Responsible adoption requires independent validation. Organizations should demand transparent methodology, reproducible tests, and explicit disclosure of reliability boundaries. The correct question is not “What can the model do?” but “What can the model do consistently under real‑world variation?”
AI’s value grows when its limitations are clearly understood. The technology is powerful within its constraints, but hype obscures those constraints, leading to misuse.
Conclusion
AI is not failing, but the public narrative surrounding it often is. The technology has substantial strengths, yet those strengths do not resemble the sweeping claims commonly found in marketing materials and media coverage. By grounding adoption in evidence rather than expectation, organizations can unlock real value while avoiding costly missteps. Progress depends not on speculation, but on accuracy.
References
[2] PitchBook Emerging Technology Report
[4] OpenAI GPT‑4 Technical Report
[5] The Verge – Google Gemini demo clarification
[6] Stanford CRFM Regression Study
[7] Harvard NLP Reasoning Evaluation
[8] Salesforce Data & AI Report
[10] New York Times – ChatGPT legal citation case
[11] Epoch AI Scaling Laws Report
[12] Nature – “The myth of sparks of AGI”
[13] McKinsey Global Survey on AI
(Mark Jennings-Bates, BIG Media Ltd., 2025)