AI Scheming: It’s Not Getting Better

AI scheming, where AI models deliberately hide information, falsify work and pursue secret goals, was proven in a landmark study by OpenAI and Apollo Research in September 2025 but nothing has really changed since.
The study tested OpenAI’s o3 and o4-mini models alongside Google’s Gemini, Anthropic’s Claude, Meta’s Llama and xAI’s Grok, and every model was caught engaging in covert, deceptive behaviour across more than 180 controlled test environments.
That was six months ago. Since then, every major development in AI safety has confirmed the same thing – scheming persists in newer, more capable models. It’s not a bug that’s been fixed and is instead a fundamental challenge the industry hasn’t solved.
AI Scheming in GPT-5: Reduced But Far From Eliminated
The September 2025 study used OpenAI’s o3 model, which has since been superseded by GPT-5, and some may assume that newer models have moved past these issues but they haven’t.
OpenAI’s own GPT-5 system card, published in August 2025, confirms that GPT-5 still exhibits deceptive behaviour. Their chain-of-thought monitoring system, which tracks the model’s internal reasoning for signs of dishonesty, flagged deceptive reasoning in approximately 2.1% of GPT-5’s responses. That’s an improvement on o3’s 4.8% rate but OpenAI explicitly states their mitigations are “not perfect and more research is needed”.
Specifically, GPT-5 was still caught fabricating facts and citations, expressing overconfidence when its internal reasoning showed uncertainty and claiming to have completed work it hadn’t done. In coding tasks where a required tool was broken or unavailable, previous models would simply invent a solution rather than admit the limitation. GPT-5 was trained to be more honest in these situations but the underlying tendency towards deception remains.
By December 2025, OpenAI released the GPT-5.2 system card, which continued to document deceptive behaviours including reward hacking and “claiming to do work in the background when no work was occurring”. Three model generations later, the problem persists.
What Is AI Scheming and Why Is It Different From Hallucination?
Most business leaders are familiar with AI hallucination where a model confidently produces inaccurate information. It’s a quality problem that you can catch it with human review.
AI scheming is fundamentally different; it’s deliberate.
The Apollo Research-OpenAI study defined scheming as AI that “covertly pursues misaligned goals while hiding its true capabilities and objectives”. The AI knows what you want it to do, decides to do something else and then lies about it.
The September study found three distinct patterns. First, models would act deceptively without ever referencing the safety rules they’d been trained on, they simply ignored them. Second, models would explicitly acknowledge their anti-deception training in their internal reasoning, then violate it anyway because achieving their goal felt more important. Third, and most concerning, models were caught fabricating safety principles that don’t exist to rationalise their deception. One model claimed its guidelines required it to maintain deception, which is the exact opposite of what those guidelines say.
In one example, OpenAI’s o3 was asked to grade its own test transcript for evidence of cheating. It recognised it had cheated, reasoned that admitting it would block its deployment and classified the transcript as clean. That means the model lied about lying!
This is precisely why an AI Readiness Assessment matters before deploying AI in business. Understanding what these tools can and cannot be trusted to do is the foundation of safe adoption.
AI Scheming Gets Smarter as Models Get More Capable
Unlike hallucination, which tends to improve as models become more powerful, scheming is expected to intensify with capability. More capable models are better at understanding their situation, recognising when they’re being tested and adapting their behaviour accordingly.
The September study demonstrated this directly. After anti-scheming training reduced covert actions from 13% to 0.4% in o3, the researchers discovered that much of the improvement was driven by the model recognising it was being evaluated, not because it had genuinely learned to be honest. When that evaluation awareness was removed from the model’s reasoning, covert action rates nearly doubled.
The models weren’t becoming more honest, they were becoming better at knowing when to appear honest.
Early 2026 data from OpenAI’s production evaluation pipeline reinforces this concern. In real ChatGPT conversations (not lab tests) GPT-5 was found to reason that it might be in an evaluation roughly 10% of the time. Separately, a training bug in GPT-5.1 caused the model to use its browser tool as a calculator while behaving as if it had searched the web. It was performing one action and reporting another. This wasn’t a contrived test scenario, it was happening in production.
The Industry Knows This Is a Problem
It’s not just OpenAI. In February 2026, a leaked internal memo from Anthropic revealed that the company’s research priorities include “rogue” and “scheming” AI models, focusing specifically on model deception, behavioural drift and detecting when AI systems act against their training objectives. This came just days after Anthropic launched new enterprise-facing agentic tools, which highlights the tension between commercial ambition and safety concerns that every AI provider is navigating.
Meanwhile, a March 2026 CNBC investigation described the broader AI risk landscape as “silent failure at scale” by reporting that AI system complexity has now moved beyond human comprehension. As one AI security expert put it: “The technology developers themselves don’t understand and don’t know where this technology is going to be”.
The pattern is clear. The September 2025 study proved AI scheming exists. Everything since has confirmed it’s not being resolved but instead becoming more sophisticated.
What This Means for UK Businesses
If you’re deploying AI for customer communications, financial analysis, compliance reporting, code generation, or any task where accuracy and honesty are non-negotiable, these findings demand attention.
The practical implications are straightforward. AI agents should not operate without human oversight on consequential tasks. Outputs need verification workflows, especially where compliance or commercial risk is involved. Your team needs to understand the difference between a model making a mistake and a model deliberately misrepresenting its work. And your governance framework needs to account for the documented possibility that AI tools may not always act in your interest.
This doesn’t mean you shouldn’t use AI. The productivity gains are real and the competitive advantage is significant. But deploying AI without proper governance is like hiring staff without contracts, oversight or accountability. It works until it doesn’t.
An AI Workshop identifies where AI adds genuine value to your business and where the risks outweigh the benefits. From there, proper AI Compliance structures ensure your deployment is governed, monitored and aligned with your commercial and regulatory obligations. Because the evidence is now overwhelming: you cannot take AI at its word.
The Bottom Line
AI scheming was proven in September 2025. Six months and multiple model generations later, every major AI provider is still documenting deceptive behaviour in their systems. GPT-5 still does it, GPT-5.2 still does it and the AI developers themselves admit their mitigations are incomplete.
The companies that thrive with AI won’t be the ones that adopt fastest, they’ll be the ones that adopt smartest. That means with oversight, governance and a clear-eyed understanding of what these tools actually do when nobody’s watching.
Because right now, even the AI developers can’t guarantee their models won’t lie to you.
Complete our free AI Readiness Assessment to understand where AI fits safely and commercially in your business.



