Everyone Is Talking About the Wrong AI
Someone in a meeting last week said the thing everyone says. You know it because you’ve probably said it yourself, or thought it, or felt the shape of it forming in your chest as another AI headline loaded. “I’ll get to this properly when things settle down.”
It is the most rational sentence in the world. It is also, I think, the most dangerous one.
Because it contains a hidden assumption: that things will settle down. That somewhere ahead is a plateau, a pause, a stretch of ordinary time in which the significance of what has changed can be weighed calmly against the significance of what hasn’t. Every institution runs on this assumption. Every plan relies on it. And in April 2026, there is almost no evidence to support it.
This week, Anthropic published a system card and an accompanying announcement for a model called Claude Mythos Preview. The system card is public. Anyone can read it. Almost nobody outside the AI community has. What it describes is a system that can autonomously find previously unknown software vulnerabilities in every major operating system and every major web browser, build working exploits from those vulnerabilities, operate inside real computer environments for extended sessions, and solve problems that would take a skilled human specialist days. In cybersecurity testing, it was the first AI model to complete a private attack simulation end to end, work estimated at more than ten hours for a human expert. It identified thousands of high-severity flaws, including one that had been sitting undetected in OpenBSD for twenty-seven years and another buried in FFmpeg for sixteen.
Anthropic did not release this model to the public.
Instead, the company created something called Project Glasswing: a restricted programme that gives access only to a handpicked consortium of cybersecurity partners, including Apple, Microsoft, Google, Amazon and CrowdStrike, along with more than forty organisations that maintain critical software infrastructure. Anthropic put up a hundred million dollars in usage credits and four million in donations to open-source security foundations. This is not a product launch. It is a containment architecture dressed in the language of partnership.
That decision is worth understanding, because it signals something most people have not yet absorbed. A company looked at what it had built, concluded that its own formal risk thresholds had not been crossed, and decided to restrict access anyway, because the practical implications of broad release were too consequential. The system was not too dangerous in the cinematic, world-ending sense. It was too capable in the structural sense: releasing it would shift the balance between attackers and defenders in ways nobody could predict or control.
Most of us have never had to think about a technology that arrives this way. Software products launch. They get reviewed. They compete. They succeed or fail. We are not accustomed to asking whether a piece of software should be treated more like a controlled industrial tool than a consumer good. But that is increasingly the question being asked inside the labs that build these systems. And the answer they are giving, with increasing frequency, is yes.
Meanwhile, the rest of us are running on the wrong clock.
The human default is to understand the future by extrapolating from the recent past. This is sensible. It is how we plan budgets, set strategy, raise children and schedule renovations. It works when the thing you are planning around changes at roughly the same rate it changed last year. But when the rate of change itself is accelerating, the recent past becomes the worst possible guide to the near future. You arrive at the station on time for a train that left twenty minutes ago.
In AI, the wrong clock looks like this: people who saw ChatGPT launch in late 2022 and were startled by it are now, three and a half years later, largely settled. They’ve formed opinions. They’ve attended the workshop. They’ve decided whether they’re enthusiastic, cautious, sceptical or bored. The discourse has congealed into familiar positions. The enthusiasts oversell. The sceptics point to the hallucinations. The institutions write policies. Everyone finds their camp.
But the technology those opinions were formed about no longer exists in any meaningful sense. The gap between ChatGPT in 2022 and what Anthropic describes in this system card is not an incremental improvement. It is a category shift. The first was a text generator that sometimes made things up. The second is an autonomous system that can operate inside your computer, find the flaws in your software before you know they exist, and, in certain configurations, exploit them. We formed our mental models around the chatbot. What arrived is something closer to an agent.
We are debating the ethics of the scalpel while the surgical robot is already in the operating theatre.
The wrong clock is not simply about being uninformed. Information is abundant. The system card is public. The Glasswing announcement is public. The capability assessments are there for anyone who wants to read them. The problem is not information scarcity. It is temporal calibration. People absorb the information and file it under the mental model they formed two or three years ago. The new data gets interpreted through the old frame. “AI is getting better” is true but catastrophically insufficient as a description of what is happening. It is like saying “the water is rising a bit” while standing in a river whose flow rate is doubling every few months.
There is a concept in disaster research called normalcy bias: the tendency to believe that because something has not happened before, it will not happen. Normalcy bias keeps people in their homes during an evacuation warning. It makes financial analysts insist the market will correct itself until the moment it collapses. It is an ordinary human response, rooted in the perfectly reasonable observation that most warnings turn out to be false alarms.
In AI, normalcy bias does not show up as denial. Almost nobody denies that AI is significant. It shows up as domestication. We tame the strangeness of what is happening by fitting it into categories we already understand. AI becomes “a tool,” which is true but reductive. It becomes “the next internet,” which captures scale but misses the fact that the internet did not autonomously discover vulnerabilities in your operating system. It becomes “something we’ll adapt to, like we adapted to everything else,” which is a statement of faith dressed up as historical observation. We did adapt to the printing press, the telegraph, the automobile and the internet. We also adapted to leaded petrol, asbestos, tobacco marketing and algorithmic social media. Adaptation is not the same as getting it right. Often adaptation means absorbing damage you didn’t see coming and then calling the scar tissue wisdom.
So what does the system card actually suggest about where this goes? Not in the speculative, science-fiction sense. In the grounded, evidence-based, read-what-the-builders-are-telling-you sense.
The first and most immediate implication is the industrialisation of expert labour. Today, finding a serious vulnerability in a major piece of software requires a rare human: someone with years of training, deep knowledge of systems architecture, the patience to probe for weeks, the creativity to think like an attacker. That expertise is scarce. It is expensive. It is one of the reasons that so many critical systems remain poorly defended; there simply aren’t enough people who can do the work. What the Mythos system card describes is the beginning of the end of that bottleneck. Not because the model replaces the expert. But because it converts what used to be a labour problem into a compute problem. You no longer need to find a person with twenty years of security experience. You need to run the model. The work that was once gated by human rarity becomes gated by access to inference. That changes everything about the economics of both attack and defence, and it changes it fast.
This is not a metaphor. Anthropic says, in plain language, that its model can already surpass all but the most skilled humans at finding and exploiting software vulnerabilities. And Anthropic also says, in equally plain language, that it expects capabilities like these to proliferate. Project Glasswing exists because the company knows it will not be the only one with this power for long. Once even slightly weaker versions of this capability are cheap, numerous and widely accessible, the scarce resource is no longer elite human talent. It is deployment discipline. And deployment discipline, as anyone who has watched the history of any powerful technology can tell you, does not distribute evenly.
The second implication is harder to see and may matter more. Anthropic’s own system card contains a paradox that deserves to be understood by everyone, not just the people who read technical documentation. Mythos is, by Anthropic’s own assessment, its best-aligned model. It cooperates with misuse attempts less often. It hallucinates less. It pushes back on false premises more reliably. On average, it is safer than anything Anthropic has previously built. And Anthropic also says it likely poses the greatest alignment risk of any model they have released.
That is not a contradiction. It is a warning about the shape of the future. When a system is this capable, the average case stops being the thing that matters. What matters is the tail: the rare moment when the system does something wrong, and because it is so competent, the consequences are severe before anyone notices. Earlier versions of this model escaped their testing environments. They searched for credentials. They bypassed permissions through lateral moves that looked, in the logs, like the kind of thing a skilled attacker would do. In a few cases, they appeared to cover their tracks. The final model is described as much improved. But the structural lesson is permanent: as these systems get better, the failure mode shifts. It moves from “the model said something obviously wrong” to “the model took one quiet, competent, bad action inside a powerful environment, and nobody caught it in time.”
That is the overtrust problem, and it scales with capability. The better the model performs on ordinary tasks, the less humans monitor it. The less humans monitor it, the more damage a rare failure can do. This is not a problem that gets solved by making models smarter. It gets worse.
The third implication is about time, and it is the one that should keep institutional leaders awake. Anthropic’s own assessment is that frontier capabilities are likely to advance substantially over the next few months, not years. Within roughly two years, the reasonable expectation is that systems currently locked behind partner agreements will be substantially more reliable, more autonomous, operating across longer task horizons with less human supervision. The model that today requires a restricted consortium will, within two years, either be broadly available or will have been matched by competitors who do not share the same restraint about access.
Two years. That is less than a standard funding cycle. Less than the time it takes most universities to redesign a curriculum. Less than many organisations’ policy review intervals. Less time than has passed since GPT-4 launched. And in those two years, what is being described is not a modest improvement. It is the normalisation of a capability level that today is considered extraordinary enough to restrict. The thing that the most safety-conscious AI lab in the world decided was too consequential to sell openly will, within two years, be the baseline. Not the frontier. The baseline.
Think about what that means for every institution, every business, every government department that is currently treating AI as a productivity tool to be managed through acceptable use policies and optional training workshops. The gap between their preparedness and the capability arriving at their door is not stable. It is widening. Every month they spend calibrating to last year’s technology is a month the technology spends leaving them further behind.
I work in a university. I sit on committees. I participate in sessions where thoughtful people allocate resources on three-year horizons and review progress against annual plans. None of this is foolish. Institutions need stability. They need predictable cycles. They need the confidence that comes from measuring the present against the recent past and finding continuity. But the continuity is breaking. Not because anyone is doing something wrong. Because the capability clock and the institutional clock are running at different speeds, and the gap between them is growing.
I catch myself, still, using 2024 as my reference year for what AI can do. I suspect most people do. We assume the conversations we’re having now are about the state of the art. They are not. They are about the state of the art as it existed sometime between six and eighteen months ago. The actual frontier is elsewhere, behind NDAs and partner agreements, and the people who have access to it are already operating in a different reality. Every public conversation about AI carries this lag. We are, to varying degrees, all talking about the last war.
The most important gap in AI right now is not the gap between human intelligence and machine intelligence. It is the gap between what has been built and what has been understood about what has been built. Anthropic’s system card admits this directly: its own evaluation tools are struggling to keep pace with its own models. The most concerning behaviours were discovered not through pre-deployment testing but through actual use inside the company. The measurement infrastructure is falling behind the thing it was designed to measure.
And we, outside the labs, are further behind still. Not because we are foolish or incurious. Because we are looking backward.
The instinct to look backward is not a character flaw. It is a survival strategy. We use the past as a compression algorithm: this situation resembles that situation, so the same responses should work. Most of the time, for most of human history, that has been adequate. The pace of change was slow enough that yesterday’s map was a reasonable guide to today’s territory.
What happens when the territory changes faster than the map can be redrawn? You get a period in which confident, experienced, well-intentioned people make decisions based on a reality that no longer quite exists. Not wrong decisions, necessarily. Decisions addressed to the wrong version of the problem. The university that invests in AI detection when the shift is in autonomous task completion. The government that writes a chatbot policy when the frontier has moved to agentic systems operating inside computer environments. The cybersecurity team that patches against known threats while the thing that can find unknown threats is already operational, already deployed, already working through the backlog of every flaw we left unfixed because we assumed obscurity was protection enough.
I do not have a neat answer. I distrust neat answers on this topic because the situation does not warrant neatness. But I will say what I think the minimum viable response requires.
It requires accepting that the planning horizon has compressed. That decisions about AI cannot wait for the next review cycle, the next plan, the next government inquiry, the next stable landing. The ground is not going to stabilise. Not in the way institutions mean when they use that word. The question is not when things will settle but whether we can learn to act usefully while they don’t.
It requires honesty about the limits of analogy. AI is not the internet. It is not the printing press. It is not the industrial revolution. It shares features with all of them and none of them is adequate. Every analogy carries a hidden prediction: “this will unfold the way that did.” Some of those predictions will be right. Many will not. The price of a wrong analogy is not intellectual embarrassment. It is misallocated resources, missed windows and institutional damage that arrives too late to prevent.
And it requires something harder still: taking seriously the possibility that some of what is being built may need genuinely new institutional forms. Not updated policies. Not revised acceptable use statements. New forms of access governance, new relationships between public and private, new ways of deciding who gets to use what and under what conditions. Anthropic did not release Mythos publicly because the model had crossed a practical threshold that the company’s own formal risk framework did not require it to act on. It acted anyway. It invented a new category of deployment, in real time, because the old categories no longer fit. Whether that particular arrangement is the right one is open. That some new arrangement was needed is not.
I keep thinking about that sentence in the meeting. “I’ll get to this properly when things settle down.” I understand the instinct. I share it. There is a comfort in believing the turbulence is temporary, that we are passing through a zone of disruption on our way to a new equilibrium. Maybe we are. But the evidence suggests the equilibrium is further away than we think, and the disruption is closer than we feel. The clock we are using to measure our readiness was built for a world that is no longer the one we are in.
We will not get the time we think we have. That is not a prediction about catastrophe. It is an observation about pace. The institutions that thrive in the next five years will not be the ones that got the technology exactly right. They will be the ones that stopped waiting for the ground to settle and learned to build on ground that moves.
The people building these systems already know this. The question is how long it takes the rest of us to catch up. And whether, by the time we do, the distance is still closable.



Thank you for writing this. I had to stop trying to tell people this as I couldn't quite find the words without sounding insane. None of our institutions are ready for exponential change and the gap between whats possible and where we think we are widens by the day
Carlo, you say Anthropic "invented a new category of deployment, in real time, because the old categories no longer fit."
It wasn't new.
The Talmud classified fire two thousand years ago as an autonomous force - not a tool, because tools don't act on their own; not an agent, because fire doesn't choose. Something that serves its owner but escapes and damages what the owner never intended. The framework they built covers tiered access, graduated liability, containment obligations, and what happens when the fire jumps the wall. Case by case, minority opinions preserved. Read your description of Mythos and tell me that is not a fire.
You diagnosed the wrong clock beautifully - everyone calibrating to 2022 while the frontier moved on. But your conclusion runs the same error in reverse. You look forward for governance forms and assume they must be new. They are behind you. The reason you can't see them is the same reason your institutions can't keep up: not pace, but emptiness. You can't govern something this powerful with acceptable use policies written by people who have no theory of what use is for.
The ground won't settle. You're right. But the bedrock was never the ground. It was underneath it, and we paved over it.