The most dangerous AI That got leaked Before Launch
They built the most dangerous AI in the world. Then accidentally leaked it. Now they won’t release it.
Anthropic built an AI that warns about unprecedented cybersecurity risks. Then leaked it through a security mistake.
Something unusual happened in late March. Anthropic, one of the most careful and safety-focused AI labs in the world, accidentally leaked details about their most powerful model yet, through a misconfigured content management system that left a draft blog post publicly searchable.
The model is called Claude Mythos. And the irony runs deep: a company building AI that warns about cybersecurity risks, exposed that warning to the world through a basic security lapse.
But what came after the leak is the real story.
What is Claude Mythos?
Anthropic described Mythos as a “step change” in AI performance and “the most capable we’ve built to date.” The company confirmed it is currently being trialed by early access customers.
The leaked document revealed that Mythos is part of a new tier of models called Capybara, larger and more intelligent than the existing Opus models, which were Anthropic’s most powerful until now. Compared to Claude Opus 4.6, Capybara gets dramatically higher scores in software coding, academic reasoning, and cybersecurity.
This isn’t just a performance upgrade.
The leaked blog post stated that Mythos is “currently far ahead of any other AI model in cyber capabilities” and that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”
Think about what that means. A single AI agent scanning for vulnerabilities, faster and more persistently than hundreds of human hackers combined. No coffee breaks. No fatigue. No learning curve.
A Dark Reading poll found that 48% of cybersecurity professionals now rank agentic AI as the number one attack vector for 2026, above deepfakes, above everything else.
So what is Anthropic doing about it?
Rather than sitting on the technology or rushing a general release, Anthropic made a deliberate choice: share it with defenders first.
The initiative is called Project Glasswing. It gives companies including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, and Nvidia access to a preview of Claude Mythos exclusively for defensive security work, and requires them to share what they learn with the wider industry.
The name itself is symbolic. Employees chose it as a metaphor referencing the glasswing butterfly, its transparency likened to software vulnerabilities, which are “relatively invisible.”
The results so far are staggering. Over the past few weeks, Mythos Preview has already identified thousands of zero-day vulnerabilities. Over 99% of the bugs found have not yet been patched. The model has improved to the point where it mostly saturates existing cybersecurity benchmarks.
In one case, the model identified a 27-year-old bug in OpenBSD, an operating system specifically known for its strong security.
The bigger picture for business leaders
This is not just a story about one AI model. It is a signal about where the industry is heading.
According to experts, the speed and scale of AI agents scanning for vulnerabilities, far beyond normal human capabilities, represents a sea change in cybersecurity. As one security researcher put it: “Unlike attackers, defenders don’t yet have AI capabilities accelerating them to the same degree. However, the attack capabilities are available to both sides, and defenders must use them if they’re to keep up.”
At the same time, enterprise risk is growing from within. Employees firing up AI agents at home and connecting them to internal work systems are unknowingly opening new doors for attackers, a phenomenon the industry is calling “shadow AI.”
Every leader reading this needs to be asking: what is our AI governance policy? Do we know what agents our teams are running? Are we using AI defensively as aggressively as attackers will use it offensively?
The bottom line
Anthropic’s Dario Amodei said it best: “The dangers of getting this wrong are obvious, but if we get it right, there is a real opportunity to create a fundamentally more secure internet and world than we had before the advent of AI-powered cyber capabilities.”
Claude Mythos is not the end of cybersecurity as we know it. It could actually be the beginning of something better, if the right people move fast enough.
The question is whether defenders will move as fast as the attackers who are already paying attention.
What’s your take, is this the right approach from Anthropic, or should this level of capability never exist at all? Drop your thoughts below.
What is your take on this?
#ArtificialIntelligence #Cybersecurity #AIStrategy #GenerativeAI #TechLeadership
