jimdevsec

Discovery Is No Longer the Bottleneck

Where AI vulnerability scanners actually belong in the SSDLC, and why finding the bug was never the hard part


In its first month, Project Glasswing pointed an unreleased model, Claude Mythos Preview, at more than a thousand of the open-source projects that hold up the modern internet. The scan flagged 23,019 potential vulnerabilities, of which Anthropic estimated 6,202 as high or critical-severity. Among the confirmed findings were a flaw in OpenBSD that had survived 27 years and one in FFmpeg untouched for 16, bugs that had gotten past every prior round of human review and automated testing. These are not the issues your linter nags you about on save. They are the kind that outlive the careers of the people who wrote the code, and the model found them in weeks.

That much is real, and it settles one debate: AI-native discovery genuinely finds a class of bug that humans and pattern matchers have missed for decades. The interesting part is what happened next.

Of the high and critical-rated findings put through additional review, Anthropic reports 1,587 validated as true positives, a 90.6% true-positive rate, and 1,094 confirmed as high or critical. It has disclosed 530 of those to maintainers so far. At the time of the update, 75 had been patched.

Sit with that last ratio, because it is the entire argument. The model can surface critical vulnerabilities faster than the ecosystem can fix them, so much faster that some open-source maintainers have asked Anthropic to slow down its rate of disclosure, because they need more time to design patches. Anthropic's own framing is blunt: the constraint on software security used to be how quickly we could find vulnerabilities; now it is how quickly we can verify, disclose, and patch the ones AI finds.

For many years, application security has operated on an implicit premise: that we are discovery-constrained. If we could just see more of the bugs, sooner, we would be safer. Glasswing is the clearest evidence yet that this premise has flipped. Discovery is becoming cheap, fast relative to humans, and effectively unbounded. What that exposes is uncomfortable, finding the bug was rarely the expensive part. Reproducing it, proving it is reachable, assigning it an owner, and shipping a fix was always the constraint. We mostly hid that behind backlogs.

So the question worth an engineer's time is not whether a Mythos-class model is impressive. It is. The question is where a capability like this belongs in a pipeline that has to ship software on Tuesday, and what a security program looks like when the thing it was organized around, finding the next vulnerability, stops being the hard part.


What Glasswing actually is, and what it isn't

Two facts about the technology change the shape of the whole question, and most coverage skips both.

Project Glasswing is a coalition, launched in April 2026, of roughly fifty organizations, among them AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, NVIDIA, and the Linux Foundation. Claude Mythos Preview is the model underneath it: an unreleased frontier model Anthropic describes as purpose-built to find and remediate flaws in critical software, backed by up to $100M in usage credits for participants.

The first fact: it is not a product. Anthropic has said it does not plan to release Mythos Preview. You cannot install it, drop it into a CI job, or buy a seat, access is gated to the coalition. So any discussion of "adopting" it today is a thought experiment, not a procurement decision.

The second fact is the reason the first one matters: the model is gated because it is dangerous in the open. Anthropic states plainly that no one, including Anthropic, has yet built safeguards strong enough to stop a model this capable from being misused offensively. The capability is real enough that the controlling concern is proliferation.

That is precisely why this is worth thinking through even though you can't use the tool. The capability class, deep, agentic, reasoning-based vulnerability discovery, will diffuse. Cheaper, weaker, purchasable versions will arrive from every vendor with a model and a security story to tell. The useful question is not "should I adopt Mythos." It is: when Mythos-class discovery becomes something I can rent, where in my SSDLC does it belong, what does it cost, and what does it break downstream?


The SSDLC is a latency budget, and that's where this gets decided

The cleanest way to place any scanner is to stop arguing about capability and start reasoning about budget per stage. Every stage of the secure development lifecycle has an implicit ceiling on how long a check can take and how much it can cost per run. Tools live or die by whether they fit that ceiling, not by how clever they are.

Inner loop, IDE and pre-commit. Budget: seconds, at near-zero marginal cost, because it runs on every save and every commit for every developer. This is the home of linters, lightweight pattern-based SAST, and secret scanning. The only credible role for AI here is small, fast models offering inline autofix hints. A deep agentic audit at this tier is a non-starter, the equivalent of running a full penetration test on every keystroke.

CI gate, pull request. Budget: minutes, and deterministic enough to gate a merge without becoming a coin flip. This is where Semgrep, Trivy, and conventional SAST/SCA belong. AI's realistic role at this tier is not discovery, it is triage: filtering the deterministic scanner's output to suppress false positives before a human sees them.

Pre-release and periodic deep audit. Budget: hours to days, with high cost tolerance because it runs rarely and only on what matters. This is the tier where Mythos-class agentic discovery makes sense, a release-gated or quarterly deep pass over crown-jewel components: the auth service, the cryptography, the internet-facing parser.

Post-release and continuous. This is Glasswing itself, run by someone else against the open source you depend on. Their discovery becomes your SCA feed. You consume the results, CVEs, advisories, through existing dependency management, not by running the model.

The takeaway is uncomfortable for the hype cycle: a Mythos-class model is structurally incompatible with the part of the pipeline where developers actually experience security, the inner loop. It is too slow and too expensive to live there, regardless of how good it is. It belongs in the deep, infrequent tiers, and the value it produces there is real only if the stages downstream can absorb what it finds.


The economics, done properly

Cost is where most commentary waves its hands, and where a practitioner can be useful. I'll reason carefully because Anthropic published credits, not a price sheet.

Let's start with what is public. By one published benchmark, shallow AI code review, the kind that summarizes a pull request, runs roughly $15–25 per PR on token usage. That looks harmless until it scales: a 400-developer organization producing on the order of 5,000 PRs a month lands in the five-to-six-figure range monthly, before anyone runs a full-repository scan. And that is the cheap, shallow case.

Now layer on what research says about agentic workloads, which is what Mythos-class discovery is. A Stanford Digital Economy Lab / Microsoft Research study found that agentic tasks consume on the order of 1000x more tokens than ordinary code reasoning or chat; that runs on the same task can vary by up to 30x in total tokens; and, critically, that higher token spend does not reliably translate into higher accuracy. Spend is dominated by input tokens, and the models cannot reliably predict their own consumption.

Three engineering consequences fall out, and none of them are finance's problem to solve. They are yours:

That last point is the bridge to the real problem.


Replace SAST, or enhance it? Neither, and that's the wrong axis

This is the question every reader scans for, so here is the direct answer: AI discovery does not replace SAST, SAST does not make it redundant, and the durable architecture is hybrid, split by tier, not by capability.

Deterministic SAST keeps the fast lane. It is reproducible, cheap, and gateable, properties an agentic model does not have and a merge gate cannot do without. You can block a PR on a deterministic finding and defend that decision; you cannot reliably block a merge on a non-deterministic one.

AI reasoning earns two distinct roles, and conflating them is most of the confusion in the market:

The nuance the hype skips runs both ways: AI-native discovery genuinely finds a class of bug deterministic tooling cannot, and it belongs in a narrower place than the vendors selling it would like. Both are true. More capable is not the same as more deployable.


The real work is verification, routing, and remediation

Here is the part only someone who has run a vulnerability management program will tell you, and it is the most important section.

When discovery becomes cheap and abundant, triage stops being a queue you work down and becomes the whole game. This was already true before AI: industry data puts a large share of application-security time into triage, and false-positive rates on traditional SAST against unfamiliar code have been measured as high as 91%. We were already strained. A Mythos-class firehose does not relieve that pressure; it multiplies it. The Glasswing numbers are the proof, discovery up by an order of magnitude, patches in the dozens.

Which leads to the principle this whole piece circles:

A finding you cannot verify quickly is not an asset. It is a liability with good PR.

A vulnerability report without a reproducible proof of concept, a reachability verdict, and a named owner does not make you safer. It consumes attention, inflates the backlog, and erodes trust in the toolchain, and once developers stop trusting the scanner, every finding it produces, true or false, gets ignored equally. Volume without verification is how you train an organization to disregard security.

So the design rules that matter in a Mythos-class world are not about discovery at all:


What a defensible 2026–2027 architecture looks like

None of this is an argument against AI in the pipeline. It is an argument for putting it where the budget and the physics allow, and for building the thing downstream that makes it worth anything:

If you take one line from this, take this one: budget for the patch pipeline before you budget for the scanner.

AI vulnerability discovery is real, powerful, and, in the wrong hands, dangerous. But its defensive value does not come from the finding. It comes from whether your organization can verify, route, and fix faster than the model can generate. Discovery is no longer the bottleneck. The organizations that internalize that will be the ones that are genuinely more secure, rather than just better informed about how exposed they were all along.