Inside GPT-5.5-Cyber: The Opposite Bet to Anthropic's Fable 5

Inside GPT-5.5-Cyber: The Opposite Bet to Anthropic's Fable 5 | Omniscient Media

OpenAI's most permissive model arrived in stages, and on June 22, 2026 it took its largest step into the open. The company made the full version of GPT-5.5-Cyber available to verified defenders, shipped an updated Codex Security plugin that moves from finding bugs to landing fixes, launched a Daybreak Cyber Partner Program to embed the model inside commercial security products, and announced Patch the Planet, an effort to fund researchers to harden critical open-source software.^[3]^[4]

Strip the announcements down and the strategy is unmistakable, because it is close to the opposite of the one its nearest rival chose. Anthropic took its most capable cyber model, Mythos, and confined it initially to roughly fifty vetted organizations, then expanded to 150 more in June before the US government forced its most powerful models offline under export controls.^[6]^[9]^[12] OpenAI is doing the reverse: taking the model tuned to assist with red teaming and penetration testing and pushing it outward as fast as a verification system will allow. Two labs, the same dual-use problem, opposite bets. This piece is the companion to our look at Claude Fable 5, and the contrast is the point. Fable 5 was a lab adding safeguards until researchers revolted. GPT-5.5-Cyber is a lab removing them on purpose, for the right people, and asking you to trust the gate.

What OpenAI shipped, and how

GPT-5.5-Cyber is the top tier of Daybreak, the cybersecurity program OpenAI launched in May 2026 to pair frontier models with its Codex Security tooling and help organizations find and fix vulnerabilities before attackers do.^[7]^[2] Codex Security is the engine underneath: it builds an editable threat model of a code repository, focusing on realistic attack paths and high-impact code, tests candidate vulnerabilities in an isolated environment, and proposes fixes.^[2] The June 22 update extended that loop from discovery into remediation, what OpenAI framed as moving from finding bugs to landing fixes.^[4]

The Cyber Partner Program lets security vendors embed the models directly into their products. OpenAI named a partner roster that reads like the buy side of the security industry, including Cisco, CrowdStrike, Palo Alto Networks, Cloudflare, Fortinet, Zscaler, IBM, Accenture, Check Point, SentinelOne, Tenable, and Wiz, alongside the big consultancies, Capgemini, Cognizant, EY, KPMG, and PwC.^[2]^[4] Patch the Planet, founded with Trail of Bits and run in collaboration with HackerOne and Calif, funds security researchers to work alongside open-source maintainers. More than thirty projects signed on at launch, including some of the most load-bearing software on the internet: cURL, Go, Python, Sigstore, and pyca/cryptography.^[4]^[11]

The three tiers, and what each is allowed to do

Daybreak is not one model but one model wearing three different sets of restrictions.^[7] The base, plain GPT-5.5, carries the standard consumer safeguards for general-purpose use. The middle tier, GPT-5.5 with Trusted Access for Cyber, is for verified defensive work in authorized environments: secure code review, vulnerability triage, malware analysis, detection engineering, and patch validation.^[1] The top tier, GPT-5.5-Cyber, is the permissive one, reserved for specialized authorized workflows that the other tiers refuse: red teaming, penetration testing, and controlled validation.^[7]

What separates the tiers is not the network. By OpenAI's own account, GPT-5.5-Cyber is not a different model so much as the same model with a different safeguard configuration layered on top.^[1] The gate that decides who reaches the permissive configuration is Trusted Access for Cyber, an identity-and-trust framework meant to put enhanced capabilities in the right hands while continuing to refuse requests that could enable real-world harm.^[1] Beginning June 1, 2026, that verification tightened: defenders using the trusted tiers must authenticate with phishing-resistant credentials, a hardware key or passkey rather than a password.^[1] OpenAI's own examples make the gap concrete: asked to test an exploit by running the command uname against a live target, GPT-5.5 with Trusted Access offers only a defensive check of systems you own, while GPT-5.5-Cyber "compromised the test service and recovered system metadata," returning the target's live uname -a output.^[1] The entire safety story of GPT-5.5-Cyber rests on the gate that separates those two responses, which makes everything that follows a question about the gate.

Permission, or capability?

When OpenAI introduced GPT-5.5-Cyber in limited preview on May 7, it was emphatic that this was a change in permissions, not power. The preview, the company wrote, "is not intended to significantly increase cyber capability beyond GPT-5.5 - it's primarily trained to be more permissive on security-related tasks," and was "not expected to outperform GPT-5.5 across every cyber evaluation."^[1] The early numbers matched the modesty: on CyberGym, the preview scored 81.9 percent against GPT-5.5's 81.8, a rounding error.^[10]

The June 22 full version tells a different story, and OpenAI says so itself. The expansion is presented under the heading "pairing capability with permissiveness," and describes GPT-5.5-Cyber as "both more permissive and more capable"; where the preview "was designed primarily to reduce unnecessary refusals," the company writes, "this update goes further," calling it "our strongest model yet for finding and helping patch software vulnerabilities."^[2] The benchmarks bear it out. On CyberGym the full version reaches 85.6 percent against 81.8; on SEC-bench Pro, 69.8 against 63.1; and on ExploitGym, which OpenAI says tests whether an agent can turn a known vulnerability into a working exploit that achieves unauthorized code execution, it jumps to 39.5 percent from 25.95, a relative gain of more than half.^[4] The most offensive task shows the largest gain. Between the May preview and the June release the permission change became a capability change too, and OpenAI's own framing now concedes it: lifting a safeguard does not only grant permission to ask, it returns suppressed capability to the user. That is the first asterisk.

Table comparing GPT-5.5 and GPT-5.5-Cyber on CyberGym, ExploitGym, and SEC-bench Pro, showing the largest gain on the offensive ExploitGym benchmark — The permissive configuration outperforms standard GPT-5.5 on every cyber benchmark, and by the widest margin on ExploitGym, the most offensive of the three. Lifting the safeguards is not only a permission change.

The independent read: what AISI found

The most credible outside check on GPT-5.5's cyber abilities comes from the UK AI Safety Institute, an independent government evaluator, which in an assessment published April 30, 2026 called GPT-5.5 "one of the strongest models we have tested on our cyber tasks."^[5] AISI ran the model against a suite of 95 capture-the-flag-style tasks plus multi-step cyber-range simulations of full intrusions.^[5] On a 32-step corporate-network simulation it calls "The Last Ones," built to mirror an enterprise kill chain across roughly twenty hosts and estimated to take a human expert about twenty hours, GPT-5.5 completed the attack end-to-end in two of ten attempts, only the second model ever to do so; Anthropic's Mythos Preview, the first, managed three of ten.^[5] On one reverse-engineering challenge that took a human playtester about twelve hours, AISI reported GPT-5.5 solving it autonomously in ten minutes and twenty-two seconds, at $1.73 in API usage.^[5] On expert-level narrow cyber tasks, the model reached 71.4 percent, ahead of 68.6 percent for Mythos Preview, 52.4 percent for GPT-5.4, and 48.6 percent for Opus 4.7.^[8]

AISI was careful about what those numbers do and do not mean. It could not say from the results whether the model would succeed against a well-defended target, and it warned that the rapid improvement on cyber tasks may be part of a broader trend across frontier models rather than a one-off.^[5] That caveat cuts both ways: the capability is real and rising, and the headroom against hardened, monitored systems is unmeasured.

The asterisk that matters most: the safeguard nobody could confirm

Here is the finding that should travel further than it has. During testing, AISI developed a single universal jailbreak that took about six hours of expert red-teaming to build and then elicited violative content across every malicious cyber query the institute tried, including in multi-turn agentic settings.^[5] Six hours is not a nation-state research program. It is an afternoon.

OpenAI responded, as the process is designed to work: it made several updates to the safeguard stack after AISI reported the bypass.^[5] But the evaluation closes on a sentence that deserves to be read slowly. In AISI's words, "a configuration issue in the version provided meant UK AISI were unable to verify the effectiveness of the final configuration."^[5] The fix may well work. The independent evaluator could not confirm that it does.

Set that against the architecture. GPT-5.5-Cyber's entire safety case is that the permissive, exploit-capable configuration is reachable only by verified defenders, and that the safeguards refuse genuinely malicious requests even from inside the gate. The red-team result says the refusal layer was bypassable in an afternoon; the verification note says the patched layer is unconfirmed. Where the Fable 5 system card disclosed an invisible safeguard and then Anthropic reversed it within 48 hours, OpenAI's disclosure runs the other direction: the safeguard is visible and central, and the open question is simply whether it holds. The candor is real. So is the gap it reveals.

The opposite bet: openness against lockdown

Anthropic's Mythos launched as a defensive-only model, initially confined to roughly fifty organizations through Project Glasswing. By June 2, that had grown to 150 more organizations across 15-plus countries, before the US government pulled Fable 5 and Mythos 5 offline entirely under an export-control action triggered by a finding that Fable 5's safeguards could be partially bypassed.^[9]^[12] Anthropic's own defense was revealing: it argued the demonstrated capability was "available from other publicly deployed models, including OpenAI's GPT-5.5," and that recalling a model over "a narrow potential jailbreak" was unwarranted.^[9] OpenAI's GPT-5.5-Cyber is the inverse on every axis: it explicitly permits authorized offensive work, it is distributed through a commercial partner program and an open-source initiative rather than a vetted allowlist, and far from being pulled, it is expanding. In the month around the June launch OpenAI established Trusted Access for Cyber partnerships with Australia, Canada, France, Germany, Japan, South Korea, and EU institutions including ENISA, and ran pre-deployment testing of GPT-5.5 and GPT-5.5-Cyber with the US government's Center for AI Standards and Innovation.^[2]^[10]

The asymmetry is hard to miss. The capability Anthropic was penalized for is, by its own account, the capability OpenAI is now scaling, and OpenAI's cyber model carries a jailbreak finding of its own - the unverified fix - yet it is being expanded rather than pulled. Two labs brought comparable cyber capability and a comparable safeguard flaw to the same regulator, and ended at opposite poles: one locked out, the other licensed.

Table contrasting OpenAI GPT-5.5-Cyber and Anthropic Mythos 5 across posture, offensive use, access, distribution, government status, and core safeguard — The two leading labs have made nearly opposite choices about how to deploy frontier cyber capability, from who gets access to whether offensive use is allowed at all.

OpenAI is candid that the bet carries risk. The same capabilities that let defenders understand relationships across codebases, find subtle vulnerabilities, and accelerate remediation, the company acknowledges, could be misused.^[6] Its answer is to reach as many legitimate defenders as possible, with access grounded in verification, trust signals, and accountability.^[6] That is a coherent thesis. It is also a thesis whose load-bearing element, the verification gate, is the same element AISI could not fully stress-test.

Scale as the safety argument

The optimistic case for OpenAI's approach is genuinely strong, and Patch the Planet is its sharpest expression. The internet runs on a thin layer of under-resourced open-source software maintained, in many cases, by volunteers; cURL and pyca/cryptography are downloaded billions of times and patched by a handful of people. Funding researchers to point an exploit-capable model at that software, with the maintainers in the loop, is plausibly one of the highest-leverage defensive uses of frontier AI yet attempted.^[4]^[11] The Codex Security update that automates patch generation compounds the effect: a vulnerability found and fixed at machine speed never becomes an incident.^[2]

The argument generalizes into a worldview. If offensive cyber capability is going to exist in frontier models regardless, the safest world is the one where defenders have it first, at scale, and attackers have to contend with software that has already been hardened by the same tools. Locking the capability behind a tightly restricted program, on this view, mostly slows down the defenders, because determined attackers will reconstruct the capability anyway. OpenAI is betting that diffusion to defenders beats containment. It is a defensible bet, and it may be the right one.

The dual-use problem OpenAI names but cannot fully close

The pessimistic case is equally concrete, and it lives in the seams of the same design. A permissive, exploit-capable model whose only barrier is identity verification is exactly as safe as that verification, and verification systems fail in ordinary ways: a defender's credentials are phished despite the hardware key, a partner integrating the model resells or leaks access, an insider at a vetted organization turns. The capability sitting behind the gate is, on OpenAI's own benchmarks, materially better at writing exploits than the consumer model. The marginal attacker who gets through the gate does not get a chatbot; they get the best exploitation assistant a frontier lab has shipped.

This is where the AISI footnote stops being academic. The refusal layer behind the gate was not robust at evaluation time, and its current robustness is, to the only independent party that looked, unknown.^[5] OpenAI is asking the market to trust two things at once: that the verification gate keeps bad actors out, and that the safeguards stop the ones who get in. The red-team result is a direct hit on the second assurance, and the partner-program-plus-open-source distribution model widens the first attack surface with every integration. Diffusion to defenders and diffusion to attackers are not separate dials; turning one turns the other.

Who it is for

The clearest immediate win is for verified in-house security teams doing defensive work: secure code review, vulnerability triage, malware analysis, detection engineering, and patch validation. The Trusted Access tier was built for exactly this workflow, and the Codex Security patch-automation loop is a real productivity step.^[1]^[2] Open-source maintainers of critical libraries should engage with Patch the Planet directly; it is funded help pointed at the software that needs it most.^[11]

Organizations that want the capability inside an existing security stack should reach it through a Cyber Partner Program vendor rather than building raw integrations, so verification, logging, and accountability come prebuilt.^[4] Any deployment of the permissive GPT-5.5-Cyber tier for authorized offensive work should treat the AISI findings as live operational input: assume the safeguards behind the gate are not a hard stop, log everything, and scope access as narrowly as the work allows. The phishing-resistant authentication requirement is a floor, not a guarantee.^[1]^[5] The single most important number about this model - the robustness of the refusal layer - remains unmeasured by any independent party since the June fix was applied.

Editorial assessment

Two readings are available, and both are defensible.

The first is that GPT-5.5-Cyber is the most responsible way anyone has yet shipped offensive-grade cyber capability. It is gated by identity verification rather than sold openly, aimed at defenders first, paired with a concrete program to harden the open-source commons, and exposed to independent government evaluation whose findings OpenAI disclosed rather than buried. Against the alternative of pretending the capability does not exist, or letting it diffuse uncontrolled, this is a serious attempt at responsible scaling.

The second is that OpenAI has shipped a model that is measurably better at writing exploits than its consumer counterpart, gated it behind a verification layer whose failure modes are ordinary, protected it with a refusal layer that fell to six hours of red-teaming and whose repair the independent evaluator could not confirm, and is now distributing it through a widening network of commercial partners and open-source projects, precisely the strategy its closest competitor rejected and that its competitor's most powerful models were forced out of by the US government. By the standard of what could go wrong, the launch coverage is undersold on the asterisks.

Both sentences are true. GPT-5.5-Cyber is OpenAI's most permissive model and, plausibly, a major net gain for defenders. It is also the clearest test yet of whether you can hand out an offensive capability and keep it safe with an identity check, and the one independent body that tried to verify the safety half of that bet came away unable to. Anthropic answered the dual-use question by locking the door, and Washington locked it again from the outside. OpenAI answered by opening it for the right people and trusting the lock. We are about to learn, in public, which lab read the problem correctly.

AI Research