
Sign in to join the discussion.
OpenAI this week released GPT-5.4 mini and GPT-5.4 nano, completing the GPT-5.4 family with two models designed for high-volume, latency-sensitive workloads.[1] The launch is not simply a cost reduction play. It reflects a structural shift in how AI is deployed: less as a monolithic reasoning engine, more as a hierarchy of specialized agents, each sized to its task.
GPT-5.4 mini runs more than twice as fast as GPT-5 mini and closes much of the gap with the flagship model on key benchmarks.[2] On SWE-Bench Pro, a test measuring a model's ability to resolve real GitHub issues, mini scores 54.4%, compared to 45.7% for GPT-5 mini and 57.7% for GPT-5.4 itself.[2] On OSWorld-Verified, which assesses desktop computer use by reading screenshots, mini reaches 72.1%, just below the human baseline of 72.4% and just short of GPT-5.4's 75.0%.[2]
GPT-5.4 nano occupies the lowest tier: 52.4% on SWE-Bench Pro and 39.0% on OSWorld, meaningfully below mini but a substantial leap over previous nano-class models.[2] It is API-only at launch, which signals OpenAI's intent clearly; nano is a developer primitive, not a consumer interface.
Pricing reflects the tiering. GPT-5.4 mini costs $0.75 per million input tokens and $4.50 per million output tokens. Nano is $0.20 input and $1.25 output, roughly four times cheaper on inputs than mini and more than twelve times cheaper than the full GPT-5.4 at $2.50/$15.00.[2]
What makes this launch consequential is less the individual model specs than the architectural pattern they enable. OpenAI explicitly positions mini and nano as subagent models: systems where a large reasoning model (GPT-5.4 Thinking, for instance) plans and coordinates while smaller models execute discrete tasks in parallel.[1] Searching a codebase, reading a file, processing a form, interpreting a screenshot: these are jobs where latency matters and where burning GPT-5.4 quota is economically irrational.
Within Codex, this is already operational. GPT-5.4 mini uses only 30% of the GPT-5.4 quota and can be delegated to by Codex for less reasoning-intensive work.[2] Aabhas Sharma, CTO of AI research and analysis platform Hebbia, reported that mini "matched or exceeded competitive models on several output tasks and citation recall at a much lower cost" and achieved higher end-to-end pass rates and stronger source attribution than the full GPT-5.4 model on their evaluations.[2]
GPT-5.4 mini is available now in the API, in Codex, and in ChatGPT for Free and Go tier users via the "Thinking" option. For paid subscribers, it serves as the automatic rate-limit fallback for GPT-5.4 Thinking. Nano is API-only.[2]
The cadence of OpenAI's model releases this quarter (GPT-5.3 Instant, GPT-5.4 Thinking, GPT-5.4, GPT-5.4 mini, GPT-5.4 nano) reflects a deliberate effort to tile the cost-performance spectrum at every level. The strategy mirrors how cloud computing matured: dominant players won not just by having the best flagship instance type, but by offering the right size at the right price for every conceivable workload. The frontier model is the attention-getter. The nano is where the margin lives.