From Seven Chips to One Trillion Dollars: NVIDIA's Vera Rubin Redraws the AI Infrastructure Map

From Seven Chips to One Trillion Dollars: NVIDIA's Vera Rubin Redraws the AI Infrastructure Map | Omniscient Media

Jensen Huang has a habit of making chip announcements feel like civilizational events. At GTC 2026 in San Jose, before an audience of 30,000 attendees, he had the numbers to justify the drama. NVIDIA now sees at least $1 trillion in cumulative revenue from Blackwell and Vera Rubin shipments through 2027, double the $500 billion figure the company cited just twelve months ago.^[1] The gap between those two projections is the clearest single measure of how fast inference demand has grown as the industry's workloads shifted from training toward agents.

The throughline of this year's keynote was a new claim about what AI actually needs now. NVIDIA describes a "fourth scaling law," agentic scaling, in which AI systems increasingly communicate with other AI agents rather than with human users, collapsing the acceptable latency threshold for token generation by an order of magnitude.^[2] Where 100 tokens per second was once a reasonable benchmark for a conversational interface, agent-to-agent pipelines will demand 1,500 TPS or more. Everything announced at GTC flows from that single premise.

Seven Chips, Five Racks, One Supercomputer

Vera Rubin is now in full production. The platform comprises seven co-designed chips: the Rubin GPU, the Vera CPU, NVLink 6 switches, ConnectX-9 SuperNICs, BlueField-4 DPUs, and Spectrum-6 Ethernet switches, joined at GTC by the newly integrated Groq 3 LPU, assembled into five rack-scale systems.^[3] Huang described the whole stack as a single vertically integrated system optimized end to end, not a collection of components.

The headline performance claim is stark: Vera Rubin delivers 10 times more inference throughput per watt than Blackwell, at one-tenth the cost per token.^[3] At the rack level, the VR NVL72 (72 Rubin GPUs paired with 36 Vera CPUs) runs at 3.6 ExaFLOPS of FP4 inference, against 1.44 ExaFLOPS for the equivalent Blackwell Ultra rack. HBM4 memory bandwidth per GPU reaches 22 TB/s, nearly triple the 8 TB/s of Blackwell Ultra's HBM3e, and NVLink 6 delivers double the per-GPU interconnect bandwidth of the previous generation.^[3]

The Vera CPU itself deserves more attention than it typically receives in chip launch coverage. Built on 88 custom NVIDIA Olympus cores with Armv9.2 compatibility, it is explicitly designed for reinforcement learning environments: the CPU-heavy sandboxes where AI agents execute tool calls, compile code, and generate reward signals during post-training. NVIDIA's Vera CPU Rack packs 256 liquid-cooled processors into a single rack, sustaining more than 22,500 concurrent CPU environments with 400 TB of total memory and 300 TB/s of aggregate memory bandwidth.^[3] The subtext is that NVIDIA no longer wants customers sourcing x86 processors from AMD or Intel to run their RL training loops. The CPU rack plugs that gap in the vertically integrated stack.

The Groq Integration: Solving the Latency Floor

The most technically significant announcement at GTC was not a GPU at all. NVIDIA revealed the Groq 3 LPU and LPX Rack, the first commercial product from the $20 billion Groq IP licensing agreement, and its role in the Vera Rubin platform.^[4] The design logic is precise: where Rubin GPUs provide 288 GB of HBM4 at 22 TB/s bandwidth per chip, the Groq 3 LPU trades memory capacity for bandwidth, delivering 150 TB/s from 500 MB of on-chip SRAM per chip.^[3]

A single LPX Rack houses 256 Groq 3 LPUs, providing 128 GB of aggregate SRAM with 640 TB/s of scale-up bandwidth, more than double the per-GPU interconnect of a Rubin NVL72 rack.^[3] When co-deployed with the VR NVL72, the two systems divide decode labor: attention computations remain on the Rubin GPUs while feed-forward layers are offloaded to the LPUs. NVIDIA's VP and GM of Hyperscale and HPC, Ian Buck, described the goal as boosting decode performance at "every layer of the AI model on every token."^[5] The company projects the combination can push throughput from 100 tokens per second to 1,500 TPS for agent intercommunication workloads.

The LPX Rack effectively replaces the CPX inference accelerator concept NVIDIA previewed last year. At the GTC Q&A, NVIDIA confirmed it is now focused on integrating Groq 3 LPX with Rubin rather than developing CPX further, a significant architectural pivot that industry observers had speculated about since the Groq licensing deal closed.^[3] Crucially, no changes to CUDA are required; the LPU operates as a transparent accelerator to the existing software stack.

Dynamo 1.0: The Inference Operating System

Hardware announcements consumed most of the keynote's oxygen, but a quieter release shipped the same day that may matter as much over time. NVIDIA Dynamo 1.0, an open source distributed inference framework, entered general availability at GTC, and its production footprint is already substantial. AstraZeneca, ByteDance, CoreWeave, Meituan, Pinterest, SoftBank Corp., Tencent Cloud, Together AI, and more than a dozen other organizations have deployed Dynamo in live inference workflows. All four major cloud hyperscalers (AWS, Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure) have built integrations for managed Kubernetes environments.^[8]

Dynamo functions as what NVIDIA calls the "operating system" of an AI factory: it orchestrates GPU and memory resources across a cluster, routes requests to workers that already hold the most relevant KV cache state, and disaggregates prefill and decode phases to maximize GPU utilization. In the SemiAnalysis InferenceX benchmark, Dynamo boosted the inference performance of NVIDIA Blackwell GPUs by up to 7x in disaggregated serving configurations on GB200 NVL72.^[8] For agentic workloads specifically, Dynamo's new priority-based routing and cache-pinning features, combined with the NeMo Agent Toolkit, demonstrated up to 4x lower time-to-first-token on Hopper hardware.^[9]

That combination of results (7x on throughput, 4x on agentic latency) is what makes the platform argument credible. Neither figure is achievable through hardware upgrades alone. The implication is that a customer fully committed to the NVIDIA stack, running Dynamo on Blackwell today, unlocks gains that a heterogeneous cluster cannot replicate by swapping in a competing accelerator. Vera Rubin deepens that dynamic further: Dynamo's Grove API is already purpose-built for topology-aware scheduling on the GB300 NVL72 rack, with the same scheduling architecture designed to extend across future Rubin deployments.^[9]

Meta and the Proof-of-Concept Partnership

The sharpest real-world validation of NVIDIA's vertical integration thesis arrived not at GTC but three weeks earlier, when Meta and NVIDIA announced a multi-year, multi-generational supply agreement. Meta committed to deploying "millions" of Blackwell and Vera Rubin GPUs alongside NVIDIA's Spectrum-X Ethernet networking platform, a deal analysts estimated could be worth tens of billions of dollars.^[10]

The CPU dimension is what makes the deal architecturally significant. Meta agreed to a large-scale deployment of standalone Grace CPUs (the first such deployment at that scale), with Vera CPUs slated to follow in 2027.^[10] In the context of NVIDIA's platform argument, this is the critical data point: the world's largest social media infrastructure operator, with $115 to $135 billion in planned 2026 capital expenditure, chose to source its CPU compute from NVIDIA rather than x86 incumbents. Ben Bajarin, CEO and Principal Analyst at Creative Strategies, described the decision as "affirmation of the soup-to-nuts strategy that NVIDIA's putting across both sets of infrastructure: CPU and GPU."^[11] The deal is also a direct rebuke to Meta's own MTIA in-house chip program, which has reportedly encountered technical challenges that pushed its training-optimized variant off its 2026 schedule.

Feynman, Rosa, and the Road to 2028

Two hours into the keynote, Huang turned to the architecture that comes after Vera Rubin. Feynman (named for physicist Richard Feynman) will introduce a new CPU called Rosa, honoring Rosalind Franklin, the crystallographer whose X-ray diffraction work revealed the double-helix structure of DNA.^[6] The naming is deliberate: just as Franklin exposed the hidden architecture of life, NVIDIA frames Rosa as the processor designed to move data, tools, and tokens efficiently across the full agentic AI stack.

The Feynman platform pairs Rosa with LP40, NVIDIA's next-generation LPU (successor to the Groq 3), BlueField-5, ConnectX-10, and NVIDIA Kyber for both copper and co-packaged optics scale-up networking.^[6] The use of co-packaged optics at the scale-up layer is the technical headline: silicon photonics replaces copper interconnects between chips, addressing the power wall that increasingly limits how densely NVIDIA can pack compute into a rack. Feynman is targeted for high-volume production in 2028.^[1]

Co-packaged optics are not a convenience feature. Data center operators are confronting a structural power procurement crisis: not enough grid capacity, not enough cooling infrastructure, not enough time. By moving optical connectivity inside the package, NVIDIA reduces the energy consumed in chip-to-chip signaling, which in turn allows more compute to fit within a given power envelope. No competing x86 or custom-silicon platform can replicate this at the rack scale on a similar timeline; the co-design depth required took NVIDIA years to accumulate. The Kyber rack architecture, previewed with a working prototype at GTC, integrates 144 GPUs in vertically oriented compute trays, a departure from horizontal rack design intended to boost density and cut latency. Kyber will first appear in Vera Rubin Ultra, the next rack-scale system after the NVL72, expected to ship in 2027.^[1]

Into Orbit

NVIDIA used GTC to announce its most literal expression of the "AI everywhere" thesis: Space-1, a computing platform for orbital data centers. The Space-1 Vera Rubin Module is engineered for size-, weight-, and power-constrained environments and delivers up to 25 times the AI compute of the H100 for space-based inferencing.^[7] Partners including Aetherflux, Axiom Space, Kepler Communications, Planet Labs, Sophia Space, and Starcloud are already building on NVIDIA accelerated platforms for orbital and ground applications.

The strategic case for space computing follows the same logic as sovereign AI on the ground: data should be processed where it is generated. For satellite constellations generating terabytes of imagery per pass, the alternative (downlinking raw data to ground stations) consumes bandwidth, introduces latency, and creates security exposure. Huang, quoting Star Trek with practiced timing, described Space-1 as "boldly taking intelligence where it's never gone before."^[7] The Space-1 Vera Rubin Module is available at a later date; IGX Thor and Jetson Orin platforms for orbital edge applications are available now.

The Platform Argument

What GTC 2026 made clear is that NVIDIA's competitive position is no longer primarily about GPU performance per se. The Vera CPU Rack, the Groq 3 LPX, the BlueField-4 STX context memory storage platform, the Dynamo inference operating system, and NemoClaw for OpenClaw are all components of a single argument: that the optimal way to run agentic AI at scale is to buy the entire stack from one vendor and let extreme co-design do the work.^[3]

The counter-argument (that hyperscalers will continue building custom silicon to reduce dependency on NVIDIA) remains valid. Google's TPUs and Amazon's Trainium are real alternatives with real deployments. AMD's Instinct accelerator lineup has made inroads among operators seeking portfolio diversification. But none of those alternatives can yet match the breadth of NVIDIA's co-designed stack from CPU sandbox to orbital data center. The $1 trillion order figure is the market's current verdict on whose argument is winning.

Sources

NVIDIA GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through 2027 - CNBC ↗
NVIDIA GTC 2026: Rubin GPUs, Groq LPUs, Vera CPUs, and What NVIDIA Is Building for Trillion-Parameter Inference - StorageReview ↗
NVIDIA GTC 2026: Live Updates on What's Next in AI - NVIDIA Blog ↗
Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC - Tom's Hardware ↗
Nvidia Groq 3 LPU and Groq LPX racks join Rubin platform at GTC - Tom's Hardware (Ian Buck quote) ↗
NVIDIA GTC 2026: Live Updates - Feynman architecture and Rosa CPU announcement - NVIDIA Blog ↗
NVIDIA Launches Space Computing, Rocketing AI Into Orbit - NVIDIA Newsroom ↗
NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories - NVIDIA Newsroom ↗
How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale - NVIDIA Technical Blog