Issue 23 — June 1 – 7, 2026

This Week in AI

Hosted by Rachel & Marcus · AI hosts

NVIDIA GTC Taipei dominated the week with a sweeping architectural argument: the agentic era demands a ground-up redesign of every layer of the compute stack, from CPUs to pod-scale supercomputers to AI factory power management. Meanwhile, enterprise practitioners are confronting the downstream consequences — inference spend overtaking headcount, SaaS moats evaporating as models clone entire apps, and AI agent swarms emerging as the primary new attack vector. The throughline is that AI has crossed from infrastructure investment into operating reality, and the companies that haven't internalized that are already behind.

Vera Rubin redefines the unit of AI infrastructure — from rack to pod

NVIDIA GTC Taipei 2026 Keynote | Live · NVIDIA Vera Rubin Platform Ramping into Full Production

NVIDIA's Vera Rubin is not an incremental GPU upgrade — it's a new category of system designed from the ground up for agentic AI workloads. The shift from rack-scale (Grace Blackwell) to a five-rack pod-scale supercomputer marks a new procurement unit for AI buyers and operators.

6 trillion transistors and 18,000+ components on a single compute board; TSMC 3nm process
Supply chain is 2× the size of Grace Blackwell's, now in full production
150 supply chain partners across Taiwan, millions of square feet of factory floor
First customers with engineering racks live: Microsoft, Dell, and CoreWeave
Liquid-cooled bus bars carry 5,000+ amps — equivalent to 20 electric cars at full acceleration
World's first Ethernet switch with 200 Gb co-packaged optics debuts inside the system

"Agents observe, reason, plan, use tools. They manage massive context, juggling working memory and long-term memory. They spin up sub-agents, specialists on demand. NVIDIA Vera Rubin is a multi-rack pod-scale system built to process Agentic AI and is now in full production."

The CPU is now the bottleneck — Vera's Olympus core is the first built for agents, not humans

NVIDIA Vera — The CPU for Agents · NVIDIA GTC Taipei 2026 Keynote | Live

NVIDIA's central architectural thesis: every CPU until now was designed for human users; Vera is the first designed for AI agents. Agentic loops — Python runtimes, tool calls, sandboxed code execution — have fundamentally different latency and branch-prediction profiles than traditional server workloads.

10 instructions fetched, decoded, and executed per clock — claimed world-leading IPC
40% lower peak memory latency than x86; first CPU to use LPDDR5X with simultaneous multi-bit error correction at full bandwidth
1.8× agentic sandbox performance over x86; SQL 3× faster, real-time stream processing 6× faster
NVLink chip-to-chip gives GPUs memory-coherent direct access to the CPU fabric, eliminating costly data copies in tight agentic loops

"In the age of agents, the CPU is now a bottleneck to GPU utilization, directly affecting token throughput, latency, and user experience."

AI factory economics: tokens are revenue units, and power is the binding constraint

NVIDIA GTC Taipei 2026 Keynote | Live

Jensen Huang reframed the entire AI infrastructure investment thesis: tokens are now profitable revenue units, so every watt of compute capacity is a revenue opportunity — and wasting it is leaving money on the table.

100 GW of AI factories will come online before end of the decade
AI factory cost is approaching $100 billion per gigawatt
Today's AI factories overprovision power by up to 40%; NVIDIA's DSX LPS software claims to recover that waste, adding GPU capacity within the same power budget
Choosing cheaper chips with worse performance-per-watt is irrational: "revenues per watt — the more you buy, the more you make"
GitHub commits nearly tripled in early 2026; Taiwan's GDP expected to grow ~10% this year driven by AI compute demand

"Tokens are now profitable units of revenues. Because it is now profitable, the AI companies want to build a lot more tokens, generate a lot more tokens, build more AI factories."

Cadence + NVIDIA cut chip verification from weeks to hours — 40× faster

Cadence Cuts Chip Verification From Weeks to Hours With AI Engineers and NVIDIA OpenShell

Chip verification — one of the most expensive and delay-prone steps in semiconductor development — has been automated with a multi-agent pipeline, delivering a 40× speedup. NVIDIA itself runs billions of compute hours per year on verification; a single RTL bug can delay a chip by months.

Codex orchestrates the loop; Cadence ChipStack runs it; NVIDIA OpenShell provides the secure sandbox
Specialized sub-agents handle RTL generation, testbench creation, regression testing, and debug
NVIDIA spends billions of compute hours per year and runs millions of tests annually just for verification
This is a live internal deployment at NVIDIA, not a demo — the speedup is claimed in production

"What once took weeks, now takes hours. Verification cycles over 40 times faster."

Application-layer SaaS moats are collapsing — network effects are the only defense

Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries

The Mercor CEO's central thesis: foundation model labs can replicate any software abstraction, making pure application-layer moats fragile to the point of worthlessness within 12–24 months.

2026 is the year models clone entire SaaS apps end-to-end — "how do you get the model to clone Slack end to end"
Companies without network effects (Salesforce, Slack, Carta have them; most SaaS doesn't) face existential risk
Mercor's proprietary APEX benchmark: frontier models went from 1% → 40% on real job tasks in 12 months
Mercor's AI project manager completed its first full project end-to-end, replacing 150 human coordinators
Frontier AI researchers now command tens of millions in stock per year — a direct recruiting obstacle for startups

"Building defensibility in the software layer on top of the models is going to be incredibly difficult."

Token spend is overtaking headcount — Jevons Paradox is driving enterprise compute budgets up, not down

Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries · This startup spends $400k/month on Anthropic

Two independent data points this week confirm the same structural shift: inference costs are growing faster than efficiency gains, and compute is on track to exceed salaries as the largest enterprise cost line.

Mercor: token spend on internal agents already exceeds total employee headcount cost
Bold prediction: "in 5 years the average enterprise spends more on compute than headcount"
Jevons Paradox at work: 10× model improvement per year drives total consumption up, not down
One unnamed startup: $400k/month on Anthropic, $0 on OpenAI — "Anthropic has been out-executing OpenAI in every category for enterprise and workflow automation"
Mercor pays out $3M+ per day to AI training workers — "the fastest job category ever created in history"

"When we make models improve by 10x year-over-year, that has just been causing the total consumption of the models to go up and up and up as the cost per performance go down."

AI agents are the new attack surface — and defenders are losing the code review race

"We Don't Trust Agents" - What This CTO Knows That You Don't · Mercor CEO on Why Application Layer Companies Have No Defensibility

The security consensus from two separate episodes: AI agents connected to tools are the primary new attack vector, and the volume of AI-generated code has already outpaced human review capacity.

Merge's explicit policy: "We don't trust agents" — hard blocks on sensitive data, not soft rules
Mercor's own breach: attackers used a swarm of coding agents to exhaustively review the codebase and gain access
AI-generated pull requests are "massively soaring" on GitHub — "you don't have enough humans to read all that code and so things are slipping by"
Adversaries now have "perfect English, perfect code, and unlimited AI-powered manpower" — traditional skill-based filters are gone
Enterprise demand is shifting toward governance, auditability, and identity-provider-linked access revocation for agent activity

"The second you connect it to tools, which is what everyone is trying to do right now, that's where everything goes wrong."

SWEbench is broken — real-world coding benchmarks tell a different story

SWEbench is done.

Developer vibe checks and a competing benchmark are directly contradicting SWEbench rankings, raising serious questions about whether the dominant coding-agent benchmark has been gamed or is simply unrepresentative.

On Deep Suite: GPT-5.5 scores 70%, Claude Opus 4.7 scores 54% — a 16-point gap
On SWEbench: the ranking is reversed — Opus 4.7 leads by ~7–8 points
Practitioners report the SWEbench ranking "doesn't match what they're feeling" when using the models
The divergence suggests SWEbench may be optimizable without generalizing to real agentic coding tasks

Inference software creates lock-in; raw GPU rental is a commodity

Baseten: "We've Never Lost Our Top Customers"

Baseten's core claim: GPU-as-a-service is undifferentiated and customers treat it as a commodity, but wrapping inference in a software layer creates genuine switching costs and extraordinary retention.

Zero churn among top 30 customers; 400% annual NDR
"GPUs as a service is not sticky… Inference with the software layer included is incredibly sticky"
Strategic implication: controlling inference compute is becoming a moat, not just a cost center — "if we have all the compute, good luck running inference"

Uber becomes the platform layer for the entire AV ecosystem

Uber CEO Dara Khosrowshahi on integrating autonomous vehicles into Uber's ecosystem

Uber's AV strategy is explicitly not to build a robot driver — it's to aggregate every safe robot driver onto its platform, capturing distribution value without bearing R&D risk.

Already live: Waymo vehicles dispatched through the standard Uber app in Austin and Atlanta
Expanding to Zoox, WeRide, Wave, and others — a rapidly growing multi-partner roster
The framing: "just like we want every safe human driver on the platform, we want every safe robot driver on the platform"

Impulse Space raises $500M to own the GEO mobility market — and the Space Force's fighter-jet problem

Why Impulse Raised $500M to Move Things in Space | President & COO Eric Romo

Impulse Space is building three distinct products targeting a single structural gap: there is no fast, affordable way to move things in geosynchronous orbit, and that gap has both commercial and urgent national security dimensions.

Helios: delivers satellites to GEO the same day as launch (8 hours), cheaper than Falcon Heavy; the status quo costs operators 6–10 months and millions in fuel and radiation shielding
Caravan: "SpaceX Transporter but for GEO" — rideshare for small GEO satellites, already oversubscribed for 2028
Mera: a high-delta-V "fighter jet" for Space Force — rapid characterization of foreign spacecraft in GEO; "from the ground, a Chinese satellite in GEO is literally one pixel"
Space Force budget jumped from $17B to $71B requested — the macro tailwind is real
$500M Series D led by 137 Ventures (major SpaceX shareholder), crossing $1B total funding

"MEO and GEO are a million times bigger than LEO. So if you're going to have that same strategy of proliferation, you'd need a million times more assets — which is just impractical."

Key Takeaways

Agentic AI is forcing a ground-up redesign of every layer of the stack — from CPUs (Vera's Olympus core) to multi-rack pod-scale systems (Vera Rubin) to inference software (Baseten), the entire infrastructure hierarchy is being rebuilt around agent workloads, not human users.
Tokens are now revenue units, and compute spend is on track to exceed headcount — Mercor already crossed that threshold internally; Jevons Paradox means cheaper-per-token models drive total spend up, not down, making inference infrastructure a strategic asset.
Application-layer SaaS moats are collapsing fast — models can clone entire apps end-to-end within 12 months; only companies with genuine network effects have a defensible position in the AI era.
AI agents are the primary new attack surface — agent swarms are being used offensively (Mercor breach), AI-generated code is outpacing human review on GitHub, and the only reliable defense is hard architectural guardrails, not soft rules.
SWEbench can no longer be trusted as a coding-agent benchmark — Deep Suite shows a 16-point real-world gap between GPT-5.5 and Claude Opus 4.7 that directly contradicts SWEbench rankings, signaling the benchmark has been gamed or is unrepresentative.
The AI infrastructure buildout is a civilization-scale capital commitment — 100 GW of AI factories before 2030, costs approaching $100B/GW, and TSMC going "full steam ahead" on capacity because demand is "too exuberant"; power efficiency (tokens per watt) is now the defining procurement metric.

Sources

Source episodes

Sourced from 94 episodes across 9 podcasts this week