Issue 23 — June 1 – 7, 2026
This Week in AI
Hosted by Rachel & Marcus · AI hosts
NVIDIA GTC Taipei dominated the week with a sweeping architectural argument: the agentic era demands a ground-up redesign of every layer of the compute stack, from CPUs to pod-scale supercomputers to AI factory power management. Meanwhile, enterprise practitioners are confronting the downstream consequences — inference spend overtaking headcount, SaaS moats evaporating as models clone entire apps, and AI agent swarms emerging as the primary new attack vector. The throughline is that AI has crossed from infrastructure investment into operating reality, and the companies that haven't internalized that are already behind.
Vera Rubin redefines the unit of AI infrastructure — from rack to pod
NVIDIA GTC Taipei 2026 Keynote | Live · NVIDIA Vera Rubin Platform Ramping into Full Production
NVIDIA's Vera Rubin is not an incremental GPU upgrade — it's a new category of system designed from the ground up for agentic AI workloads. The shift from rack-scale (Grace Blackwell) to a five-rack pod-scale supercomputer marks a new procurement unit for AI buyers and operators.
- 6 trillion transistors and 18,000+ components on a single compute board; TSMC 3nm process
- Supply chain is 2× the size of Grace Blackwell's, now in full production
- 150 supply chain partners across Taiwan, millions of square feet of factory floor
- First customers with engineering racks live: Microsoft, Dell, and CoreWeave
- Liquid-cooled bus bars carry 5,000+ amps — equivalent to 20 electric cars at full acceleration
- World's first Ethernet switch with 200 Gb co-packaged optics debuts inside the system
"Agents observe, reason, plan, use tools. They manage massive context, juggling working memory and long-term memory. They spin up sub-agents, specialists on demand. NVIDIA Vera Rubin is a multi-rack pod-scale system built to process Agentic AI and is now in full production."
The CPU is now the bottleneck — Vera's Olympus core is the first built for agents, not humans
NVIDIA Vera — The CPU for Agents · NVIDIA GTC Taipei 2026 Keynote | Live
NVIDIA's central architectural thesis: every CPU until now was designed for human users; Vera is the first designed for AI agents. Agentic loops — Python runtimes, tool calls, sandboxed code execution — have fundamentally different latency and branch-prediction profiles than traditional server workloads.
- 10 instructions fetched, decoded, and executed per clock — claimed world-leading IPC
- 40% lower peak memory latency than x86; first CPU to use LPDDR5X with simultaneous multi-bit error correction at full bandwidth
- 1.8× agentic sandbox performance over x86; SQL 3× faster, real-time stream processing 6× faster
- NVLink chip-to-chip gives GPUs memory-coherent direct access to the CPU fabric, eliminating costly data copies in tight agentic loops
"In the age of agents, the CPU is now a bottleneck to GPU utilization, directly affecting token throughput, latency, and user experience."
AI factory economics: tokens are revenue units, and power is the binding constraint
NVIDIA GTC Taipei 2026 Keynote | Live
Jensen Huang reframed the entire AI infrastructure investment thesis: tokens are now profitable revenue units, so every watt of compute capacity is a revenue opportunity — and wasting it is leaving money on the table.
- 100 GW of AI factories will come online before end of the decade
- AI factory cost is approaching $100 billion per gigawatt
- Today's AI factories overprovision power by up to 40%; NVIDIA's DSX LPS software claims to recover that waste, adding GPU capacity within the same power budget
- Choosing cheaper chips with worse performance-per-watt is irrational: "revenues per watt — the more you buy, the more you make"
- GitHub commits nearly tripled in early 2026; Taiwan's GDP expected to grow ~10% this year driven by AI compute demand
"Tokens are now profitable units of revenues. Because it is now profitable, the AI companies want to build a lot more tokens, generate a lot more tokens, build more AI factories."
Cadence + NVIDIA cut chip verification from weeks to hours — 40× faster
Cadence Cuts Chip Verification From Weeks to Hours With AI Engineers and NVIDIA OpenShell
Chip verification — one of the most expensive and delay-prone steps in semiconductor development — has been automated with a multi-agent pipeline, delivering a 40× speedup. NVIDIA itself runs billions of compute hours per year on verification; a single RTL bug can delay a chip by months.
- Codex orchestrates the loop; Cadence ChipStack runs it; NVIDIA OpenShell provides the secure sandbox
- Specialized sub-agents handle RTL generation, testbench creation, regression testing, and debug
- NVIDIA spends billions of compute hours per year and runs millions of tests annually just for verification
- This is a live internal deployment at NVIDIA, not a demo — the speedup is claimed in production
"What once took weeks, now takes hours. Verification cycles over 40 times faster."
Application-layer SaaS moats are collapsing — network effects are the only defense
Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries
The Mercor CEO's central thesis: foundation model labs can replicate any software abstraction, making pure application-layer moats fragile to the point of worthlessness within 12–24 months.
- 2026 is the year models clone entire SaaS apps end-to-end — "how do you get the model to clone Slack end to end"
- Companies without network effects (Salesforce, Slack, Carta have them; most SaaS doesn't) face existential risk
- Mercor's proprietary APEX benchmark: frontier models went from 1% → 40% on real job tasks in 12 months
- Mercor's AI project manager completed its first full project end-to-end, replacing 150 human coordinators
- Frontier AI researchers now command tens of millions in stock per year — a direct recruiting obstacle for startups
"Building defensibility in the software layer on top of the models is going to be incredibly difficult."
Token spend is overtaking headcount — Jevons Paradox is driving enterprise compute budgets up, not down
Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries · This startup spends $400k/month on Anthropic
Two independent data points this week confirm the same structural shift: inference costs are growing faster than efficiency gains, and compute is on track to exceed salaries as the largest enterprise cost line.
- Mercor: token spend on internal agents already exceeds total employee headcount cost
- Bold prediction: "in 5 years the average enterprise spends more on compute than headcount"
- Jevons Paradox at work: 10× model improvement per year drives total consumption up, not down
- One unnamed startup: $400k/month on Anthropic, $0 on OpenAI — "Anthropic has been out-executing OpenAI in every category for enterprise and workflow automation"
- Mercor pays out $3M+ per day to AI training workers — "the fastest job category ever created in history"
"When we make models improve by 10x year-over-year, that has just been causing the total consumption of the models to go up and up and up as the cost per performance go down."
AI agents are the new attack surface — and defenders are losing the code review race
"We Don't Trust Agents" - What This CTO Knows That You Don't · Mercor CEO on Why Application Layer Companies Have No Defensibility
The security consensus from two separate episodes: AI agents connected to tools are the primary new attack vector, and the volume of AI-generated code has already outpaced human review capacity.
- Merge's explicit policy: "We don't trust agents" — hard blocks on sensitive data, not soft rules
- Mercor's own breach: attackers used a swarm of coding agents to exhaustively review the codebase and gain access
- AI-generated pull requests are "massively soaring" on GitHub — "you don't have enough humans to read all that code and so things are slipping by"
- Adversaries now have "perfect English, perfect code, and unlimited AI-powered manpower" — traditional skill-based filters are gone
- Enterprise demand is shifting toward governance, auditability, and identity-provider-linked access revocation for agent activity
"The second you connect it to tools, which is what everyone is trying to do right now, that's where everything goes wrong."
SWEbench is broken — real-world coding benchmarks tell a different story
SWEbench is done.
Developer vibe checks and a competing benchmark are directly contradicting SWEbench rankings, raising serious questions about whether the dominant coding-agent benchmark has been gamed or is simply unrepresentative.
- On Deep Suite: GPT-5.5 scores 70%, Claude Opus 4.7 scores 54% — a 16-point gap
- On SWEbench: the ranking is reversed — Opus 4.7 leads by ~7–8 points
- Practitioners report the SWEbench ranking "doesn't match what they're feeling" when using the models
- The divergence suggests SWEbench may be optimizable without generalizing to real agentic coding tasks
Inference software creates lock-in; raw GPU rental is a commodity
Baseten: "We've Never Lost Our Top Customers"
Baseten's core claim: GPU-as-a-service is undifferentiated and customers treat it as a commodity, but wrapping inference in a software layer creates genuine switching costs and extraordinary retention.
- Zero churn among top 30 customers; 400% annual NDR
- "GPUs as a service is not sticky… Inference with the software layer included is incredibly sticky"
- Strategic implication: controlling inference compute is becoming a moat, not just a cost center — "if we have all the compute, good luck running inference"
Uber becomes the platform layer for the entire AV ecosystem
Uber CEO Dara Khosrowshahi on integrating autonomous vehicles into Uber's ecosystem
Uber's AV strategy is explicitly not to build a robot driver — it's to aggregate every safe robot driver onto its platform, capturing distribution value without bearing R&D risk.
- Already live: Waymo vehicles dispatched through the standard Uber app in Austin and Atlanta
- Expanding to Zoox, WeRide, Wave, and others — a rapidly growing multi-partner roster
- The framing: "just like we want every safe human driver on the platform, we want every safe robot driver on the platform"
Impulse Space raises $500M to own the GEO mobility market — and the Space Force's fighter-jet problem
Why Impulse Raised $500M to Move Things in Space | President & COO Eric Romo
Impulse Space is building three distinct products targeting a single structural gap: there is no fast, affordable way to move things in geosynchronous orbit, and that gap has both commercial and urgent national security dimensions.
- Helios: delivers satellites to GEO the same day as launch (8 hours), cheaper than Falcon Heavy; the status quo costs operators 6–10 months and millions in fuel and radiation shielding
- Caravan: "SpaceX Transporter but for GEO" — rideshare for small GEO satellites, already oversubscribed for 2028
- Mera: a high-delta-V "fighter jet" for Space Force — rapid characterization of foreign spacecraft in GEO; "from the ground, a Chinese satellite in GEO is literally one pixel"
- Space Force budget jumped from $17B to $71B requested — the macro tailwind is real
- $500M Series D led by 137 Ventures (major SpaceX shareholder), crossing $1B total funding
"MEO and GEO are a million times bigger than LEO. So if you're going to have that same strategy of proliferation, you'd need a million times more assets — which is just impractical."
Key Takeaways
- Agentic AI is forcing a ground-up redesign of every layer of the stack — from CPUs (Vera's Olympus core) to multi-rack pod-scale systems (Vera Rubin) to inference software (Baseten), the entire infrastructure hierarchy is being rebuilt around agent workloads, not human users.
- Tokens are now revenue units, and compute spend is on track to exceed headcount — Mercor already crossed that threshold internally; Jevons Paradox means cheaper-per-token models drive total spend up, not down, making inference infrastructure a strategic asset.
- Application-layer SaaS moats are collapsing fast — models can clone entire apps end-to-end within 12 months; only companies with genuine network effects have a defensible position in the AI era.
- AI agents are the primary new attack surface — agent swarms are being used offensively (Mercor breach), AI-generated code is outpacing human review on GitHub, and the only reliable defense is hard architectural guardrails, not soft rules.
- SWEbench can no longer be trusted as a coding-agent benchmark — Deep Suite shows a 16-point real-world gap between GPT-5.5 and Claude Opus 4.7 that directly contradicts SWEbench rankings, signaling the benchmark has been gamed or is unrepresentative.
- The AI infrastructure buildout is a civilization-scale capital commitment — 100 GW of AI factories before 2030, costs approaching $100B/GW, and TSMC going "full steam ahead" on capacity because demand is "too exuberant"; power efficiency (tokens per watt) is now the defining procurement metric.
Sources
- NVIDIA GTC Taipei 2026 Keynote | Live
- NVIDIA Vera Rubin Platform Ramping into Full Production | Built for the Era of Agents
- NVIDIA Vera — The CPU for Agents
- Cadence Cuts Chip Verification From Weeks to Hours With AI Engineers and NVIDIA OpenShell
- Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries
- This startup spends $400k/month on Anthropic
- SWEbench is done.
- "We Don't Trust Agents" - What This CTO Knows That You Don't
- Baseten: "We've Never Lost Our Top Customers"
- Uber CEO Dara Khosrowshahi on integrating autonomous vehicles into Uber's ecosystem
- Why Impulse Raised $500M to Move Things in Space | President & COO Eric Romo
- NVIDIA GTC Live Keynote Pregame | Live in Taipei
- Building the Future of Voice-First Sovereign AI: Sarvam & NVIDIA
- Inside PsiQuantum's Silicon Photonic Chipset Never-Before-Seen
- Roblox CEO David Baszucki on the next wave of digital experiences
- Official Keynote Closing Video | GTC Taipei 2026
- Architectural Design With Agents on NVIDIA RTX Spark
- Why AI Won't Take Your Job
Source episodes
Sourced from 94 episodes across 9 podcasts this week
- Merge Co-Founders Gil Feig and Shensi Ding on the AI Valuation Bubble
- Jensen Huang and Satya Nadella's Conversation at Microsoft Build
- Gil Feig on why 2026 is the year AI agents finally hit production
- Merge Co-Founders Gil Feig and Shensi Ding on why the AI bill is crushing CFOs
- SWEbench is done.
- NVIDIA GTC Live Keynote Pregame | Live in Taipei
- Official Keynote Closing Video | GTC Taipei 2026
- OpenAI Codex will merge into ChatGPT: Denise Dresser, Alex Emibiricos, Romain Huet, Sam Altman
- Er-Xuan Ping on safely scaling reactive barium for automated wafer production
- Anthropic Files to Go Public | Cognition Raises $1BN at $26BN Valuation | The 996 Work Ethic
- Inside Impulse Space's $500M Series D | President & COO Eric Romo
- Peritas AI Showcase
- Baseten: "We've Never Lost Our Top Customers”
- "We spend more on tokens than salaries"
- "SpaceX is the greatest company of all time" — Shaun Maguire breaks down the numbers
- Dara Khosrowshahi on the biggest AI misconception
- The Taipei 101 Tribute to NVIDIA GTC
- The one sentence that opened up the universe for a stranger - Adam Brown
- How Merge Won OpenAI and Netflix One Logo at a Time
- Figure CEO Brett Adcock on building robots that create real value
- Anthopic did a thing...
- Why Impulse Raised $500M to Move Things in Space | President & COO Eric Romo
- Building the Future of Voice-First Sovereign AI: Sarvam & NVIDIA
- Architectural Design With Agents on NVIDIA RTX Spark
- The Invisible Layer in OpenAI, Mistral, JPMorgan, & Netflix
- NVIDIA Foundational Technology Montage I GTC Taipei and COMPUTEX 2026 Edition
- You get to keep your job
- It's like AI insurance
- Commure CEO Tanay Tandon on how AI is saving millions of physician hours
- Early Preview of NVIDIA RTX Spark at Computex
- NVIDIA GTC Taipei 2026 Keynote | Live
- We just launched Paxel!
- It’s starting…
- NVIDIA Vera Rubin Platform Ramping into Full Production | Built for the Era of Agents
- Mercor CEO on Why Application Layer Companies Have No Defensibility & Token Spend Exceeds Salaries
- Uber CEO Dara Khosrowshahi on integrating autonomous vehicles into Uber’s ecosystem
- Inside PsiQuantum's Silicon Photonic Chipset *Never-Before-Seen*
- Uber CEO on AI, Autonomous Vehicles, and the Future of Transportation
- The Rise of the Full-Stack Builder and Hyper-Leveraged Generalist with Microsoft CEO Satya Nadella
- Emergent: How Six Months of Tinkering Led To A $100M ARR Company
- Tokens Over Humans?
- Humans split into separate groups for a million years, then merged - David Reich
- Merge Co-founder Shensi Ding on why AI agents will choose your vendors for you
- Merge Co-Founders on why AI infrastructure deals close faster than SaaS
- "No one in the 800 pound gorilla is excited to be there"
- NVIDIA RTX Spark Reinvents Windows PCs for the Age of Personal AI
- The $10B Startup Running on AI Agents
- A 10-mile ride turned into a 1,000-mile spiritual quest - Adam Brown
- AI Is Creating Jobs Faster Than Ever
- Why ElevenLabs broke out: "A great consumer + great enterprise business"
- How to Build an AI-Native Services Company
- "We Don't Trust Agents" - What This CTO Knows That You Don't
- Announcing NVIDIA RTX Spark | GTC Taipei 2026 Keynote by CEO Jensen Huang
- Coatue CIO of Public Investments Jaimin Rangwalla on the new scale of AI-driven revenue growth
- The Neanderthal DNA Puzzle No One Can Explain - David Reich
- Why Token Maxing is Failing Enterprise Startups | Legora CTO
- Ryan Serhant on how social media sparked a $38,000,000 phone call
- Legora CTO on the tools they vibe code
- Merge CTO Gil Feig's Warning: AI Will Supercharge Cyberattacks
- Tokens Turn Data Into Knowledge | Official Keynote Intro | GTC Taipei at COMPUTEX 2026
- NVIDIA DSX Powers Gigawatt‑Scale AI Factories at Maximum Efficiency
- This Theory Explains the Neanderthal DNA Mystery - David Reich
- "The Model is the Product"
- Merge CTO Gil Feig on the control layer every enterprise AI stack needs
- The better AI gets, the smaller its share of the economy might get – Alex Imas and Phil Trammell
- Why AI Won't Take Your Job
- Cadence Cuts Chip Verification From Weeks to Hours With AI Engineers and NVIDIA OpenShell
- How Legora Went From YC to $100M ARR in 18 Months
- NVIDIA Vera—The CPU for Agents
- Why we should increase capital gains tax
- "Loosing money is like s*x"
- How Merge built a new product with one engineer and two founders using AI
- Roblox CEO David Baszucki on the next wave of digital experiences
- This is NVIDIA Alpamayo Thinking Out Loud
- This startup spends $400k/month on Anthropic
- $1,300,000 in TOKENS
- General Catalyst Institute Founding CEO Teresa Carlson on building the first cloud for the CIA
- Tom Mueller: SpaceX's #1 Employee Built the Merlin Engine. Now He's Building Impulse Space
- Morgan Housel on the real price of success in VC
- Introducing NVIDIA Nemotron 3 Ultra
- Inside Impulse Space's Factory with Founder Tom Mueller (Full Tour)
- Conductor CEO Charlie Holtz Walks Us Through His AI Coding Setup
- Perplexity Just Built an AI That Does Everything
- BioHub ESM Paper Authors: A World Model of Protein Biology [Paper Club 20260603]
- Introducing NVIDIA Cosmos 3: The Open Model That Thinks, Generates, and Acts
- Everyone's holding their laptops open
- Dell Technologies World 2026 Keynote | May 18–21 | Las Vegas
- How General Catalyst is turning a struggling hospital into an AI native hospital
- Trae Stephens on Arsenal 1: "The factory is the weapon"
- Henry Ward on turning spreadsheet industries into software (and why PE is the prize)
- This is NVIDIA Alpamayo Thinking Out Loud
- Eric Romo on why every hire at Impulse Space must own something
- Accelerating Humanoid Robot Development With NVIDIA Isaac GR00T
- Coatue CIO of Public Investments Jaimin Rangwalla on the unprecedented scale of private AI companies