🧪 Why AI infrastructure startups are suddenly worth billions.
📰 Google engineer's $1.2M bet, Steam Deck's price hike, dating apps' burnout problem.
🛠️ Weekend To-Do: Try serverless GPUs, test inference, and audit your compute bill.
🗳️ Poll: Where does the durable value of inference end up?
Let’s dive in. No floaties needed…

Framer helps teams design, build, and launch their marketing sites lightning fast. With the ability to publish hundreds of CMS pages in a single click, operate at a global scale with seamless localization, and even host unified content across multiple domains, teams have never been able to ship faster.
Trusted by companies like Miro, Bilt, and Perplexity.
*This is sponsored content

Modal’s numbers are the tell: From $60M to $300M annualized revenue in eight months, with valuation jumping from $1.1B to $4.65B, Modal’s growth signals where AI money is actually concentrating.
Inference is the recurring cost: Training a model happens once. Inference happens billions of times daily, every query, every automated task, and its share of AI compute is projected to hit two-thirds of the total by 2026.
The whole sector is moving: Baseten, Fireworks AI, and Inferact raised $300M, $250M, and $150M, respectively, in early 2026, all at multi-billion valuations. This isn’t one hot startup; it’s a pattern.
The AWS analogy explains the stakes: Whoever owns the reliable, scalable infrastructure layer beneath AI models may quietly capture the most durable recurring revenue, while the consumer-facing products get the headlines.
When an average user interacts with AI chatbots like ChatGPT or Claude, the experience feels deceptively simple. A prompt is typed into a clean text box, and within seconds an answer appears: an email draft, a summary, a coding solution, an image. The interaction is designed to feel seamless, concealing the industrial-scale machinery operating underneath. Behind every response sits a network of specialized computer chips known as GPUs, distributed across massive data centers that constantly allocate computing power in fractions of a second.
For most consumers, that infrastructure remains invisible. But for companies trying to turn AI into a long-term business, it has become the industry’s defining challenge. Increasingly, the real value in artificial intelligence is shifting away from the chatbot interface users see and toward the systems that power those models reliably, cheaply, and at scale. And, one company’s recent numbers illustrate that transition more clearly than most, revealing how investor attention is moving from AI as a consumer product to AI as an industrial infrastructure business.
Modal, a New York-based infrastructure startup, reported annualized revenue of roughly $60M in September 2025. By May 2026, that figure had reached approximately $300M, a roughly fivefold increase in eight months. Over the same period, its valuation rose from $1.1B to $4.65B after it closed a $355M Series C round led by Redpoint Ventures and General Catalyst. Much of that growth has been fueled by the surge in AI-assisted coding and the broader demand for systems capable of running AI workloads efficiently at scale, according to CEO Erik Bernhardsson.
This demand is coming from customers, which include biotech companies, hedge funds, and weather-forecasting firms, a revealing mix of industries where computational performance is not a convenience but a core operational requirement. In each of these sectors, AI systems are increasingly being integrated into workflows where delays, instability, or inefficient computing directly translate into financial, scientific, or strategic costs.
Seen in that context, Modal’s rapid rise in valuation, more than quadrupling in under a year, says less about the momentum of a single startup than about what investors increasingly believe the AI economy is actually built on. The real value in AI is beginning to consolidate not only around the models users interact with, but around the infrastructure layer that makes those systems usable, scalable, and economically viable.
Modal sits in a part of the AI stack that most consumers never interact with but that every AI product depends on. The company does not build AI models; it handles inference: the process of running a trained model to generate a response when someone sends it a request. Training a model, the phase where it learns from enormous datasets, happens once or occasionally when the model is updated. Inference happens billions of times a day, every time a user gets a reply, every time an enterprise system processes a document, every time an AI agent completes a task. It is the ongoing operational cost of AI at scale, and managing it efficiently is genuinely hard.
Part of that difficulty stems from the mismatch between how quickly users expect AI systems to respond and how slowly computing infrastructure has traditionally adapted to sudden demand. According to Modal’s own engineering documentation, spinning up a new GPU instance to handle a surge in requests can take tens of minutes using conventional approaches, far too slow in an environment where traffic spikes can emerge in seconds.
That constraint helps explain why infrastructure has become such a critical layer of the AI economy. Modal built its own infrastructure stack from scratch, including a custom container runtime, the software environment each job runs inside, and a proprietary file system optimized specifically for faster boot times. The result is sub-second cold starts, allowing new compute instances to become operational in under a second. The company also offers a sandbox product that lets developers safely test AI-generated code before deployment, an increasingly important capability as tools like Claude Code and GitHub Copilot generate more of the software enterprises actually run.
The broader industry economics are also beginning to shift in ways that favor companies operating in this layer of the stack. For most of AI’s recent history, the headline investment went into training: building larger foundation models with ever-greater computational requirements. That phase produced the current generation of frontier AI systems and the companies behind them. Increasingly, however, the center of gravity is moving toward inference, the ongoing process of serving those models to users in real time.
According to Deloitte’s TMT Predictions report, cited by Computerworld, inference already accounted for half of all AI compute in 2025, with that share projected to rise to two-thirds in 2026.
The arithmetic is straightforward: training is a one-time capital expenditure. Inference is the recurring operational cost that scales with every user, every query, every automated workflow. As AI moves from research curiosity to enterprise utility, inference becomes the cost center.
That shift is sharpened by what has happened to the models themselves. Open-source releases and falling compute costs have compressed the performance gap between frontier and commodity models faster than most forecasters expected. As one TechCrunch analysis noted, improving inference efficiency reduces compute costs and shortens the lag time between a user’s prompt and a response. When the intelligence layer becomes interchangeable, the durable advantage shifts to whoever can serve it most reliably and cheaply.
Modal’s trajectory would be easier to dismiss as an isolated surge of investor enthusiasm if similar companies were not attracting comparable levels of capital. But across the AI infrastructure market, the same pattern is beginning to repeat itself. According to TechCrunch, Baseten raised $300M at a $5B valuation in February 2026, more than doubling its valuation within months. Fireworks AI, whose customers include Cursor, Notion, Uber, and Shopify, raised $250M at a $4B valuation. Meanwhile, Inferact, a commercial company built on the open-source inference framework vLLM, secured $150M in seed funding at an $800M valuation, led by Andreessen Horowitz.
Taken together, these are not cautious investments spread across uncertain possibilities. They reflect a growing consensus among investors that the enduring value in artificial intelligence may ultimately lie beneath the models themselves, in the infrastructure layer responsible for deploying, scaling, optimizing, and reliably serving AI systems in production environments.
The comparison many investors increasingly make is to Amazon Web Services. When AWS launched in 2006, it appeared to many observers to be an unremarkable compute-rental business operating behind the scenes of the internet economy. The consumer-facing applications built on top of it became globally recognizable brands; AWS quietly became the financial engine of Amazon itself. The underlying thesis shaping AI infrastructure investment today is structurally similar: companies that control the scalable, dependable infrastructure layer beneath artificial intelligence may ultimately capture the most durable and recurring revenue streams as AI adoption expands across industries.
For the average user interacting with AI from a laptop in a café, none of this infrastructure is visible, and that invisibility is largely intentional. The consumer experience of artificial intelligence is designed to feel effortless: type a request, receive an answer, move on. But the direction the technology ultimately takes, which capabilities are prioritized, which industries receive attention first, and what AI eventually costs to operate at scale, is increasingly being shaped not by the hundreds of millions of people casually using chatbots, but by the enterprises signing infrastructure contracts and the investors tracking where recurring revenue is beginning to concentrate.
From that perspective, Modal’s eight-month rise, alongside a cluster of similarly valued infrastructure companies emerging over the same period, looks less like startup exuberance and more like a broader market signal.
The chatbot is what users see. The inference layer is what companies keep paying for. And as AI moves deeper into everyday business operations, that distinction may determine where the industry’s real power and profit ultimately accumulate.


🗳️ The real AI money is moving to the inference layer. Where does the durable value end up? |
|

Business is hard. And sometimes you don’t really have the necessary tools to be great in your job. Well, Open Source CEO is here to change that.
Tools & resources, ranging from playbooks, databases, courses, and more.
Deep dives on famous visionary leaders.
Interviews with entrepreneurs and playbook breakdowns.
Are you ready to see what’s all about?
*This is sponsored content

Google engineer's $1.2M bet: A Google engineer has been charged with insider trading after making $1.2M on Polymarket, exposing fresh questions about prediction-market regulation and tech-employee oversight.
Steam Deck's price hike: Valve is raising Steam Deck prices, a move some see as the end of the affordable handheld-gaming era, as components and tariffs reshape the market.
Dating apps' burnout problem: A new BBC report unpacks the deepening user fatigue across dating apps, with engagement and trust falling faster than the industry's algorithms can patch them.

Try a serverless GPU platform: Spin up a quick job on Modal or Replicate to feel why sub-second cold starts matter for real AI workloads.
Test your own inference setup: Run a small model locally with vLLM or Ollama and see how latency, throughput, and cost actually behave outside a polished chatbot UI.
Audit your AI compute bill: Pull last month's invoices from OpenAI, Anthropic, or your cloud provider and map spend by workload. You'll quickly see where the real money is going.

Regie.ai: AI sales agent that researches prospects, writes personalized outbound at scale, and handles follow-ups so reps can focus on closing.
Replit AI: A browser-based coding environment with an AI agent that builds, debugs, and deploys apps from prompts, with no local setup required.
Sourcegraph: Code intelligence platform with Cody, an AI assistant that understands your entire codebase so it can answer questions and write code that actually fits.

What did you think of today's email? |
