Overview of Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip
This Odd Lots episode features Bloomberg’s Tracy Alloway and Joe Weisenthal interviewing Andrew Feldman, CEO and cofounder of Cerebras, the company behind an unusually large “wafer-scale” AI chip. The conversation focuses on why Cerebras built a dinner-plate-sized processor, how its architecture is designed to speed up AI inference, and what that means for the economics of AI, supply chains, competition with Nvidia, and the broader semiconductor and data center landscape.
Why Cerebras Built a Giant Chip
The core technical idea
- Cerebras’ chip is built on a wafer-scale design, meaning the processor is far larger than a conventional chip.
- Feldman argues that larger chips can process more information in less time, which is especially valuable for AI workloads.
- The goal is to reduce latency by keeping compute and memory physically close together.
Why size matters
- Traditional AI systems often rely on connecting many smaller chips together.
- Cerebras instead places much more of the system on one massive wafer to minimize movement of data, which is a major bottleneck.
- Feldman says the chip is about 58 times larger than any previous chip and helps the company achieve major speed advantages.
The memory advantage
- Cerebras’ design allows it to use fast memory more effectively.
- The key tradeoff is that fast memory typically stores less per square millimeter, but Cerebras solves that by simply using a much larger wafer.
- Feldman argues this is a major reason Cerebras can be 15x faster than the fastest GPU, and on some workloads 50x, 100x, or even 1,000x faster.
AI Inference vs. Training
What Cerebras focuses on
- The discussion makes a sharp distinction between:
- Training: building AI models
- Inference: using AI models
- Feldman says Cerebras can do both, but current demand is especially strong in inference.
Why inference matters now
- He argues that by 2025 and beyond, AI models have become useful enough that demand for actual usage has exploded.
- That means the market is increasingly about serving fast responses, not just building bigger models.
- He also pushes back on the idea that speed matters less for agentic AI, arguing that speed is important in all productive work.
The Economics of Speed and Tokens
Speed has real value
- Feldman says customers are willing to pay a premium for faster tokens because speed increases productivity.
- He cites Anthropic’s premium fast-token service as evidence that the market values latency reduction.
Cost per token
- GPUs are efficient at producing slow tokens cheaply, but become much more expensive and power-hungry when trying to serve fast tokens.
- Cerebras claims its architecture makes fast tokens much cheaper than GPUs, while using far less power.
Why this matters
- The interview frames AI economics as a question of:
- how much speed is worth,
- what users are willing to pay for it,
- and how much of today’s AI spending is wasteful versus necessary.
Building the Chip: Engineering and Manufacturing Challenges
Why no one had done it before
- Feldman says every prior wafer-scale effort failed over the 75-year history of computing.
- Cerebras had to solve problems across:
- lithography,
- materials,
- packaging,
- power delivery,
- cooling,
- and software.
TSMC’s role
- Cerebras worked closely with TSMC, which Feldman calls the greatest manufacturing company in the world.
- Despite the complexity, he says Cerebras has so far received the wafers it needed.
- He notes that the hardest part of the supply chain is not just chips, but data center buildings and powered real estate.
Supply constraints
- Cerebras avoids several of the most constrained AI supply chain bottlenecks:
- HBM memory shortages
- CoWoS packaging constraints
- TSMC’s most pressured 3-nanometer capacity
- It uses 5-nanometer manufacturing instead.
Competition: Open Source, Closed Source, and Nvidia
Open source vs. closed source models
- Feldman says open source models are generally cheaper per unit of intelligence because users are not paying the training cost.
- Closed source models are slightly better, but only by a few percentage points in his view.
- He does not think the market will settle on a single dominant model type; he expects a diverse market with many players.
Nvidia and CUDA
- Feldman argues that CUDA, Nvidia’s software ecosystem, was hugely important in the early AI era.
- But he says CUDA is far less important for inference and is losing relevance in training too.
- His view is that switching workloads from GPUs to Cerebras is relatively straightforward.
Customers, Cloud, and the Business Model
Cerebras’ cloud offering
- Cerebras also operates its own cloud and provides access to open source models.
- Feldman says open source deployment on Cerebras is attractive because it is fast and cheaper since users are not paying model-training costs.
Major customers
- He discusses:
- OpenAI
- AWS
- G42, the UAE’s AI champion
- healthcare and pharma customers such as Mayo Clinic and GlaxoSmithKline
- G42 is a major customer and investor, using Cerebras for both training and inference in various data centers.
AWS integration
- Feldman says Cerebras chips will be accessible through AWS Bedrock, where customers can choose faster inference options.
- He expects that service to carry a premium price.
Geopolitics, Export Controls, and China
Export controls matter more now
- Feldman says export restrictions were less important a few years ago, but are now highly relevant to the company.
- He supports limiting access to China’s AI capabilities, arguing that strategically sensitive technology should not be broadly diffused.
His view on policy
- He acknowledges the debate over whether the U.S. should block China or compete by giving access.
- But he comes down on the side of restriction, even if it forecloses some markets for Cerebras.
IPO, Wealth, and Long-Term Hardware Building
The IPO
- Cerebras recently completed a major IPO, and Feldman describes the process as the result of years of work rather than a sudden windfall.
- He says becoming a billionaire was not the big emotional moment; what mattered more was that the IPO created hundreds of millionaires across the company.
Hardware takes time
- Feldman emphasizes that hardware innovation is slow, expensive, and iterative.
- He says he likes the discipline of “measure twice, cut once” and that building hardware is fundamentally different from software’s move-fast-and-iterate culture.
Key Takeaways
- Cerebras’ thesis: Bigger chips can reduce latency and improve AI performance, especially for inference.
- Inference is the immediate prize: As AI usage grows, fast response time becomes a major economic advantage.
- Supply chain bottlenecks are shifting: The main constraints are increasingly data centers, power, and real estate—not just chip fabrication.
- CUDA is not the whole moat: Feldman believes Nvidia’s software advantage is shrinking, especially in inference.
- Open source is real competition: He sees open source models as cheaper and increasingly viable, though still slightly behind closed source models.
- Geopolitics matters: Export controls, China, and foreign customers like G42 are now central to AI chip strategy.
Notable Insight
“Speed matters in all aspects of productive work.”
That idea runs through the interview: Cerebras is betting that AI customers will increasingly pay for lower latency, and that a radically different chip architecture can turn speed into a durable economic advantage.
