Overview of AI giveth and AI taketh CPU
In this Stack Overflow Podcast interview from HumanX, host Ryan Donovan talks with Mark Papermaster, CTO of AMD, about how AI is reshaping hardware demand across CPUs, GPUs, memory, networking, and data center design. The conversation centers on AMD’s strategy of combining high-performance CPUs and GPUs, using chiplets and an open software stack to stay flexible as workloads shift from training to inference, from cloud to edge, and from general-purpose models to small language models and agentic workflows.
AMD’s AI Strategy: Performance Through Heterogeneous Computing
Papermaster frames AMD’s success as the result of a long-term focus on customer needs, product quality, and simplification.
Core strategy
- Build leadership products across:
- supercomputing
- cloud
- edge devices
- PCs and embedded systems
- Focus on what delivers real customer value
- Keep innovation tied to listening to customers
Why AMD fits AI
- AMD has deep experience in both CPUs and GPUs
- AI workloads benefit from heterogeneous computing rather than relying on just one type of processor
- The company has expanded from CPUs/GPU into embedded and adaptive computing
CPU + GPU Integration and Chiplets
A major theme of the discussion is how AMD combines compute types efficiently.
What AMD has been doing for years
- AMD has combined CPU and GPU technologies since 2011
- Early implementations focused on PCs, gaming, and workstation graphics
- The key idea: shared memory and coherent architecture reduce data movement and power use
What chiplets add
- Instead of one large monolithic chip, AMD uses chiplets
- Benefits include:
- easier manufacturing
- better yield
- lower cost
- more flexibility in product design
- Chiplets allow AMD to mix and match components for:
- servers
- desktops
- workstations
- GPUs
Data center design
- AMD uses chiplets in large data center products to combine:
- CPU compute
- GPU compute
- memory
- I/O
- This modular design helps AMD tailor systems to different workloads without redesigning everything from scratch
Open Ecosystem and ROCm Software Stack
Papermaster repeatedly emphasizes AMD’s preference for openness over lock-in.
ROCm and software control
- AMD’s software stack is ROCm (the transcript occasionally misrenders it)
- ROCm manages:
- workload partitioning between CPU and GPU
- compiler optimization
- communication between devices
- The stack is open, so developers can:
- contribute code
- fork it internally
- avoid vendor lock-in
Why openness matters
- Helps enterprise and hyperscale customers retain control
- Makes AMD more attractive in mixed-vendor environments
- Reduces the “moat” competitors may have had in AI software ecosystems
Workload Shifts: Training, Inference, and Agentic AI
A big part of the conversation is how AI workloads are changing and how AMD is adapting.
From training to inference
- Earlier AI demand was dominated by training
- Now, inference is growing rapidly and becoming more varied
- AMD has adapted with different GPU configurations for:
- high-performance computing
- inference-heavy workloads
New inference patterns
Papermaster notes that inference is not one thing anymore. Different applications need different optimization goals:
- low latency for “vibe coding” and interactive use
- high throughput for larger batch workloads
- large context handling for agentic workflows and long prompts
Small language models at the edge
- Papermaster expects more workloads to move to:
- small language models
- edge devices
- PCs and embedded systems
- The cloud and large clusters will still matter for training and large-scale fine-tuning
Rack-Scale Systems and Data Center Scaling
The interview also covers AMD’s move beyond chips into rack-level architecture.
Rack-level optimization
- AMD now designs around full systems, not just individual processors
- Example: a rack-scale AI reference architecture with:
- CPUs
- GPUs
- memory
- networking
- scale-up and scale-out connectivity
Why it matters
- Large AI clusters need more than fast chips
- They require carefully designed:
- power delivery
- cooling
- networking
- memory placement
- interconnect strategy
Scaling up and out
- One rack can serve as a building block
- Multiple racks can connect into very large clusters
- This supports everything from enterprise deployments to frontier-model training
Manufacturing, Supply Chain, and Bottlenecks
Papermaster makes clear that chip strategy is as much about supply chain planning as design.
The real constraints
- Semiconductor manufacturing is slow compared with software
- Demand must be forecast years in advance
- AMD works closely with partners like TSMC and memory suppliers
Chiplets help here too
- Easier to manufacture
- Better yield
- More flexibility in production planning
Industry-wide pressure
- AI has increased demand for:
- GPUs
- CPUs
- memory
- data center power
- AMD expects this to also create pressure in consumer products like PCs and phones
Power Efficiency: “Tokens per Watt per Dollar”
A major recurring theme is energy efficiency.
AMD’s approach
Papermaster says efficiency is improved across the full stack:
- transistor design
- chip architecture
- chiplet interconnects
- packaging
- power delivery
- memory hierarchy
- software optimization
- data center controls
Key efficiency ideas
- reduce data movement
- use coherent CPU/GPU memory access
- improve compiler and kernel efficiency
- optimize agentic workflows
- manage power spikes in data centers
Future hardware directions
- AMD is investing in photonic interconnects for future systems
- It is also using 3D-stacked SRAM/cache techniques to improve performance and energy efficiency
AI Helping AMD Build Better Chips
One of the most interesting points in the interview is that AMD uses AI internally to improve chip design.
How AMD uses AI
- Fine-tuned proprietary models trained on AMD’s design history
- AI-assisted:
- chip design
- validation
- compilation
- kernel development
- workflow optimization
What has changed recently
- Earlier AI gains were mostly point improvements
- More recently, agentic workflows have produced larger productivity gains
- These systems can explore many more options than humans alone, sometimes finding unexpected performance wins
Future Outlook
Papermaster sees the next phase of AI hardware as one of increasing specialization and collaboration.
What’s next
- More tailored inference optimization
- More collaboration between hardware, software, and data center operators
- More diverse AI workloads across:
- finance
- oil and gas
- science and research
- enterprise applications
AMD’s position
- Continue offering both:
- high-precision computing like FP32 and FP64
- AI-friendly formats like FP4 and FP8
- Keep customer choice central
- Support everything from supercomputers to embedded edge systems
Key Takeaways
- AMD’s advantage comes from combining CPU + GPU + chiplets + open software
- AI is shifting demand from pure training toward inference, agentic workflows, and small language models
- Efficiency is now a core competitive metric: tokens per watt, not just raw speed
- The data center is becoming a system-design problem, not just a chip-design problem
- AMD sees openness and flexibility as a major differentiator versus more closed ecosystems
Notable Insight
“More than ever, the industry has to band together and collaborate to drive energy efficiency.”
Papermaster’s broader message is that AI progress depends on the whole stack working together: silicon, software, systems, and supply chain.
