Summary of The history of servers, the cloud, and what’s next – with Oxide Podcast Episode by The Pragmatic Engineer

Overview of The history of servers, the cloud, and what’s next — with Oxide

This episode features Brian Cantrell (ex‑Sun, Joyent; co‑founder of Oxide) interviewed by Gergely Orosz. They trace server and cloud history from the late 1990s to today, explain why hyperscalers build custom hardware, describe what it takes to design a modern server and its software stack, and explain how Oxide built a full-stack, open‑source on‑prem cloud rack. The conversation also covers how AI tools are used (and where they fall short), Oxide’s unusual hiring/compensation choices, remote hardware work, and practical advice for engineers.

Key takeaways

Boom vs bust: The dot‑com boom produced lots of activity but deeper technical innovation (ZFS, DTrace, Solaris OSS) happened in the post‑bust period when teams were forced to focus.
Major architectural shifts: Linux + x86 displacement of RISC; rise of cloud (S3/EC2); Kubernetes enabled cloud portability and multi‑cloud.
Hyperscalers (Google, Meta, Microsoft, Amazon) design their own hardware and software because off‑the‑shelf servers are built for small racks, not warehouse‑scale infrastructure.
Oxide intentionally built hardware + network + software from a clean sheet to deliver turn‑key on‑prem cloud racks; everything is open source.
AI/LLMs are useful for writerly tasks, polishing code, test-case generation and documentation comprehension — but largely unhelpful for low‑level hardware debugging and initial bring‑ups.
Building hardware is a fractally complex, analog‑driven engineering challenge requiring fearless EEs, tooling, and coordinated teams — intelligence alone isn’t enough.

Timeline & evolution (late 1990s → today)

Late 1990s / Sun era
- Solaris + SPARC systems dominated early web/database deployments.
- Java and the early web created enormous demand; Sun/Cisco were common platform choices.
Dot‑com boom → bust (2000–2001)
- Boom produced frothy expectations; the bust forced focus and produced deep technical work (ZFS, DTrace, SMF).
- Lesson: innovation often needs constraints/desperation.
2000s shifts
- Linux matured and gained corporate backing; x86 overtook RISC due to microarchitecture advances (memory wall, speculative execution).
- Google and other hyperscalers moved to their own hardware early.
Cloud era (S3, EC2)
- AWS execution (price cuts, new managed services) made public cloud dominant and attractive.
- Kubernetes (post‑2014) provided an abstraction enabling cloud neutrality and easier multi‑cloud adoption.
Today
- Hyperscalers largely design custom servers and data center approaches (DC busbar, power shelves), and build massive internal software/tooling stacks for safe deployments, observability and experimentation.

Why hyperscalers build custom hardware

Off‑the‑shelf servers (Dell/HP/Supermicro) are designed for small rack deployments with AC power per chassis and lots of cabling — not for buying thousands of units.
At hyperscale you need different environmental design: DC busbars, centralized rectification (power shelves), blind‑mate power/networking, custom switches and optimized thermal/power characteristics.
Custom hardware + custom software yields better economics, scale, reliability, and operability.

Oxide: product and major engineering decisions

Clean‑sheet approach

Oxide designed racks from first principles rather than reusing commodity server chassis.
Goals: turn‑key delivery (wheel rack into data center, blind‑mate power & network, minimal operator cabling), strong economics for at‑scale on‑prem cloud.

Hardware highlights

Rack contains ~32 compute sleds; sleds blind‑mate into power and network (no external cabling, reduced miswiring).
DC busbar + power shelf architecture (AC -> centralized rectifier -> DC distribution).
Custom switch development was essential:
- Needed programmability and control; chose Intel Tofino (Stofino) for P4 programmability vs. Broadcom proprietary silicon.
- Building a switch is a second full computer effort (networking, firmware, silicon, integration).
Electronics challenges: DDR5/PCIe high‑speed signal integrity, power sequencing, RF/analog issues — boards are effectively analog systems where timing and SI matter.

Software stack

Service processor OS written from scratch in Rust (project name: Hubris); debugger named Humility.
Control plane (distributed system) — originally named Omicron — handles provisioning, API/CLI, VM lifecycle, storage attachment, updates, etc.
Oxide ships the full software stack (service processor, hypervisor/control plane, orchestration) and open‑sources it; revenue model is hardware + services/support.

Update/upgradability — a core challenge

Updating a rack is updating a distributed system with many interdependent components: service processor, drive firmware, host OS, control plane, database schemas, etc.
Oxide implemented a minimum viable update (“Mupdate”) that parks the control plane, takes the rack offline for updates, then brings it back — a first step that allowed incremental progress to seamless, live rolling updates.
Reasoning through hybrid/versioned states during updates is fractally complex and required deliberate scope control and quality prioritization.

AI (LLMs & agents) — how Oxide uses them, and limits

Useful areas:
- Document comprehension, summarization, and glossary generation.
- Editing, polish and phrasing for technical writing.
- Generating test cases, small code idioms, and suggestions for idiomatic Rust snippets.
- Developer productivity tools (cloud code, autocomplete) for routine/boilerplate tasks.
Limited / ineffective areas:
- Hardware design and low‑level debugging (board bring‑ups, RF/SI issues, firmware interactions). Real problem example: CPU repeatedly resetting due to missing acknowledgement from voltage regulator (firmware bug in regulator controller); diagnosing required cross‑domain hardware + firmware reasoning and was not LLM‑solvable.
- LLMs lack goals, accountability, and robust reasoning across messy physical systems.
Philosophy: LLMs are powerful tools and tutors; use them, but don’t anthropomorphize or over‑rely on them. They help with polishing and small tasks more than being at the epicenter of complex technical creation.

Company culture, hiring & compensation

Oxide is fully open source and transparent about tech and processes.
Unusual compensation policy: transparent and uniform base pays across roles (same baseline for EE, SWE, QA, support). This attracted nontraditional talent (e.g., world‑class QA and support).
Remote/hybrid model works because much HW work is driven by software tools (EDA, layout, SolidWorks) and many tasks can be done remotely; hands‑on manufacturing/bring‑up trips still happen (Benchmark Electronics in Minnesota).
Growth challenge: preserve hiring discipline and cultural values as headcount grows (team size ~85 at the time of the episode).

Practical advice & perspectives for engineers

Mindset: focus on getting better every day; mastery and continual improvement trump short‑term optimization for “getting hired.”
Use LLMs to accelerate learning and do iterative improvement, but validate outputs and retain accountability.
Teamwork, discipline, and diverse problem‑solving styles are essential for tough, messy engineering problems — especially hardware.
Opportunities exist for new company formation and building novel products — embrace risk and imagination.

Notable quotes / insights

“We did much more technically interesting work in the bust than we did in the boom.” — innovation is often driven by constraint.
“Intelligence is not enough.” — building complex hardware systems needs experience, persistence, testing, and team coordination beyond raw cleverness.
“LLMs are powerful tools, but they don’t have goals or accountability.” — use responsibly.

Actionable items / recommendations

For engineers: learn fundamentals, focus on sustained improvement, use LLMs as tutors/editors, and seek diverse teams to learn from.
For engineering leaders: preserve hiring discipline and culture during growth; treat LLMs as tools, not replacements; consider full‑stack integration (HW+SW) and open source as a strategic approach when appropriate.
For organizations thinking about on‑prem cloud: evaluate economics at scale, plan for operability (blind‑mate, centralized power, integrated switching), and budget for the significant software investment (control plane, update/system management).

If you want the core technical examples or links (Oxide GitHub, Hubris, Omicron, video talks referenced), they are public and were discussed in the episode — search Oxide’s GitHub and the show notes for direct links.

Summary of The history of servers, the cloud, and what’s next – with Oxide

The Pragmatic Engineer
by Gergely Orosz

Overview of The history of servers, the cloud, and what’s next — with Oxide

Key takeaways

Timeline & evolution (late 1990s → today)

Why hyperscalers build custom hardware

Oxide: product and major engineering decisions

Clean‑sheet approach

Hardware highlights

Software stack

Update/upgradability — a core challenge

AI (LLMs & agents) — how Oxide uses them, and limits

Company culture, hiring & compensation

Practical advice & perspectives for engineers

Notable quotes / insights

Recommended reading (Brian’s picks)

Actionable items / recommendations

Summary of The history of servers, the cloud, and what’s next – with Oxide

The Pragmatic Engineerby Gergely Orosz

Overview of The history of servers, the cloud, and what’s next — with Oxide

Key takeaways

Timeline & evolution (late 1990s → today)

Why hyperscalers build custom hardware

Oxide: product and major engineering decisions

Clean‑sheet approach

Hardware highlights

Software stack

Update/upgradability — a core challenge

AI (LLMs & agents) — how Oxide uses them, and limits

Company culture, hiring & compensation

Practical advice & perspectives for engineers

Notable quotes / insights

Recommended reading (Brian’s picks)

Actionable items / recommendations

The Pragmatic Engineer
by Gergely Orosz