The history of servers, the cloud, and what’s next – with Oxide

Summary of The history of servers, the cloud, and what’s next – with Oxide

by Gergely Orosz

1h 39mDecember 17, 2025

Overview of The history of servers, the cloud, and what’s next — with Oxide

This episode features Brian Cantrell (ex‑Sun, Joyent; co‑founder of Oxide) interviewed by Gergely Orosz. They trace server and cloud history from the late 1990s to today, explain why hyperscalers build custom hardware, describe what it takes to design a modern server and its software stack, and explain how Oxide built a full-stack, open‑source on‑prem cloud rack. The conversation also covers how AI tools are used (and where they fall short), Oxide’s unusual hiring/compensation choices, remote hardware work, and practical advice for engineers.

Key takeaways

  • Boom vs bust: The dot‑com boom produced lots of activity but deeper technical innovation (ZFS, DTrace, Solaris OSS) happened in the post‑bust period when teams were forced to focus.
  • Major architectural shifts: Linux + x86 displacement of RISC; rise of cloud (S3/EC2); Kubernetes enabled cloud portability and multi‑cloud.
  • Hyperscalers (Google, Meta, Microsoft, Amazon) design their own hardware and software because off‑the‑shelf servers are built for small racks, not warehouse‑scale infrastructure.
  • Oxide intentionally built hardware + network + software from a clean sheet to deliver turn‑key on‑prem cloud racks; everything is open source.
  • AI/LLMs are useful for writerly tasks, polishing code, test-case generation and documentation comprehension — but largely unhelpful for low‑level hardware debugging and initial bring‑ups.
  • Building hardware is a fractally complex, analog‑driven engineering challenge requiring fearless EEs, tooling, and coordinated teams — intelligence alone isn’t enough.

Timeline & evolution (late 1990s → today)

  • Late 1990s / Sun era
    • Solaris + SPARC systems dominated early web/database deployments.
    • Java and the early web created enormous demand; Sun/Cisco were common platform choices.
  • Dot‑com boom → bust (2000–2001)
    • Boom produced frothy expectations; the bust forced focus and produced deep technical work (ZFS, DTrace, SMF).
    • Lesson: innovation often needs constraints/desperation.
  • 2000s shifts
    • Linux matured and gained corporate backing; x86 overtook RISC due to microarchitecture advances (memory wall, speculative execution).
    • Google and other hyperscalers moved to their own hardware early.
  • Cloud era (S3, EC2)
    • AWS execution (price cuts, new managed services) made public cloud dominant and attractive.
    • Kubernetes (post‑2014) provided an abstraction enabling cloud neutrality and easier multi‑cloud adoption.
  • Today
    • Hyperscalers largely design custom servers and data center approaches (DC busbar, power shelves), and build massive internal software/tooling stacks for safe deployments, observability and experimentation.

Why hyperscalers build custom hardware

  • Off‑the‑shelf servers (Dell/HP/Supermicro) are designed for small rack deployments with AC power per chassis and lots of cabling — not for buying thousands of units.
  • At hyperscale you need different environmental design: DC busbars, centralized rectification (power shelves), blind‑mate power/networking, custom switches and optimized thermal/power characteristics.
  • Custom hardware + custom software yields better economics, scale, reliability, and operability.

Oxide: product and major engineering decisions

Clean‑sheet approach

  • Oxide designed racks from first principles rather than reusing commodity server chassis.
  • Goals: turn‑key delivery (wheel rack into data center, blind‑mate power & network, minimal operator cabling), strong economics for at‑scale on‑prem cloud.

Hardware highlights

  • Rack contains ~32 compute sleds; sleds blind‑mate into power and network (no external cabling, reduced miswiring).
  • DC busbar + power shelf architecture (AC -> centralized rectifier -> DC distribution).
  • Custom switch development was essential:
    • Needed programmability and control; chose Intel Tofino (Stofino) for P4 programmability vs. Broadcom proprietary silicon.
    • Building a switch is a second full computer effort (networking, firmware, silicon, integration).
  • Electronics challenges: DDR5/PCIe high‑speed signal integrity, power sequencing, RF/analog issues — boards are effectively analog systems where timing and SI matter.

Software stack

  • Service processor OS written from scratch in Rust (project name: Hubris); debugger named Humility.
  • Control plane (distributed system) — originally named Omicron — handles provisioning, API/CLI, VM lifecycle, storage attachment, updates, etc.
  • Oxide ships the full software stack (service processor, hypervisor/control plane, orchestration) and open‑sources it; revenue model is hardware + services/support.

Update/upgradability — a core challenge

  • Updating a rack is updating a distributed system with many interdependent components: service processor, drive firmware, host OS, control plane, database schemas, etc.
  • Oxide implemented a minimum viable update (“Mupdate”) that parks the control plane, takes the rack offline for updates, then brings it back — a first step that allowed incremental progress to seamless, live rolling updates.
  • Reasoning through hybrid/versioned states during updates is fractally complex and required deliberate scope control and quality prioritization.

AI (LLMs & agents) — how Oxide uses them, and limits

  • Useful areas:
    • Document comprehension, summarization, and glossary generation.
    • Editing, polish and phrasing for technical writing.
    • Generating test cases, small code idioms, and suggestions for idiomatic Rust snippets.
    • Developer productivity tools (cloud code, autocomplete) for routine/boilerplate tasks.
  • Limited / ineffective areas:
    • Hardware design and low‑level debugging (board bring‑ups, RF/SI issues, firmware interactions). Real problem example: CPU repeatedly resetting due to missing acknowledgement from voltage regulator (firmware bug in regulator controller); diagnosing required cross‑domain hardware + firmware reasoning and was not LLM‑solvable.
    • LLMs lack goals, accountability, and robust reasoning across messy physical systems.
  • Philosophy: LLMs are powerful tools and tutors; use them, but don’t anthropomorphize or over‑rely on them. They help with polishing and small tasks more than being at the epicenter of complex technical creation.

Company culture, hiring & compensation

  • Oxide is fully open source and transparent about tech and processes.
  • Unusual compensation policy: transparent and uniform base pays across roles (same baseline for EE, SWE, QA, support). This attracted nontraditional talent (e.g., world‑class QA and support).
  • Remote/hybrid model works because much HW work is driven by software tools (EDA, layout, SolidWorks) and many tasks can be done remotely; hands‑on manufacturing/bring‑up trips still happen (Benchmark Electronics in Minnesota).
  • Growth challenge: preserve hiring discipline and cultural values as headcount grows (team size ~85 at the time of the episode).

Practical advice & perspectives for engineers

  • Mindset: focus on getting better every day; mastery and continual improvement trump short‑term optimization for “getting hired.”
  • Use LLMs to accelerate learning and do iterative improvement, but validate outputs and retain accountability.
  • Teamwork, discipline, and diverse problem‑solving styles are essential for tough, messy engineering problems — especially hardware.
  • Opportunities exist for new company formation and building novel products — embrace risk and imagination.

Notable quotes / insights

  • “We did much more technically interesting work in the bust than we did in the boom.” — innovation is often driven by constraint.
  • “Intelligence is not enough.” — building complex hardware systems needs experience, persistence, testing, and team coordination beyond raw cleverness.
  • “LLMs are powerful tools, but they don’t have goals or accountability.” — use responsibly.

Recommended reading (Brian’s picks)

  • Soul of a New Machine — Tracy Kidder (classic engineering narrative about building a new computer)
  • Skunk Works — Ben Rich (Lockheed’s secret projects and how focused teams solve impossible problems)
  • Steve Jobs and the Next Big Thing — Randall Stross (deep look at Next and lessons from Jobs’ failures and comeback)

Actionable items / recommendations

  • For engineers: learn fundamentals, focus on sustained improvement, use LLMs as tutors/editors, and seek diverse teams to learn from.
  • For engineering leaders: preserve hiring discipline and culture during growth; treat LLMs as tools, not replacements; consider full‑stack integration (HW+SW) and open source as a strategic approach when appropriate.
  • For organizations thinking about on‑prem cloud: evaluate economics at scale, plan for operability (blind‑mate, centralized power, integrated switching), and budget for the significant software investment (control plane, update/system management).

If you want the core technical examples or links (Oxide GitHub, Hubris, Omicron, video talks referenced), they are public and were discussed in the episode — search Oxide’s GitHub and the show notes for direct links.