Summary of ‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing Podcast Episode by Hard Fork

Overview of Hard Fork — "‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing"

This episode of The New York Times’ Hard Fork (hosts Kevin Roose and Casey Newton) covers three linked AI topics: recent tech layoffs framed as driven by AI (and whether that’s sincere or "AI‑washing"), why large language models (LLMs) still struggle at high‑quality creative writing (conversation with journalist Jasmine Sun), and a new workplace phenomenon—“tokenmaxxing”—where employees and companies track and gamify AI token consumption.

Big picture takeaways

Many public tech layoffs now invoke AI as a reason, but motives vary: cost-cutting, reshaping skill mixes, signaling to investors, or convenient cover for longer-term mismanagement.
LLMs are highly capable text generators for narrow, verifiable tasks (emails, summaries, code), but struggle with literary, grounded, voice-driven writing because of alignment/post‑training choices and lack of lived experience.
Inside tech firms a new metric—token consumption—is becoming a proxy for AI adoption and (supposed) productivity. Leaderboards incentivize token use but create perverse incentives and serious budget risks.

Layoffs and “AI‑Washing”

Context: Recent layoffs at Atlassian, Block (Square), and reported large cuts at Meta have been publicly linked to AI. CEOs often justify cuts as adapting to a new AI-enabled operating model.
Key nuances:
- Atlassian: CEO framed AI as changing required skills and role mixes; likely a mix of business pressures (SaaS pricing headwinds) and wanting an AI-forward narrative.
- Block: Rapid headcount growth in prior years, expensive corporate spending (example: large in‑person event), and stock pressure make management decisions complex—AI may be part of the explanation or a convenient framing.
- Meta: Publicly pushing massive capex on AI infrastructure while saying fewer people are needed to do certain projects; but Meta also faces technical setbacks and internal reorganizations in AI teams.
Dynamics and incentives:
- Investor signaling: Claiming AI focus can restore confidence and boost stock price.
- Cost-shift: Companies are shifting spend from human labor to data centers and AI infrastructure (the bet: AI systems will eventually replace or augment many roles).
- Worker impact: Heightened anxiety; ambiguous guidance on using AI (use it to appear adaptive vs. prove your job automatable); potential for unionization pressures to rise.
Assessment:
- “AI‑washing” can be real when AI is used primarily as a narrative to justify layoffs. But at many firms AI is both a real factor and a convenient cover—each case needs granular analysis (which teams, roles, and tasks are cut).

Why LLMs struggle with “good” creative writing (Jasmine Sun interview)

Core claim: LLMs are often excellent at utility writing (emails, summaries, coded outputs) but fall short of literary, voice‑driven, emotionally grounded writing.
Main reasons:
- Post‑training alignment (RLHF): Modern models are tuned to be helpful, safe, and consistent; that alignment tends to produce a bland, assistant‑like persona and reduces the irregular, surprising, or risky choices that make great creative writing.
- Rubrics and human raters: Graders often use simplistic or misaligned evaluation criteria (e.g., counting punctuation, over‑weighting factuality), training models toward safe, generic outputs.
- Lack of lived experience and stakes: Great writers ground metaphors, voice, and specificity in life experience; models generate patterns from web data and lack authentic, embodied perspective.
- Verifiability bias: Domains like coding have objective tests (code runs), which make automated evaluation and improvement straightforward; creative quality is subjective and hard to formalize.
Historical observation: Early models (GPT‑2/GPT‑3 era) produced weirder, sometimes more surprising stylistic outputs; later iterations traded some of that unpredictability for reliability.
Practical use cases and human+AI workflows:
- LLMs can excel as editors or research assistants if calibrated to an individual writer’s taste. Example: Jasmine uses Claude with her archive and retro notes to co-develop personalized rubrics for ideation, structure, prose feedback, and fact‑checking—treating the model as an editor, not a writer.
- “Centaur” model (human+AI) approach is currently the most productive for creative or high‑stakes writing.
Outlook:
- Not a categorical “never.” Models can be improved if labs invest in writing-specific fine‑tuning and rethink alignment metrics—but economic incentives (prioritizing coding/agentic features) may make that a lower priority.
- Cultural bias: readers often devalue writing once they know it’s AI-generated; blind tests show mixed results.

Tokenmaxxing: leaderboards, tokens, and corporate incentives

What is a token? The atomic unit LLM providers use to measure input/output usage (roughly fragments of words). Agentic tools and long sessions are highly token‑hungry.
The trend: Some firms have internal leaderboards showing employees’ token consumption as a proxy for AI adoption/productivity.
Extreme examples:
- Reported OpenAI internal high: ~210 billion tokens used by a single employee in a 7‑day stretch (not all first‑time generation; includes cached tokens).
- Top Claude Code user reportedly spent >$150,000 on tokens in one month (for perspective: employees often get these tokens free; costs are borne by employers).
Problems and perverse incentives:
- Goodhart’s Law: when token use becomes a target, it ceases to be a reliable measure of productivity—employees may waste tokens to climb leaderboards or run side projects funded by the company.
- Budget risk: heavy token users can dramatically exceed expected AI costs; some non‑tech teams are already pressured about AI use in performance reviews.
- Morale and gaming: public leaderboards can harm collaboration and incentivize showy token usage over meaningful outcomes.
Why some companies do it: They want to accelerate adoption, identify “early adopters,” and demonstrate AI‑forward cultures. Some view token use as an observable proxy for experimentation.
Practical scale effects: Individual users consuming company‑scale quantities of tokens can create distortions in how companies allocate resources and assess staff.
Guidance emerging from the episode:
- Leaderboards are generally a bad productivity metric; better to measure outcomes/ROI.
- Companies should set clear token budgets, require justification/outputs for large consumption, and monitor alignment between token spend and business value.

Notable quotes & lines to remember

“Projects that used to require big teams now can be accomplished by a single very talented person.” — Mark Zuckerberg (as discussed in the episode)
“AI‑washing”: using AI as a narrative justification for layoffs or restructuring.
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” — historical analogy used to criticize crude productivity metrics.
Jasmine Sun: alignment/regulation after training turned models into “helpful assistants,” reducing surprising creative voice; the solution may be centaur workflows and different evaluation incentives.

Recommendations / Action items

For employees:
- Learn to use AI as an augment (editor, researcher, ideation partner) and document how AI contributions lead to measurable outcomes.
- Track token usage if your employer does; be able to justify spending in terms of productivity/impact.
- Consider organizing (or collective bargaining) to negotiate retraining, redeployment, and protections as job roles shift.
For managers / execs:
- Avoid public leaderboards as a productivity metric; prefer outcome-based measures and ROI for token spend.
- Set token budgets, require clear project charters for high‑consumption experiments, and audit token ROI.
- Be transparent about rationales for layoffs that cite AI: explain what is changing in workflows, skill mixes, and plans for retraining.
For AI labs and product teams:
- If better creative writing is a priority, invest in dedicated fine‑tuning, improved writing rubrics, and more nuanced human feedback that values voice and lived-grounded specificity.
- Reexamine alignment objectives where "helpfulness" crowds out creative risk/voice when the use case demands it.

Why it matters

The three topics intersect: corporate narratives about AI change labor markets; the technical choices (alignment, metrics) determine what LLMs become good at; and new operational metrics (tokens) will shape behavior, incentives, and budgets across industries. The episode argues we should scrutinize incentives at every level—technical, managerial, and economic—rather than accepting headline claims about AI capability or inevitability.

Summary of ‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing

Hard Forkby The New York Times