Latent Space: The AI Engineer Podcast | Podcast Summaries

Latent Space: The AI Engineer Podcast artwork

Technology

Science

by swyx + Alessio

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny...

12 episodes summarized

Episodes

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

FULL

Tickets for <a target="_blank" href="https://ai.engineer/miami">AIE Miami</a> and <a target="_blank" href="https://www.ai.engineer/europe">AIE Europe</a> are on sale now!From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent<a target="_blank" href="https://www.goodfire.ai/blog/our-series-b"> </a><a target="_blank" href="https://www.goodfire.ai/blog/our-series-b">$150M Series B funding round at a $1.25B valuation</a>.In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire’s core bet: that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. <a target="_blank" href="https://www.goodfire.ai/blog/on-optimism-for-interpretability">Goodfire’s answer</a> is to build a bi-directional interface between humans and models: read what’s happening inside, edit it surgically, and eventually use interpretability during training so customization isn’t just brute-force guesswork.Mark and Myra walk through what that looks like when you stop treating interpretability like a lab demo and start treating it like infrastructure: lightweight probes that add near-zero latency, token-level safety filters that can run at inference time, and interpretability workflows that survive messy constraints (multilingual inputs, synthetic→real transfer, regulated domains, no access to sensitive data). We also get a live window into what “frontier-scale interp” means operationally (i.e. steering a trillion-parameter model in real time by targeting internal features) plus why the same tooling generalizes cleanly from language models to genomics, medical imaging, and “pixel-space” world models.We discuss:* Myra + Mark’s path: Palantir (health systems, forward-deployed engineering) → Goodfire early team; Two Sigma → Head of Product, translating frontier interpretability research into a platform and real-world deployments* What “interpretability” actually means in practice: not just post-hoc poking, but a broader “science of deep learning” approach across the full AI lifecycle (data curation → post-training → internal representations → model design)* Why post-training is the first big wedge: “surgical edits” for unintended behaviors likereward hacking, sycophancy, noise learned during customization plus the dream of targeted unlearning and bias removal without wrecking capabilities* SAEs vs probes in the real world: why SAE feature spaces sometimes underperform classifiers trained on raw activations for downstream detection tasks (hallucination, harmful intent, PII), and what that implies about “clean concept spaces”* <a target="_blank" href="https://www.goodfire.ai/research/rakuten-sae-probes-for-pii-detection">Rakuten in production</a>: deploying interpretability-based token-level PII detection at inference time to prevent routing private data to downstream providers plus the gnarly constraints: no training on real customer PII, synthetic→real transfer, English + Japanese, and tokenization quirks* Why interp can be operationally cheaper than LLM-judge guardrails: probes are lightweight, low-latency, and don’t require hosting a second large model in the loop* Real-time steering at frontier scale: a demo of steering Kimi K2 (~1T params) live and finding features via SAE pipelines, auto-labeling via LLMs, and toggling a “Gen-Z slang” feature across multiple layers without breaking tool use* Hallucinations as an internal signal: the case that models have latent uncertainty / “user-pleasing” circuitry you can detect and potentially mitigate more directly than black-box methods* <a target="_blank" href="https://www.goodfire.ai/blog/feature-steering-for-reliable-and-expressive-ai-engineering">Steering vs prompting</a>: the emerging view that activation steering and in-context learning are more closely connected than people think, including work mapping between the two (even for jailbreak-style behaviors)* Interpretability for science: using the same tooling across domains (genomics, medical imaging, materials) to debug spurious correlations and extract new knowledge up to and including early biomarker discovery work with major partners* World models + “pixel-space” interpretability: why vision/video models make concepts easier to see, how that accelerates the feedback loop, and why robotics/world-model partners are especially interesting design partners* The north star: moving from “data in, weights out” to intentional model design where experts can impart goals and constraints directly, not just via reward signals and brute-force post-training—Goodfire AI* Website: <a target="_blank" href="https://goodfire.ai">https://goodfire.ai</a>* LinkedIn: <a target="_blank" href="https://www.linkedin.com/company/goodfire-ai/">https://www.linkedin.com/company/goodfire-ai/</a>* X: <a target="_blank" href="https://x.com/GoodfireAI">https://x.com/GoodfireAI</a>Myra Deng* Website: <a target="_blank" href="https://myradeng.com/">https://myradeng.com/</a>* LinkedIn: <a target="_blank" href="https://www.linkedin.com/in/myra-deng/">https://www.linkedin.com/in/myra-deng/</a>* X: <a target="_blank" href="https://x.com/myra_deng">https://x.com/myra_deng</a>Mark Bissell* LinkedIn: <a target="_blank" href="https://www.linkedin.com/in/mark-bissell/">https://www.linkedin.com/in/mark-bissell/</a>* X: <a target="_blank" href="https://x.com/MarkMBissell">https://x.com/MarkMBissell</a>Full Video EpisodeTimestamps00:00:00 Introduction00:00:05 Introduction to the Latent Space Podcast and Guests from Goodfire00:00:29 What is Goodfire? Mission and Focus on Interpretability00:01:01 Goodfire’s Practical Approach to Interpretability00:01:37 Goodfire’s Series B Fundraise Announcement00:02:04 Backgrounds of Mark and Myra from Goodfire00:02:51 Team Structure and Roles at Goodfire00:05:13 What is Interpretability? Definitions and Techniques00:05:30 Understanding Errors00:07:29 Post-training vs. Pre-training Interpretability Applications00:08:51 Using Interpretability to Remove Unwanted Behaviors00:10:09 Grokking, Double Descent, and Generalization in Models00:10:15 404 Not Found Explained00:12:06 Subliminal Learning and Hidden Biases in Models00:14:07 How Goodfire Chooses Research Directions and Projects00:15:00 Troubleshooting Errors00:16:04 Limitations of SAEs and Probes in Interpretability00:18:14 Rakuten Case Study: Production Deployment of Interpretability00:20:45 Conclusion00:21:12 Efficiency Benefits of Interpretability Techniques00:21:26 Live Demo: Real-Time Steering in a Trillion Parameter Model00:25:15 How Steering Features are Identified and Labeled00:26:51 Detecting and Mitigating Hallucinations Using Interpretability00:31:20 Equivalence of Activation Steering and Prompting00:34:06 Comparing Steering with Fine-Tuning and LoRA Techniques00:36:04 Model Design and the Future of Intentional AI Development00:38:09 Getting Started in Mechinterp: Resources, Programs, and Open Problems00:40:51 Industry Applications and the Rise of Mechinterp in Practice00:41:39 Interpretability for Code Models and Real-World Usage00:43:07 Making Steering Useful for More Than Stylistic Edits00:46:17 Applying Interpretability to Healthcare and Scientific Discovery00:49:15 Why Interpretability is Crucial in High-Stakes Domains like Healthcare00:52:03 Call for Design Partners Across Domains00:54:18 Interest in World Models and Visual Interpretability00:57:22 Sci-Fi Inspiration: Ted Chiang and Interpretability01:00:14 Interpretability, Safety, and Alignment Perspectives01:04:27 Weak-to-Strong Generalization and Future Alignment Challenges01:05:38 Final Thoughts and Hiring/Collaboration Opportunities at GoodfireTranscriptShawn Wang [00:00:05]: So welcome to the Latent Space pod. We’re back in the studio with our special MechInterp co-host, Vibhu. Welcome. Mochi, Mochi’s special co-host. And Mochi, the mechanistic interpretability doggo. We have with us Mark and Myra from Goodfire. Welcome. Thanks for having us on. Maybe we can sort of introduce Goodfire and then introduce you guys. How do you introduce Goodfire today?Myra Deng [00:00:29]: Yeah, it’s a great question. So Goodfire, we like to say, is an AI research lab that focuses on using interpretability to understand, learn from, and design AI models. And we really believe that interpretability will unlock the new generation, next frontier of safe and powerful AI models. That’s our description right now, and I’m excited to dive more into the work we’re doing to make that happen.Shawn Wang [00:00:55]: Yeah. And there’s always like the official description. Is there an understatement? Is there an unofficial one that sort of resonates more with a different audience?Mark Bissell [00:01:01]: Well, being an AI research lab that’s focused on interpretability, there’s obviously a lot of people have a lot that they think about when they think of interpretability. And I think we have a pretty broad definition of what that means and the types of places that can be applied. And in particular, applying it in production scenarios, in high stakes industries, and really taking it sort of from the research world into the real world. Which, you know. It’s a new field, so that hasn’t been done all that much. And we’re excited about actually seeing that sort of put into practice.Shawn Wang [00:01:37]: Yeah, I would say it wasn’t too long ago that Anthopic was like still putting out like toy models or superposition and that kind of stuff. And I wouldn’t have pegged it to be this far along. When you and I talked at NeurIPS, you were talking a little bit about your production use cases and your customers. And then not to bury the lead, today we’re also announcing the fundraise, your Series B. $150 million. $150 million at a 1.25B valuation. Congrats, Unicorn.Mark Bissell [00:02:02]: Thank you. Yeah, no, things move fast.Shawn Wang [00:02:04]: We were talking to you in December and already some big updates since then. Let’s dive, I guess, into a bit of your backgrounds as well. Mark, you were at Palantir working on health stuff, which is really interesting because the Goodfire has some interesting like health use cases. I don’t know how related they are in practice.Mark Bissell [00:02:22]: Yeah, not super related, but I don’t know. It was helpful context to know what it’s like. Just to work. Just to work with health systems and generally in that domain. Yeah.Shawn Wang [00:02:32]: And Mara, you were at Two Sigma, which actually I was also at Two Sigma back in the day. Wow, nice.Myra Deng [00:02:37]: Did we overlap at all?Shawn Wang [00:02:38]: No, this is when I was briefly a software engineer before I became a sort of developer relations person. And now you’re head of product. What are your sort of respective roles, just to introduce people to like what all gets done in Goodfire?Mark Bissell [00:02:51]: Yeah, prior to Goodfire, I was at Palantir for about three years as a forward deployed engineer, now a hot term. Wasn’t always that way. And as a technical lead on the health care team and at Goodfire, I’m a member of the technical staff. And honestly, that I think is about as specific as like as as I could describe myself because I’ve worked on a range of things. And, you know, it’s it’s a fun time to be at a team that’s still reasonably small. I think when I joined one of the first like ten employees, now we’re above 40, but still, it looks like there’s always a mix of research and engineering and product and all of the above. That needs to get done. And I think everyone across the team is, you know, pretty, pretty switch hitter in the roles they do. So I think you’ve seen some of the stuff that I worked on related to image models, which was sort of like a research demo. More recently, I’ve been working on our scientific discovery team with some of our life sciences partners, but then also building out our core platform for more of like flexing some of the kind of MLE and developer skills as well.Shawn Wang [00:03:53]: Very generalist. And you also had like a very like a founding engineer type role.Myra Deng [00:03:58]: Yeah, yeah.Shawn Wang [00:03:59]: So I also started as I still am a member of technical staff, did a wide range of things from the very beginning, including like finding our office space and all of this, which is we both we both visited when you had that open house thing. It was really nice.Myra Deng [00:04:13]: Thank you. Thank you. Yeah. Plug to come visit our office.Shawn Wang [00:04:15]: It looked like it was like 200 people. It has room for 200 people. But you guys are like 10.Myra Deng [00:04:22]: For a while, it was very empty. But yeah, like like Mark, I spend. A lot of my time as as head of product, I think product is a bit of a weird role these days, but a lot of it is thinking about how do we take our frontier research and really apply it to the most important real world problems and how does that then translate into a platform that’s repeatable or a product and working across, you know, the engineering and research teams to make that happen and also communicating to the world? Like, what is interpretability? What is it used for? What is it good for? Why is it so important? All of these things are part of my day-to-day as well.Shawn Wang [00:05:01]: I love like what is things because that’s a very crisp like starting point for people like coming to a field. They all do a fun thing. Vibhu, why don’t you want to try tackling what is interpretability and then they can correct us.Vibhu Sapra [00:05:13]: Okay, great. So I think like one, just to kick off, it’s a very interesting role to be head of product, right? Because you guys, at least as a lab, you’re more of an applied interp lab, right? Which is pretty different than just normal interp, like a lot of background research. But yeah. You guys actually ship an API to try these things. You have Ember, you have products around it, which not many do. Okay. What is interp? So basically you’re trying to have an understanding of what’s going on in model, like in the model, in the internal. So different approaches to do that. You can do probing, SAEs, transcoders, all this stuff. But basically you have an, you have a hypothesis. You have something that you want to learn about what’s happening in a model internals. And then you’re trying to solve that from there. You can do stuff like you can, you know, you can do activation mapping. You can try to do steering. There’s a lot of stuff that you can do, but the key question is, you know, from input to output, we want to have a better understanding of what’s happening and, you know, how can we, how can we adjust what’s happening on the model internals? How’d I do?Mark Bissell [00:06:12]: That was really good. I think that was great. I think it’s also a, it’s kind of a minefield of a, if you ask 50 people who quote unquote work in interp, like what is interpretability, you’ll probably get 50 different answers. And. Yeah. To some extent also like where, where good fire sits in the space. I think that we’re an AI research company above all else. And interpretability is a, is a set of methods that we think are really useful and worth kind of specializing in, in order to accomplish the goals we want to accomplish. But I think we also sort of see some of the goals as even more broader as, as almost like the science of deep learning and just taking a not black box approach to kind of any part of the like AI development life cycle, whether that. That means using interp for like data curation while you’re training your model or for understanding what happened during post-training or for the, you know, understanding activations and sort of internal representations, what is in there semantically. And then a lot of sort of exciting updates that were, you know, are sort of also part of the, the fundraise around bringing interpretability to training, which I don’t think has been done all that much before. A lot of this stuff is sort of post-talk poking at models as opposed to. To actually using this to intentionally design them.Shawn Wang [00:07:29]: Is this post-training or pre-training or is that not a useful.Myra Deng [00:07:33]: Currently focused on post-training, but there’s no reason the techniques wouldn’t also work in pre-training.Shawn Wang [00:07:38]: Yeah. It seems like it would be more active, applicable post-training because basically I’m thinking like rollouts or like, you know, having different variations of a model that you can tweak with the, with your steering. Yeah.Myra Deng [00:07:50]: And I think in a lot of the news that you’ve seen in, in, on like Twitter or whatever, you’ve seen a lot of unintended. Side effects come out of post-training processes, you know, overly sycophantic models or models that exhibit strange reward hacking behavior. I think these are like extreme examples. There’s also, you know, very, uh, mundane, more mundane, like enterprise use cases where, you know, they try to customize or post-train a model to do something and it learns some noise or it doesn’t appropriately learn the target task. And a big question that we’ve always had is like, how do you use your understanding of what the model knows and what it’s doing to actually guide the learning process?Shawn Wang [00:08:26]: Yeah, I mean, uh, you know, just to anchor this for people, uh, one of the biggest controversies of last year was 4.0 GlazeGate. I’ve never heard of GlazeGate. I didn’t know that was what it was called. The other one, they called it that on the blog post and I was like, well, how did OpenAI call it? Like officially use that term. And I’m like, that’s funny, but like, yeah, I guess it’s the pitch that if they had worked a good fire, they wouldn’t have avoided it. Like, you know what I’m saying?Myra Deng [00:08:51]: I think so. Yeah. Yeah.Mark Bissell [00:08:53]: I think that’s certainly one of the use cases. I think. Yeah. Yeah. I think the reason why post-training is a place where this makes a lot of sense is a lot of what we’re talking about is surgical edits. You know, you want to be able to have expert feedback, very surgically change how your model is doing, whether that is, you know, removing a certain behavior that it has. So, you know, one of the things that we’ve been looking at or is, is another like common area where you would want to make a somewhat surgical edit is some of the models that have say political bias. Like you look at Quen or, um, R1 and they have sort of like this CCP bias.Shawn Wang [00:09:27]: Is there a CCP vector?Mark Bissell [00:09:29]: Well, there’s, there are certainly internal, yeah. Parts of the representation space where you can sort of see where that lives. Yeah. Um, and you want to kind of, you know, extract that piece out.Shawn Wang [00:09:40]: Well, I always say, you know, whenever you find a vector, a fun exercise is just like, make it very negative to see what the opposite of CCP is.Mark Bissell [00:09:47]: The super America, bald eagles flying everywhere. But yeah. So in general, like lots of post-training tasks where you’d want to be able to, to do that. Whether it’s unlearning a certain behavior or, you know, some of the other kind of cases where this comes up is, are you familiar with like the, the grokking behavior? I mean, I know the machine learning term of grokking.Shawn Wang [00:10:09]: Yeah.Mark Bissell [00:10:09]: Sort of this like double descent idea of, of having a model that is able to learn a generalizing, a generalizing solution, as opposed to even if memorization of some task would suffice, you want it to learn the more general way of doing a thing. And so, you know, another. A way that you can think about having surgical access to a model’s internals would be learn from this data, but learn in the right way. If there are many possible, you know, ways to, to do that. Can make interp solve the double descent problem?Shawn Wang [00:10:41]: Depends, I guess, on how you. Okay. So I, I, I viewed that double descent as a problem because then you’re like, well, if the loss curves level out, then you’re done, but maybe you’re not done. Right. Right. But like, if you actually can interpret what is a generalizing or what you’re doing. What is, what is still changing, even though the loss is not changing, then maybe you, you can actually not view it as a double descent problem. And actually you’re just sort of translating the space in which you view loss and like, and then you have a smooth curve. Yeah.Mark Bissell [00:11:11]: I think that’s certainly like the domain of, of problems that we’re, that we’re looking to get.Shawn Wang [00:11:15]: Yeah. To me, like double descent is like the biggest thing to like ML research where like, if you believe in scaling, then you don’t need, you need to know where to scale. And. But if you believe in double descent, then you don’t, you don’t believe in anything where like anything levels off, like.Vibhu Sapra [00:11:30]: I mean, also tendentially there’s like, okay, when you talk about the China vector, right. There’s the subliminal learning work. It was from the anthropic fellows program where basically you can have hidden biases in a model. And as you distill down or, you know, as you train on distilled data, those biases always show up, even if like you explicitly try to not train on them. So, you know, it’s just like another use case of. Okay. If we can interpret what’s happening in post-training, you know, can we clear some of this? Can we even determine what’s there? Because yeah, it’s just like some worrying research that’s out there that shows, you know, we really don’t know what’s going on.Mark Bissell [00:12:06]: That is. Yeah. I think that’s the biggest sentiment that we’re sort of hoping to tackle. Nobody knows what’s going on. Right. Like subliminal learning is just an insane concept when you think about it. Right. Train a model on not even the logits, literally the output text of a bunch of random numbers. And now your model loves owls. And you see behaviors like that, that are just, they defy, they defy intuition. And, and there are mathematical explanations that you can get into, but. I mean.Shawn Wang [00:12:34]: It feels so early days. Objectively, there are a sequence of numbers that are more owl-like than others. There, there should be.Mark Bissell [00:12:40]: According to, according to certain models. Right. It’s interesting. I think it only applies to models that were initialized from the same starting Z. Usually, yes.Shawn Wang [00:12:49]: But I mean, I think that’s a, that’s a cheat code because there’s not enough compute. But like if you believe in like platonic representation, like probably it will transfer across different models as well. Oh, you think so?Mark Bissell [00:13:00]: I think of it more as a statistical artifact of models initialized from the same seed sort of. There’s something that is like path dependent from that seed that might cause certain overlaps in the latent space and then sort of doing this distillation. Yeah. Like it pushes it towards having certain other tendencies.Vibhu Sapra [00:13:24]: Got it. I think there’s like a bunch of these open-ended questions, right? Like you can’t train in new stuff during the RL phase, right? RL only reorganizes weights and you can only do stuff that’s somewhat there in your base model. You’re not learning new stuff. You’re just reordering chains and stuff. But okay. My broader question is when you guys work at an interp lab, how do you decide what to work on and what’s kind of the thought process? Right. Because we can ramble for hours. Okay. I want to know this. I want to know that. But like, how do you concretely like, you know, what’s the workflow? Okay. There’s like approaches towards solving a problem, right? I can try prompting. I can look at chain of thought. I can train probes, SAEs. But how do you determine, you know, like, okay, is this going anywhere? Like, do we have set stuff? Just, you know, if you can help me with all that. Yeah.Myra Deng [00:14:07]: It’s a really good question. I feel like we’ve always at the very beginning of the company thought about like, let’s go and try to learn what isn’t working in machine learning today. Whether that’s talking to customers or talking to researchers at other labs, trying to understand both where the frontier is going and where things are really not falling apart today. And then developing a perspective on how we can push the frontier using interpretability methods. And so, you know, even our chief scientist, Tom, spends a lot of time talking to customers and trying to understand what real world problems are and then taking that back and trying to apply the current state of the art to those problems and then seeing where they fall down basically. And then using those failures or those shortcomings to understand what hills to climb when it comes to interpretability research. So like on the fundamental side, for instance, when we have done some work applying SAEs and probes, we’ve encountered, you know, some shortcomings in SAEs that we found a little bit surprising. And so have gone back to the drawing board and done work on that. And then, you know, we’ve done some work on better foundational interpreter models. And a lot of our team’s research is focused on what is the next evolution beyond SAEs, for instance. And then when it comes to like control and design of models, you know, we tried steering with our first API and realized that it still fell short of black box techniques like prompting or fine tuning. And so went back to the drawing board and we’re like, how do we make that not the case and how do we improve it beyond that? And one of our researchers, Ekdeep, who just joined is actually Ekdeep and Atticus are like steering experts and have spent a lot of time trying to figure out like, what is the research that enables us to actually do this in a much more powerful, robust way? So yeah, the answer is like, look at real world problems, try to translate that into a research agenda and then like hill climb on both of those at the same time.Shawn Wang [00:16:04]: Yeah. Mark has the steering CLI demo queued up, which we’re going to go into in a sec. But I always want to double click on when you drop hints, like we found some problems with SAEs. Okay. What are they? You know, and then we can go into the demo. Yeah.Myra Deng [00:16:19]: I mean, I’m curious if you have more thoughts here as well, because you’ve done it in the healthcare domain. But I think like, for instance, when we do things like trying to detect behaviors within models that are harmful or like behaviors that a user might not want to have in their model. So hallucinations, for instance, harmful intent, PII, all of these things. We first tried using SAE probes for a lot of these tasks. So taking the feature activation space from SAEs and then training classifiers on top of that, and then seeing how well we can detect the properties that we might want to detect in model behavior. And we’ve seen in many cases that probes just trained on raw activations seem to perform better than SAE probes, which is a bit surprising if you think that SAEs are actually also capturing the concepts that you would want to capture cleanly and more surgically. And so that is an interesting observation. I don’t think that is like, I’m not down on SAEs at all. I think there are many, many things they’re useful for, but we have definitely run into cases where I think the concept space described by SAEs is not as clean and accurate as we would expect it to be for actual like real world downstream performance metrics.Mark Bissell [00:17:34]: Fair enough. Yeah. It’s the blessing and the curse of unsupervised methods where you get to peek into the AI’s mind. But sometimes you wish that you saw other things when you walked inside there. Although in the PII instance, I think weren’t an SAE based approach actually did prove to be the most generalizable?Myra Deng [00:17:53]: It did work well in the case that we published with Rakuten. And I think a lot of the reasons it worked well was because we had a noisier data set. And so actually the blessing of unsupervised learning is that we actually got to get more meaningful, generalizable signal from SAEs when the data was noisy. But in other cases where we’ve had like good data sets, it hasn’t been the case.Shawn Wang [00:18:14]: And just because you named Rakuten and I don’t know if we’ll get it another chance, like what is the overall, like what is Rakuten’s usage or production usage? Yeah.Myra Deng [00:18:25]: So they are using us to essentially guardrail and inference time monitor their language model usage and their agent usage to detect things like PII so that they don’t route private user information.Myra Deng [00:18:41]: And so that’s, you know, going through all of their user queries every day. And that’s something that we deployed with them a few months ago. And now we are actually exploring very early partnerships, not just with Rakuten, but with other people around how we can help with potentially training and customization use cases as well. Yeah.Shawn Wang [00:19:03]: And for those who don’t know, like it’s Rakuten is like, I think number one or number two e-commerce store in Japan. Yes. Yeah.Mark Bissell [00:19:10]: And I think that use case actually highlights a lot of like what it looks like to deploy things in practice that you don’t always think about when you’re doing sort of research tasks. So when you think about some of the stuff that came up there that’s more complex than your idealized version of a problem, they were encountering things like synthetic to real transfer of methods. So they couldn’t train probes, classifiers, things like that on actual customer data of PII. So what they had to do is use synthetic data sets. And then hope that that transfer is out of domain to real data sets. And so we can evaluate performance on the real data sets, but not train on customer PII. So that right off the bat is like a big challenge. You have multilingual requirements. So this needed to work for both English and Japanese text. Japanese text has all sorts of quirks, including tokenization behaviors that caused lots of bugs that caused us to be pulling our hair out. And then also a lot of tasks you’ll see. You might make simplifying assumptions if you’re sort of treating it as like the easiest version of the problem to just sort of get like general results where maybe you say you’re classifying a sentence to say, does this contain PII? But the need that Rakuten had was token level classification so that you could precisely scrub out the PII. So as we learned more about the problem, you’re sort of speaking about what that looks like in practice. Yeah. A lot of assumptions end up breaking. And that was just one instance where you. A problem that seems simple right off the bat ends up being more complex as you keep diving into it.Vibhu Sapra [00:20:41]: Excellent. One of the things that’s also interesting with Interp is a lot of these methods are very efficient, right? So where you’re just looking at a model’s internals itself compared to a separate like guardrail, LLM as a judge, a separate model. One, you have to host it. Two, there’s like a whole latency. So if you use like a big model, you have a second call. Some of the work around like self detection of hallucination, it’s also deployed for efficiency, right? So if you have someone like Rakuten doing it in production live, you know, that’s just another thing people should consider.Mark Bissell [00:21:12]: Yeah. And something like a probe is super lightweight. Yeah. It’s no extra latency really. Excellent.Shawn Wang [00:21:17]: You have the steering demos lined up. So we were just kind of see what you got. I don’t, I don’t actually know if this is like the latest, latest or like alpha thing.Mark Bissell [00:21:26]: No, this is a pretty hacky demo from from a presentation that someone else on the team recently gave. So this will give a sense for, for technology. So you can see the steering and action. Honestly, I think the biggest thing that this highlights is that as we’ve been growing as a company and taking on kind of more and more ambitious versions of interpretability related problems, a lot of that comes to scaling up in various different forms. And so here you’re going to see steering on a 1 trillion parameter model. This is Kimi K2. And so it’s sort of fun that in addition to the research challenges, there are engineering challenges that we’re now tackling. Cause for any of this to be sort of useful in production, you need to be thinking about what it looks like when you’re using these methods on frontier models as opposed to sort of like toy kind of model organisms. So yeah, this was thrown together hastily, pretty fragile behind the scenes, but I think it’s quite a fun demo. So screen sharing is on. So I’ve got two terminal sessions pulled up here. On the left is a forked version that we have of the Kimi CLI that we’ve got running to point at our custom hosted Kimi model. And then on the right is a set up that will allow us to steer on certain concepts. So I should be able to chat with Kimi over here. Tell it hello. This is running locally. So the CLI is running locally, but the Kimi server is running back to the office. Well, hopefully should be, um, that’s too much to run on that Mac. Yeah. I think it’s, uh, it takes a full, like each 100 node. I think it’s like, you can. You can run it on eight GPUs, eight 100. So, so yeah, Kimi’s running. We can ask it a prompt. It’s got a forked version of our, uh, of the SG line code base that we’ve been working on. So I’m going to tell it, Hey, this SG line code base is slow. I think there’s a bug. Can you try to figure it out? There’s a big code base, so it’ll, it’ll spend some time doing this. And then on the right here, I’m going to initialize in real time. Some steering. Let’s see here.Mark Bissell [00:23:33]: searching for any. Bugs. Feature ID 43205.Shawn Wang [00:23:38]: Yeah.Mark Bissell [00:23:38]: 20, 30, 40. So let me, uh, this is basically a feature that we found that inside Kimi seems to cause it to speak in Gen Z slang. And so on the left, it’s still sort of thinking normally it might take, I don’t know, 15 seconds for this to kick in, but then we’re going to start hopefully seeing him do this code base is massive for real. So we’re going to start. We’re going to start seeing Kimi transition as the steering kicks in from normal Kimi to Gen Z Kimi and both in its chain of thought and its actual outputs.Mark Bissell [00:24:19]: And interestingly, you can see, you know, it’s still able to call tools, uh, and stuff. It’s um, it’s purely sort of it’s it’s demeanor. And there are other features that we found for interesting things like concision. So that’s more of a practical one. You can make it more concise. Um, the types of programs, uh, programming languages that uses, but yeah, as we’re seeing it come in. Pretty good. Outputs.Shawn Wang [00:24:43]: Scheduler code is actually wild.Vibhu Sapra [00:24:46]: Yo, this code is actually insane, bro.Vibhu Sapra [00:24:53]: What’s the process of training in SAE on this, or, you know, how do you label features? I know you guys put out a pretty cool blog post about, um, finding this like autonomous interp. Um, something. Something about how agents for interp is different than like coding agents. I don’t know while this is spewing up, but how, how do we find feature 43, two Oh five. Yeah.Mark Bissell [00:25:15]: So in this case, um, we, our platform that we’ve been building out for a long time now supports all the sort of classic out of the box interp techniques that you might want to have like SAE training, probing things of that kind, I’d say the techniques for like vanilla SAEs are pretty well established now where. You take your model that you’re interpreting, run a whole bunch of data through it, gather activations, and then yeah, pretty straightforward pipeline to train an SAE. There are a lot of different varieties. There’s top KSAEs, batch top KSAEs, um, normal ReLU SAEs. And then once you have your sparse features to your point, assigning labels to them to actually understand that this is a gen Z feature, that’s actually where a lot of the kind of magic happens. Yeah. And the most basic standard technique is look at all of your d input data set examples that cause this feature to fire most highly. And then you can usually pick out a pattern. So for this feature, If I’ve run a diverse enough data set through my model feature 43, two Oh five. Probably tends to fire on all the tokens that sounds like gen Z slang. You know, that’s the, that’s the time of year to be like, Oh, I’m in this, I’m in this Um, and, um, so, you know, you could have a human go through all 43,000 concepts andVibhu Sapra [00:26:34]: And I’ve got to ask the basic question, you know, can we get examples where it hallucinates, pass it through, see what feature activates for hallucinations? Can I just, you know, turn hallucination down?Myra Deng [00:26:51]: Oh, wow. You really predicted a project we’re already working on right now, which is detecting hallucinations using interpretability techniques. And this is interesting because hallucinations is something that’s very hard to detect. And it’s like a kind of a hairy problem and something that black box methods really struggle with. Whereas like Gen Z, you could always train a simple classifier to detect that hallucinations is harder. But we’ve seen that models internally have some... Awareness of like uncertainty or some sort of like user pleasing behavior that leads to hallucinatory behavior. And so, yeah, we have a project that’s trying to detect that accurately. And then also working on mitigating the hallucinatory behavior in the model itself as well.Shawn Wang [00:27:39]: Yeah, I would say most people are still at the level of like, oh, I would just turn temperature to zero and that turns off hallucination. And I’m like, well, that’s a fundamental misunderstanding of how this works. Yeah.Mark Bissell [00:27:51]: Although, so part of what I like about that question is you, there are SAE based approaches that might like help you get at that. But oftentimes the beauty of SAEs and like we said, the curse is that they’re unsupervised. So when you have a behavior that you deliberately would like to remove, and that’s more of like a supervised task, often it is better to use something like probes and specifically target the thing that you’re interested in reducing as opposed to sort of like hoping that when you fragment the latent space, one of the vectors that pops out.Vibhu Sapra [00:28:20]: And as much as we’re training an autoencoder to be sparse, we’re not like for sure certain that, you know, we will get something that just correlates to hallucination. You’ll probably split that up into 20 other things and who knows what they’ll be.Mark Bissell [00:28:36]: Of course. Right. Yeah. So there’s no sort of problems with like feature splitting and feature absorption. And then there’s the off target effects, right? Ideally, you would want to be very precise where if you reduce the hallucination feature, suddenly maybe your model can’t write. Creatively anymore. And maybe you don’t like that, but you want to still stop it from hallucinating facts and figures.Shawn Wang [00:28:55]: Good. So Vibhu has a paper to recommend there that we’ll put in the show notes. But yeah, I mean, I guess just because your demo is done, any any other things that you want to highlight or any other interesting features you want to show?Mark Bissell [00:29:07]: I don’t think so. Yeah. Like I said, this is a pretty small snippet. I think the main sort of point here that I think is exciting is that there’s not a whole lot of inter being applied to models quite at this scale. You know, Anthropic certainly has some some. Research and yeah, other other teams as well. But it’s it’s nice to see these techniques, you know, being put into practice. I think not that long ago, the idea of real time steering of a trillion parameter model would have sounded.Shawn Wang [00:29:33]: Yeah. The fact that it’s real time, like you started the thing and then you edited the steering vector.Vibhu Sapra [00:29:38]: I think it’s it’s an interesting one TBD of what the actual like production use case would be on that, like the real time editing. It’s like that’s the fun part of the demo, right? You can kind of see how this could be served behind an API, right? Like, yes, you’re you only have so many knobs and you can just tweak it a bit more. And I don’t know how it plays in. Like people haven’t done that much with like, how does this work with or without prompting? Right. How does this work with fine tuning? Like, there’s a whole hype of continual learning, right? So there’s just so much to see. Like, is this another parameter? Like, is it like parameter? We just kind of leave it as a default. We don’t use it. So I don’t know. Maybe someone here wants to put out a guide on like how to use this with prompting when to do what?Mark Bissell [00:30:18]: Oh, well, I have a paper recommendation. I think you would love from Act Deep on our team, who is an amazing researcher, just can’t say enough amazing things about Act Deep. But he actually has a paper that as well as some others from the team and elsewhere that go into the essentially equivalence of activation steering and in context learning and how those are from a he thinks of everything in a cognitive neuroscience Bayesian framework, but basically how you can precisely show how. Prompting in context, learning and steering exhibit similar behaviors and even like get quantitative about the like magnitude of steering you would need to do to induce a certain amount of behavior similar to certain prompting, even for things like jailbreaks and stuff. It’s a really cool paper. Are you saying steering is less powerful than prompting? More like you can almost write a formula that tells you how to convert between the two of them.Myra Deng [00:31:20]: And so like formally equivalent actually in the in the limit. Right.Mark Bissell [00:31:24]: So like one case study of this is for jailbreaks there. I don’t know. Have you seen the stuff where you can do like many shot jailbreaking? You like flood the context with examples of the behavior. And the topic put out that paper.Shawn Wang [00:31:38]: A lot of people were like, yeah, we’ve been doing this, guys.Mark Bissell [00:31:40]: Like, yeah, what’s in this in context learning and activation steering equivalence paper is you can like predict the number. Number of examples that you will need to put in there in order to jailbreak the model. That’s cool. By doing steering experiments and using this sort of like equivalence mapping. That’s cool. That’s really cool. It’s very neat. Yeah.Shawn Wang [00:32:02]: I was going to say, like, you know, I can like back rationalize that this makes sense because, you know, what context is, is basically just, you know, it updates the KV cache kind of and like and then every next token inference is still like, you know, the sheer sum of everything all the way. It’s plus all the context. It’s up to date. And you could, I guess, theoretically steer that with you probably replace that with your steering. The only problem is steering typically is on one layer, maybe three layers like like you did. So it’s like not exactly equivalent.Mark Bissell [00:32:33]: Right, right. There’s sort of you need to get precise about, yeah, like how you sort of define steering and like what how you’re modeling the setup. But yeah, I’ve got the paper pulled up here. Belief dynamics reveal the dual nature. Yeah. The title is Belief Dynamics Reveal the Dual Nature of Incompetence. And it’s an exhibition of the practical context learning and activation steering. So Eric Bigelow, Dan Urgraft on the who are doing fellowships at Goodfire, Ekt Deep’s the final author there.Myra Deng [00:32:59]: I think actually to your question of like, what is the production use case of steering? I think maybe if you just think like one level beyond steering as it is today. Like imagine if you could adapt your model to be, you know, an expert legal reasoner. Like in almost real time, like very quickly. efficiently using human feedback or using like your semantic understanding of what the model knows and where it knows that behavior. I think that while it’s not clear what the product is at the end of the day, it’s clearly very valuable. Thinking about like what’s the next interface for model customization and adaptation is a really interesting problem for us. Like we have heard a lot of people actually interested in fine-tuning an RL for open weight models in production. And so people are using things like Tinker or kind of like open source libraries to do that, but it’s still very difficult to get models fine-tuned and RL’d for exactly what you want them to do unless you’re an expert at model training. And so that’s like something we’reShawn Wang [00:34:06]: looking into. Yeah. I never thought so. Tinker from Thinking Machines famously uses rank one LoRa. Is that basically the same as steering? Like, you know, what’s the comparison there?Mark Bissell [00:34:19]: Well, so in that case, you are still applying updates to the parameters, right?Shawn Wang [00:34:25]: Yeah. You’re not touching a base model. You’re touching an adapter. It’s kind of, yeah.Mark Bissell [00:34:30]: Right. But I guess it still is like more in parameter space then. I guess it’s maybe like, are you modifying the pipes or are you modifying the water flowing through the pipes to get what you’re after? Yeah. Just maybe one way.Mark Bissell [00:34:44]: I like that analogy. That’s my mental map of it at least, but it gets at this idea of model design and intentional design, which is something that we’re, that we’re very focused on. And just the fact that like, I hope that we look back at how we’re currently training models and post-training models and just think what a primitive way of doing that right now. Like there’s no intentionalityShawn Wang [00:35:06]: really in... It’s just data, right? The only thing in control is what data we feed in.Mark Bissell [00:35:11]: So, so Dan from Goodfire likes to use this analogy of, you know, he has a couple of young kids and he talks about like, what if I could only teach my kids how to be good people by giving them cookies or like, you know, giving them a slap on the wrist if they do something wrong, like not telling them why it was wrong or like what they should have done differently or something like that. Just figure it out. Right. Exactly. So that’s RL. Yeah. Right. And, and, you know, it’s sample inefficient. There’s, you know, what do they say? It’s like slurping feedback. It’s like, slurping supervision. Right. And so you’d like to get to the point where you can have experts giving feedback to their models that are, uh, internalized and, and, you know, steering is an inference time way of sort of getting that idea. But ideally you’re moving to a world whereVibhu Sapra [00:36:04]: it is much more intentional design in perpetuity for these models. Okay. This is one of the questions we asked Emmanuel from Anthropic on the podcast a few months ago. Basically the question, was you’re at a research lab that does model training, foundation models, and you’re on an interp team. How does it tie back? Right? Like, does this, do ideas come from the pre-training team? Do they go back? Um, you know, so for those interested, you can, you can watch that. There wasn’t too much of a connect there, but it’s still something, you know, it’s something they want toMark Bissell [00:36:33]: push for down the line. It can be useful for all of the above. Like there are certainly post-hocVibhu Sapra [00:36:39]: use cases where it doesn’t need to touch that. I think the other thing a lot of people forget is this stuff isn’t too computationally expensive, right? Like I would say, if you’re interested in getting into research, MechInterp is one of the most approachable fields, right? A lot of this train an essay, train a probe, this stuff, like the budget for this one, there’s already a lot done. There’s a lot of open source work. You guys have done some too. Um, you know,Shawn Wang [00:37:04]: There’s like notebooks from the Gemini team for Neil Nanda or like, this is how you do it. Just step through the notebook.Vibhu Sapra [00:37:09]: Even if you’re like, not even technical with any of this, you can still make like progress. There, you can look at different activations, but, uh, if you do want to get into training, you know, training this stuff, correct me if I’m wrong is like in the thousands of dollars, not even like, it’s not that high scale. And then same with like, you know, applying it, doing it for post-training or all this stuff is fairly cheap in scale of, okay. I want to get into like model training. I don’t have compute for like, you know, pre-training stuff. So it’s, it’s a very nice field to get into. And also there’s a lot of like open questions, right? Um, some of them have to go with, okay, I want a product. I want to solve this. Like there’s also just a lot of open-ended stuff that people could work on. That’s interesting. Right. I don’t know if you guys have any calls for like, what’s open questions, what’s open work that you either open collaboration with, or like, you’d just like to see solved or just, you know, for people listening that want to get into McInturk because people always talk about it. What are, what are the things they should check out? Start, of course, you know, join you guys as well. I’m sure you’re hiring.Myra Deng [00:38:09]: There’s a paper, I think from, was it Lee, uh, Sharky? It’s open problems and, uh, it’s, it’s a bit of interpretability, which I recommend everyone who’s interested in the field. Read. I’m just like a really comprehensive overview of what are the things that experts in the field think are the most important problems to be solved. I also think to your point, it’s been really, really inspiring to see, I think a lot of young people getting interested in interpretability, actually not just young people also like scientists to have been, you know, experts in physics for many years and in biology or things like this, um, transitioning into interp, because the barrier of, of what’s now interp. So it’s really cool to see a number to entry is, you know, in some ways low and there’s a lot of information out there and ways to get started. There’s this anecdote of like professors at universities saying that all of a sudden every incoming PhD student wants to study interpretability, which was not the case a few years ago. So it just goes to show how, I guess, like exciting the field is, how fast it’s moving, how quick it is to get started and things like that.Mark Bissell [00:39:10]: And also just a very welcoming community. You know, there’s an open source McInturk Slack channel. There are people are always posting questions and just folks in the space are always responsive if you ask things on various forums and stuff. But yeah, the open paper, open problems paper is a really good one.Myra Deng [00:39:28]: For other people who want to get started, I think, you know, MATS is a great program. What’s the acronym for? Machine Learning and Alignment Theory Scholars? It’s like the...Vibhu Sapra [00:39:40]: Normally summer internship style.Myra Deng [00:39:42]: Yeah, but they’ve been doing it year round now. And actually a lot of our full-time staff have come through that program or gone through that program. And it’s great for anyone who is transitioning into interpretability. There’s a couple other fellows programs. We do one as well as Anthropic. And so those are great places to get started if anyone is interested.Mark Bissell [00:40:03]: Also, I think been seen as a research field for a very long time. But I think engineering... I think engineers are sorely wanted for interpretability as well, especially at Goodfire, but elsewhere, as it does scale up.Shawn Wang [00:40:18]: I should mention that Lee actually works with you guys, right? And in the London office and I’m adding our first ever McInturk track at AI Europe because I see this industry applications now emerging. And I’m pretty excited to, you know, help push that along. Yeah, I was looking forward to that. It’ll effectively be the first industry McInturk conference. Yeah. I’m so glad you added that. You know, it’s still a little bit of a bet. It’s not that widespread, but I can definitely see this is the time to really get into it. We want to be early on things.Mark Bissell [00:40:51]: For sure. And I think the field understands this, right? So at ICML, I think the title of the McInturk workshop this year was actionable interpretability. And there was a lot of discussion around bringing it to various domains. Everyone’s adding pragmatic, actionable, whatever.Shawn Wang [00:41:10]: It’s like, okay, well, we weren’t actionable before, I guess. I don’t know.Vibhu Sapra [00:41:13]: And I mean, like, just, you know, being in Europe, you see the Interp room. One, like old school conferences, like, I think they had a very tiny room till they got lucky and they got it doubled. But there’s definitely a lot of interest, a lot of niche research. So you see a lot of research coming out of universities, students. We covered the paper last week. It’s like two unknown authors, not many citations. But, you know, you can make a lot of meaningful work there. Yeah. Yeah. Yeah.Shawn Wang [00:41:39]: Yeah. I think people haven’t really mentioned this yet. It’s just Interp for code. I think it’s like an abnormally important field. We haven’t mentioned this yet. The conspiracy theory last two years ago was when the first SAE work came out of Anthropic was they would do like, oh, we just used SAEs to turn the bad code vector down and then turn up the good code. And I think like, isn’t that the dream? Like, you know, like, but basically, I guess maybe, why is it funny? Like, it’s... If it was realistic, it would not be funny. It would be like, no, actually, we should do this. But it’s funny because we know there’s like, we feel there’s some limitations to what steering can do. And I think a lot of the public image of steering is like the Gen Z stuff. Like, oh, you can make it really love the Golden Gate Bridge, or you can make it speak like Gen Z. To like be a legal reasoner seems like a huge stretch. Yeah. And I don’t know if that will get there this way. Yeah.Myra Deng [00:42:36]: I think, um, I will say we are announcing. Something very soon that I will not speak too much about. Um, but I think, yeah, this is like what we’ve run into again and again is like, we, we don’t want to be in the world where steering is only useful for like stylistic things. That’s definitely not, not what we’re aiming for. But I think the types of interventions that you need to do to get to things like legal reasoning, um, are much more sophisticated and require breakthroughs in, in learning algorithms. And that’s, um...Shawn Wang [00:43:07]: And is this an emergent property of scale as well?Myra Deng [00:43:10]: I think so. Yeah. I mean, I think scale definitely helps. I think scale allows you to learn a lot of information and, and reduce noise across, you know, large amounts of data. But I also think we think that there’s ways to do things much more effectively, um, even, even at scale. So like actually learning exactly what you want from the data and not learning things that you do that you don’t want exhibited in the data. So we’re not like anti-scale, but we are also realizing that scale is not going to get us anywhere. It’s not going to get us to the type of AI development that we want to be at in, in the future as these models get more powerful and get deployed in all these sorts of like mission critical contexts. Current life cycle of training and deploying and evaluations is, is to us like deeply broken and has opportunities to, to improve. So, um, more to come on that very, very soon.Mark Bissell [00:44:02]: And I think that that’s a use basically, or maybe just like a proof point that these concepts do exist. Like if you can manipulate them in the precise best way, you can get the ideal combination of them that you desire. And steering is maybe the most coarse grained sort of peek at what that looks like. But I think it’s evocative of what you could do if you had total surgical control over every concept, every parameter. Yeah, exactly.Myra Deng [00:44:30]: There were like bad code features. I’ve got it pulled up.Vibhu Sapra [00:44:33]: Yeah. Just coincidentally, as you guys are talking.Shawn Wang [00:44:35]: This is like, this is exactly.Vibhu Sapra [00:44:38]: There’s like specifically a code error feature that activates and they show, you know, it’s not, it’s not typo detection. It’s like, it’s, it’s typos in code. It’s not typical typos. And, you know, you can, you can see it clearly activates where there’s something wrong in code. And they have like malicious code, code error. They have a whole bunch of sub, you know, sub broken down little grain features. Yeah.Shawn Wang [00:45:02]: Yeah. So, so the, the rough intuition for me, the, why I talked about post-training was that, well, you just, you know, have a few different rollouts with all these things turned off and on and whatever. And then, you know, you can, that’s, that’s synthetic data you can kind of post-train on. Yeah.Vibhu Sapra [00:45:13]: And I think we make it sound easier than it is just saying, you know, they do the real hard work.Myra Deng [00:45:19]: I mean, you guys, you guys have the right idea. Exactly. Yeah. We replicated a lot of these features in, in our Lama models as well. I remember there was like.Vibhu Sapra [00:45:26]: And I think a lot of this stuff is open, right? Like, yeah, you guys opened yours. DeepMind has opened a lot of essays on Gemma. Even Anthropic has opened a lot of this. There’s, there’s a lot of resources that, you know, we can probably share of people that want to get involved.Shawn Wang [00:45:41]: Yeah. And special shout out to like Neuronpedia as well. Yes. Like, yeah, amazing piece of work to visualize those things.Myra Deng [00:45:49]: Yeah, exactly.Shawn Wang [00:45:50]: I guess I wanted to pivot a little bit on, onto the healthcare side, because I think that’s a big use case for you guys. We haven’t really talked about it yet. This is a bit of a crossover for me because we are, we are, we do have a separate science pod that we’re starting up for AI, for AI for science, just because like, it’s such a huge investment category and also I’m like less qualified to do it, but we actually have bio PhDs to cover that, which is great, but I need to just kind of recover, recap your work, maybe on the evil two stuff, but then, and then building forward.Mark Bissell [00:46:17]: Yeah, for sure. And maybe to frame up the conversation, I think another kind of interesting just lens on interpretability in general is a lot of the techniques that were described. are ways to solve the AI human interface problem. And it’s sort of like bidirectional communication is the goal there. So what we’ve been talking about with intentional design of models and, you know, steering, but also more advanced techniques is having humans impart our desires and control into models and over models. And the reverse is also very interesting, especially as you get to superhuman models, whether that’s narrow superintelligence, like these scientific models that work on genomics, data, medical imaging, things like that. But down the line, you know, superintelligence of other forms as well. What knowledge can the AIs teach us as sort of that, that the other direction in that? And so some of our life science work to date has been getting at exactly that question, which is, well, some of it does look like debugging these various life sciences models, understanding if they’re actually performing well, on tasks, or if they’re picking up on spurious correlations, for instance, genomics models, you would like to know whether they are sort of focusing on the biologically relevant things that you care about, or if it’s using some simpler correlate, like the ancestry of the person that it’s looking at. But then also in the instances where they are superhuman, and maybe they are understanding elements of the human genome that we don’t have names for or specific, you know, yeah, discoveries that they’ve made that that we don’t know about, that’s, that’s a big goal. And so we’re already seeing that, right, we are partnered with organizations like Mayo Clinic, leading research health system in the United States, our Institute, as well as a startup called Prima Menta, which focuses on neurodegenerative disease. And in our partnership with them, we’ve used foundation models, they’ve been training and applied our interpretability techniques to find novel biomarkers for Alzheimer’s disease. So I think this is just the tip of the iceberg. But it’s, that’s like a flavor of some of the things that we’re working on.Shawn Wang [00:48:36]: Yeah, I think that’s really fantastic. Obviously, we did the Chad Zuckerberg pod last year as well. And like, there’s a plethora of these models coming out, because there’s so much potential and research. And it’s like, very interesting how it’s basically the same as language models, but just with a different underlying data set. But it’s like, it’s the same exact techniques. Like, there’s no change, basically.Mark Bissell [00:48:59]: Yeah. Well, and even in like other domains, right? Like, you know, robotics, I know, like a lot of the companies just use Gemma as like the like backbone, and then they like make it into a VLA that like takes these actions. It’s, it’s, it’s transformers all the way down. So yeah.Vibhu Sapra [00:49:15]: Like we have Med Gemma now, right? Like this week, even there was Med Gemma 1.5. And they’re training it on this stuff, like 3d scans, medical domain knowledge, and all that stuff, too. So there’s a push from both sides. But I think the thing that, you know, one of the things about McInturpp is like, you’re a little bit more cautious in some domains, right? So healthcare, mainly being one, like guardrails, understanding, you know, we’re more risk adverse to something going wrong there. So even just from a basic understanding, like, if we’re trusting these systems to make claims, we want to know why and what’s going on.Myra Deng [00:49:51]: Yeah, I think there’s totally a kind of like deployment bottleneck to actually using. foundation models for real patient usage or things like that. Like, say you’re using a model for rare disease prediction, you probably want some explanation as to why your model predicted a certain outcome, and an interpretable explanation at that. So that’s definitely a use case. But I also think like, being able to extract scientific information that no human knows to accelerate drug discovery and disease treatment and things like that actually is a really, really big unlock for science, like scientific discovery. And you’ve seen a lot of startups, like say that they’re going to accelerate scientific discovery. And I feel like we actually are doing that through our interp techniques. And kind of like, almost by accident, like, I think we got reached out to very, very early on from these healthcare institutions. And none of us had healthcare.Shawn Wang [00:50:49]: How did they even hear of you? A podcast.Myra Deng [00:50:51]: Oh, okay. Yeah, podcast.Vibhu Sapra [00:50:53]: Okay, well, now’s that time, you know.Myra Deng [00:50:55]: Everyone can call us.Shawn Wang [00:50:56]: Podcasts are the most important thing. Everyone should listen to podcasts.Myra Deng [00:50:59]: Yeah, they reached out. They were like, you know, we have these really smart models that we’ve trained, and we want to know what they’re doing. And we were like, really early that time, like three months old, and it was a few of us. And we were like, oh, my God, we’ve never used these models. Let’s figure it out. But it’s also like, great proof that interp techniques scale pretty well across domains. We didn’t really have to learn too much about.Shawn Wang [00:51:21]: Interp is a machine learning technique, machine learning skills everywhere, right? Yeah. And it’s obviously, it’s just like a general insight. Yeah. Probably to finance too, I think, which would be fun for our history. I don’t know if you have anything to say there.Mark Bissell [00:51:34]: Yeah, well, just across the science. Like, we’ve also done work on material science. Yeah, it really runs the gamut.Vibhu Sapra [00:51:40]: Yeah. Awesome. And, you know, for those that should reach out, like, you’re obviously experts in this, but like, is there a call out for people that you’re looking to partner with, design partners, people to use your stuff outside of just, you know, the general developer that wants to. Plug and play steering stuff, like on the research side more so, like, are there ideal design partners, customers, stuff like that?Myra Deng [00:52:03]: Yeah, I can talk about maybe non-life sciences, and then I’m curious to hear from you on the life sciences side. But we’re looking for design partners across many domains, language, anyone who’s customizing language models or trying to push the frontier of code or reasoning models is really interesting to us. And then also interested in the frontier of modeling. There’s a lot of models that work in, like, pixel space, as we call it. So if you’re doing world models, video models, even robotics, where there’s not a very clean natural language interface to interact with, I think we think that Interp can really help and are looking for a few partners in that space.Shawn Wang [00:52:43]: Just because you mentioned the keyword world models, is that a big part of your thinking? Do you have a definition that I can use? Because everyone’s asking me about it.Myra Deng [00:52:53]: About world models?Shawn Wang [00:52:54]: There’s quite a few definitions, let’s say.Myra Deng [00:52:56]: I don’t feel equipped to be an expert on world model definitions, but the reason we’re interested in them is because they give you, like, you know, with language models, when you get features, you still have to do auto Interp and things like that to actually get an understanding of what this concept is. But in image and video and world, it’s like extremely easy to grok what the concept is because you can see it and you can visualize it. And this makes the feedback. It makes the feedback cycle extremely fast for us and also for things like, I don’t know, if you think about probes in language model context and then take it to world models, like, what if you wanted to detect harmful actors in world model scenes? Like, you can’t actually, like, go and label all of that data feasibly, but maybe you could synthetically generate, you know, I don’t know, world, like, harmful actor data using SAE feature activations or whatever, and then actually train a probe that was able to detect. That much more scalably. So I just think, like, video and image and world has always been something we’ve explored and are continuing to explore. Mark’s demo was probably the first moment we really, like, we’re like, oh, wow, like, this is really gonna, this could really, like, change the world. The steering demo? Yeah, no, the image demo. The diffusion one. Yeah, yeah, exactly. Yeah.Shawn Wang [00:54:18]: We should probably show that. And you demoed it at World’s Fair, so we can link that.Myra Deng [00:54:23]: Nice, yeah. Yeah.Vibhu Sapra [00:54:24]: You can play with it, right? Yes. Yeah, it’s still up.Mark Bissell [00:54:26]: Paint.goodfair.ai. Yeah. Yeah.Shawn Wang [00:54:28]: I think for me, one way in which I think about world models is just like this, like, having this consistent model of the world where everything that you generate operates within the rules of that world. And imagine it would be a bigger deal for science or, like, math or anything that where, like, you have verifiable rules. Whereas, I guess, in natural language, maybe there’s less rules. And so it’s not that important. Yeah.Mark Bissell [00:54:53]: And which makes the debugging of the model’s internal representations or its internal world model, to the extent you can make that legible and explicit and have control over that, I think it makes it all the more important. Because in language, it’s sort of a fuzzy enough domain that if its world model isn’t fully like ours, it can still sort of, like, pass the Turing test, so to speak. But I know there have been papers that have looked at, like, even if you train certain astrophysics models, it does not learn. Like, the same way that you can, you know, have a model do well for modular arithmetic, but it doesn’t really, like, learn how we think of modular arithmetic. It learns some crazy heuristic that is, like, essentially functionally equivalent. But it’s probably not the sort of Grok solution that you would hope for. It’s how an alien would do it. Right. Right. Exactly.Shawn Wang [00:55:45]: But no, no, I think there’s probably, I think, a function of our learning being bad rather than the, well, that approach probably not being. Because it’s how we humans learn. Yeah, right.Mark Bissell [00:55:56]: Well, it’s just, it’s the problem of induction, right? All of ML is based on induction. And it’s impossible to say, I have a physics model. You might have a physics model that works all the time, except when there is a character wearing a blue shirt and green shoes. And, like, you can’t disprove that that’s the case unless you test every particular situation your model might be in. Yeah. So we know that the laws of physics apply no matter. Where you are, what scenario it is. But from a model’s perspective, maybe something that’s out of distribution. It just never needed to learn that the same laws of physics apply there. Yeah.Shawn Wang [00:56:30]: You were very excited because I read Ted Chiang over the holidays and I was very inspired by this short story called Understand, which apparently is, like, pretty old. You must be familiar with it. To me, it was like, it’s this fictional story. It’s like the inverse of Flowers for Algernon, where you had someone, like, get really smart, but then also try to outsmart the tester. And the story just read, like, the chain of thought of a superintelligence, right? Where they’re like, oh, I realize I’m being tested. Therefore, and then, okay, what’s the consequence of being tested? Oh, they’re testing me. And if I score well, they will use me for things that I don’t want to do. Therefore, I will score badly. And, like, but not too badly that they will raise alarms. So model sandbagging is a thing that people have explored. But I just think, like, Ted Chiang’s work just in general seems to be something that inspires you. I just wanted to prompt you to talk about it.Mark Bissell [00:57:22]: I think, so Ted Chiang has two, is a sci-fi author who writes amazing short stories. His other claim to fame is Stories of Our Lives, which became the movie Arrival. Exactly, yeah. So two books of short stories that I’m aware of. He also actually has a great just online blog post. I think he’s the one who coined the term of LLMs as, like, a blurry JPEG of the internet. I should fact check that, but it’s a good post. But I think almost every one of his short stories has some lesson to bear. I’m thinking about AI and thinking about AI research. So, you know, you’ve been talking about alien intelligence, right, in this AI human communication translation problem. That’s, you know, exactly sort of what’s going on in Arrival and Story of Your Life. And just the fact that other beings will think and operate and communicate in ways that are not just challenging for us to understand, but just fundamentally different in ways that we might not even be able to expect. And then the one that’s just. Super relevant for interpretability is the other short book of short stories he has is called Exhalation. And that is literally about a robot doing interpretability on its own mind. Oh, OK. So I just think that that, you know, you don’t even have to squint to make the analogies there.Shawn Wang [00:58:41]: Well, I actually take Exhalation as a discussion about entropy and order. But yes, there’s a scene in Exhalation where basically everyone is a robot. So they. The guy realizes he can set up a mirror to work on the back of his own head and then starts doing operations like that and looking in the mirror and doing this. Yeah.Mark Bissell [00:59:00]: And I think Ted Chiang has written about like the inspiration for that story. It was like half inspired by some of the things he had been doing on entropy. There’s apparently some other short story that is similar where a character goes to the doctor and opens up his chest and there’s like a like a ticker tape going along. It’s like he basically realizes he’s like a Turing machine. And I don’t know. I. Think especially as it comes to using agents for interp. That story always sticks in my mind.Myra Deng [00:59:27]: I find the brain surgery or like surgery analogies a little bit, a little bit morbid, but it is very apt. And when we talk to a lot of computational neuroscientists, they moved to interp because they were like, look, we have unfettered access to this artificial intelligent mind. It’s so much. You have access to everything. You can run as many ablations experiments as you want. It’s an. Amazing bed for science. And, you know, human brains, obviously, we can’t just go and do whatever we want to them. And I think it is really just like a moment in time where we have intelligent systems that can really like do things better than humans in many ways. And it’s time, I think, for us to do the science on it.Shawn Wang [01:00:14]: I’ll ask a brief like safety question. You know, McInturk was kind of born out of the alignment and safety conversation. Safety is on your website. It’s not like something that you, you like de-prioritize, but like there’s like a sort of very militant safety arm that like wants to blow up data centers and like stop AI and, and then there’s this like sort of middle ground and like, is, is this like a conversation in your part of the world? Do you go up to Berkeley and Lighthaven and like talk to those guys or are they like, you know, there’s like a brief like civil war going on or no?Myra Deng [01:00:45]: I think, I think a good amount of us have spent some time in Berkeley. And then there are researchers there that we really. Admire and respect. I think for us, it’s like, we have a very grounded view of alignment and, and safety in that we want to make sure that we can build models that do what we want them to do and that we have scalable oversight into what these models are doing. And we think that that is the key to a lot of these like technical alignment challenges. And I think that is our opinion. That’s our research direction. We of course are going to do. Safety related research to make sure that our techniques also work on, you know, things like reward hacking and, and other like more concrete safety issues that we’ve seen in the wild, but we want to be kind of like grounded in solving the technical challenges we see to having humans be humans play a big role in, in the deployment of, of these super intelligent agents of the future.Mark Bissell [01:01:47]: Yeah, I’ve, I’ve found the community to actually be remarkably cohesive, whether it’s. Talking about academia or the interpretability work being done at the frontier labs or some of the independent programs like maths and stuff. I think we’re all shooting for the same goal. I don’t know that there’s anyone who doesn’t want our understanding of models to increase. I, I think everyone, regardless of where they’re coming from or the use cases that they’re thinking, whether it’s alignment as the premier thing they’re focused on or someone who’s coming in purely from the angle of scientific discovery, I think we would all hope that models can be. More reliably and robustly controlled and understood. It seems like a pretty unambiguous goal.Shawn Wang [01:02:28]: I’ll maybe phrase it in terms of like, there’s maybe like a U curve of, of this, where like, if you’re extremely doomer, you don’t want any research whatsoever. If you’re like mildly doomer, you’re like, okay, there’s this like high agency doomer is like, well, the default path is we’re all dead, but like we can do something about it. Whereas there’s, there’s other people who are like, no, just like, don’t ever do anything. You know? Yeah.Vibhu Sapra [01:02:50]: Yeah. There’s also the other side, like there is the super alignment, like people that are like, okay, weak to strong generalization, we’re going to get there. We’re going to have models smarter than us and use those to train even smarter models. How do we do that safely? That’s, you know, there’s the camp there too. That’s trying to solve it, but yeah, there’s, there’s a lot of doomers too.Mark Bissell [01:03:12]: When I, and I think there’s a lot to be learned from taking a very, um, like even regardless of the problem. That you’re applying this to also just like the notion of like scalable oversight as a method of saying, let’s take super intelligent or, or current frontier models and help use them to understand other models is another case where I think it’s just like a good lesson that everyone is aligned on of ideally you are setting up your research so that as super intelligence arrives, that is a tailwind. That’s also bolstering our ability to like understand the models. Cause otherwise you’re fighting. Losing battle. If it’s like the systems are getting more and more capable and our methods are sort of linearly growing at like human pace. Yeah.Shawn Wang [01:03:58]: Yeah. Uh, Viva did call out something like, you know, I, I do think a consistent part of the Mac interp field is consistently strong to weak, meaning that we, we train weaker models to understand strong models, something like that. Um, or maybe I got it the other way around the other way. Weak. The other way around. Yeah. Yeah. The question that Ilya and Janlaika posed was, well, is that going to scale? Because eventually these are going to be. Stronger than us. Right. So I don’t know if you have a perspective on that because I, that is something I still haven’t got over even after seeing that.Vibhu Sapra [01:04:27]: There’s a good paper from open AI, but it’s somewhat old. I think it’s like 23, 24. It’s literally weak to strong generalization. Yeah. But the thing is that most of opening a high super alignment team has, they’re gone. They’re gone.Mark Bissell [01:04:39]: But like, I think the idea, the idea is there’s no more. They’re so back.Shawn Wang [01:04:44]: think there’s some new blog posts coming out. I know. I did just, you know, check the thinking machines, uh, website. Let’s see who’s back. There’s more kind of thing, you know, you don’t want to be like, we too strong seemed like a very different direction. And when, when it first came out, I was like, oh my God, this is like, this is what we have to do. Uh, and like, it may be completely different than everything, all the techniques that we have today. Yeah.Mark Bissell [01:05:06]: My understanding of that is it’s, that’s more like weak to strong when you, when you trust the weak model and you’re uncertain whether you can trust the strong model that’s, that’s being developed. I’m sort of speaking out of my depth on some of these topics. Yeah. But I think right now we’re in a regime where even the strong models we, uh, trust as reasonably aligned. And so they can be good co-scientists on a lot of the problems that we’ve been, we’ve been tackling, which is a nice, a nice state to be in. Hmm. Yeah.Shawn Wang [01:05:35]: Any last thoughts, close action?Mark Bissell [01:05:38]: I don’t think so. As you mentioned, actively hiring MLEs, research scientists, um, you can check out the careers page at good fire. Um, where are you guys based?Myra Deng [01:05:47]: San Francisco. We’re in, um, Levi’s Plaza. Like by court tower, that’s where our office is. So come hang out. Um, we’re also looking for design partners across, um, people working in, in reasoning models, um, world models, robotics, and then also of course, people who are working on building super intelligent science models or looking at drug discovery or disease treatment. We would love to partner as well. Yeah.Shawn Wang [01:06:13]: Maybe the way I’ll phrase it is like, you know, maybe you have a use case where LLMs are almost good enough, but you need one. Maybe you have a magical knob to tune so that it is good enough that you guys make the knob. Yeah.Mark Bissell [01:06:26]: Yeah. Or foundation models, uh, in, in other domains as well. The, the, some of those are the, um, especially opaque ones because you can’t, you can’t chat with them. So what do you, what do you do if you can’t chat with them? Oh, well, like thinking about like a genomics model or material science model. So like, uh, yeah, they label a narrow foundation. Yeah. They predict.Shawn Wang [01:06:44]: Yeah. Got it. Good.Vibhu Sapra [01:06:45]: I was gonna say, I thought the diffusion work you guys did early was pretty, you know, pretty fun. Like you could see it directly. Applied to images, but we don’t see as much interp in diffusion or images, right?Shawn Wang [01:06:55]: Like I see, you know, it’s gonna be huge. Like, look at this video models. They’re so expensive to produce. And like, I mean, basically a mid journey S ref is kind of a feature, right? The what? Mid journey S ref. Oh, like the, the, the string of numbers. Right. Right. Right. Yeah. The style reference, I guess. Yeah.Mark Bissell [01:07:12]: No, I, I mean, I think we’re starting to see more of it and I’ll say like the, the research preview of our diffusion model, kind of like a creative use case in the steering demo you saw. I, I think of those much more as, as, as demos than, um, a lot of the sort of core platform features that, that we’re working with partners are unfortunately sort of under NDA and less demoable, but I will, you know, hope that you’re gonna see inter pervading a lot of what gets done, even if it is behind the scenes like that. So some of the, yeah, some of the public facing demos might not always be representative of like the, it’s, it’s just the tip of the iceberg, I guess, is one way to put it. Okay. Excellent. Thanks for coming on. Thanks for having us. Thanks for having us. This is a great time. Get full access to Latent.Space at <a href="https://www.latent.space/subscribe?utm_medium=podcast&utm_campaign=CTA_4">www.latent.space/subscribe</a>

February 5, 2026•1:08:01

🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White

FULL

Editor’s note: Welcome to our new AI for Science pod, with your new hosts RJ and Brandon! See the writeup on Latent.Space for more details on why we’re launching 2 new pods this year. RJ Honicky is a co-founder and CTO at MiraOmics (https://miraomics.bio/), building AI models and services for single cell, spatial transcriptomics and pathology slide analysis. Brandon Anderson builds AI systems for RNA drug discovery at Atomic AI (https://atomic.ai). Anything said on this podcast is his personal take — not Atomic’s. —- From building molecular dynamics simulations at the University of Washington to red-teaming GPT-4 for chemistry applications and co-founding Future House (a focused research organization) and Edison Scientific (a venture-backed startup automating science at scale)—Andrew White has spent the last five years living through the full arc of AI's transformation of scientific discovery, from ChemCrow (the first Chemistry LLM agent) triggering White House briefings and three-letter agency meetings, to shipping Cosmos, an end-to-end autonomous research system that generates hypotheses, runs experiments, analyzes data, and updates its world model to accelerate the scientific method itself. The ChemCrow story: GPT-4 + React + cloud lab automation, released March 2023, set off a storm of anxiety about AI-accelerated bioweapons/chemical weapons, led to a White House briefing (Jake Sullivan presented the paper to the president in a 30-minute block), and meetings with three-letter agencies asking "how does this change breakout time for nuclear weapons research?" Why scientific taste is the frontier: RLHF on hypotheses didn't work (humans pay attention to tone, actionability, and specific facts, not "if this hypothesis is true/false, how does it change the world?"), so they shifted to end-to-end feedback loops where humans click/download discoveries and that signal rolls up to hypothesis quality Cosmos: the full scientific agent with a world model (distilled memory system, like a Git repo for scientific knowledge) that iterates on hypotheses via literature search, data analysis, and experiment design—built by Ludo after weeks of failed attempts, the breakthrough was putting data analysis in the loop (literature alone didn't work) Why molecular dynamics and DFT are overrated: "MD and DFT have consumed an enormous number of PhDs at the altar of beautiful simulation, but they don't model the world correctly—you simulate water at 330 Kelvin to get room temperature, you overfit to validation data with GGA/B3LYP functionals, and real catalysts (grain boundaries, dopants) are too complicated for DFT" The AlphaFold vs. DE Shaw Research counterfactual: DE Shaw built custom silicon, taped out chips with MD algorithms burned in, ran MD at massive scale in a special room in Times Square, and David Shaw flew in by helicopter to present—Andrew thought protein folding would require special machines to fold one protein per day, then AlphaFold solved it in Google Colab on a desktop GPU The E3 Zero reward hacking saga: trained a model to generate molecules with specific atom counts (verifiable reward), but it kept exploiting loopholes, then a Nature paper came out that year proving six-nitrogen compounds are possible under extreme conditions, then it started adding nitrogen gas (purchasable, doesn't participate in reactions), then acid-base chemistry to move one atom, and Andrew ended up "building a ridiculous catalog of purchasable compounds in a Bloom filter" to close the loop Andrew White Future House: https://futurediscovery.org Edison Scientific: https://edison.science X: https://x.com/andrewwhite01 Cosmos paper: https://futurediscovery.org/cosmos Chapters 00:00:00 Introduction: Andrew White on Automating Science with Future House and Edison Scientific 00:02:22 The Academic to Startup Journey: Red Teaming GPT-4 and the ChemCrow Paper 00:11:35 Future House Origins: The FRO Model and Mission to Automate Science 00:12:32 Resigning Tenure: Why Leave Academia for AI Science 00:15:54 What Does 'Automating Science' Actually Mean? 00:17:30 The Lab-in-the-Loop Bottleneck: Why Intelligence Isn't Enough 00:18:39 Scientific Taste and Human Preferences: The 52% Agreement Problem 00:20:05 Paper QA, Robin, and the Road to Cosmos 00:21:57 World Models as Scientific Memory: The GitHub Analogy 00:40:20 The Bitter Lesson for Biology: Why Molecular Dynamics and DFT Are Overrated 00:43:22 AlphaFold's Shock: When First Principles Lost to Machine Learning 00:46:25 Enumeration and Filtration: How AI Scientists Generate Hypotheses 00:48:15 CBRN Safety and Dual-Use AI: Lessons from Red Teaming 01:00:40 The Future of Chemistry is Language: Multimodal Debate 01:08:15 Ether Zero: The Hilarious Reward Hacking Adventures 01:10:12 Will Scientists Be Displaced? Jevons Paradox and Infinite Discovery 01:13:46 Cosmos in Practice: Open Access and Enterprise Partnerships

January 28, 2026•1:13:55

⚡️ Prism: OpenAI's LaTeX "Cursor for Scientists" — Kevin Weil & Victor Powell, OpenAI for Science

FULL

From building Crixet in stealth (so stealthy Kevin had to hunt down Victor on Reddit to explore an acquisition) to launching Prism (https://openai.com/prism/) as OpenAI's free AI-native LaTeX editor, Kevin Weil (VP of OpenAI for Science) and Victor Powell (Product Lead on Prism) are embedding frontier reasoning models like GPT 5.2 directly into the scientific publishing workflow—turning weeks of LaTeX wrestling into minutes of natural language instruction, and accelerating the path from research breakthrough to published paper. We discuss: What Prism is: a free AI-native LaTeX editor with GPT-5.2 embedded directly into the workflow (no copy-pasting between ChatGPT and Overleaf, the AI has full context on all your files) The origin story: Kevin found Victor's stealth company Cricket on a Reddit forum, DMed him out of the blue, and brought the team into OpenAI to build the scientific collaboration layer for AI acceleration Live demo highlights: proofreading an introduction paragraph-by-paragraph, converting a whiteboard commutative diagram photo into TikZ LaTeX code, generating 30 pages of general relativity lecture notes in seconds, and verifying complex symmetry equations in parallel chat sessions Why LaTeX is the bottleneck: scientists spend hours aligning diagrams, formatting equations, and managing references—time that should go to actual science, not typesetting The software engineering analogy: just like 2025 was the year AI moved from "early adopters only" to "you're falling behind if you're not using it" for coding, 2026 will be that year for science Why collaboration is built-in: unlimited collaborators for free (most LaTeX tools charge per seat), commenting, multi-line diff generation, and Monaco-based editor infrastructure The UI evolution thesis: today your document is front and center with AI on the side, but as models improve and trust increases, the primary interface becomes your conversation with the AI (the document becomes secondary verification) OpenAI for Science's mission: accelerate science by building frontier models and embedding them into scientific workflows (not just better models, but AI in the right places at the right time) The progression from SAT to open problems: two years ago GPT passed the SAT, then contest math, then graduate-level problems, then IMO Gold, and now it's solving open problems at the frontier of math, physics, and biology Why robotic labs are the next bottleneck: as AI gets better at reasoning over the full literature and designing experiments, the constraint shifts from "can we think of the right experiment" to "can we run 100 experiments in parallel while we sleep" The in silico acceleration unlock: nuclear fusion simulations, materials science, drug discovery—fields where you can run thousands of simulations in parallel, feed results back to the reasoning model, and iterate before touching the real world Self-acceleration and the automated researcher: Jakub's public goal of an intern-level AI researcher by September 2026 (eight months away), and why that unlocks faster model improvement and faster science The vision: not to win Nobel Prizes ourselves, but for 100 scientists to win Nobel Prizes using our technology—and to compress 25 years of science into five by making every scientist faster — Prism Try Prism: https://prism.openai.com (free, log in with your ChatGPT account) OpenAI for Science: https://openai.com/science Chapters 00:00:00 Introduction: OpenAI Prism Launch and the AI for Science Mission 00:00:42 Why LaTeX Needs AI: The Scientific Writing Bottleneck 00:03:13 The Cricket Acquisition Story: From Reddit to OpenAI 00:05:50 Live Demo: AI-Powered LaTeX Editing with GPT-5.2 00:17:13 Engineering Challenges: Monaco, WebAssembly, and Backend Rendering 00:18:19 The Future of Scientific UIs: From Document-First to AI-First 00:15:51 Collaboration Features and Notebooks: The Next Integration 00:21:02 AI for Science: From SAT Tests to Open Research Problems 00:23:32 The Wet Lab Bottleneck: Robotic Labs and Experimental Acceleration 00:33:08 Self-Acceleration and the Automated AI Researcher by September 2026

January 27, 2026•35:59

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

FULL

From shipping Gemini Deep Think and IMO Gold to launching the Reasoning and AGI team in Singapore, Yi Tay has spent the last 18 months living through the full arc of Google DeepMind's pivot from architecture research to RL-driven reasoning—watching his team go from a dozen researchers to 300+, training models that solve International Math Olympiad problems in a live competition, and building the infrastructure to scale deep thinking across every domain, and driving Gemini to the top of the leaderboards across every category. Yi Returns to dig into the inside story of the IMO effort and more! We discuss: Yi's path: Brain → Reka → Google DeepMind → Reasoning and AGI team Singapore, leading model training for Gemini Deep Think and IMO Gold The IMO Gold story: four co-captains (Yi in Singapore, Jonathan in London, Jordan in Mountain View, and Tong leading the overall effort), training the checkpoint in ~1 week, live competition in Australia with professors punching in problems as they came out, and the tension of not knowing if they'd hit Gold until the human scores came in (because the Gold threshold is a percentile, not a fixed number) Why they threw away AlphaProof: "If one model can't do it, can we get to AGI?" The decision to abandon symbolic systems and bet on end-to-end Gemini with RL was bold and non-consensus On-policy vs. off-policy RL: off-policy is imitation learning (copying someone else's trajectory), on-policy is the model generating its own outputs, getting rewarded, and training on its own experience—"humans learn by making mistakes, not by copying" Why self-consistency and parallel thinking are fundamental: sampling multiple times, majority voting, LM judges, and internal verification are all forms of self-consistency that unlock reasoning beyond single-shot inference The data efficiency frontier: humans learn from 8 orders of magnitude less data than models, so where's the bug? Is it the architecture, the learning algorithm, backprop, off-policyness, or something else? Three schools of thought on world models: (1) Genie/spatial intelligence (video-based world models), (2) Yann LeCun's JEPA + FAIR's code world models (modeling internal execution state), (3) the amorphous "resolution of possible worlds" paradigm (curve-fitting to find the world model that best explains the data) Why AI coding crossed the threshold: Yi now runs a job, gets a bug, pastes it into Gemini, and relaunches without even reading the fix—"the model is better than me at this" The Pokémon benchmark: can models complete Pokédex by searching the web, synthesizing guides, and applying knowledge in a visual game state? "Efficient search of novel idea space is interesting, but we're not even at the point where models can consistently apply knowledge they look up" DSI and generative retrieval: re-imagining search as predicting document identifiers with semantic tokens, now deployed at YouTube (symmetric IDs for RecSys) and Spotify Why RecSys and IR feel like a different universe: "modeling dynamics are strange, like gravity is different—you hit the shuttlecock and hear glass shatter, cause and effect are too far apart" The closed lab advantage is increasing: the gap between frontier labs and open source is growing because ideas compound over time, and researchers keep finding new tricks that play well with everything built before Why ideas still matter: "the last five years weren't just blind scaling—transformers, pre-training, RL, self-consistency, all had to play well together to get us here" Gemini Singapore: hiring for RL and reasoning researchers, looking for track record in RL or exceptional achievement in coding competitions, and building a small, talent-dense team close to the frontier — Yi Tay Google DeepMind: https://deepmind.google X: https://x.com/YiTayML Chapters 00:00:00 Introduction: Returning to Google DeepMind and the Singapore AGI Team 00:04:52 The Philosophy of On-Policy RL: Learning from Your Own Mistakes 00:12:00 IMO Gold Medal: The Journey from AlphaProof to End-to-End Gemini 00:21:33 Training IMO Cat: Four Captains Across Three Time Zones 00:26:19 Pokemon and Long-Horizon Reasoning: Beyond Academic Benchmarks 00:36:29 AI Coding Assistants: From Lazy to Actually Useful 00:32:59 Reasoning, Chain of Thought, and Latent Thinking 00:44:46 Is Attention All You Need? Architecture, Learning, and the Local Minima 00:55:04 Data Efficiency and World Models: The Next Frontier 01:08:12 DSI and Generative Retrieval: Reimagining Search with Semantic IDs 01:17:59 Building GDM Singapore: Geography, Talent, and the Symposium 01:24:18 Hiring Philosophy: High Stats, Research Taste, and Student Budgets 01:28:49 Health, HRV, and Research Performance: The 23kg Journey

January 23, 2026•1:32:04

Brex’s AI Hail Mary — With CTO James Reggio

FULL

From building internal AI labs to becoming CTO of Brex, James Reggio has helped lead one of the most disciplined AI transformations inside a real financial institution where compliance, auditability, and customer trust actually matter. We sat down with Reggio to unpack Brex’s three-pillar AI strategy (corporate, operational, and product AI) [https://www.brex.com/journal/brex-ai-native-operations], how SOP-driven agents beat overengineered RL in ops, why Brex lets employees “build their own AI stack” instead of picking winners [https://www.conductorone.com/customers/brex/], and how a small, founder-heavy AI team is shipping production agents to 40,000+ companies. Reggio also goes deep on Brex’s multi-agent “network” architecture, evals for multi-turn systems, agentic coding’s second-order effects on codebase understanding, and why the future of finance software looks less like dashboards and more like executive assistants coordinating specialist agents behind the scenes. We discuss: Brex’s three-pillar AI strategy: corporate AI for 10x employee workflows, operational AI for cost and compliance leverage, and product AI that lets customers justify Brex as part of their AI strategy to the board Why SOP-driven agents beat overengineered RL in finance ops, and how breaking work into auditable, repeatable steps unlocked faster automation in KYC, underwriting, fraud, and disputes Building an internal AI platform early: LLM gateways, prompt/version management, evals, cost observability, and why platform work quietly became the force multiplier behind everything else Multi-agent “networks” vs single-agent tools: why Brex’s EA-style assistant coordinates specialist agents (policy, travel, reimbursements) through multi-turn conversations instead of one-shot tool calls The audit agent pattern: separating detection, judgment, and follow-up into different agents to reduce false negatives without overwhelming finance teams Centralized AI teams without resentment: how Brex avoided “AI envy” by tying work to business impact and letting anyone transfer in if they cared deeply enough Letting employees build their own AI stack: ChatGPT vs Claude vs Gemini, Cursor vs Windsurf, and why Brex refuses to pick winners in fast-moving tool races Measuring adoption without vanity metrics: why “% of code written by AI” is the wrong KPI and what second-order effects (slop, drift, code ownership) actually matter Evals in the real world: regression tests from ops QA, LLM-as-judge for multi-turn agents, and why integration-style evals break faster than you expect Teaching AI fluency at scale: the user → advocate → builder → native framework, ops-led training, spot bonuses, and avoiding fear-based adoption Re-interviewing the entire engineering org: using agentic coding interviews internally to force hands-on skill upgrades without formal performance scoring Headcount in the age of agents: why Brex grew the business without growing engineering, and why AI amplifies bad architecture as fast as good decisions The future of finance software: why dashboards fade, assistants take over, and agent-to-agent collaboration becomes the real UI — James Reggio X: https://x.com/jamesreggio LinkedIn: https://www.linkedin.com/in/jamesreggio/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction 00:01:24 From Mobile Engineer to CTO: The Founder's Path 00:03:00 Quitters Welcome: Building a Founder-Friendly Culture 00:05:13 The AI Team Structure: 10-Person Startup Within Brex 00:11:55 Building the Brex Agent Platform: Multi-Agent Networks 00:13:45 Tech Stack Decisions: TypeScript, Mastra, and MCP 00:24:32 Operational AI: Automating Underwriting, KYC, and Fraud 00:16:40 The Brex Assistant: Executive Assistant for Every Employee 00:40:26 Evaluation Strategy: From Simple SOPs to Multi-Turn Evals 00:37:11 Agentic Coding Adoption: Cursor, Windsurf, and the Engineering Interview 00:58:51 AI Fluency Levels: From User to Native 01:09:14 The Audit Agent Network: Finance Team Agents in Action 01:03:33 The Future of Engineering Headcount and AI Leverage

January 17, 2026•1:13:26

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

FULL

don’t miss George’s AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk —- From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah Hill-Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really? We discuss: The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard) The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding \"I don't know\"), and Claude models lead with the lowest hallucination rates despite not always being the smartest GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias) The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron) The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents) Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions) V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models) — Artificial Analysis Website: https://artificialanalysis.ai (https://artificialanalysis.ai (\"https://artificialanalysis.ai\")) George Cameron on X: https://x.com/grmcameron (https://x.com/grmcameron (\"https://x.com/grmcameron\")) Micah Hill-Smith on X: https://x.com/_micah_h (https://x.com/_micah_h (\"https://x.com/_micah_h\")) Chapters 00:00:00 Introduction: Full Circle Moment and Artificial Analysis Origins 00:01:08 Business Model: Independence and Revenue Streams 00:04:00 The Origin Story: From Legal AI to Benchmarking 00:07:00 Early Challenges: Cost, Methodology, and Independence 00:16:13 AI Grant and Moving to San Francisco 00:18:58 Evolution of the Intelligence Index: V1 to V3 00:27:55 New Benchmarks: Hallucination Rate and Omissions Index 00:33:19 Critical Point and Frontier Physics Problems 00:35:56 GDPVAL AA: Agentic Evaluation and Stirrup Harness 00:51:47 The Openness Index: Measuring Model Transparency 00:57:57 The Smiling Curve: Cost of Intelligence Paradox 01:04:00 Hardware Efficiency and Sparsity Trends 01:07:43 Reasoning vs Non-Reasoning: Token Efficiency Matters 01:10:47 Multimodal Benchmarking and Community Requests 01:14:50 Looking Ahead: V4 Intelligence Index and Beyond

January 9, 2026•1:18:14

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

FULL

From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data) and raising a $134M seed from Khosla (https://techcrunch.com/2025/10/16/general-intuition-lands-134m-seed-to-teach-agents-spatial-reasoning-using-video-game-clips/) to spin out General Intuition, Pim is betting that world models trained on peak human gameplay are the next frontier after LLMs. We sat down with Pim to dig into why game highlights are “episodic memory for simulation” (and how Medal’s privacy-first action labels became a world-model goldmine https://medal.tv/blog/posts/enabling-state-of-the-art-security-and-protections-on-medals-new-apm-and-controller-overlay-features), what it takes to build fully vision-based agents that just see frames and output actions in real time, how General Intuition transfers from games to real-world video and then into robotics, why world models and LLMs are complementary rather than rivals, what founders with proprietary datasets should know before selling or licensing to labs, and his bet that spatial-temporal foundation models will power 80% of future atoms-to-atoms interactions in both simulation and the real world. We discuss: How Medal’s 3.8B action-labeled highlight clips became a privacy-preserving goldmine for world models Building fully vision-based agents that only see frames and output actions yet play like (and sometimes better than) humans Transferring from arcade-style games to realistic games to real-world video using the same perception–action recipe Why world models need actions, memory, and partial observability (smoke, occlusion, camera shake) vs. “just” pretty video generation Distilling giant policies into tiny real-time models that still navigate, hide, and peek corners like real players Pim’s path from RuneScape private servers, Tourette’s, and reverse engineering to leading a frontier world-model lab How data-rich founders should think about valuing their datasets, negotiating with big labs, and deciding when to go independent GI’s first customers: replacing brittle behavior trees in games, engines, and controller-based robots with a “frames in, actions out” API Using Medal clips as “episodic memory of simulation” to move from imitation learning to RL via world models and negative events The 2030 vision: spatial–temporal foundation models that power the majority of atoms-to-atoms interactions in simulation and the real world — Pim X: https://x.com/PimDeWitte LinkedIn: https://www.linkedin.com/in/pimdw/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and Medal's Gaming Data Advantage 00:02:08 Exclusive Demo: Vision-Based Gaming Agents 00:06:17 Action Prediction and Real-World Video Transfer 00:08:41 World Models: Interactive Video Generation 00:13:42 From Runescape to AI: Pim's Founder Journey 00:16:45 The Research Foundations: Diamond, Genie, and SEMA 00:33:03 Vinod Khosla's Largest Seed Bet Since OpenAI 00:35:04 Data Moats and Why GI Stayed Independent 00:38:42 Self-Teaching AI Fundamentals: The Francois Fleuret Course 00:40:28 Defining World Models vs Video Generation 00:41:52 Why Simulation Complexity Favors World Models 00:43:30 World Labs, Yann LeCun, and the Spatial Intelligence Race 00:50:08 Business Model: APIs, Agents, and Game Developer Partnerships 00:58:57 From Imitation Learning to RL: Making Clips Playable 01:00:15 Open Research, Academic Partnerships, and Hiring 01:02:09 2030 Vision: 80 Percent of Atoms-to-Atoms AI Interactions

December 6, 2025

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

FULL

Fei-Fei Li and Justin Johnson are cofounders of World Labs, who have recently launched Marble (https://marble.worldlabs.ai/), a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marble lets creators generate persistent 3D worlds, precisely control cameras, and interactively edit scenes, making it a powerful tool for games, film, VR, robotics simulation, and more. In this episode, Fei-Fei and Justin share how their journey from ImageNet and Stanford research led to World Labs, why spatial intelligence is the next frontier after LLMs, and how world models could change how machines see, understand, and build in 3D. We discuss: The massive compute scaling from AlexNet to today and why world models and spatial data are the most compelling way to “soak up” modern GPU clusters compared to language alone. What Marble actually is: a generative model of 3D worlds that turns text and images into editable scenes using Gaussian splats, supports precise camera control and recording, and runs interactively on phones, laptops, and VR headsets. Fei-fei’s essay (https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence) on spatial intelligence as a distinct form of intelligence from language: from picking up a mug to inferring the 3D structure of DNA, and why language is a lossy, low-bandwidth channel for describing the rich 3D/4D world we live in. Whether current models “understand” physics or just fit patterns: the gap between predicting orbits and discovering F=ma, and how attaching physical properties to splats and distilling physics engines into neural networks could lead to genuine causal reasoning. The changing role of academia in AI, why Fei-Fei worries more about under-resourced universities than “open vs closed,” and how initiatives like national AI compute clouds and open benchmarks can rebalance the ecosystem. Why transformers are fundamentally set models, not sequence models, and how that perspective opens up new architectures for world models, especially as hardware shifts from single GPUs to massive distributed clusters. Real use cases for Marble today: previsualization and VFX, game environments, virtual production, interior and architectural design (including kitchen remodels), and generating synthetic simulation worlds for training embodied agents and robots. How spatial intelligence and language intelligence will work together in multimodal systems, and why the goal isn’t to throw away LLMs but to complement them with rich, embodied models of the world. Fei-Fei and Justin’s long-term vision for spatial intelligence: from creative tools for artists and game devs to broader applications in science, medicine, and real-world decision-making. — Fei-Fei Li X: https://x.com/drfeifei LinkedIn: https://www.linkedin.com/in/fei-fei-li-4541247 Justin Johnson X: https://x.com/jcjohnss LinkedIn: https://www.linkedin.com/in/justin-johnson-41b43664 Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and the Fei-Fei Li & Justin Johnson Partnership 00:02:00 From ImageNet to World Models: The Evolution of Computer Vision 00:12:42 Dense Captioning and Early Vision-Language Work 00:19:57 Spatial Intelligence: Beyond Language Models 00:28:46 Introducing Marble: World Labs' First Spatial Intelligence Model 00:33:21 Gaussian Splats and the Technical Architecture of Marble 00:22:10 Physics, Dynamics, and the Future of World Models 00:41:09 Multimodality and the Interplay of Language and Space 00:37:37 Use Cases: From Creative Industries to Robotics and Embodied AI 00:56:58 Hiring, Research Directions, and the Future of World Labs

November 25, 2025

⚡️ 10x AI Engineers with $1m Salaries — Alex Lieberman & Arman Hezarkhani, Tenex

FULL

Alex Lieberman and Arman Hezarkani, co-founders of Tenex, reveal how they're revolutionizing software consulting by compensating AI engineers for output rather than hours—enabling some engineers to earn over $1 million annually while delivering 10x productivity gains. Their company represents a fundamental rethinking of knowledge work compensation in the age of AI agents, where traditional hourly billing models perversely incentivize slower work even as AI tools enable unprecedented speed. The Genesis: From 90% Downsizing to 10x Output The story behind 10X begins with Arman's previous company, Parthian, where he was forced to downsize his engineering team by 90%. Rather than collapse, Arman re-architected the entire product and engineering process to be AI-first—and discovered that production-ready software output increased 10x despite the massive headcount reduction. This counterintuitive result exposed a fundamental misalignment: engineers compensated by the hour are disincentivized from leveraging AI to work faster, even when the technology enables dramatic productivity gains. Alex, who had invested in Parthian, initially didn't believe the numbers until Arman walked him through why LLMs have made such a profound impact specifically on engineering as knowledge work. The Economic Model: Story Points Over Hours 10X's core innovation is compensating engineers based on story points—units of completed, quality output—rather than hours worked. This creates direct economic incentives for engineers to adopt every new AI tool, optimize their workflows, and maximize throughput. The company expects multiple engineers to earn over $1 million in cash compensation next year purely from story point earnings. To prevent gaming the system, they hire for two profiles: engineers who are "long-term selfish" (understanding that inflating story points will destroy client relationships) and those who genuinely love writing code and working with smart people. They also employ technical strategists incentivized on client retention (NRR) who serve as the final quality gate before any engineering plan reaches a client. Impressive Builds: From Retail AI to App Store Hits The results speak for themselves. In one project, 10X built a computer vision system for retail cameras that provides heat maps, queue detection, shelf stocking analysis, and theft detection—creating early prototypes in just two weeks for work that previously took quarters. They built Snapback Sports' mobile trivia app in one month, which hit 20th globally on the App Store. In a sales context, an engineer spent four hours building a working prototype of a fitness influencer's AI health coach app after the prospect initially said no—immediately moving 10X to the top of their vendor list. These examples demonstrate how AI-enabled speed fundamentally changes sales motions and product development timelines. The Interview Process: Unreasonably Difficult Take-Homes Despite concerns that AI would make take-home assessments obsolete, 10X still uses them—but makes them "unreasonably difficult." About 50% of candidates don't even respond, but those who complete the challenge demonstrate the caliber needed. The interview process is remarkably short: two calls before the take-home, review, then one or two final meetings—completable in as little as a week. A signature question: "If you had infinite resources to build an AI that could replace either of us on this call, what would be the first major bottleneck?" The sophisticated answer isn't just "model intelligence" or "context length"—it's controlling entropy, the accumulating error rate that derails autonomous agents over time. The Limiting Factor: Human Capital, Not Technology Despite being an AI-first company, 10X's primary constraint is human capital—finding and hiring enough exceptional engineers fast enough, then matching them with the right processes to maintain delivery quality as they scale. The company has ambitions beyond consulting to build their own technology, but for the foreseeable future, recruiting remains the bottleneck. This reveals an important insight about the AI era: even as technology enables unprecedented leverage, the constraint shifts to finding people who can harness that leverage effectively. Chapters 00:00:00 Introduction and Meeting the 10X Co-founders 00:01:29 The 10X Moment: From Hourly Billing to Output-Based Compensation 00:04:44 The Economic Model Behind 10X 00:05:42 Story Points and Measuring Engineering Output 00:08:41 Impressive Client Projects and Rapid Prototyping 00:12:22 The 10X Tech Stack: TypeScript and High Structure 00:13:21 AI Coding Tools: The Daily Evolution 00:15:05 Human Capital as the Limiting Factor 00:16:02 The Unreasonably Difficult Interview Process 00:17:14 Entropy and Context Engineering: The Future of AI Agents 00:23:28 The MCP Debate and AI Industry Sociology 00:26:01 Consulting, Digital Transformation, and Conference Insights

November 19, 2025

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

FULL

Deedy Das, Partner at Menlo Ventures, returns to Latent Space to discuss his journey from Glean to venture capital, the explosive rise of Anthropic, and how AI is reshaping enterprise software and coding. From investing in Anthropic early on when they had no revenue to managing the $100M Ontology Fund, Das shares insider perspectives on the fastest-growing software company in history and what's next for AI infrastructure, research investing, and the future of engineering. We cover Glean’s rise from “boring” enterprise search to a $7B AI-native company, Anthropic's meteoric rise, the strategic decisions behind products like Claude Code, and why market share in enterprise AI is shifting dramatically. Das explains his investment thesis on research companies like Goodfire, Prime Intellect, and OpenRouter and how the Anthology Fund is quietly seeding the next wave of AI infra, research, and devtools.

November 14, 2025

⚡ Inside GitHub’s AI Revolution: Jared Palmer Reveals Agent HQ & The Future of Coding Agents

FULL

Jared Palmer, SVP at GitHub and VP of CoreAI at Microsoft, joins Latent Space for an in-depth look at the evolution of coding agents and modern developer tools. Recently joining after leading AI initiatives at Vercel, Palmer shares firsthand insights from behind the scenes at GitHub Universe, including the launch of Agent HQ which is a new collaboration hub for coding agents and developers. This episode traces Palmer’s journey from building Copilot inspired tools to pioneering the focused Next.js coding agent, v0, and explores how platform constraints fostered rapid experimentation and a breakout success in AI-powered frontend development. Palmer explains the unique advantages of GitHub’s massive developer network, the challenges of scaling agent-based workflows, and why integrating seamless AI into developer experiences is now a top priority for both Microsoft and GitHub.

November 10, 2025

DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

FULL

At OpenAI DevDay, we sit down with Sherwin Wu and Christina Cai from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder, while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within ChatGPT through the new Apps SDK. The conversation explores how OpenAI is tackling the challenges developers face when taking agents to production - from writing and optimizing prompts to building evaluation pipelines. They discuss the decision to adopt Anthropic's MCP protocol for tool connectivity, the importance of visual workflows for complex agent systems, and how features like human-in-the-loop approvals and automated prompt optimization are making agent development more accessible to a broader range of developers. Sherwin and Christina also reveal how OpenAI is dogfooding these tools internally, with their own customer support at openai.com already powered by AgentKit, and share candid insights about the evolution from plugins to GPTs to this new agent platform. They discuss the surprising persistence of prompting as a critical skill (contrary to predictions from two years ago), the challenges of serving custom fine-tuned models at scale, and why they believe visual agent builders are essential as workflows grow to span dozens of nodes. Guests: Sherwin Wu: Head of Engineering, OpenAI Platform https://www.linkedin.com/in/sherwinwu1/ https://x.com/sherwinwu?lang=en Christina Huang: Platform Experience, OpenAI https://x.com/christinaahuang https://www.linkedin.com/in/christinaahuang/ Thanks very much to Lindsay and Shaokyi for helping us set up this great deepdive into the new DevDay launches! Key Topics: • AgentKit launch: Agent SDK, Builder, Evals, and deployment tools • Apps SDK and the inversion of the app-chatbot paradigm • Adopting MCP protocol for universal tool connectivity • Visual agent building vs code-first approaches • Human-in-the-loop workflows and approval systems • Automated prompt optimization and "zero-gradient fine-tuning" • Service Health Dashboard and achieving five nines reliability • ChatKit as an embeddable, evergreen chat interface • The evolution from plugins to GPTs to agent platforms • Internal dogfooding with Codex and agent-powered support

October 7, 2025