Power by the hour

July 20, 2026

It is a truth universally acknowledged that an airline in possession of an airplane must be in want of engines to make it go. Yet, somewhat surprisingly, they don’t really buy engines.

Who is walking who?

July 11, 2026

One good way to annoy a neuroscientist is to compare an LLM to the brain. It’s appealing though! There are similarities! In infancy we take a complex fusion of sensory inputs and learn to make predictions in latent space, while in pre-training a stack of Transformers learn to predict which number SolidGoldMagikarp will say next on Reddit.

MOPD

July 9, 2026

We talked about this sort of thing a bit before, but now the official Multi-Teacher On-Policy distillation paper is out, and its a pleasant read: “MOPD for Capability Integration in LLM Post-Training”.

Benchmarks Mean Business

June 30, 2026

The basic job of an eval is let you judge how good your model is on a task. If enough people use the same eval we can use it to benchmark the relative performance of multiple models on a level playing field. All good, no drama.

It’s always the learning rates

June 28, 2026

Pre-training any kind of good LLM is very, very expensive. Thankfully, we have scaling laws. Lilian Weng of Thinky writes:

LLMs are complicated now

June 19, 2026

Back in 2022 and 2023 there were two big branches of machine learning happening at Meta¹. The LLM work that led to Llama was a clean, smooth stack of repeated Transformer modules; the recommendation systems graphs were, by contrast, terrifying. Luckily, the industry has remedied that state of affairs by making LLMs a lot more complicated.

And many smaller ones, shout outs to all my Content Understanding and integrity peeps ↩

FactWorld

June 12, 2026

When we started building LLMs, we mostly focused on them knowing things. They had information encoded in their weights, and they could spit it out when given sufficient prompts. But an agent doesn’t just need to know things; it needs to combine several kinds of knowledge.

Somehow, more on distillation

June 5, 2026

The capabilities in a large language model emerge, mysteriously, from the training data. Everyone agrees that you start with a big pile of data, add some compute, and at the end you can vibe code. Opinions differ on what that pile of data should look like.

We can distill it for you wholesale

May 31, 2026

There has been a lot of drama¹ about distillation: how (closed) frontier models are being used by other labs to boost their own performance on particularly hard tasks.

And/or marketing. ↩

Maybe the agents shouldn’t write the kernels

May 27, 2026

A thing you can do is take the most performance and correctness sensitive part of your stack and just ask a chatbot to write it for you. They will sometimes get it right!