Free-Threaded Python gets ‘supported’ status

June 21, 2025

Huge congratulations to Thomas, Matt and Sam for sheparding through PEP 779 that moves the no-gil/free threaded python mode from experimental to supported:

A patchwork quilt view of AI Alignment

June 15, 2025

https://arxiv.org/abs/2505.05197

Linear Layouts in Triton

June 11, 2025

[2505.23819] Linear Layouts: Robust Code Generation of Efficient Tensor Computation

Monarch: PyTorch Single Controller

June 10, 2025

I’ve been excited for this to make it to OSS: The PyTorch team at Meta recently soft-launched Monarch on Github.

Toward a Theory of Tokenization in LLMs

June 10, 2025

[2404.08335] Toward a Theory of Tokenization in LLMs

Analyzing Modern GPU Cores

June 5, 2025

[2503.20481] Analyzing Modern NVIDIA GPU cores

Darwin Gödel Machines

June 1, 2025

https://open.substack.com/pub/gonzoml/p/darwin-godel-machine

Keeping a GPU busy is a lot about tiling

May 30, 2025

File this under the “gross oversimplifications” category. The basic approach to keeping GPUs busy is dividing the work into tiles, smaller sub-problems that make up the larger result. For a GEMM you might break the matrix into 128×128 or 128×64 tiles and let each CUDA thread block (CTA) own one tile. The GPU has many streaming multiprocessors (an A100 has 108) and every SM picks up one CTA at a time. If you want to know how many SMs your own card has you can call:

Metrics for Engineering Teams

May 28, 2025

Don’t blindly tie every piece of work to top-level metrics. Even if technically feasible, the cost is too high and the risk of spurious logic chains significant.

Accidental Factors

Free-Threaded Python gets ‘supported’ status

A patchwork quilt view of AI Alignment

Linear Layouts in Triton

Monarch: PyTorch Single Controller

Toward a Theory of Tokenization in LLMs

Analyzing Modern GPU Cores

Darwin Gödel Machines

Keeping a GPU busy is a lot about tiling

Metrics for Engineering Teams

Scaling RL Compute