Fused Linear Cross-Entropy

May 24, 2025

Fused Linear Cross-Entropy is a popular optimization that combines the final linear projection and cross-entropy loss into a single operation. This fusion is very valuable for training large language models efficiently, as it can reduce memory usage significant, particularly for larger vocabularies.

Values in AI

May 23, 2025

Daniel Schmachtenberger has made the argument:

Scientific discovery and AI

May 19, 2025

I got fooled by AI-for-science hype—here’s what it taught me

Optimizers and Hessians

May 17, 2025

https://arxiv.org/abs/2505.02809

Pyrefly

May 16, 2025

https://pyrefly.org

The First Year of Free-Threaded Python

May 16, 2025

https://labs.quansight.org/blog/free-threaded-one-year-recap

Generative modelling in the latent space

May 13, 2025

Generative modelling in latent space – Sander Dieleman

How does Triton do Warp Spec?

May 9, 2025

Kapil Sharma from the PyTorch team has a great series of posts diving into the Triton compiler process: 1, 2, 3. As covered there, Triton lowers to a series of intermediate representations, and each level has a set of transformational passes that implement optimizations. TTIR is the generic Triton IR and leverages a number of standard MLIR passes like common subexpression elimination, as well as some Triton specific passes like managing broadcast ops. That’s then lowered to TTGIR, a GPU-specific IR¹

Note:Because of some project weirdness, warp specialization is quite different in the release branches from main, so I’ll refer to 3.3 from here on. It’s in very active development (by teams at Meta, OpenAI and Nvidia!) so the specifics are quite likely to change over coming releases! ↩

The Mold Linker

May 8, 2025

rui314/mold: Mold: A Modern Linker 🦠

LSP & Standards

May 7, 2025

https://www.michaelpj.com/blog/2024/09/03/lsp-good-bad-ugly.html