Optimizers and Hessians
May 17, 2025
Needless text
May 17, 2025
May 16, 2025
May 16, 2025
May 13, 2025
May 9, 2025
Kapil Sharma from the PyTorch team has a great series of posts diving into the Triton compiler process: 1, 2, 3. As covered there, Triton lowers to a series of intermediate representations, and each level has a set of transformational passes that implement optimizations. TTIR is the generic Triton IR and leverages a number of standard MLIR passes like common subexpression elimination, as well as some Triton specific passes like managing broadcast ops. That’s then lowered to TTGIR, a GPU-specific IR1
Note:Because of some project weirdness, warp specialization is quite different in the release branches from main, so I’ll refer to 3.3 from here on. It’s in very active development (by teams at Meta, OpenAI and Nvidia!) so the specifics are quite likely to change over coming releases! ↩
May 8, 2025
May 7, 2025
May 6, 2025
| [Scalably Solving Assistance Games | OpenReview](https://openreview.net/forum?id=xVS7dFKoMR) |
May 5, 2025
May 4, 2025
torch.compile offers some knobs for controlling the trade-off of execution performance with longer compile times. This is particularly useful for inference, where the same model will be running for a long time.