Scaling RL Compute

· May 27, 2025

links-and-recs

https://gr.inc/blog/scaling-rl-compute/?

Great post by the folks at General Reasoning on the combination of factors that led to O1-type breakthroughs in inference time compute.

But here is the key point: no-one suddenly discovered that reinforcement learning was useful for reasoning. It was always useful, but getting some of the details right was the difference between a good post-training recipe and a paradigm shift in the way we use language models.

ML research is prone to these lollapalooza effects where several positive facts coincide to produce a much larger than expected result. You can go look at the launch of ChatGPT for another example: ChatGPT wasn’t a surprise for folks who had spent time with large language models, and had seen attempts like Galactica before. But for many people it was a remarkable, new experience, and the engagement and interaction ChatGPT saw was new to the researcher community. That itself contributed to further breakthroughs and improvements.