Vibe Coding & Code Review

· March 19, 2025

(Updated: Apr 22, 2026)

Vibe coding is trendy right now, and it’s part of a bigger shift toward AI-generated code. One of the questions I have about this, at scale, is how it will interact with code review.

When you go public, you have to comply with Sarbanes-Oxley. SOX mandates a separation of duties, which in practical terms usually means having a process by which changes are tested and approved by someone other than the author before being deployed, and often that is implemented through code review. Companies like Google, Meta, Snapchat, and Uber also have compliance obligations around privacy and security thanks to FTC consent decrees. While these mandates don’t directly say “do code reviews” (they mostly mandate programs) in practice, code reviews are key checkpoints for compliance.

As we use more model-generated code, traditional code review processes will have to change. Maybe engineers become reviewers of AI-generated code, or we start clearly separating AI-written code from the manually-reviewed stuff—similar to what already happens with some current codegen tools.

Companies will have to think about ownership and stewardship of changes. Code review requirements like OWNERS or Readability at Google require an approver who isn’t the original author: if the change is LLM authored, is it acceptable for there to be no human author, or is it owned by the person who kicked off the workflow, precluding them from being the reviewer? If we have automated detect-change flows set up (e.g. upgrading downstreams when a dependency changes), is it an individual or a team/oncall that owns the change?

The bottleneck is human attention and focus. It is plausible to envisage 10-1000x more changes in a large code base, but current review practices simply wouldn’t do an effective job: at best, many changes would be rubber stamped. Work like the diff risk score shows you can do some degree of triage or prioritizing with models. Conceptually you could see extending this to privacy, security and more specific types of risk scoring.

Work like Policy Zones associates compliance requirements with data and asserts controls in the data flow of the system. This may be easier to scale than trying to validate on code changes.

Static analysis can also catch patterns of bad usage, like recent work in detecting scraping opportunities. This feels especially important for LLM generated code where a well diffused but bad pattern might be easier to generate than a more novel but safe pattern.

I expect to see more pressure on developer infrastructure teams to build out capabilities for risk detection, automatic validation and embedding policy information into code or data. There will be an advantage to these being open, industry standard approaches as foundation models will do a better job and require less fine tuning to company specific idioms.