Coding agents got genuinely good at writing code during 2025, and the first instinct was to treat the output like outsourced code — hand it to a senior engineer for review. That worked for a while. It doesn't scale.
AI agents write code like enthusiastic mid-level engineers. Competent, often impressive, occasionally brilliant — but not always as good as it could be, and reliably faster than any human can keep up with. A single agent can produce in an afternoon what would take a developer a week. The code review backlog builds up fast, and senior engineer time and mental capacity are both finite. Something has to give.
The problem is straightforward: AI agents produce code far faster than people can review it, and the code still needs reviewing.
The solution is equally straightforward: if AI agents can write code, they can review it too.
The more interesting insight is that a single generalist reviewer — human or AI — tends to miss things. When you're simultaneously holding security concerns, performance characteristics, and architectural consistency in your head, each one gets less attention than it deserves. The answer is to run parallel, targeted reviews, each focused on a specific problem space. That's exactly how our review process works. As of writing, each pull request gets reviewed by four separate agents running in parallel:
- General code review — correctness, clarity, obvious bugs
- Security review — thinking like a penetration tester, looking for vulnerabilities
- Performance review — potential bottlenecks, inefficient queries, resource waste
- Architecture review — structural problems, coupling, drift from established patterns
It's relatively easy to add more specialisations, but there is a real cost to running agents at scale, so each new reviewer needs to earn its place — the signal has to be worth the noise and the bill.
One strong preference worth highlighting: run the reviews on a different model to the one that wrote the code. A model reviewing its own output is too familiar with its own reasoning and too likely to ratify its own choices. A different model brings genuine fresh eyes. Currently the reviews run on slower, lower-cost models as GitHub Actions, which keeps costs manageable without sacrificing too much quality.
This is still an evolving process. But the direction of travel is clear: the world is about to have far more code running in production than it ever has before, written faster and by fewer people than ever before. The answer to that is not less rigorous review — it is more systematic, scalable review. That means AI reviewing AI, doing it in parallel, and doing it thoroughly.