@@ · the bottleneck @@

Why are pull requests so hard to review?

Othman Shareef · June 10, 2026 · 6 min read

Every engineering team we talk to has the same queue: code that’s written, tested, and ready, waiting on review. The author moved on hours ago. The reviewer is dreading the tab. Why is reviewing a pull request so much harder than writing one?

It wasn’t always. For most of software’s history the economics ran the other way: writing code was the slow, expensive part (hours or days of design, typing, and debugging per change), and reviewing that change was the comparatively quick read at the end. AI inverted the ratio. Code is now cheap to produce in volume, and on many teams reviewing a change takes longer than writing it did. The bottleneck moved from authoring to review, while our habits and tools are still built for the old ratio.

Reading code is harder than writing it

When you write code, you build the mental model first and the code falls out of it. When you review code, you’re handed the output and asked to reconstruct the model in reverse. Microsoft Research’s study of code review at scale, Expectations, Outcomes, and Challenges of Modern Code Review, found exactly this: the number-one challenge reviewers report isn’t finding defects; it’s understanding the change in the first place.

That asymmetry is the root of everything else. A reviewer doing the job properly is doing archaeology: why this file, why this approach, what else does this touch? Every missing piece of context turns into either a round-trip question or a shrug-and-approve.

Diffs outgrew our working memory

Review effectiveness falls off a cliff with size. SmartBear’s analysis of peer review (drawn from a large study at Cisco) recommends keeping reviews under roughly 400 lines of code. Beyond that, the ability to find defects drops measurably. Google’s engineering practices push the same direction: small changes get reviewed faster, more thoroughly, and with less back-and-forth.

Meanwhile real-world PRs are heading the other way. AI assistants make it cheap to generate four hundred lines before lunch, and refactors ride along with features because splitting feels like overhead. The result: a forty-file diff where three files carry the actual change, and the reviewer has to find them by scrolling.

The “why” never made it into the PR

Most PR descriptions describe the what (“Add retry logic to sync”), not the why (“Sync fails on flaky hotel wi-fi; we retry 3× with backoff; the risky part is the idempotency assumption in applyBatch”). Without the why, a reviewer either reconstructs it from the diff (slow) or reviews superficially. Neither is the reviewer’s fault: nothing in the default workflow demands that context exist.

The review surface fights you

Here’s the part we think gets too little blame: the place where review happens. The standard web diff was designed for hosting and browsing code, and it shows. Files collapse and lose your place. There’s no triage: file one and file forty get equal billing. Comments vanish into resolved threads. Forty open tabs later, the review “session” is really an exercise in remembering where you were.

None of these is fatal alone. Together they tax exactly the thing review depends on: sustained attention on an unfamiliar change.

What actually helps

  • Smaller PRs, enforced socially or by tooling. The single highest-leverage change, per both Google and the Cisco data above.
  • Authors review first. A self-review pass catches the cheap stuff before it costs a reviewer’s attention, and writing the “why” down halves the archaeology.
  • A review surface built for understanding. Triage the files that matter, keep your place, see every thread in context, and run the whole read-comment-approve-merge loop without losing state between tabs.

That last one is the bet we’re making with Pyor: a review app where the diff, the conversation, and the merge live in one window, and your code never leaves your machine. We built it because we were the reviewer dreading the tab. (Yes, that’s a product plug. It’s also the reason this blog exists, so we’d rather say it plainly.)

Frequently asked questions

Why do pull requests sit unreviewed for days?

Mostly because reviewing feels expensive: large diffs, missing context, and notification noise push reviewers to defer. Reducing PR size and giving reviewers a faster surface for understanding the change shortens the queue more than reminders do.

Is AI-generated code making review harder?

It shifts the bottleneck. AI assistants speed up authoring, so more code arrives per reviewer-hour, and diffs trend larger. The review side (human understanding) has not sped up to match.

← All posts