The Research-Plan-Implement Pattern: How AI Coding Actually Works at Scale
By LazyRalph Team
If you’ve been using Claude Code, Codex, or Cursor on anything bigger than a toy project for more than a few weeks, you’ve probably hit the same wall. The model starts strong, then somewhere around the thousandth line of diff it starts rewriting code it already wrote an hour ago, contradicting its own earlier decisions, and confidently inventing APIs that don’t exist.
There’s a name for this now. The community calls it context rot, and the most widely adopted workaround is a three-phase pattern called Research-Plan-Implement — RPI for short.
This post is a practical walkthrough: where RPI came from, what it actually does, why the person who invented it publicly threw it out, and what that tells you about how to work with AI coding tools right now.
Where RPI came from
RPI wasn’t invented by a big lab or a research paper. It came from a founder at a YC-backed startup who needed to ship real features into a 300,000-line Rust codebase without the agents losing their minds halfway through.
Dex Horthy, founder of HumanLayer, published the original .claude/commands/ directory in his company’s public repo in 2025. Three files: research_codebase.md, create_plan.md, implement_plan.md. The prime directive on the research command is deliberately narrow:
“YOUR ONLY JOB IS TO DOCUMENT AND EXPLAIN THE CODEBASE AS IT EXISTS TODAY.”
Researchers are “documentarians, not critics.” They don’t propose changes. They don’t judge. They produce a clean map of how things actually are. Then a separate agent — with a fresh context — reads that map and makes a plan. Then a third agent reads the plan and implements it.
Horthy later gave a talk at AI Engineer Code Summit NYC called “No Vibes Allowed: Solving Hard Problems in Complex Codebases” where he pinned down the mechanism that makes it work: what he calls frequent intentional compaction. The goal isn’t to use as much context as possible — it’s to keep each agent instance running in the 40–60% context window range, where models are sharpest.
He calls the zone above that threshold the Dumb Zone. Performance degrades. Models forget. Hallucinations go up. The whole pattern exists to keep you out of it.
The three phases, explained without the marketing
Research. One agent, one job: read the codebase and describe what’s there. It doesn’t propose solutions. It doesn’t suggest fixes. It produces a markdown file — usually research.md or something in a thoughts/ directory — that lists the relevant files, the existing patterns, the dependencies, and the gotchas. That’s it.
Plan. A new agent starts with a clean context. It reads research.md and the original task, and it writes a plan. The plan has enough detail that implementation can be mechanical: files to create, files to modify, tests to write, sequence. If the plan is vague, the next phase will be vague. If the plan is wrong, everything after it is wrong.
Implement. A new agent starts with a clean context. It reads the plan and does it. Not “interprets” it. Does it.
The key move in all of this is that the three phases don’t share context. Each phase starts fresh. They communicate through files — artifacts written to disk — not through conversation history. That’s the whole trick.
Boris Tane, who wrote what has become one of the most-cited practitioner blog posts on this, puts it like this:
“The research prevents ignorant changes. The plan prevents wrong changes.”
And then:
“I want implementation to be boring.”
That’s the whole shape of the pattern in one sentence. If implementation is exciting, you skipped something upstream.
What RPI actually fixes
To understand why this pattern spread, you have to understand the specific failure modes people were hitting. Here are the ones that come up constantly in practitioner blogs, HN threads, and postmortems:
Context window exhaustion. Ashley Ha wrote about what this looks like from the inside: “before I knew it, I was at 95% context capacity with half-implemented code and a dozen final_implementation_final_feature.md files.” This is the normal state of affairs for people who don’t break their work into phases. Her personal rule now: never let context exceed 60%.
Plan drift. Without a written plan in a file, the agent’s “plan” lives in its own head — its context window. As the window fills, the plan gets summarized, then summarized again, then replaced. By the time implementation is halfway done, the agent is solving a different problem than it started with.
Ignorant changes. The agent hasn’t read the codebase. It reads three files, assumes the rest, and generates code that looks plausible but fights the existing patterns. This is what research phase prevents.
Hallucinated APIs. The model doesn’t know your codebase’s internal APIs. When it doesn’t know, it guesses. A research phase that documents the actual APIs solves this — not perfectly, but dramatically better than the alternative.
Harper Reed described the experience of losing the plot without structure as going “over my skis”:
“All of a sudden you are like ‘WHAT THE FUCK IS GOING ON!,’ and are completely lost.”
Anyone who’s spent a few hours letting an agent run on a complex task without checkpoints has had this exact moment.
The numbers
The pattern’s proponents cite some specific results, which are worth taking seriously because they come with context:
- A 300k-LOC Rust codebase (BAML) — bug fix approved within hours
- 35k LOC of features (WASM + cancellation) shipped in 7 hours using the pattern, vs. a senior engineer estimate of 3–5 days
- A study referenced in several blog posts claims 77% first-try success on complex (7+ file) tasks with plan mode, vs. 40% without
- Block’s Goose team documented a 10-phase, 32-file refactor: 9 minutes research, 4 minutes plan, 39 minutes implement. 52 minutes total. Build passed. Code review agent had zero comments.
These aren’t huge sample sizes. But the direction is consistent across sources: when the task is complex enough that a single prompt can’t hold all the context, splitting into phases produces dramatically better outcomes.
The twist: the inventor threw it out
Here’s the part that nobody seems to write about, which is a shame, because it’s the most interesting part.
In late 2025 / early 2026, Dex Horthy published a follow-up talk called “Everything We Got Wrong About RPI” and quietly rebuilt the pattern. The new version has eight phases and is called QRSPI — Questions, Research, Structure, Plan, Implement, and a few review steps. Alex Lavaee’s postmortem documents the three failures that forced the rewrite:
Failure 1: Instruction budget overflow. Frontier thinking models follow about 150–200 instructions with reasonable consistency. RPI’s system prompt had grown to 85+ instructions. Add user instructions, add CLAUDE.md context, add tool descriptions, and you’re blowing past the budget. The models don’t error. They just silently skip the alignment steps you thought were guaranteed.
Failure 2: Magic words dependency. If a pattern only works when you say the right incantation — “think hard,” “ultrathink,” “research the codebase first” — the pattern is broken. Horthy’s own line: “if a tool requires magic words for basic functionality, the tool itself is broken.”
Failure 3: The plan-reading illusion. This one is the sharpest. Horthy’s words:
“Plans are persuasive artifacts by nature. LLMs are very good at producing text that reads as authoritative.”
You read the plan. It sounds good. You approve it. Nothing about reading the plan actually validates whether the technical assumptions are right. The human review loop feels like a checkpoint but doesn’t actually check anything.
This matters for anyone adopting RPI now. The pattern works. It also has known limits. If you treat “write a plan, review it, run it” as a silver bullet, you’re going to run into exactly the failures Horthy ran into.
What the pattern is actually teaching you
Strip the jargon and RPI is saying a few things that are straightforward once you hear them:
- Agents are stateless. Every new session starts from zero. Progress has to live in files, not conversations.
- Context window pressure is real. Once you push past 40–60% utilization, the model’s quality drops measurably, and you can’t feel it happening in real-time.
- Phases let you validate cheaply. Reviewing a 200-line research doc is faster and cheaper than reviewing 2000 lines of generated code.
- Written artifacts are the interface. The agent doesn’t need to remember; it needs to be able to re-read.
These four points are useful even if you never touch a three-phase pipeline. They tell you how to think about AI coding tools generally.
How to actually use this
If you’re starting from scratch, you don’t need a framework. You need three things:
A research file. Before you let the agent write code, have it read the relevant parts of your codebase and write a markdown file describing what it found. Read the file. If the file is wrong, the agent doesn’t understand your codebase yet, and nothing it writes next will be right.
A plan file. Based on the research file plus the task, have the agent write a plan in markdown. Files to create, files to modify, sequence. Read the plan. Edit the plan if it’s wrong. Don’t start implementation until the plan is something you’d be willing to hand to another developer.
A separate implementation session. Start a fresh chat. Give it the plan file. Ask it to implement it. If the plan was good, this part is mechanical. If the plan was bad, stop — go back and fix the plan.
That’s the whole pattern. No framework required. The frameworks exist because people want to automate the handoff between phases, which is reasonable, but the pattern works fine as a manual discipline.
What to avoid
A few things that come up repeatedly in the practitioner literature as pitfalls:
- Don’t use plan mode’s built-in UI for complex plans. Several practitioners have written that reviewing a multi-page plan inside the terminal UI is hard and the proceed/skip/cancel options are too coarse. Use real markdown files and edit them normally.
- Don’t do this for one-line changes. The overhead isn’t worth it. The rough rule from several blogs: if you can describe the full change in one sentence, skip the plan.
- Don’t treat the plan as a promise. It’s a starting point. Expect to come back and edit it as the implement phase surfaces issues.
- Don’t let research turn into design. The research phase describes what exists. The moment the researcher starts proposing solutions, you’re back to planning inside a too-full context window.
The connection to everything else
The RPI pattern sits in the middle of a larger shift that people are starting to call context engineering — the discipline of managing what the model has in front of it at any given moment. Context engineering is the umbrella. RPI is one shape of it. Spec-driven development is another. The Ralph loop — which we’ve written about here — is a third, very different one.
What all of them share is an acknowledgement that the bottleneck for AI coding isn’t the model’s capability. It’s the delivery of exactly the right context to the model at the right time.
Or as Horthy puts it: “Everything is context engineering.”
LazyRalph is a web UI for this kind of work — running RPI-style pipelines with visible stages, artifact viewers, and clarity gates at each step. Join the waitlist below to be the first to try it.
Frequently asked questions
What is the Research-Plan-Implement (RPI) pattern? +
Research-Plan-Implement is a three-phase workflow for AI coding tools where a research phase documents the existing codebase, a plan phase writes a detailed implementation plan, and an implement phase executes the plan — each phase running in a fresh AI context window so context rot doesn't degrade quality.
Who created the RPI pattern? +
Dex Horthy, founder of HumanLayer (YC F24), created the pattern and published the original .claude/commands/ directory in his company's public repo in 2025. The three canonical files are research_codebase.md, create_plan.md, and implement_plan.md.
Why do the phases need fresh contexts? +
LLMs degrade in quality as their context window fills. Research from Dex Horthy and practitioners shows performance drops noticeably above 40–60% utilization — what Horthy calls the 'Dumb Zone.' Running each phase in a fresh session keeps the agent sharp and avoids plan drift caused by summarized or forgotten context.
What is QRSPI and why did Dex Horthy change RPI? +
QRSPI (Questions, Research, Structure, Plan, Implement, plus review steps) is Horthy's eight-phase evolution of RPI. He rebuilt it after three failure modes surfaced: instruction budget overflow (models follow 150–200 instructions reliably; RPI's system prompt grew to 85+), magic-words dependency, and the 'plan-reading illusion' — humans rubber-stamping plans that read as authoritative but rest on wrong assumptions.
When should I use RPI versus just prompting the AI directly? +
Use RPI for complex, multi-file changes in existing codebases where the AI needs to understand the terrain before building. Skip it for changes you can describe in one sentence. The rough rule cited across practitioner blogs: if the plan would be shorter than the code, skip the plan.
How is RPI different from the Ralph Wiggum loop? +
RPI is human-driven, phase-based, and built for complex tasks in existing codebases. The Ralph Wiggum loop is agent-driven, iterative, and built for greenfield work with clear end states. Both solve the same underlying problem — context rot — from opposite directions. The people at the frontier use both, depending on the job.