Reflection and Self-Correction

The second pass that catches what the first missed

Reflection is the agent stopping to look at what it just produced before moving forward. Did the code actually compile? Does the answer address what was asked, or a slightly different question the model found easier? Self-correction loops catch a real percentage of errors before they reach anyone. The tradeoff is extra latency and cost, which is a real engineering decision you have to make deliberately rather than defaulting to either extreme.

Why it matters

Without reflection, a model that gets something wrong will often keep building on that wrong foundation. The confidence of the output does not correlate with its correctness. A bad answer delivered with certainty is not neutral, it is actively harmful because the person on the other end trusts it more than they should. Reflection is what closes that gap.

Deep Dive

Reflection is a second pass. The agent produces an output, then evaluates it against the original goal before deciding whether to move on or revise. That sounds like it should be the default. It is not. Without an explicit reflection step, a model that produces a wrong answer will typically continue from that wrong answer as if it were correct, building on it rather than catching it. I have watched Claude Code do this. The model is not confused, it just did not check. Adding a check changes the behavior significantly.

Two papers published at NeurIPS 2023 formalized different approaches to this. Reflexion, from a Princeton-led team, treats failed attempts as verbal reinforcement: instead of updating model weights, the agent writes a text summary of what went wrong and carries that forward as context on the next attempt. Self-Refine, from CMU, showed that a model could generate feedback on its own output and use that feedback to iteratively improve the result, with no external signal at all. Both demonstrated real quality improvements on coding, math, and reasoning tasks, not trivial ones.

Anthropic's Constitutional AI applies a related idea at training time rather than inference time. The model learns to critique its own outputs against a set of principles and revise them, so the reflection is baked into the model rather than added as a separate inference step. For production systems the tradeoff is concrete: a single-pass answer is cheaper and faster, a reflected answer is more reliable, and the right choice depends entirely on what a wrong answer actually costs in your specific use case. Writing a test is worth reflecting on. Generating a quick summary probably is not.

In the Wild

LangGraph Reflection Agents

DSPy

Reflexion

Go Deeper

PAPERReflexion: Language Agents with Verbal Reinforcement LearningPrinceton / arXiv · 2023 PAPERSelf-Refine: Iterative Refinement with Self-FeedbackCMU / arXiv · 2023 PAPERConstitutional AI: Harmlessness from AI FeedbackAnthropic / arXiv · 2022 ARTICLEReflection AgentsLangChain Blog · 2024 GUIDEBuilding Effective AgentsAnthropic · 2024

Related Patterns

Planning and Decomposition Evaluation and Testing Guardrails and Safety

All 26 patternsRead the blogHome