Prompt Chaining

Break complex tasks into sequential LLM calls, with gates between steps.

Prompt chaining is the simplest multi-step LLM pattern you will encounter. You split a complex task into discrete subtasks, run each one as its own LLM call, and pass the output of one step as input to the next. Between steps, you can add gates: code that validates whether the output is good enough to proceed, transforms the format, or injects additional context before the next call. The pipeline is deterministic in structure even if the content inside each step is not.

Why it matters

Most real tasks are too complex for a single LLM call to handle reliably. When you try to do everything in one prompt, the model has to juggle too many concerns at once and quality degrades. Chaining lets each call focus on exactly one thing. Failures are easier to isolate because you know which step in the sequence produced the wrong output. Testing improves for the same reason.

Deep Dive

The mental model is a Unix pipe. Each step takes input, does one thing, and produces output. The difference from a Unix pipe is that each node is a language model call rather than a binary. That means you can define the transformation in natural language, which is powerful when the transformation is hard to express as code. Summarizing a document, extracting structured fields from unstructured text, translating tone while preserving meaning: these are cases where you want an LLM doing the work, not a regex.

Gates are what separate naive chaining from production-quality chaining. A gate is code that runs between steps and decides whether to continue, retry, branch, or halt. A simple gate might check that the previous output is valid JSON before feeding it to the next step. A more sophisticated gate might call a classifier model to score output quality, then decide whether to retry with a different prompt. The gate is deterministic code, not another LLM call, which keeps your pipeline predictable. When your chain has gates, you have a controlled pipeline. Without them, you have a sequence of hopes.

The failure mode people hit first is building chains that are too long. Each step adds latency and cost. Each step also adds error surface: if step 3 produces a slightly malformed output, steps 4 through 8 may all degrade silently. A good heuristic is to ask whether a step could be collapsed into the previous one without losing reliability. If yes, collapse it. Keep chains as short as the task allows. The other failure mode is building chains where the output of each step is a long document fed wholesale to the next step. The model in step 4 does not need all of the output from step 3. Summarize, filter, or extract only what the next step actually needs.

Examples

Summarize document, then translate, then format as HTML

Extract legal clauses, then classify risk level per clause, then generate summary report

Scrape product page, then extract structured fields, then generate ad copy

Transcribe audio, then clean transcript, then extract action items

Go Deeper

ARTICLEBuilding Effective AgentsAnthropic · 2024

Related Patterns

Parallelization Evaluator-Optimizer Orchestrator-Workers

All 9 workflow patternsAgentX patternsRead the blogHome