Why AI Agent Patterns Matter in 2026
Your production AI agent just crashed mid-task. Again. The logs show it looped 47 times trying to parse a malformed API response before hitting the iteration limit. Sound familiar?
This is the reality of building AI agents without understanding the core design patterns that power every successful implementation. While GPT-4 solves only 4% of Game of 24 puzzles with standard prompting, agents using the Tree of Thoughts planning pattern achieve 74% success. The difference is not the model - it is the architecture.
AI agent design patterns are reusable architectural solutions that solve common problems in autonomous AI systems. They define how agents reason about tasks, take actions, learn from mistakes, and plan multi-step workflows. The three patterns covered in this guide - ReAct, Reflection, and Planning - power every major AI coding assistant, search engine, and automation tool shipping in 2026.
What You Will Learn
This implementation guide provides production-ready code for each pattern:
- ReAct Pattern: Full LangChain and LangGraph implementations with tool integration
- Reflection Pattern: Reflexion agent with memory persistence using vector databases
- Planning Pattern: Plan-and-Execute with adaptive replanning
- Pattern Combinations: How Claude Code and Perplexity combine patterns
- Production Deployment: Error handling, observability, and scaling strategies
All code examples are verified against January 2026 framework versions: LangChain 0.3.x, LangGraph 1.x, OpenAI Agents SDK 0.6.x, and Claude Agent SDK 1.x.
ReAct Pattern: Reasoning + Acting Implementation
ReAct is a pattern that interleaves reasoning traces with action execution, allowing the agent to think through problems step-by-step while taking actions and observing their outcomes. Introduced by Yao et al. in their 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models" (ICLR 2023), it remains the foundational pattern for tool-using AI agents.
How ReAct Works
Unlike pure chain-of-thought (reasoning only) or action-only approaches, ReAct creates a feedback loop:
- Thought: Agent reasons about the current state and what action to take next
- Action: Agent invokes a tool (search, calculate, API call, etc.)
- Observation: Agent receives the result from the tool execution
- Repeat: Loop continues until the agent has enough information to answer
ReAct Benchmark Results (Verified from Original Paper)
| Benchmark | Baseline (Act-only) | Chain-of-Thought | ReAct | ReAct + CoT |
|---|---|---|---|---|
| HotpotQA (multi-hop QA) | 29.4% | 34.3% | 34.3% | 47.8% |
| Fever (fact verification) | 56.3% | 64.1% | 71.1% | 69.7% |
| ALFWorld (decision-making) | 45% | - | 79% | - |
| WebShop (e-commerce) | 29.1% | - | 39.3% | - |
Source: Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models" - ICLR 2023
Full Python Implementation with LangChain
Here is a production-ready ReAct agent using LangChain's create_react_agent:
"""
ReAct Agent Implementation with LangChain
Requires: pip install langchain langchain-openai langchain-community
"""
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.tools import tool
from langchain_core.prompts import PromptTemplate
import os
# Set up environment
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Define custom tools
@tool
def search_wikipedia(query: str) -> str:
"""Search Wikipedia for information about a topic.
Use this tool when you need factual information about people, places, events, or concepts.
"""
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
api_wrapper = WikipediaAPIWrapper(top_k_results=2, doc_content_chars_max=1000)
wiki = WikipediaQueryRun(api_wrapper=api_wrapper)
return wiki.run(query)
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression.
Use this tool for any calculations. Input should be a valid Python math expression.
Examples: "2 + 2", "math.sqrt(16)", "15 * 7 / 3"
"""
import math
try:
result = eval(expression, {"__builtins__": {}, "math": math})
return str(result)
except Exception as e:
return f"Error evaluating expression: {e}"
# Initialize tools and model
tools = [search_wikipedia, calculate]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# ReAct prompt template
react_prompt = PromptTemplate.from_template('''Answer the following questions as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}''')
# Create ReAct agent
agent = create_react_agent(llm, tools, react_prompt)
# Wrap in executor with error handling
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Show reasoning trace
max_iterations=10, # Prevent infinite loops
handle_parsing_errors=True,
return_intermediate_steps=True # For debugging
)
# Execute agent
def run_react_agent(question: str):
"""Run the ReAct agent and return structured response."""
try:
result = agent_executor.invoke({"input": question})
return {
"answer": result["output"],
"steps": result.get("intermediate_steps", []),
"success": True
}
except Exception as e:
return {
"answer": None,
"error": str(e),
"success": False
}
# Example usage
if __name__ == "__main__":
question = "What is the population of Paris, and what is the square root of that number?"
result = run_react_agent(question)
print(f"\nFinal Answer: {result['answer']}")
ReAct with LangGraph (Full Control)
For maximum flexibility, implement ReAct from scratch with LangGraph's state graph:
"""
ReAct Agent Implementation with LangGraph (from scratch)
Requires: pip install langgraph langchain-openai
"""
from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage, ToolMessage, SystemMessage, HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
import json
# Define agent state
class AgentState(TypedDict):
"""State maintained across the agent's reasoning loop."""
messages: Annotated[Sequence[BaseMessage], add_messages]
iteration_count: int
# Define tools
@tool
def search_web(query: str) -> str:
"""Search the web for current information."""
# Replace with real search API in production
return f"Search results for '{query}': [Relevant information here]"
@tool
def run_code(code: str) -> str:
"""Execute Python code and return the result."""
try:
local_vars = {}
exec(code, {"__builtins__": __builtins__}, local_vars)
return str(local_vars.get('result', 'Code executed successfully'))
except Exception as e:
return f"Error: {e}"
# Initialize model with tools
tools = [search_web, run_code]
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
model_with_tools = model.bind_tools(tools)
tools_by_name = {tool.name: tool for tool in tools}
# Define graph nodes
def call_model(state: AgentState) -> dict:
"""Call the LLM with current state."""
system_message = SystemMessage(content="""You are a helpful AI assistant that uses tools to answer questions.
For each question:
1. Think about what information you need
2. Use available tools to gather that information
3. Synthesize the observations into a clear answer""")
messages = [system_message] + list(state["messages"])
response = model_with_tools.invoke(messages)
return {
"messages": [response],
"iteration_count": state["iteration_count"] + 1
}
def execute_tools(state: AgentState) -> dict:
"""Execute tool calls from the last message."""
last_message = state["messages"][-1]
outputs = []
for tool_call in last_message.tool_calls:
tool_name = tool_call["name"]
tool_result = tools_by_name[tool_name].invoke(tool_call["args"])
outputs.append(
ToolMessage(
content=tool_result if isinstance(tool_result, str) else json.dumps(tool_result),
name=tool_name,
tool_call_id=tool_call["id"]
)
)
return {"messages": outputs}
def should_continue(state: AgentState) -> str:
"""Determine if agent should continue or stop."""
last_message = state["messages"][-1]
if state["iteration_count"] >= 10:
return "end"
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "continue"
return "end"
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", execute_tools)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue, {"continue": "tools", "end": END})
workflow.add_edge("tools", "agent")
graph = workflow.compile()
When to Use ReAct
Ideal for:
- External tool integration (search, databases, APIs)
- Multi-step problem solving requiring intermediate data
- Research tasks with web search
- Interactive debugging scenarios
- Tasks where transparency of reasoning is important
Avoid when:
- Simple direct questions (overhead not justified)
- Latency-critical applications (<500ms required)
- Tasks requiring long-term planning with dependencies
- High-accuracy requirements where reflection helps
Production Examples Using ReAct
| Product | How ReAct is Used |
|---|---|
| GitHub Copilot Chat | Agent Mode for multi-file edits with RAG + ReAct loop |
| Perplexity AI | Search + reasoning + citation in "Pro Search" mode |
| Claude Code | Tool use with terminal, files, LSP integration |
| ChatGPT Plugins | Function calling loop with observation handling |
Reflection Pattern: Self-Improving Agents
The Reflection pattern adds a self-evaluation layer where agents critique their own outputs, checking for accuracy, verifying constraints, and identifying logical gaps. The breakthrough Reflexion paper by Shinn et al. (NeurIPS 2023) demonstrated that agents can learn through linguistic feedback, improving performance without weight updates.
The Reflexion Architecture
Reflexion implements a three-component system:
- Actor: The LLM that generates solutions (attempts the task)
- Evaluator: Assesses the solution quality (pass/fail with feedback)
- Self-Reflection: Generates verbal feedback on failures, stored in memory
The key insight is that reflections are stored in episodic memory and retrieved for future attempts, enabling the agent to learn from past mistakes.
Reflection Benchmark Results
| Benchmark | GPT-4 Baseline | Reflexion | Improvement |
|---|---|---|---|
| HumanEval (Python) | 80.0% pass@1 | 91.0% | +11% |
| HumanEval (Rust) | 40.6% pass@1 | 55.9% | +15.3% |
| ALFWorld | 24% (2 trials) | 97% (12 trials) | +73% |
| HotpotQA | 31.0% | 51.0% | +20% |
Source: Shinn et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning" - NeurIPS 2023
Full Reflexion Agent Implementation
"""
Reflexion Agent Implementation with LangGraph
Based on Shinn et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning"
"""
from typing import TypedDict, List, Optional, Annotated
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
# State definition
class ReflexionState(TypedDict):
"""State for the Reflexion agent."""
task: str
current_solution: Optional[str]
evaluation: Optional[str]
reflections: List[str] # Memory of past reflections
attempts: int
messages: Annotated[List[BaseMessage], add_messages]
# Initialize models (different models for different roles)
actor_model = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
evaluator_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
reflector_model = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
MAX_ATTEMPTS = 5
# Actor Node: Generate solution
def actor_node(state: ReflexionState) -> dict:
"""Generate or improve a solution based on task and past reflections."""
reflection_context = ""
if state["reflections"]:
reflection_context = "\n\nPrevious attempts and learnings:\n"
for i, reflection in enumerate(state["reflections"], 1):
reflection_context += f"\nAttempt {i} Reflection:\n{reflection}\n"
prompt = f"""You are an expert problem solver. Generate a solution for the following task.
Task: {state["task"]}
{reflection_context}
Based on any previous reflections, generate an improved solution.
Focus on avoiding past mistakes and incorporating lessons learned.
Provide your solution:"""
response = actor_model.invoke([HumanMessage(content=prompt)])
return {
"current_solution": response.content,
"attempts": state["attempts"] + 1,
"messages": [AIMessage(content=f"Attempt {state['attempts'] + 1}:\n{response.content}")]
}
# Evaluator Node: Assess solution quality
def evaluator_node(state: ReflexionState) -> dict:
"""Evaluate the current solution and determine if it passes."""
prompt = f"""You are a strict evaluator. Assess if this solution correctly solves the task.
Task: {state["task"]}
Solution:
{state["current_solution"]}
Evaluate:
1. Is the solution correct and complete?
2. Are there any bugs, errors, or missing elements?
3. Does it fully address the task requirements?
Respond with exactly one of:
- PASS: [brief explanation]
- FAIL: [specific issues that need to be fixed]"""
response = evaluator_model.invoke([HumanMessage(content=prompt)])
return {
"evaluation": response.content,
"messages": [AIMessage(content=f"Evaluation: {response.content}")]
}
# Self-Reflection Node: Generate improvement insights
def reflection_node(state: ReflexionState) -> dict:
"""Generate verbal reflection on why the solution failed."""
prompt = f"""You are a thoughtful self-reflector. The solution failed evaluation.
Task: {state["task"]}
Failed Solution:
{state["current_solution"]}
Evaluation Feedback:
{state["evaluation"]}
Generate a reflection that:
1. Identifies the specific mistakes made
2. Explains WHY these mistakes occurred
3. Provides concrete strategies to avoid them in the next attempt
4. Suggests specific improvements to make
Your reflection (be specific and actionable):"""
response = reflector_model.invoke([HumanMessage(content=prompt)])
updated_reflections = state["reflections"] + [response.content]
return {
"reflections": updated_reflections,
"messages": [AIMessage(content=f"Reflection: {response.content}")]
}
# Routing logic
def should_continue(state: ReflexionState) -> str:
if state["evaluation"] and state["evaluation"].startswith("PASS"):
return "end"
if state["attempts"] >= MAX_ATTEMPTS:
return "end"
return "reflect"
# Build the Reflexion graph
workflow = StateGraph(ReflexionState)
workflow.add_node("actor", actor_node)
workflow.add_node("evaluator", evaluator_node)
workflow.add_node("reflection", reflection_node)
workflow.set_entry_point("actor")
workflow.add_edge("actor", "evaluator")
workflow.add_conditional_edges("evaluator", should_continue, {"reflect": "reflection", "end": END})
workflow.add_edge("reflection", "actor") # Loop back for retry
reflexion_graph = workflow.compile()
Memory System for Reflection Agents
For production systems, persist reflections in a vector database for semantic retrieval:
"""
Persistent Memory System for Reflexion using Chroma
"""
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
class ReflectionMemory:
"""Persistent memory for storing and retrieving reflections."""
def __init__(self, persist_directory: str = "./reflexion_memory"):
self.embeddings = OpenAIEmbeddings()
self.vectorstore = Chroma(
collection_name="reflections",
embedding_function=self.embeddings,
persist_directory=persist_directory
)
def store_reflection(self, task: str, reflection: str, success: bool):
"""Store a reflection with metadata."""
doc = Document(
page_content=reflection,
metadata={"task": task, "success": success}
)
self.vectorstore.add_documents([doc])
def retrieve_relevant_reflections(self, task: str, k: int = 3) -> list:
"""Retrieve reflections similar to the current task."""
docs = self.vectorstore.similarity_search(task, k=k)
return [
{"reflection": doc.page_content, "task": doc.metadata.get("task")}
for doc in docs
]
Use Cases for Reflection Pattern
| Use Case | Why Reflection Helps | Expected Improvement |
|---|---|---|
| Code Generation | Catches bugs through self-review | +10-15% pass@1 |
| Creative Writing | Iterative quality improvement | Better coherence |
| Complex Reasoning | Validates logical chains | +20% accuracy |
| Test Generation | Improves coverage through reflection | 65% to 85% coverage |
Planning Pattern: Goal-Oriented Decomposition
Plan-and-execute agents separate planning from execution, achieving 92% task accuracy compared to 85% for ReAct patterns. This pattern is essential for complex multi-step workflows where understanding the full task structure upfront leads to better outcomes.
Planning Frameworks Comparison
| Framework | Architecture | Key Innovation |
|---|---|---|
| BabyAGI | Task Creation > Prioritization > Execution | Three-agent task loop |
| AutoGPT | Recursive goal decomposition | Self-prompted planning |
| LangGraph PlanAndExecute | Planner + Executor + Replanner | Adaptive replanning |
| Tree of Thoughts | Branch exploration + backtracking | 74% on Game of 24 |
| ReWOO | Plan-first, execute-all | 80% token reduction |
Tree of Thoughts Benchmark Results
Tree of Thoughts demonstrates the power of deliberate planning:
| Task | Chain-of-Thought | Tree of Thoughts | Improvement |
|---|---|---|---|
| Game of 24 | 4% success | 74% success | +70% |
| Creative Writing | 6.19 coherency | 7.67 coherency | +24% |
| Mini Crosswords | <2% solved | 20% solved | +18% |
Source: Yao et al. (2023) "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" - NeurIPS 2023
Plan-and-Execute Implementation with LangGraph
"""
Plan-and-Execute Agent Implementation with LangGraph
Based on Plan-and-Solve paper and BabyAGI project
"""
from typing import TypedDict, List, Optional
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from pydantic import BaseModel, Field
import json
# Pydantic models for structured output
class Task(BaseModel):
"""A single task in the plan."""
id: int = Field(description="Unique task identifier")
description: str = Field(description="What needs to be done")
dependencies: List[int] = Field(default=[], description="IDs of tasks this depends on")
status: str = Field(default="pending")
result: Optional[str] = Field(default=None)
class Plan(BaseModel):
"""A complete plan with tasks."""
goal: str = Field(description="The overall goal")
tasks: List[Task] = Field(description="Ordered list of tasks")
# State definition
class PlanExecuteState(TypedDict):
goal: str
plan: Optional[Plan]
current_task_idx: int
completed_results: List[str]
final_answer: Optional[str]
# Initialize models
planner_model = ChatOpenAI(model="gpt-4o", temperature=0) # Strong planner
executor_model = ChatOpenAI(model="gpt-4o-mini", temperature=0) # Efficient executor
# Planner Node
def planner_node(state: PlanExecuteState) -> dict:
"""Create a plan to achieve the goal."""
prompt = f"""Create a detailed step-by-step plan to achieve this goal.
Goal: {state["goal"]}
Requirements:
1. Break down into discrete, actionable tasks
2. Order tasks logically (dependencies first)
3. Each task should be independently executable
Return as JSON:
{{"goal": "the goal", "tasks": [{{"id": 1, "description": "task", "dependencies": []}}]}}"""
response = planner_model.invoke([HumanMessage(content=prompt)])
# Parse the plan
content = response.content
if "```json" in content:
content = content.split("```json")[1].split("```")[0]
plan_data = json.loads(content)
plan = Plan(**plan_data)
return {"plan": plan, "current_task_idx": 0}
# Executor Node
def executor_node(state: PlanExecuteState) -> dict:
"""Execute the current task."""
plan = state["plan"]
task_idx = state["current_task_idx"]
if task_idx >= len(plan.tasks):
return {}
current_task = plan.tasks[task_idx]
context = ""
if state["completed_results"]:
context = "\n\nPrevious results:\n"
for i, result in enumerate(state["completed_results"]):
context += f"Task {i+1}: {result[:200]}...\n"
prompt = f"""Execute this task:
Goal: {plan.goal}
Current Task: {current_task.description}
{context}
Provide a thorough result:"""
response = executor_model.invoke([HumanMessage(content=prompt)])
current_task.status = "completed"
current_task.result = response.content
updated_results = state["completed_results"] + [response.content]
return {
"plan": plan,
"current_task_idx": task_idx + 1,
"completed_results": updated_results
}
# Synthesizer Node
def synthesizer_node(state: PlanExecuteState) -> dict:
"""Synthesize final answer from all task results."""
plan = state["plan"]
prompt = f"""Synthesize these results into a final answer:
Goal: {plan.goal}
Task Results:
{json.dumps([{"task": t.description, "result": t.result} for t in plan.tasks], indent=2)}
Final Answer:"""
response = executor_model.invoke([HumanMessage(content=prompt)])
return {"final_answer": response.content}
# Routing logic
def should_continue_execution(state: PlanExecuteState) -> str:
if state["current_task_idx"] >= len(state["plan"].tasks):
return "synthesize"
return "execute"
# Build the graph
workflow = StateGraph(PlanExecuteState)
workflow.add_node("planner", planner_node)
workflow.add_node("executor", executor_node)
workflow.add_node("synthesizer", synthesizer_node)
workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges("executor", should_continue_execution, {"execute": "executor", "synthesize": "synthesizer"})
workflow.add_edge("synthesizer", END)
plan_execute_graph = workflow.compile()
ReAct vs Plan-and-Execute Comparison
| Metric | ReAct | Plan-and-Execute |
|---|---|---|
| Response Time | ~2-5s (faster) | ~5-15s |
| Token Usage | 2000-3000 | 3000-4500 |
| Task Accuracy | 85% | 92% |
| API Calls | 3-5 | 5-8 |
| Cost per Task | $0.06-0.09 | $0.09-0.14 |
Choose Plan-and-Execute when: Complex multi-step tasks with dependencies, high-accuracy requirements (financial analysis), long-term planning scenarios, tasks requiring strategic decision-making.
Choose ReAct when: Simple direct objectives, real-time interactive scenarios, cost-sensitive applications, quick responses needed.
Pattern Combinations in Production
Modern production systems rarely use a single pattern. Here is how major AI products combine patterns for optimal results.
Claude Code Architecture
Claude Code combines Reflection and Planning patterns:
- Plan Mode Check: For complex requests, forces planning first
- Planning Phase: Goal decomposition and task prioritization
- ReAct Execution: For each task - thought, action (LSP, terminal, files), observation
- Reflection Check: After key milestones, self-critique and approach updates
Perplexity AI Architecture (200M queries/day)
Perplexity uses ReAct + Multi-Agent:
- Query Analysis: Understand user intent
- Plan Generation: For Pro Search mode
- Retrieval Agent: Search stack execution
- Synthesis Agent: GPT-5/Claude 4.5 for answer generation
- Verification Agent: Citation checking and grounding
Pattern Selection Decision Tree
Use this logic to select the right pattern combination:
- Is the task simple and direct? YES: Use Direct Prompting (no agent needed)
- Does quality matter more than speed? YES: Add Reflection pattern
- Is the task genuinely complex with dependencies? YES: Consider Planning pattern
- Are multiple specialized skills needed? YES: Use Multi-Agent system
- Default: ReAct with appropriate tools
Combined Pattern Implementation
Here is an adaptive agent that selects patterns based on task complexity:
"""
Combined Pattern Agent: Adaptive Pattern Selection
"""
from typing import TypedDict, List, Optional, Literal
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
class CombinedState(TypedDict):
query: str
complexity: Literal["simple", "medium", "complex"]
plan: Optional[List[str]]
current_step: int
react_history: List[dict]
reflections: List[str]
final_answer: Optional[str]
# Complexity classifier
def classify_complexity(state: CombinedState) -> dict:
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = f"""Classify complexity of this task:
Task: {state["query"]}
Levels:
- SIMPLE: Direct answer, no tools, single step
- MEDIUM: Requires tools, 2-3 steps
- COMPLEX: Multi-step, dependencies, requires planning
Respond with one word: SIMPLE, MEDIUM, or COMPLEX"""
response = model.invoke([{"role": "user", "content": prompt}])
complexity = response.content.strip().upper()
return {"complexity": complexity.lower() if complexity in ["SIMPLE", "MEDIUM", "COMPLEX"] else "medium"}
# Route based on complexity
def route_by_complexity(state: CombinedState) -> str:
if state["complexity"] == "simple":
return "direct"
elif state["complexity"] == "complex":
return "plan"
else:
return "react"
Production Deployment Guide
Error Handling Best Practices
"""
Production-grade error handling for AI agents
"""
from tenacity import retry, stop_after_attempt, wait_exponential
import logging
logger = logging.getLogger(__name__)
class AgentError(Exception):
"""Base exception for agent errors."""
pass
class MaxIterationsError(AgentError):
"""Agent exceeded maximum iterations."""
pass
class ReasoningLoopError(AgentError):
"""Agent stuck in reasoning loop."""
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def safe_tool_call(tool, args):
"""Execute tool with retry logic."""
try:
return tool.invoke(args)
except Exception as e:
logger.warning(f"Tool {tool.name} failed: {e}")
raise
def detect_reasoning_loop(history: list, window: int = 3) -> bool:
"""Detect if agent is stuck repeating the same actions."""
if len(history) < window * 2:
return False
recent = history[-window:]
previous = history[-window*2:-window]
recent_actions = [h.get("action") for h in recent]
previous_actions = [h.get("action") for h in previous]
return recent_actions == previous_actions
Observability Setup
"""
Structured tracing for AI agents
"""
from dataclasses import dataclass, asdict
from datetime import datetime
import json
@dataclass
class AgentTrace:
trace_id: str
timestamp: str
pattern: str # "react", "reflection", "planning"
input_query: str
steps: list
tools_called: list
tokens_used: int
latency_ms: int
success: bool
error: str = None
def create_trace(trace_id, pattern, query, steps, tools, tokens, latency, success, error=None):
trace = AgentTrace(
trace_id=trace_id,
timestamp=datetime.utcnow().isoformat(),
pattern=pattern,
input_query=query,
steps=steps,
tools_called=tools,
tokens_used=tokens,
latency_ms=latency,
success=success,
error=error
)
return asdict(trace)
Cost and Performance Comparison
| Pattern | Avg Latency | Token Overhead | Cost/Query |
|---|---|---|---|
| Direct Prompting | 1-2s | Baseline | $0.01-0.02 |
| ReAct (3 steps) | 5-10s | +200-300% | $0.06-0.09 |
| Reflection (2 iter) | 8-15s | +100-200% | $0.08-0.12 |
| Plan-and-Execute | 10-20s | +300-400% | $0.12-0.18 |
| Combined (all) | 15-30s | +500-600% | $0.15-0.25 |
Framework Recommendations (January 2026)
| Use Case | Recommended Framework | Why |
|---|---|---|
| General agents | LangGraph | Flexible, production-ready |
| OpenAI-native | OpenAI Agents SDK | Best GPT integration |
| Anthropic-native | Claude Agent SDK | MCP support, tool use |
| Multi-agent | CrewAI or AutoGen | Role-based collaboration |
| Simple prototyping | LangChain AgentExecutor | Quick start |
Getting Started: Your Implementation Roadmap
Step-by-Step Implementation Checklist
- Start with ReAct: Implement a basic ReAct agent with 2-3 tools. This handles 80% of use cases and builds foundational understanding.
- Add Observability: Implement structured tracing from day one. You cannot debug what you cannot observe.
- Implement Loop Detection: Add maximum iteration limits and action pattern detection to prevent infinite loops.
- Add Reflection for Quality: Once ReAct works, add a reflection step for code generation or high-stakes outputs.
- Implement Planning for Complexity: For multi-step workflows, add plan-and-execute with adaptive replanning.
- Combine Patterns: Use complexity classification to route requests to appropriate pattern combinations.
Common Pitfalls and Solutions
| Pitfall | Solution |
|---|---|
| Agent loops forever | Set max_iterations, implement loop detection, add circuit breakers |
| Tool errors crash agent | Use retry with exponential backoff, handle_parsing_errors=True |
| Costs spiral out of control | Route by complexity, use smaller models for execution |
| Reflection does not improve quality | Use different models for actor/evaluator, be specific in evaluation prompts |
| Plans are too vague | Use stronger model for planning (GPT-4o), require specific task descriptions |
Required Dependencies
# Core frameworks
pip install langchain langchain-openai langgraph
# Vector database for memory
pip install chromadb
# Error handling
pip install tenacity
# Optional: Alternative frameworks
pip install crewai autogen openai-agents-sdk
Next Steps
Ready to implement these patterns in your own projects? Here is what to do next:
- Clone the code examples from the LangGraph tutorials: https://langchain-ai.github.io/langgraph/tutorials/
- Read the original papers for deeper understanding of the theory
- Subscribe to our YouTube channel for video walkthroughs of these implementations
Frequently Asked Questions
What is the ReAct pattern in AI agents?
ReAct (Reasoning and Acting) is an agent pattern that interleaves reasoning traces with action execution in a Thought-Action-Observation loop. The agent thinks about what to do, takes an action using external tools, observes the result, and uses that information for the next reasoning step. ReAct achieves 47.8% accuracy on HotpotQA multi-hop QA tasks versus 29.4% baseline.
How does the Reflection pattern improve AI agent performance?
The Reflection pattern enables agents to critique and improve their own outputs through self-evaluation. After generating an initial response, the agent assesses it for accuracy, identifies gaps, and iteratively refines the output. Reflexion agents achieve 91% pass@1 on HumanEval coding benchmarks versus GPT-4's baseline of 80%, without any fine-tuning.
When should I use Plan-and-Execute instead of ReAct?
Use Plan-and-Execute for complex multi-step tasks with dependencies, high-accuracy requirements, and long-term planning scenarios. It achieves 92% task accuracy versus 85% for ReAct but costs 2x more in API calls. ReAct is better for simple objectives requiring quick responses and real-time interactive scenarios.
What frameworks support AI agent patterns in 2026?
LangGraph (LangChain) is the leading framework with native support for ReAct, Reflection, and Plan-and-Execute patterns. OpenAI Agents SDK 0.6.x, Claude Agent SDK 1.x, CrewAI, and AutoGen also provide production-ready implementations. LangGraph offers the most flexible graph-based architecture for custom agent workflows.
How do production AI systems combine these patterns?
Real production systems combine 2-3 patterns for optimal results. Perplexity uses ReAct plus Multi-Agent architecture for search with separate retrieval, synthesis, and verification agents. Claude Code uses Reflection plus Planning with a plan mode that forces architectural thinking before execution. GitHub Copilot Chat uses ReAct with RAG for multi-file code edits.
What benchmarks prove these patterns work?
Verified benchmarks from academic papers show: ReAct achieves 47.8% on HotpotQA (vs 29.4% baseline), Reflexion reaches 91% on HumanEval (vs GPT-4's 80%), Tree of Thoughts solves 74% of Game of 24 puzzles (vs 4% for chain-of-thought), and Plan-and-Execute achieves 92% task accuracy (vs 85% for ReAct alone).
How do I detect and prevent reasoning loops in AI agents?
Implement loop detection by tracking the last N actions in a sliding window and comparing for repeated patterns. Set maximum iteration limits (typically 10-15 iterations). Use exponential backoff with retry logic for tool failures. Monitor token usage and implement circuit breakers that fall back to simpler patterns when agents exceed thresholds.
What is the cost difference between AI agent patterns?
Direct prompting costs approximately $0.01-0.02 per query. ReAct with 3 steps costs $0.06-0.09 with 200-300% token overhead. Reflection with 2 iterations costs $0.08-0.12. Plan-and-Execute costs $0.12-0.18 with 300-400% token overhead. Combined patterns can cost $0.15-0.25 per query with 500-600% token overhead.
Which pattern should I start with for my first AI agent?
Start with the ReAct pattern for your first AI agent. It is the most widely understood, battle-tested, and suitable for 80% of use cases. ReAct provides a good balance of capability and simplicity. Once you have ReAct working, add Reflection for quality-critical tasks like code generation, or Planning for complex multi-step workflows.
How do I implement memory persistence for Reflection agents?
Use a vector database like Chroma or Pinecone to store reflections with embeddings for semantic similarity search. Store metadata including the original task, success/failure status, and timestamp. Retrieve relevant reflections by similarity to the current task, filtering by outcome type. This enables agents to learn from past failures without weight updates.
Conclusion
The three AI agent design patterns covered in this guide - ReAct, Reflection, and Planning - represent the architectural foundation of every major AI coding assistant, search engine, and automation tool shipping in 2026.
ReAct grounds reasoning in tool observations, achieving 47.8% accuracy on multi-hop QA versus 29.4% baseline. Reflection enables self-improvement that reaches 91% on HumanEval, surpassing GPT-4's 80%. Planning unlocks complex problem-solving, with Tree of Thoughts achieving 74% on puzzles that chain-of-thought solves only 4% of the time.
Production systems like Claude Code, Perplexity, and GitHub Copilot combine these patterns for optimal results. The key is not choosing a single pattern, but understanding when to apply each and how to combine them effectively.
Start with ReAct for your first agent. Add Reflection for quality-critical outputs. Use Planning for complex multi-step workflows. Implement observability from day one. Your production AI agents will thank you.