Introduction
Most LLM systems are fundamentally reactive: you ask a question, they generate an answer, and that’s it. If the first answer is wrong, there’s no self-correction. If the task requires multiple steps, there’s no iteration. If results don’t meet expectations, there’s no refinement.
The React Loop changes this paradigm entirely. It transforms a static, one-shot LLM system into a dynamic, iterative agent that can:
- Sense its environment and gather context
- Reason about what actions to take
- Act by executing tools and generating responses
- Observe the results of its actions
- Evaluate whether it succeeded or needs to try again
- Learn from outcomes to improve future iterations
The core insight:
Real-world problems rarely have single-shot solutions. They require iteration, feedback, and refinement—exactly what the React Loop provides.
Think of it as:
- A closed-loop control system for AI agents
- The OODA loop (Observe-Orient-Decide-Act) for LLMs
- Test-driven development for AI reasoning
- Continuous integration for agent decision-making
Why this matters:
Traditional approaches are fundamentally limited:
| Single-Shot LLM | RAG System | React Loop Agent |
|---|---|---|
| Generate once, hope it’s right | Retrieve once, generate | Iterate until successful |
| No error recovery | No retrieval refinement | Self-correcting |
| Linear execution | Static retrieval | Dynamic planning |
| Not auditable | Limited observability | Full traceability |
When React Loops shine:
- Complex multi-step tasks (system debugging, data analysis)
- Uncertain environments (incomplete information, changing requirements)
- Quality-critical outputs (code generation, research synthesis)
- Multi-tool orchestration (file operations, API calls, database queries)
What you’ll learn:
- The six-layer architecture of React Loops
- Three production implementation patterns (planning, RAG, multi-agent)
- Loop control logic and termination conditions
- Memory integration for cross-iteration consistency
- Safety, observability, and failure mode handling
- Complete Python implementation with real examples
This guide takes you from understanding the fundamentals to deploying production-grade React Loop agents that can solve complex problems autonomously.
1. Why the React Loop Matters
The limitations of traditional LLM workflows
Traditional LLM systems suffer from fundamental architectural constraints:
1. Linear execution
# Traditional approach: One-shot generation
query = "Debug why the authentication service is failing"
response = llm.generate(query)
# If response is incomplete or wrong, you're stuck
Problem: No mechanism to gather more information, try different approaches, or refine the answer.
2. Single-shot responses
# Traditional RAG
query = "What caused the production outage yesterday?"
context = vector_db.search(query, top_k=5)
answer = llm.generate(f"Context: {context}\nQuestion: {query}")
# What if the retrieved context doesn't contain the root cause?
# What if you need to search logs, then metrics, then recent deployments?
# Traditional system: Give up or return incomplete answer
Problem: Cannot iterate on retrieval or explore multiple sources.
3. Hard to debug
# Traditional system
result = black_box_llm(query)
# Why did it produce this output?
# What information did it consider?
# Where did reasoning go wrong?
# → No visibility into decision process
Problem: Opaque reasoning, no audit trail, impossible to debug failures.
4. Poor at iterative problem-solving
# Traditional approach to complex task
query = "Find all security vulnerabilities in this codebase and fix them"
response = llm.generate(query)
# Requires:
# 1. Understand codebase structure
# 2. Identify vulnerability patterns
# 3. Check each file
# 4. Generate fixes
# 5. Verify fixes don't break functionality
# → Too complex for single-shot generation
Problem: Multi-step tasks require planning, execution, validation, and refinement—impossible in one shot.
What the React Loop solves
1. Iteration over actions
# React Loop approach
loop = ReactLoop(max_iterations=10)
iteration = 0
while not loop.is_complete():
# Try, evaluate, refine, repeat
action = loop.plan_next_action()
result = loop.execute(action)
evaluation = loop.evaluate(result)
if evaluation.success:
break
else:
loop.refine_approach(evaluation.feedback)
Benefit: Can try multiple approaches until success.
2. Feedback incorporation
# Iteration 1
action = "Search logs for 'error'"
result = "Found 1000 errors"
evaluation = "Too broad, need to filter"
# Iteration 2 (refined based on feedback)
action = "Search logs for 'authentication error' in last hour"
result = "Found 3 relevant errors"
evaluation = "Specific enough, proceed with analysis"
Benefit: Each iteration improves based on previous results.
3. Dynamic planning
# React Loop adjusts plan based on what it discovers
initial_plan = ["Check service health", "Review logs", "Analyze metrics"]
# After checking service health, discovers unexpected issue
revised_plan = [
"Investigate database connection pool exhaustion", # New priority
"Check recent configuration changes",
"Review resource utilization"
]
Benefit: Adapts strategy as new information emerges.
4. Self-monitoring
# React Loop tracks its own progress
loop_state = {
"iterations": 3,
"confidence": 0.65, # Not yet confident
"actions_taken": ["search_logs", "query_database", "check_config"],
"findings": ["connection_timeout", "pool_exhausted"],
"next_action": "investigate_pool_configuration"
}
# Can make meta-decisions: "I'm not making progress, try different approach"
Benefit: Awareness of progress, ability to change strategy when stuck.
The result: Autonomous, auditable agents
Autonomous:
# Agent handles complex task end-to-end
task = "Find and fix the production authentication bug"
agent = ReactLoopAgent(task)
result = agent.run()
# Agent internally:
# - Checked logs (iteration 1)
# - Found error pattern (iteration 2)
# - Traced to configuration (iteration 3)
# - Generated fix (iteration 4)
# - Validated fix (iteration 5)
# - Deployed (iteration 6)
Auditable:
# Full audit trail of decision process
print(agent.get_trace())
"""
Iteration 1: Searched logs for authentication errors
Action: search_logs(pattern='auth', time_range='1h')
Result: Found 47 errors
Evaluation: Too many, need to narrow down
Iteration 2: Filtered to critical errors
Action: search_logs(pattern='auth AND severity:critical')
Result: Found 3 errors, all from auth-service
Evaluation: Specific, proceed to root cause
Iteration 3: Analyzed auth-service configuration
Action: read_config('auth-service')
Result: Found misconfigured JWT secret rotation
Evaluation: Root cause identified
...
"""
2. Mental Model
The six-phase control loop
Think of the React Loop as a closed-loop control system for AI agents:
┌─────────────────────────────────────────────────────────┐
│ React Loop Cycle │
└─────────────────────────────────────────────────────────┘
┌──────────────┐
│ Perception │ ← Sense environment, gather context
└──────┬───────┘
↓
┌──────────────┐
│ Reasoning │ ← Plan next action, prioritize
└──────┬───────┘
↓
┌──────────────┐
│ Action │ ← Execute tools, generate output
└──────┬───────┘
↓
┌──────────────┐
│ Observation │ ← Gather results, logs, metrics
└──────┬───────┘
↓
┌──────────────┐
│ Evaluation │ ← Check success, identify issues
└──────┬───────┘
↓
┌──────────────┐
│Memory Update │ ← Store learnings, update state
└──────┬───────┘
↓
↺ (Repeat until complete or max iterations)
The six layers explained
Layer 1: Perception
- Purpose: Understand current state
- Inputs: User query, environment state, memory, tool outputs
- Outputs: Normalized context for reasoning
Layer 2: Reasoning
- Purpose: Decide what to do next
- Inputs: Perception context, goal, past actions
- Outputs: Structured action plan
Layer 3: Action
- Purpose: Execute decisions
- Inputs: Action plan from reasoning
- Outputs: Tool results, generated responses
Layer 4: Observation
- Purpose: Gather execution results
- Inputs: Action outputs, logs, metrics
- Outputs: Structured observations
Layer 5: Evaluation
- Purpose: Assess progress toward goal
- Inputs: Observations, goal, success criteria
- Outputs: Success/failure signal, confidence score, feedback
Layer 6: Memory Update
- Purpose: Learn from iteration
- Inputs: Actions taken, results, evaluation
- Outputs: Updated memory state
Concrete example: Debugging workflow
# Task: "Find why API response time increased"
# Iteration 1
Perception: User reports slow API, no other context
Reasoning: Need to gather metrics first
Action: query_metrics(service='api', metric='latency', time_range='24h')
Observation: Latency increased from 100ms to 800ms at 2pm
Evaluation: Found when, but not why. Need to investigate further.
Memory Update: Store timestamp of latency spike
# Iteration 2
Perception: Latency spike at 2pm, need to find cause
Reasoning: Check what changed around 2pm
Action: query_deployment_log(time_range='1pm-3pm')
Observation: Deployment of v2.3.1 at 1:55pm
Evaluation: Correlation found, but need to confirm causation
Memory Update: Store deployment as potential cause
# Iteration 3
Perception: Deployment v2.3.1 correlates with latency increase
Reasoning: Compare v2.3.1 changes to v2.3.0
Action: git_diff(from='v2.3.0', to='v2.3.1')
Observation: Added synchronous database call in hot path
Evaluation: Root cause identified! Synchronous call blocks requests.
Memory Update: Store root cause: synchronous DB call added
# Iteration 4
Perception: Root cause is synchronous DB call
Reasoning: Need to fix by making it asynchronous
Action: generate_fix(issue='synchronous_db_call', approach='async')
Observation: Generated code patch using async/await
Evaluation: Fix looks correct, need to validate
Memory Update: Store proposed fix
# Iteration 5
Perception: Have proposed fix, need validation
Reasoning: Run tests to ensure fix works
Action: run_tests(patch='async_db_fix.patch')
Observation: All tests pass, latency reduced to 120ms in staging
Evaluation: Fix validated! Ready to deploy.
Memory Update: Mark task complete
Key insight: Each iteration builds on previous learnings, progressively narrowing toward the solution.
3. Core Components of a React Loop
3.1 Perception Layer
Purpose: Collect and normalize all relevant context for decision-making.
Implementation:
class PerceptionLayer:
def __init__(self, memory: AgentMemory, tools: ToolRegistry):
self.memory = memory
self.tools = tools
async def perceive(
self,
user_input: str,
iteration: int
) -> PerceptionContext:
"""
Gather all context needed for reasoning
"""
# 1. User input (current query or feedback)
normalized_input = self.normalize_input(user_input)
# 2. Memory (what agent knows)
working_memory = self.memory.get_working_memory()
long_term_memory = self.memory.relevant_memories(user_input)
# 3. Environment state (tool availability, system status)
available_tools = self.tools.list_available()
system_state = self.get_system_state()
# 4. Past actions (what's been tried)
action_history = self.memory.get_action_history()
return PerceptionContext(
input=normalized_input,
working_memory=working_memory,
long_term_memory=long_term_memory,
available_tools=available_tools,
system_state=system_state,
action_history=action_history,
iteration=iteration
)
Example perception context:
perception = PerceptionContext(
input="Find security vulnerabilities in authentication code",
working_memory={
"task": "security_audit",
"target": "auth_service",
"files_checked": ["auth.py", "jwt.py"],
"findings": ["weak_password_hash"]
},
long_term_memory=[
"User prefers detailed security reports",
"Previous audit found XSS vulnerabilities"
],
available_tools=[
"read_file", "grep", "static_analysis", "run_tests"
],
system_state={
"codebase": "/app/auth",
"language": "python"
},
action_history=[
{"action": "read_file", "file": "auth.py", "result": "success"},
{"action": "static_analysis", "tool": "bandit", "result": "3 issues"}
],
iteration=2
)
3.2 Reasoning Layer
Purpose: Generate the next action plan based on current context.
Implementation:
class ReasoningLayer:
def __init__(self, llm: LLM):
self.llm = llm
async def reason(
self,
context: PerceptionContext
) -> ActionPlan:
"""
Decide what to do next
"""
prompt = f"""
You are an autonomous agent working on a task.
Current situation:
Task: {context.input}
Iteration: {context.iteration}
Working memory: {context.working_memory}
Past actions: {context.action_history}
Available tools: {context.available_tools}
Based on this context, what should be the next action?
Respond in JSON format:
{{
"reasoning": "Why this action makes sense",
"action": "tool_name",
"parameters": {{}},
"expected_outcome": "What we expect to learn/achieve",
"confidence": 0.0-1.0
}}
"""
response = await self.llm.generate(prompt)
plan = ActionPlan.from_json(response)
return plan
Example reasoning output:
plan = ActionPlan(
reasoning="Found weak password hashing in auth.py. Need to check if JWT implementation has timing vulnerabilities.",
action="static_analysis",
parameters={
"tool": "semgrep",
"rules": "jwt-security",
"target": "jwt.py"
},
expected_outcome="Identify timing attack vulnerabilities in JWT validation",
confidence=0.85
)
3.3 Action Layer
Purpose: Execute the planned action safely and return results.
Implementation:
class ActionLayer:
def __init__(self, tools: ToolRegistry, safety: SafetyChecker):
self.tools = tools
self.safety = safety
async def execute(
self,
plan: ActionPlan
) -> ActionResult:
"""
Execute action with safety checks
"""
# 1. Pre-execution safety check
if not self.safety.is_safe(plan):
return ActionResult(
success=False,
error="Action blocked by safety policy",
details=self.safety.explain_block(plan)
)
# 2. Execute tool
try:
tool = self.tools.get(plan.action)
result = await tool.execute(**plan.parameters)
return ActionResult(
success=True,
data=result,
execution_time=result.elapsed_ms,
logs=result.logs
)
except Exception as e:
return ActionResult(
success=False,
error=str(e),
exception_type=type(e).__name__
)
Example action execution:
# Action: Run security analysis
action_result = ActionResult(
success=True,
data={
"tool": "semgrep",
"findings": [
{
"rule": "jwt-timing-attack",
"severity": "high",
"file": "jwt.py",
"line": 42,
"message": "JWT signature comparison vulnerable to timing attacks"
}
]
},
execution_time=1250, # ms
logs=["Scanned jwt.py", "Applied 15 security rules", "Found 1 high severity issue"]
)
3.4 Observation Layer
Purpose: Standardize action results for evaluation.
Implementation:
class ObservationLayer:
async def observe(
self,
action: ActionPlan,
result: ActionResult
) -> Observation:
"""
Structure results for evaluation
"""
return Observation(
action_taken=action.action,
parameters=action.parameters,
success=result.success,
data=result.data,
error=result.error if not result.success else None,
execution_time=result.execution_time,
logs=result.logs,
timestamp=datetime.utcnow()
)
3.5 Evaluation Layer
Purpose: Assess whether the action moved us closer to the goal.
Implementation:
class EvaluationLayer:
def __init__(self, llm: LLM):
self.llm = llm
async def evaluate(
self,
goal: str,
observation: Observation,
memory: AgentMemory
) -> Evaluation:
"""
Determine if we're making progress
"""
prompt = f"""
Goal: {goal}
Action taken: {observation.action_taken}
Parameters: {observation.parameters}
Result: {observation.data}
Success: {observation.success}
Previous findings: {memory.get_findings()}
Evaluate:
1. Did this action help achieve the goal?
2. What did we learn?
3. Should we continue, refine approach, or stop?
4. Confidence in current progress (0-1)
Respond in JSON format:
{{
"progress_made": true/false,
"learnings": ["key insight 1", "key insight 2"],
"next_step": "continue" / "refine" / "stop",
"confidence": 0.0-1.0,
"feedback": "What to do differently next time"
}}
"""
response = await self.llm.generate(prompt)
evaluation = Evaluation.from_json(response)
return evaluation
Example evaluation:
evaluation = Evaluation(
progress_made=True,
learnings=[
"JWT implementation has timing attack vulnerability on line 42",
"Using string comparison instead of constant-time comparison"
],
next_step="continue", # Found issue, now need to generate fix
confidence=0.90,
feedback="Found critical security issue. Next: generate fix using constant-time comparison."
)
3.6 Memory Update Layer
Purpose: Store learnings and update state for next iteration.
Implementation:
class MemoryUpdateLayer:
def __init__(self, memory: AgentMemory):
self.memory = memory
async def update(
self,
action: ActionPlan,
observation: Observation,
evaluation: Evaluation
) -> None:
"""
Update agent memory with new information
"""
# 1. Update action history
self.memory.add_action(
action=action.action,
parameters=action.parameters,
result=observation.data,
success=observation.success
)
# 2. Store learnings
for learning in evaluation.learnings:
self.memory.add_learning(learning)
# 3. Update working memory
if evaluation.progress_made:
self.memory.update_working_memory(
findings=evaluation.learnings,
confidence=evaluation.confidence
)
# 4. Store for long-term if significant
if evaluation.confidence > 0.8:
self.memory.store_long_term(
task_type="security_audit",
finding=evaluation.learnings,
confidence=evaluation.confidence
)
4. Loop Control Logic
4.1 Iteration Rules
Critical: React Loops must have bounded execution to prevent runaway behavior.
Implementation:
class LoopController:
def __init__(
self,
max_iterations: int = 10,
success_threshold: float = 0.9,
timeout_seconds: int = 300
):
self.max_iterations = max_iterations
self.success_threshold = success_threshold
self.timeout_seconds = timeout_seconds
self.start_time = None
def should_continue(
self,
iteration: int,
evaluation: Evaluation
) -> Tuple[bool, str]:
"""
Determine if loop should continue
Returns: (should_continue, reason)
"""
# 1. Check max iterations
if iteration >= self.max_iterations:
return False, f"Max iterations reached ({self.max_iterations})"
# 2. Check success threshold
if evaluation.confidence >= self.success_threshold:
return False, f"Success threshold met (confidence: {evaluation.confidence:.2%})"
# 3. Check timeout
if self.start_time:
elapsed = time.time() - self.start_time
if elapsed > self.timeout_seconds:
return False, f"Timeout reached ({elapsed:.1f}s)"
# 4. Check evaluation signal
if evaluation.next_step == "stop":
return False, "Agent decided to stop"
# 5. Check repeated failures
if self._detect_stagnation(evaluation):
return False, "No progress in last 3 iterations"
# Continue iterating
return True, "Continuing iteration"
def _detect_stagnation(self, evaluation: Evaluation) -> bool:
"""Detect if agent is stuck"""
# Track last N evaluations
recent_evaluations = self.evaluation_history[-3:]
if len(recent_evaluations) < 3:
return False
# All recent iterations made no progress
no_progress = all(not e.progress_made for e in recent_evaluations)
# Confidence not improving
confidence_flat = (
recent_evaluations[-1].confidence - recent_evaluations[0].confidence < 0.05
)
return no_progress and confidence_flat
Example usage:
controller = LoopController(
max_iterations=10,
success_threshold=0.90,
timeout_seconds=300 # 5 minutes
)
iteration = 0
while True:
# Execute iteration
evaluation = agent.iterate()
# Check if should continue
should_continue, reason = controller.should_continue(iteration, evaluation)
if not should_continue:
print(f"Loop terminating: {reason}")
break
iteration += 1
4.2 Decision Criteria
Factors that influence loop termination:
1. Confidence levels
class ConfidenceBasedControl:
def __init__(self):
self.confidence_thresholds = {
"critical_task": 0.95, # High stakes, need high confidence
"exploratory": 0.70, # Discovery mode, lower threshold
"routine": 0.85 # Standard tasks
}
def should_stop(
self,
task_type: str,
current_confidence: float
) -> bool:
threshold = self.confidence_thresholds.get(task_type, 0.85)
return current_confidence >= threshold
2. Task completeness
class TaskCompletenessChecker:
def __init__(self, goal: str):
self.goal = goal
self.required_steps = self.decompose_goal(goal)
self.completed_steps = set()
def is_complete(self) -> bool:
"""Check if all required steps are done"""
return self.completed_steps >= set(self.required_steps)
def mark_complete(self, step: str):
"""Mark step as completed"""
if step in self.required_steps:
self.completed_steps.add(step)
3. Resource limits
class ResourceLimits:
def __init__(
self,
max_cost_usd: float = 1.0,
max_api_calls: int = 50,
max_time_seconds: int = 600
):
self.max_cost = max_cost_usd
self.max_api_calls = max_api_calls
self.max_time = max_time_seconds
self.current_cost = 0.0
self.current_api_calls = 0
self.start_time = time.time()
def check_limits(self) -> Tuple[bool, Optional[str]]:
"""Returns (within_limits, violation_message)"""
if self.current_cost > self.max_cost:
return False, f"Cost limit exceeded (${self.current_cost:.2f})"
if self.current_api_calls > self.max_api_calls:
return False, f"API call limit exceeded ({self.current_api_calls})"
elapsed = time.time() - self.start_time
if elapsed > self.max_time:
return False, f"Time limit exceeded ({elapsed:.1f}s)"
return True, None
def record_call(self, cost: float):
"""Record API call and cost"""
self.current_cost += cost
self.current_api_calls += 1
5. Implementation Patterns
Pattern 1: Agentic Planning Loop
Use case: Complex tasks requiring dynamic planning and execution.
Architecture:
class AgenticPlanningLoop:
def __init__(self, llm: LLM, tools: ToolRegistry):
self.llm = llm
self.tools = tools
self.memory = AgentMemory()
async def run(self, goal: str) -> Result:
"""
Execute agentic planning loop
"""
# Initialize
self.memory.set_goal(goal)
iteration = 0
max_iterations = 10
while iteration < max_iterations:
# 1. Perceive current state
context = await self.perceive(goal, iteration)
# 2. Generate plan for next action
plan = await self.plan(context)
# 3. Check plan feasibility
if not self.is_feasible(plan):
plan = await self.revise_plan(plan, "infeasible")
# 4. Execute action
result = await self.execute(plan)
# 5. Evaluate outcome
evaluation = await self.evaluate(goal, result)
# 6. Update memory
await self.update_memory(plan, result, evaluation)
# 7. Check if done
if evaluation.task_complete:
return Result(
success=True,
output=evaluation.final_output,
trace=self.memory.get_trace()
)
# 8. Revise plan if needed
if not evaluation.on_track:
await self.revise_strategy(evaluation.feedback)
iteration += 1
# Max iterations reached
return Result(
success=False,
error="Max iterations reached without completing goal",
trace=self.memory.get_trace()
)
async def plan(self, context: Context) -> Plan:
"""Generate next action plan"""
prompt = f"""
Goal: {context.goal}
Current progress: {context.progress}
Available tools: {context.tools}
Past actions: {context.history}
What should be the next action to achieve the goal?
Provide plan in JSON: {{"action": "...", "parameters": {{...}}, "reasoning": "..."}}
"""
response = await self.llm.generate(prompt)
return Plan.from_json(response)
def is_feasible(self, plan: Plan) -> bool:
"""Check if plan can be executed"""
# Check tool availability
if plan.action not in self.tools.available():
return False
# Check parameter validity
tool = self.tools.get(plan.action)
if not tool.validate_parameters(plan.parameters):
return False
# Check resource constraints
if not self.resources.can_afford(plan.estimated_cost):
return False
return True
async def revise_plan(self, plan: Plan, reason: str) -> Plan:
"""Revise infeasible plan"""
prompt = f"""
Original plan: {plan}
Issue: {reason}
Provide alternative plan that addresses the issue.
"""
response = await self.llm.generate(prompt)
return Plan.from_json(response)
Example execution:
# Task: "Analyze security vulnerabilities and generate fixes"
loop = AgenticPlanningLoop(llm=gpt4, tools=security_tools)
result = await loop.run("Find and fix all security vulnerabilities in auth module")
# Trace shows:
# Iteration 1: Plan: scan_codebase() → Found 5 files
# Iteration 2: Plan: static_analysis(tool='bandit') → Found 3 issues
# Iteration 3: Plan: analyze_issue(issue_id=1) → SQL injection risk
# Iteration 4: Plan: generate_fix(issue_id=1) → Parameterized queries
# Iteration 5: Plan: test_fix(fix_id=1) → Tests pass
# Iteration 6: Plan: analyze_issue(issue_id=2) → XSS vulnerability
# ...
Pattern 2: RAG React Loop
Use case: Iterative retrieval and answer generation.
Architecture:
class RAGReactLoop:
def __init__(
self,
llm: LLM,
vector_store: VectorStore,
graph_store: Optional[GraphStore] = None
):
self.llm = llm
self.vector_store = vector_store
self.graph_store = graph_store
async def run(self, query: str) -> Answer:
"""
Iteratively retrieve and generate until confident
"""
iteration = 0
max_iterations = 5
retrieved_context = []
while iteration < max_iterations:
# 1. Retrieve relevant information
if iteration == 0:
# Initial semantic search
docs = await self.vector_store.search(query, top_k=5)
else:
# Refined search based on gaps
refined_query = self.refine_query(query, gaps)
docs = await self.vector_store.search(refined_query, top_k=5)
retrieved_context.extend(docs)
# 2. Generate answer from context
answer = await self.generate_answer(query, retrieved_context)
# 3. Evaluate quality
evaluation = await self.evaluate_answer(query, answer, retrieved_context)
# 4. Check if good enough
if evaluation.confidence > 0.85 and not evaluation.has_gaps:
return Answer(
text=answer,
sources=retrieved_context,
confidence=evaluation.confidence,
iterations=iteration + 1
)
# 5. Identify gaps and iterate
gaps = evaluation.identified_gaps
# Optional: Use graph traversal for missing info
if self.graph_store and gaps:
graph_docs = await self.graph_traverse(gaps)
retrieved_context.extend(graph_docs)
iteration += 1
# Return best effort
return Answer(
text=answer,
sources=retrieved_context,
confidence=evaluation.confidence,
iterations=iteration,
warning="Max iterations reached, answer may be incomplete"
)
def refine_query(self, original_query: str, gaps: List[str]) -> str:
"""Reformulate query to address gaps"""
prompt = f"""
Original query: {original_query}
Information gaps: {gaps}
Reformulate the query to better retrieve missing information.
"""
return self.llm.generate(prompt)
async def evaluate_answer(
self,
query: str,
answer: str,
context: List[Doc]
) -> Evaluation:
"""Evaluate answer quality and identify gaps"""
prompt = f"""
Query: {query}
Generated answer: {answer}
Retrieved context: {context}
Evaluate:
1. Does the answer fully address the query?
2. Is the answer grounded in the retrieved context?
3. What information is missing (gaps)?
4. Confidence level (0-1)?
Return JSON: {{"confidence": 0.8, "has_gaps": false, "identified_gaps": []}}
"""
response = await self.llm.generate(prompt)
return Evaluation.from_json(response)
Pattern 3: Multi-Agent Loops
Use case: Complex tasks requiring coordination of specialized agents.
Architecture:
class MultiAgentLoop:
def __init__(
self,
master: LLM,
sub_agents: Dict[str, Agent]
):
self.master = master
self.sub_agents = sub_agents
async def run(self, task: str) -> Result:
"""
Master coordinates sub-agents to complete task
"""
# 1. Master decomposes task
subtasks = await self.decompose_task(task)
# 2. Assign subtasks to specialized agents
assignments = self.assign_subtasks(subtasks)
# 3. Execute sub-agents in parallel or sequence
results = {}
for agent_name, subtask in assignments.items():
agent = self.sub_agents[agent_name]
result = await agent.run(subtask)
results[agent_name] = result
# 4. Master evaluates results
evaluation = await self.evaluate_results(task, results)
# 5. If incomplete, iterate
if not evaluation.complete:
# Identify what's missing
missing = evaluation.missing_elements
# Re-assign or refine
refined_assignments = self.refine_assignments(missing)
additional_results = await self.execute_agents(refined_assignments)
# Merge results
results.update(additional_results)
# 6. Master synthesizes final output
final = await self.synthesize(task, results)
return Result(
success=True,
output=final,
sub_results=results
)
async def decompose_task(self, task: str) -> List[Subtask]:
"""Master decomposes complex task"""
prompt = f"""
Task: {task}
Decompose into subtasks that can be handled by specialized agents:
- Code analysis agent
- Research agent
- Testing agent
- Documentation agent
Return JSON list of subtasks with assigned agent.
"""
response = await self.master.generate(prompt)
return [Subtask.from_json(s) for s in response]
def assign_subtasks(self, subtasks: List[Subtask]) -> Dict[str, Subtask]:
"""Map subtasks to agents"""
assignments = {}
for subtask in subtasks:
agent_name = subtask.assigned_agent
if agent_name in self.sub_agents:
assignments[agent_name] = subtask
return assignments
Example multi-agent execution:
# Task: "Conduct full security audit of web application"
master = GPT4()
sub_agents = {
"code_analyzer": CodeAnalysisAgent(),
"penetration_tester": PenTestAgent(),
"compliance_checker": ComplianceAgent(),
"report_generator": ReportAgent()
}
loop = MultiAgentLoop(master, sub_agents)
result = await loop.run("Security audit of web app")
# Master decomposes:
# 1. Static code analysis → code_analyzer
# 2. Dynamic pen testing → penetration_tester
# 3. Compliance verification → compliance_checker
# 4. Report generation → report_generator
# Each agent runs its own React Loop
# Master synthesizes results into final audit report
6. Memory & State Integration
Memory types in React Loops
1. Working Memory (Current Task)
class WorkingMemory:
"""Tracks current task state"""
def __init__(self):
self.goal = None
self.current_step = None
self.completed_steps = []
self.findings = []
self.confidence = 0.0
def update(self, step: str, result: Any, confidence: float):
"""Update after each iteration"""
self.completed_steps.append(step)
if result:
self.findings.append(result)
self.confidence = max(self.confidence, confidence)
def get_state(self) -> Dict:
"""Get current state for next iteration"""
return {
"goal": self.goal,
"progress": f"{len(self.completed_steps)} steps completed",
"findings": self.findings,
"confidence": self.confidence
}
2. Long-Term Memory (Knowledge Base)
class LongTermMemory:
"""Stores accumulated knowledge"""
def __init__(self, vector_db: VectorStore):
self.vector_db = vector_db
async def store(self, knowledge: str, metadata: Dict):
"""Store for future retrieval"""
await self.vector_db.insert(
text=knowledge,
metadata={
**metadata,
"timestamp": datetime.utcnow(),
"task_type": metadata.get("task_type")
}
)
async def recall(self, query: str) -> List[str]:
"""Retrieve relevant past knowledge"""
results = await self.vector_db.search(query, top_k=3)
return [r.text for r in results]
3. Procedural Memory (How-To Knowledge)
class ProceduralMemory:
"""Stores learned procedures"""
def __init__(self):
self.procedures = {}
def learn_procedure(self, task_type: str, steps: List[str]):
"""Learn successful procedure"""
self.procedures[task_type] = steps
def get_procedure(self, task_type: str) -> Optional[List[str]]:
"""Retrieve known procedure"""
return self.procedures.get(task_type)
Integrated memory system:
class IntegratedMemory:
def __init__(self):
self.working = WorkingMemory()
self.long_term = LongTermMemory(vector_db)
self.procedural = ProceduralMemory()
async def prepare_context(self, task: str) -> MemoryContext:
"""Gather all relevant memory for iteration"""
return MemoryContext(
working_state=self.working.get_state(),
relevant_knowledge=await self.long_term.recall(task),
known_procedures=self.procedural.get_procedure(task)
)
Why memory matters:
# Without memory: Each iteration starts from scratch
# Iteration 1: Finds issue A
# Iteration 2: Forgets about A, finds issue B
# Iteration 3: Finds issue A again (duplicate work)
# With memory: Cumulative progress
# Iteration 1: Finds issue A, stores in memory
# Iteration 2: Remembers A, finds issue B, stores both
# Iteration 3: Remembers A and B, finds issue C
# → Builds comprehensive understanding
7. Safety, Logging, and Observability
7.1 Structured Logging
Implementation:
class ReactLoopLogger:
def __init__(self):
self.iterations = []
def log_iteration(
self,
iteration: int,
perception: PerceptionContext,
reasoning: ActionPlan,
action_result: ActionResult,
evaluation: Evaluation
):
"""Log complete iteration"""
self.iterations.append({
"iteration": iteration,
"timestamp": datetime.utcnow().isoformat(),
"perception": {
"input": perception.input,
"working_memory": perception.working_memory,
"available_tools": perception.available_tools
},
"reasoning": {
"reasoning": reasoning.reasoning,
"action": reasoning.action,
"parameters": reasoning.parameters,
"confidence": reasoning.confidence
},
"action": {
"success": action_result.success,
"execution_time_ms": action_result.execution_time,
"error": action_result.error
},
"evaluation": {
"progress_made": evaluation.progress_made,
"confidence": evaluation.confidence,
"next_step": evaluation.next_step
}
})
def export_trace(self) -> str:
"""Export human-readable trace"""
trace = []
for it in self.iterations:
trace.append(f"""
Iteration {it['iteration']}:
Reasoning: {it['reasoning']['reasoning']}
Action: {it['reasoning']['action']}({it['reasoning']['parameters']})
Success: {it['action']['success']}
Progress: {it['evaluation']['progress_made']}
Confidence: {it['evaluation']['confidence']:.2%}
""")
return "\n".join(trace)
7.2 Safety Mechanisms
Pre-execution safety checks:
class SafetyChecker:
def __init__(self, policies: List[SafetyPolicy]):
self.policies = policies
def is_safe(self, plan: ActionPlan) -> Tuple[bool, Optional[str]]:
"""Check if action is safe to execute"""
for policy in self.policies:
is_safe, reason = policy.check(plan)
if not is_safe:
return False, f"Blocked by {policy.name}: {reason}"
return True, None
# Example policies
class NoDestructiveActions(SafetyPolicy):
def check(self, plan: ActionPlan) -> Tuple[bool, Optional[str]]:
destructive = ["delete", "drop", "rm", "remove"]
if any(word in plan.action.lower() for word in destructive):
return False, "Destructive actions not allowed"
return True, None
class RateLimitPolicy(SafetyPolicy):
def __init__(self, max_per_minute: int = 10):
self.max_per_minute = max_per_minute
self.call_times = deque()
def check(self, plan: ActionPlan) -> Tuple[bool, Optional[str]]:
now = time.time()
# Remove calls older than 1 minute
while self.call_times and self.call_times[0] < now - 60:
self.call_times.popleft()
if len(self.call_times) >= self.max_per_minute:
return False, f"Rate limit exceeded ({self.max_per_minute}/min)"
self.call_times.append(now)
return True, None
7.3 Observability
Metrics to track:
class ReactLoopMetrics:
def __init__(self):
self.metrics = {
"total_iterations": 0,
"successful_completions": 0,
"timeout_failures": 0,
"max_iteration_failures": 0,
"avg_iterations_to_complete": 0.0,
"avg_confidence": 0.0,
"action_distribution": {},
"evaluation_trends": []
}
def record_completion(
self,
iterations: int,
final_confidence: float,
actions_used: List[str]
):
"""Record successful completion"""
self.metrics["successful_completions"] += 1
self.metrics["total_iterations"] += iterations
# Update averages
total_completions = self.metrics["successful_completions"]
self.metrics["avg_iterations_to_complete"] = (
self.metrics["total_iterations"] / total_completions
)
# Track action usage
for action in actions_used:
self.metrics["action_distribution"][action] = (
self.metrics["action_distribution"].get(action, 0) + 1
)
def get_dashboard(self) -> Dict:
"""Get metrics for monitoring dashboard"""
return {
"success_rate": (
self.metrics["successful_completions"] /
(self.metrics["successful_completions"] +
self.metrics["timeout_failures"] +
self.metrics["max_iteration_failures"])
if self.metrics["successful_completions"] > 0 else 0.0
),
"avg_iterations": self.metrics["avg_iterations_to_complete"],
"most_used_actions": sorted(
self.metrics["action_distribution"].items(),
key=lambda x: x[1],
reverse=True
)[:5]
}
8. Failure Modes & Mitigation
Common failure modes
1. Infinite loops (runaway iteration)
Problem:
# Agent gets stuck repeating same action
Iteration 1: search_logs("error")
Iteration 2: search_logs("error") # Same action
Iteration 3: search_logs("error") # Still same
# → Never makes progress
Mitigation:
class InfiniteLoopDetector:
def __init__(self, window_size: int = 3):
self.action_history = deque(maxlen=window_size)
def detect(self, action: str) -> bool:
"""Detect if repeating same action"""
self.action_history.append(action)
if len(self.action_history) == self.action_history.maxlen:
# All recent actions are identical
return len(set(self.action_history)) == 1
return False
def intervene(self) -> str:
"""Suggest different approach"""
return "Detected repeated action. Try alternative approach."
2. Action errors (tool failures)
Problem:
# Tool fails but agent doesn't handle it properly
result = execute_tool("query_database", invalid_params)
# Tool crashes, agent has no recovery strategy
Mitigation:
class ActionErrorHandler:
async def execute_with_recovery(
self,
action: ActionPlan
) -> ActionResult:
"""Execute with error handling and retry"""
max_retries = 3
for attempt in range(max_retries):
try:
# Pre-condition check
if not self.validate_preconditions(action):
return ActionResult(
success=False,
error="Preconditions not met"
)
# Execute
result = await self.execute(action)
# Post-condition check
if not self.validate_postconditions(result):
return ActionResult(
success=False,
error="Postconditions not met"
)
return result
except Exception as e:
if attempt == max_retries - 1:
# Final attempt failed
return ActionResult(
success=False,
error=f"Failed after {max_retries} attempts: {e}"
)
# Retry with exponential backoff
await asyncio.sleep(2 ** attempt)
3. Hallucinated evaluations (false confidence)
Problem:
# Agent thinks it succeeded but actually failed
evaluation = Evaluation(
progress_made=True, # Wrong!
confidence=0.95, # Overconfident!
task_complete=True # False positive
)
Mitigation:
class EvaluationVerifier:
async def verify_evaluation(
self,
claimed_evaluation: Evaluation,
actual_results: ActionResult,
memory: AgentMemory
) -> Evaluation:
"""Verify evaluation against ground truth"""
# 1. Check if claimed success matches actual result
if claimed_evaluation.progress_made but not actual_results.success:
# Hallucination detected
return Evaluation(
progress_made=False,
confidence=0.0,
feedback="Action failed, progress claim incorrect"
)
# 2. Check against memory for consistency
if self.contradicts_memory(claimed_evaluation, memory):
return Evaluation(
progress_made=False,
confidence=0.0,
feedback="Evaluation contradicts known facts"
)
# 3. If verifiable, check against external source
if self.can_verify_externally(claimed_evaluation):
verified = await self.external_verification(claimed_evaluation)
if not verified:
claimed_evaluation.confidence *= 0.5 # Reduce confidence
return claimed_evaluation
4. Resource exhaustion (cost/time blowup)
Problem:
# Agent keeps calling expensive APIs
for i in range(100): # Uncontrolled loop
expensive_api_call() # $1 per call
# → $100 bill
Mitigation:
class ResourceBudget:
def __init__(self, max_cost: float = 10.0):
self.max_cost = max_cost
self.spent = 0.0
self.action_costs = {
"llm_call": 0.01,
"vector_search": 0.001,
"api_call": 1.0
}
def can_afford(self, action: str) -> bool:
"""Check if action is within budget"""
cost = self.action_costs.get(action, 0.0)
return (self.spent + cost) <= self.max_cost
def charge(self, action: str):
"""Deduct cost"""
cost = self.action_costs.get(action, 0.0)
self.spent += cost
def prioritize_actions(
self,
possible_actions: List[ActionPlan]
) -> List[ActionPlan]:
"""Return actions sorted by cost-effectiveness"""
return sorted(
possible_actions,
key=lambda a: self.action_costs.get(a.action, float('inf'))
)
9. Production Architecture Example
Complete production-ready React Loop
from typing import Optional
import asyncio
class ProductionReactLoop:
"""
Complete production React Loop implementation
"""
def __init__(
self,
llm: LLM,
tools: ToolRegistry,
memory: AgentMemory,
safety: SafetyChecker,
logger: ReactLoopLogger,
metrics: ReactLoopMetrics
):
# Core components
self.perception = PerceptionLayer(memory, tools)
self.reasoning = ReasoningLayer(llm)
self.action = ActionLayer(tools, safety)
self.observation = ObservationLayer()
self.evaluation = EvaluationLayer(llm)
self.memory_update = MemoryUpdateLayer(memory)
# Control & monitoring
self.controller = LoopController(
max_iterations=10,
success_threshold=0.90,
timeout_seconds=300
)
self.safety = safety
self.logger = logger
self.metrics = metrics
# State
self.memory = memory
self.iteration = 0
async def run(self, goal: str) -> Result:
"""
Execute complete React Loop
"""
# Initialize
self.memory.set_goal(goal)
self.controller.start()
try:
while True:
# 1. PERCEPTION: Gather context
context = await self.perception.perceive(
user_input=goal,
iteration=self.iteration
)
# 2. REASONING: Plan next action
plan = await self.reasoning.reason(context)
# 3. SAFETY: Check if action is safe
is_safe, block_reason = self.safety.is_safe(plan)
if not is_safe:
self.logger.log_safety_block(plan, block_reason)
break
# 4. ACTION: Execute plan
action_result = await self.action.execute(plan)
# 5. OBSERVATION: Collect results
observation = await self.observation.observe(plan, action_result)
# 6. EVALUATION: Assess progress
evaluation = await self.evaluation.evaluate(
goal=goal,
observation=observation,
memory=self.memory
)
# 7. MEMORY UPDATE: Store learnings
await self.memory_update.update(plan, observation, evaluation)
# 8. LOGGING: Record iteration
self.logger.log_iteration(
iteration=self.iteration,
perception=context,
reasoning=plan,
action_result=action_result,
evaluation=evaluation
)
# 9. CONTROL: Check if should continue
should_continue, reason = self.controller.should_continue(
iteration=self.iteration,
evaluation=evaluation
)
if not should_continue:
# Loop terminating
return Result(
success=(evaluation.confidence >= self.controller.success_threshold),
output=self.memory.get_findings(),
reason=reason,
trace=self.logger.export_trace(),
iterations=self.iteration + 1,
confidence=evaluation.confidence
)
self.iteration += 1
except Exception as e:
# Error handling
self.logger.log_error(e)
return Result(
success=False,
error=str(e),
trace=self.logger.export_trace(),
iterations=self.iteration
)
finally:
# Record metrics
self.metrics.record_completion(
iterations=self.iteration,
final_confidence=evaluation.confidence,
actions_used=self.logger.get_actions()
)
# Usage example
async def main():
# Setup
llm = GPT4()
tools = ToolRegistry()
tools.register("search_logs", SearchLogsTool())
tools.register("query_database", DatabaseTool())
tools.register("read_file", FileReadTool())
memory = AgentMemory()
safety = SafetyChecker(policies=[
NoDestructiveActions(),
RateLimitPolicy(max_per_minute=10)
])
logger = ReactLoopLogger()
metrics = ReactLoopMetrics()
# Create loop
loop = ProductionReactLoop(llm, tools, memory, safety, logger, metrics)
# Execute
result = await loop.run("Debug authentication service failures")
# Output
print(f"Success: {result.success}")
print(f"Iterations: {result.iterations}")
print(f"Confidence: {result.confidence:.2%}")
print(f"\nTrace:\n{result.trace}")
Architecture diagram
┌─────────────────────────────────────────────────────────────┐
│ Production React Loop │
└─────────────────────────────────────────────────────────────┘
User Input (Goal)
↓
┌──────────────────┐
│ Loop Controller │ ← Max iterations, timeout, success threshold
└────────┬─────────┘
↓
┌─────────────────────────────────────────────────────┐
│ ITERATION CYCLE │
│ │
│ ┌─────────────┐ │
│ │ 1. Perceive │ ← Gather context from memory/tools│
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 2. Reason │ ← LLM plans next action │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 3. Safety │ ← Check action is safe │
│ │ Check │ │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 4. Act │ ← Execute tool/generate response │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 5. Observe │ ← Collect results & logs │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 6. Evaluate │ ← LLM assesses progress │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 7. Update │ ← Store in memory │
│ │ Memory │ │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 8. Log │ ← Record for audit trail │
│ └──────┬──────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ 9. Control │ ← Check termination conditions │
│ │ Decision │ │
│ └──────┬──────┘ │
│ │ │
│ ┌────┴────┐ │
│ │ Done? │ │
│ └─┬────┬──┘ │
│ │Yes │No │
│ │ └─────────┐ │
│ │ ↓ │
│ │ Next Iteration │
│ │ ↑ │
│ │ └────────────────────┐ │
│ ↓ │ │
└──────────────────────────────────────────│─────────┘
│ │
↓ ↺
Final Result
- Success/Failure
- Output
- Trace
- Metrics
Parallel Components:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │
└──────────────┘ └──────────────┘ └──────────────┘
↓ ↓ ↓
└──────────────────┴──────────────────┘
↓
Master Aggregator
10. Implementation Best Practices
Production checklist
1. Use structured outputs (JSON) for actions
# Bad: Unstructured text
action = "Search logs for errors"
# Good: Structured JSON
action = {
"action": "search_logs",
"parameters": {
"pattern": "error",
"time_range": "1h",
"max_results": 100
},
"expected_outcome": "Identify error patterns"
}
2. Enforce hard limits
class SafeReactLoop:
def __init__(self):
# MANDATORY limits
self.max_iterations = 10 # Prevent infinite loops
self.max_cost_usd = 1.0 # Prevent cost blowup
self.timeout_seconds = 300 # 5 minute max
self.max_tool_calls = 50 # Prevent API abuse
# Enforce in controller
self.controller = LoopController(
max_iterations=self.max_iterations,
timeout_seconds=self.timeout_seconds
)
3. Store complete loop history
class AuditableReactLoop:
def log_complete_history(self):
"""Store everything for debugging"""
return {
"goal": self.goal,
"iterations": [
{
"iteration": i,
"perception": {...}, # What agent saw
"reasoning": {...}, # Why it chose this action
"action": {...}, # What it did
"observation": {...}, # What happened
"evaluation": {...} # How it assessed results
}
for i in range(self.iteration)
],
"final_result": self.result,
"metrics": {
"total_time": self.elapsed_time,
"total_cost": self.total_cost,
"iterations": self.iteration,
"success": self.result.success
}
}
4. Use adaptive iteration based on confidence
class AdaptiveController:
def should_continue(self, evaluation: Evaluation) -> bool:
"""Adapt based on confidence trajectory"""
# High confidence → stop early
if evaluation.confidence > 0.95:
return False
# Confidence improving → continue
recent_confidences = self.confidence_history[-3:]
if self.is_improving(recent_confidences):
return True
# Confidence flat → try different approach or stop
if self.is_stagnant(recent_confidences):
return False
return True
5. Separate reasoning from execution
# Good: Clear separation
class SeparatedLoop:
async def run(self):
# Reasoning (LLM) - stateless, pure function
plan = await self.llm.generate(prompt)
# Validation - deterministic check
if not self.validator.is_valid(plan):
plan = await self.llm.revise(plan, "Invalid format")
# Execution - controlled environment
result = await self.sandboxed_executor.execute(plan)
6. Integrate long-term memory
class MemoryIntegratedLoop:
async def prepare_context(self, goal: str):
"""Pull from long-term memory"""
# Recall similar past tasks
similar_tasks = await self.long_term_memory.search(
query=goal,
top_k=3
)
# Recall learned procedures
procedure = self.procedural_memory.get_procedure(
task_type=classify_task(goal)
)
return {
"goal": goal,
"past_learnings": similar_tasks,
"known_procedure": procedure
}
7. Use hybrid retrieval for better grounding
class HybridRetrievalLoop:
async def retrieve_context(self, query: str):
"""Combine vector + graph + keyword search"""
# Semantic search
vector_results = await self.vector_db.search(query)
# Graph traversal for relationships
if entities := extract_entities(query):
graph_results = await self.graph_db.traverse(entities)
else:
graph_results = []
# Keyword fallback for exact matches
keyword_results = await self.keyword_search(query)
# Merge and deduplicate
return merge_results([
vector_results,
graph_results,
keyword_results
])
8. Maintain audit trails
class CompliantLoop:
def __init__(self):
self.audit_log = []
def log_decision(
self,
iteration: int,
decision: str,
reasoning: str,
outcome: str
):
"""Log for compliance/debugging"""
entry = {
"timestamp": datetime.utcnow().isoformat(),
"iteration": iteration,
"decision": decision,
"reasoning": reasoning,
"outcome": outcome,
"confidence": self.current_confidence
}
self.audit_log.append(entry)
# Persist to secure storage
self.audit_store.append(entry)
11. When to Use the React Loop
Use React Loops when:
1. Complex, multi-step tasks
# Perfect for React Loop
tasks = [
"Debug production outage and implement fix",
"Research competitor features and generate PRD",
"Analyze codebase for security vulnerabilities",
"Generate comprehensive test suite for module"
]
# Each requires: discovery → analysis → planning → execution → validation
2. Tasks with uncertain data or retrieval quality
# Example: Research task with unclear sources
query = "What caused the 2008 financial crisis?"
# React Loop can:
# Iteration 1: Broad search → find many sources
# Iteration 2: Evaluate sources → identify gaps
# Iteration 3: Targeted search → fill gaps
# Iteration 4: Synthesize → generate answer
# Iteration 5: Validate → check against known facts
3. Tasks requiring iterative refinement
# Example: Code generation
goal = "Implement user authentication API"
# React Loop process:
# Iteration 1: Generate basic structure
# Iteration 2: Add error handling (based on evaluation)
# Iteration 3: Add input validation (based on security review)
# Iteration 4: Add tests (based on coverage check)
# Iteration 5: Optimize (based on performance analysis)
4. Tasks needing multi-agent orchestration
# Example: Full-stack feature implementation
goal = "Implement payment processing feature"
# Master agent coordinates:
# - Backend agent: API endpoints
# - Frontend agent: UI components
# - Database agent: Schema changes
# - Testing agent: E2E tests
# Each agent runs its own React Loop
# Master synthesizes results
Do NOT use React Loops for:
1. Single-step deterministic responses
# Bad fit for React Loop
queries = [
"What is 2 + 2?",
"What's the capital of France?",
"Translate 'hello' to Spanish"
]
# Single LLM call is sufficient
# React Loop adds unnecessary overhead
2. Low-latency applications
# Bad: Real-time chat
# React Loop: 10-20 seconds per response
# Requirement: <1 second response
# Good: Autocomplete
# Single model: <200ms
# Meets requirement
3. Small, simple queries
# Overkill for React Loop
simple_queries = [
"List files in directory",
"Get current time",
"Check if user exists"
]
# Simple tool call or single LLM call is enough
Decision framework
def should_use_react_loop(task: Task) -> bool:
"""Decision tree"""
# Multi-step required?
if task.estimated_steps <= 1:
return False
# Can afford latency?
if task.latency_requirement < 5: # seconds
return False
# Task complexity
if task.complexity == "trivial":
return False
# Need iterative refinement?
if task.requires_iteration:
return True
# Need multi-agent coordination?
if task.requires_multiple_agents:
return True
# Default: probably not needed
return False
12. Performance & Scaling
Optimization strategies
1. Parallelize sub-agent actions
# Bad: Sequential execution
results = []
for agent in agents:
result = await agent.run(subtask)
results.append(result)
# Total time: sum of all agent times
# Good: Parallel execution
results = await asyncio.gather(*[
agent.run(subtask)
for agent in agents
])
# Total time: max of all agent times
2. Cache repeated retrievals
class CachedRetrieval:
def __init__(self):
self.cache = {}
async def retrieve(self, query: str):
"""Cache retrieval results"""
cache_key = hash(query)
if cache_key in self.cache:
return self.cache[cache_key]
results = await self.vector_db.search(query)
self.cache[cache_key] = results
return results
3. Limit context size
class ContextManager:
def __init__(self, max_context_tokens: int = 8000):
self.max_tokens = max_context_tokens
def prepare_context(self, history: List[Dict]) -> str:
"""Truncate history to fit context window"""
# Keep most recent iterations
recent = history[-5:]
# Summarize older iterations
older = history[:-5]
summary = self.summarize(older) if older else ""
context = summary + format_history(recent)
# Truncate if still too long
if count_tokens(context) > self.max_tokens:
context = truncate_to_tokens(context, self.max_tokens)
return context
4. Use async execution
class AsyncReactLoop:
async def execute_tools(self, actions: List[ActionPlan]):
"""Execute I/O-bound actions concurrently"""
# All tool calls happen in parallel
results = await asyncio.gather(*[
self.execute_tool(action)
for action in actions
])
return results
5. Monitor convergence
class ConvergenceMonitor:
def is_converging(self, evaluations: List[Evaluation]) -> bool:
"""Check if making progress"""
if len(evaluations) < 3:
return True # Too early to tell
# Extract confidence trajectory
confidences = [e.confidence for e in evaluations[-5:]]
# Check if improving
slope = self.calculate_slope(confidences)
if slope > 0.05:
return True # Improving
if slope < -0.05:
return False # Getting worse
# Flat - check if already high
return confidences[-1] > 0.85
def calculate_slope(self, values: List[float]) -> float:
"""Linear regression slope"""
n = len(values)
x = list(range(n))
x_mean = sum(x) / n
y_mean = sum(values) / n
numerator = sum((x[i] - x_mean) * (values[i] - y_mean) for i in range(n))
denominator = sum((x[i] - x_mean) ** 2 for i in range(n))
return numerator / denominator if denominator != 0 else 0
13. Resources & References
Research Papers
Foundational work:
- ReAct: Synergizing Reasoning and Acting in Language Models - Original ReAct paper (Yao et al., 2022)
- Reflexion: Language Agents with Verbal Reinforcement Learning - Self-reflection for agents
- AutoGPT and the Future of Autonomous Agents - Autonomous agent patterns
Frameworks & Tools
Production implementations:
- LangGraph - State machine framework for agentic loops
- AutoGen - Microsoft’s multi-agent conversation framework
- CrewAI - Role-based multi-agent orchestration
Practical Guides
Implementation tutorials:
- Building Production AI Agents - Anthropic’s agent patterns
- RAG + Agentic Loops - Combining retrieval with iteration
- Multi-Agent Systems Guide - Lil’Log comprehensive overview
Best Practices
Production deployment:
- Agent Safety Guidelines - Safety considerations for production agents
- LLM Observability - Monitoring and debugging agent systems
- Prompt Engineering for Agents - Effective prompting strategies
Final Takeaway
The React Loop transforms LLMs into autonomous agents
Key principles:
- Iteration over single-shot: Multiple attempts beat hoping to get it right the first time
- Feedback incorporation: Each iteration learns from previous results
- Dynamic planning: Adjust strategy based on what’s discovered
- Self-monitoring: Agents track their own progress and know when to stop
- Bounded execution: Hard limits prevent runaway behavior
- Auditability: Every decision is logged and traceable
The mental model:
Perceive → Reason → Act → Observe → Evaluate → Update → Repeat
This closed-loop control system enables:
- Autonomous problem-solving (complex multi-step tasks)
- Self-correction (recover from errors)
- Adaptive behavior (adjust to changing conditions)
- Production reliability (bounded, safe, observable)
When to use:
- Complex tasks requiring multiple steps
- Uncertain environments with incomplete information
- Quality-critical outputs needing refinement
- Multi-agent coordination
When not to use:
- Simple single-step queries
- Low-latency requirements (<5s)
- Deterministic responses
- Trivial tasks
The bottom line:
The React Loop is the foundation of production-ready autonomous AI agents. It takes LLMs from reactive responders to proactive problem-solvers.
Master the React Loop, and you unlock the ability to build agents that can tackle real-world complexity with reliability, safety, and auditability.