Introduction
The rise of large language models (LLMs) has transformed how we think about conversational agents. Early chatbots were essentially question‑answer machines—they took a user’s prompt, generated a textual response, and that was the end of the interaction. While useful, this model quickly hit a ceiling when real‑world problems demanded action: fetching data from APIs, orchestrating multi‑step processes, and making decisions based on evolving context.
Enter agentic workflows—a paradigm where LLMs act as orchestrators that can invoke external tools, maintain state across turns, and reason about long‑term goals. The Open-Action Protocol (OAP) 2.0 is the latest open standard that formalizes this capability. It provides a language‑agnostic schema for describing actions, pre‑conditions, post‑conditions, and state transitions, enabling developers to build robust, composable agents without reinventing the wheel.
In this article we will:
- Demystify the core concepts behind agentic workflows.
- Explore the new features introduced in Open‑Action Protocol 2.0.
- Walk through practical, end‑to‑end implementations using Python.
- Discuss best practices, pitfalls, and future directions.
By the end, you’ll have a concrete roadmap for turning a simple chatbot into a fully‑featured autonomous assistant that can handle complex, multi‑step tasks reliably.
1. From Chatbots to Agentic Systems
1.1 The Limitations of Traditional Chatbots
| Limitation | Example |
|---|---|
| Statelessness | A bot cannot remember a user’s preference across sessions. |
| No External Interaction | The bot cannot book a calendar event or query a database. |
| Single‑Turn Reasoning | Complex logic that requires branching is impossible. |
These constraints stem from the fact that most chatbots treat the LLM as a pure text generator without a well‑defined contract for side effects.
1.2 What Makes an Agent “Agentic”?
An agentic system satisfies three essential properties:
- Actionability – It can invoke external functions (APIs, scripts, hardware) through a defined protocol.
- Statefulness – It maintains a mutable state that persists across turns, enabling planning and feedback loops.
- Goal‑Directed Reasoning – It can decompose high‑level objectives into sub‑tasks, monitor progress, and adapt when conditions change.
Open‑Action Protocol 2.0 provides the scaffolding to express these properties in a standardized, interoperable way.
2. Overview of Open‑Action Protocol 2.0
Open‑Action Protocol (OAP) is an open‑source specification hosted under the Apache 2.0 license. Version 2.0 builds on the original 1.x by adding:
- Typed Action Schemas – Strong typing for inputs/outputs.
- Conditional Execution Graphs – Branching logic expressed as directed acyclic graphs (DAGs).
- State Versioning – Immutable snapshots for auditability and rollback.
- Security Extensions – Scoped permissions and token‑based authentication.
- Streaming Responses – Progressive updates for long‑running actions.
2.1 Core Data Model
{
"action_id": "string",
"name": "string",
"description": "string",
"input_schema": { "type": "object", "properties": { /* ... */ } },
"output_schema": { "type": "object", "properties": { /* ... */ } },
"preconditions": [ "state.key == value", "user.role == 'admin'" ],
"postconditions": [ "state.key = result.value" ],
"security": { "scopes": ["read:email", "write:calendar"] }
}
action_id: Globally unique identifier (UUID v4 recommended).input_schema/output_schema: JSON‑Schema definitions that enable automatic validation.preconditions/postconditions: Declarative expressions evaluated against the agent state.security: Optional OAuth‑style scopes that the runtime must enforce.
2.2 Execution Graph
The protocol allows you to declare an execution graph where each node is an action, and edges encode data flow and conditional branching.
{
"graph_id": "pipeline-123",
"nodes": [
{ "id": "fetch_email", "action_id": "email.fetch" },
{ "id": "extract_intent", "action_id": "nlp.intent_extractor" },
{ "id": "schedule_meeting", "action_id": "calendar.create", "condition": "intent == 'schedule_meeting'" }
],
"edges": [
{ "source": "fetch_email", "target": "extract_intent" },
{ "source": "extract_intent", "target": "schedule_meeting" }
]
}
The runtime evaluates conditions at runtime, enabling dynamic paths based on LLM inference results.
3. Key Concepts in Agentic Workflows
3.1 Action Types
| Type | Description | Typical Use‑Case |
|---|---|---|
| Synchronous | Returns a result immediately. | Simple calculations, string transformations. |
| Asynchronous | Returns a task ID; result arrives later. | Long‑running data processing, video rendering. |
| Streaming | Emits incremental updates (e.g., progress bars). | File uploads, live transcription. |
| Human‑in‑the‑Loop | Pauses execution awaiting user confirmation. | Approving a financial transaction. |
3.2 State Management
The agent state is a JSON object that persists across turns. OAP 2.0 encourages immutable versioning:
{
"state_id": "state-2024-09-01-001",
"snapshot": {
"user": { "id": "U123", "preferences": { "timezone": "UTC" } },
"session": { "step": 3, "last_action": "extract_intent" }
},
"parent_state_id": "state-2024-09-01-000"
}
Each action can read from the current state and write new values via postconditions. This design makes debugging deterministic and audit-friendly.
3.3 Conditional Execution
OAP 2.0 uses a mini expression language (similar to JMESPath) for conditions:
state.session.step >= 2result.intent in ['schedule_meeting', 'cancel_meeting']user.role == 'admin'
These expressions are evaluated safely by the runtime, preventing arbitrary code execution.
4. Architecture of an OAP‑Powered Agent
Below is a high‑level diagram (described in text for accessibility):
- LLM Core – Generates natural language and decides which action(s) to invoke based on the prompt and current state.
- OAP Runtime – Parses the LLM’s action plan, validates against schemas, enforces security, and orchestrates execution.
- Action Registry – A catalog of JSON‑defined actions (could be micro‑services, serverless functions, or local scripts).
- State Store – Persistent key‑value store (e.g., Redis, PostgreSQL) that holds immutable state snapshots.
- Security Layer – OAuth2 or API‑key validation based on the
security.scopesfield. - Feedback Loop – The runtime feeds results back into the LLM prompt, enabling iterative refinement.
4.1 Data Flow Example
User → LLM Prompt → LLM decides "fetch_email" → OAP Runtime validates → fetch_email service runs → result stored in state → LLM receives result → decides next action "extract_intent" → …
5. Practical Example: Automated Email Triage
We’ll build a mini‑assistant that:
- Fetches unread emails from Gmail.
- Extracts the user’s intent (e.g., “schedule meeting”, “forward to colleague”).
- Executes the appropriate downstream action (calendar creation, email forwarding, etc.).
5.1 Prerequisites
- Python 3.10+
openailibrary (for LLM access)fastapi(to expose actions as HTTP endpoints)redis(for state storage)pydantic(for schema validation)
5.2 Defining Actions
Create a file actions.yaml (OAP supports JSON/YAML) with three actions:
- action_id: "email.fetch_unread"
name: "Fetch Unread Emails"
description: "Retrieves the latest unread emails from the user's Gmail inbox."
input_schema:
type: object
properties:
max_results:
type: integer
default: 5
required: []
output_schema:
type: object
properties:
emails:
type: array
items:
type: object
properties:
id: { type: string }
subject: { type: string }
snippet: { type: string }
from: { type: string }
required: [id, subject, from]
preconditions: []
postconditions:
- "state.emails = result.emails"
security:
scopes: ["read:gmail"]
- action_id: "nlp.intent_extractor"
name: "Intent Extractor"
description: "Uses an LLM to classify the intent of each email."
input_schema:
type: object
properties:
emails:
type: array
items:
type: object
properties:
id: { type: string }
subject: { type: string }
snippet: { type: string }
from: { type: string }
required: [id, subject, from]
required: [emails]
output_schema:
type: object
properties:
intents:
type: array
items:
type: object
properties:
email_id: { type: string }
intent: { type: string, enum: ["schedule_meeting", "forward", "ignore"] }
confidence: { type: number }
required: [email_id, intent]
required: [intents]
preconditions: [ "state.emails != null" ]
postconditions:
- "state.intents = result.intents"
security: {}
- action_id: "calendar.create_meeting"
name: "Create Calendar Meeting"
description: "Creates a meeting in the user's Google Calendar based on extracted details."
input_schema:
type: object
properties:
email_id: { type: string }
date_time: { type: string, format: "date-time" }
participants:
type: array
items: { type: string, format: "email" }
required: [email_id, date_time, participants]
output_schema:
type: object
properties:
event_id: { type: string }
required: [event_id]
preconditions:
- "state.intents[*].intent == 'schedule_meeting'"
postconditions:
- "state.meetings.append(result.event_id)"
security:
scopes: ["write:calendar"]
5.3 Implementing the Runtime (Python)
# runtime.py
import uuid, json, asyncio
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel, Field, validator
from typing import List, Dict, Any
import redis
import httpx
app = FastAPI()
r = redis.Redis(host="localhost", port=6379, db=0)
# ---------- Helper Functions ----------
def load_actions() -> Dict[str, Dict]:
"""Load actions from YAML file and index by action_id."""
import yaml, pathlib
path = pathlib.Path(__file__).parent / "actions.yaml"
with open(path) as f:
raw = yaml.safe_load(f)
return {a["action_id"]: a for a in raw}
ACTIONS = load_actions()
def get_state(state_id: str) -> Dict:
raw = r.get(state_id)
if not raw:
raise HTTPException(status_code=404, detail="State not found")
return json.loads(raw)
def save_state(state: Dict) -> str:
state_id = f"state:{uuid.uuid4()}"
r.set(state_id, json.dumps(state))
return state_id
def evaluate_expression(expr: str, context: Dict) -> bool:
"""Very small safe evaluator for pre/post conditions."""
# For demo purposes we use eval with a restricted globals dict.
# In production replace with a proper expression parser (e.g., jmespath).
allowed_names = {"state": context.get("state", {}), "result": context.get("result", {})}
return eval(expr, {"__builtins__": {}}, allowed_names)
# ---------- Request Models ----------
class ActionRequest(BaseModel):
action_id: str
inputs: Dict[str, Any] = Field(default_factory=dict)
state_id: str
# ---------- Core Endpoint ----------
@app.post("/execute")
async def execute_action(req: ActionRequest):
# 1️⃣ Retrieve action definition
action = ACTIONS.get(req.action_id)
if not action:
raise HTTPException(status_code=404, detail="Action not found")
# 2️⃣ Load current state
state = get_state(req.state_id)
# 3️⃣ Validate input against schema (simplified)
# In production use jsonschema.validate()
for key, schema in action["input_schema"].get("properties", {}).items():
if schema.get("required", False) and key not in req.inputs:
raise HTTPException(status_code=400, detail=f"Missing required input: {key}")
# 4️⃣ Check preconditions
for cond in action.get("preconditions", []):
if not evaluate_expression(cond, {"state": state}):
raise HTTPException(status_code=400, detail=f"Precondition failed: {cond}")
# 5️⃣ Dispatch to concrete implementation
result = await dispatch(action["action_id"], req.inputs)
# 6️⃣ Apply postconditions
for cond in action.get("postconditions", []):
# Example: "state.emails = result.emails"
lhs, rhs = cond.split("=")
lhs = lhs.strip()
rhs = rhs.strip()
# Resolve RHS
value = eval(rhs, {"__builtins__": {}}, {"state": state, "result": result})
# Simple dotted assignment
parts = lhs.split(".")
target = state
for part in parts[:-1]:
target = target.setdefault(part, {})
target[parts[-1]] = value
# 7️⃣ Persist new state version
new_state_id = save_state(state)
return {"state_id": new_state_id, "result": result}
5.3.1 Dispatch Implementation
async def dispatch(action_id: str, inputs: Dict) -> Dict:
if action_id == "email.fetch_unread":
return await fetch_unread_emails(inputs.get("max_results", 5))
if action_id == "nlp.intent_extractor":
return await extract_intent(inputs["emails"])
if action_id == "calendar.create_meeting":
return await create_meeting(inputs)
raise NotImplementedError(f"No dispatcher for {action_id}")
# Example of a simple async Gmail fetch (using Google API client)
async def fetch_unread_emails(max_results: int) -> Dict:
# Placeholder implementation – replace with actual Google API calls
await asyncio.sleep(0.5) # Simulate network latency
return {
"emails": [
{"id": "msg-1", "subject": "Project kickoff", "snippet": "Let’s schedule...", "from": "alice@example.com"},
{"id": "msg-2", "subject": "Invoice #1234", "snippet": "Please see attached", "from": "billing@example.com"}
][:max_results]
}
async def extract_intent(emails: List[Dict]) -> Dict:
# Use OpenAI's GPT‑4 to classify intent
prompt = "Classify the intent of each email. Return JSON with fields email_id, intent, confidence."
# Build a concise representation
for e in emails:
prompt += f"\n- ID: {e['id']}, Subject: {e['subject']}"
# Call the LLM (simplified)
import openai
resp = await openai.ChatCompletion.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.0,
)
# Assume the model returns a JSON string
json_str = resp.choices[0].message.content.strip()
return {"intents": json.loads(json_str)}
async def create_meeting(payload: Dict) -> Dict:
# Placeholder: pretend we called Google Calendar API
await asyncio.sleep(0.3)
return {"event_id": f"event-{uuid.uuid4()}"}
5.4 Orchestrating the Workflow
Instead of manually calling each endpoint, we can declare a graph in JSON and let the runtime walk it.
{
"graph_id": "email_triage_v1",
"nodes": [
{ "id": "fetch", "action_id": "email.fetch_unread", "inputs": {"max_results": 5} },
{ "id": "intent", "action_id": "nlp.intent_extractor", "inputs": {} },
{ "id": "schedule", "action_id": "calendar.create_meeting", "inputs": {} }
],
"edges": [
{ "source": "fetch", "target": "intent" },
{ "source": "intent", "target": "schedule", "condition": "state.intents[*].intent == 'schedule_meeting'" }
]
}
A tiny graph executor walks the nodes, respects conditions, and updates state automatically.
# graph_executor.py (simplified)
async def run_graph(graph: Dict, initial_state_id: str):
current_state_id = initial_state_id
for node in graph["nodes"]:
# Resolve inputs that reference state
inputs = {}
for k, v in node.get("inputs", {}).items():
if isinstance(v, str) and v.startswith("$state."):
# e.g., "$state.emails"
path = v.split(".", 1)[1]
state = get_state(current_state_id)
for part in path.split("."):
state = state.get(part, {})
inputs[k] = state
else:
inputs[k] = v
# Execute
resp = await execute_action(
ActionRequest(action_id=node["action_id"], inputs=inputs, state_id=current_state_id)
)
current_state_id = resp["state_id"]
# Conditional edge handling
# (simplified: break if condition fails)
for edge in graph["edges"]:
if edge["source"] == node["id"]:
condition = edge.get("condition")
if condition and not evaluate_expression(condition, {"state": get_state(current_state_id)}):
# Skip downstream node
continue
return current_state_id
Running the graph:
import asyncio, json
async def main():
# Load graph definition
with open("triage_graph.json") as f:
graph = json.load(f)
# Create initial empty state
root_state = {}
root_state_id = save_state(root_state)
final_state_id = await run_graph(graph, root_state_id)
final_state = get_state(final_state_id)
print("Workflow completed. Final state:", json.dumps(final_state, indent=2))
asyncio.run(main())
Result (example):
{
"emails": [
{"id": "msg-1", "subject": "Project kickoff", "from": "alice@example.com"},
{"id": "msg-2", "subject": "Invoice #1234", "from": "billing@example.com"}
],
"intents": [
{"email_id": "msg-1", "intent": "schedule_meeting", "confidence": 0.92},
{"email_id": "msg-2", "intent": "ignore", "confidence": 0.88}
],
"meetings": ["event-3f2c9b1a-5d7e-4a9f-9b1c-7f6e2d4c9a5b"]
}
The assistant has automatically fetched emails, classified intent, and created a calendar entry, all while maintaining an immutable audit trail.
6. Building Your Own Agentic Application
6.1 Step‑by‑Step Checklist
Define Business Objectives – What high‑level goals should the agent achieve?
Model the State – Sketch a JSON schema that captures all mutable data (user profile, session progress, external resources).
Catalog Required Actions – For each external capability, write an OAP action definition (input/output, security scopes).
Create Execution Graphs – Use DAGs to encode typical workflows.
Implement Runtime – Leverage existing OAP SDKs (e.g.,
openaction-py,openaction-js) or roll your own as shown above.Integrate LLM Prompting – Prompt the LLM to select actions based on the current state. Example prompt template:
You are an autonomous assistant. Given the current state: {{state_json}} Choose the next action(s) from the catalog: {{action_catalog}} Return a JSON array of action IDs and input mappings.Test End‑to‑End – Simulate realistic conversation flows, verify state snapshots, and assert security constraints.
Deploy with Observability – Log state versions, action outcomes, and latency metrics.
6.2 Sample Prompt Template (Python)
def build_prompt(state: dict, catalog: list) -> str:
return f"""
You are an autonomous AI assistant. Your goal is to help the user manage their inbox.
Current state (JSON):
```json
{json.dumps(state, indent=2)}
Available actions (JSON schema):
{json.dumps(catalog, indent=2)}
Select the single most appropriate action and provide the required inputs. Return ONLY a JSON object: {{ “action_id”: “<action_id>”, “inputs”: {{ … }} }} """
Pass this prompt to the LLM, parse the JSON response, and feed it into the OAP runtime.
---
## 7. Best Practices & Gotchas
### 7.1 Security First
- **Least‑Privilege Scopes** – Grant actions only the permissions they truly need.
- **Input Sanitization** – Even though OAP validates JSON schemas, always double‑check for injection vectors when forwarding data to external services.
- **Audit Trails** – Store immutable state snapshots in a tamper‑evident log (e.g., append‑only database, blockchain‑style ledger) for compliance.
### 7.2 Managing State Size
- **Chunk Large Payloads** – Store bulky data (e.g., PDFs, images) in object storage (S3, GCS) and keep only references in the state.
- **Prune Historical Versions** – Retain snapshots for a configurable window (e.g., 30 days) to balance observability and storage costs.
### 7.3 Handling Failures
- **Retry Policies** – Wrap asynchronous actions with exponential backoff.
- **Compensation Actions** – Define “undo” actions in the graph for critical side effects (e.g., cancel a meeting if subsequent steps fail).
- **Human‑in‑the‑Loop** – For high‑risk decisions, pause execution and request explicit user confirmation via a UI widget.
### 7.4 Performance Optimizations
- **Parallel Execution** – OAP DAGs can be traversed concurrently when nodes have no dependency edges.
- **Streaming Results** – Use the streaming action type for long uploads; the LLM can adapt its plan while progress is reported.
- **Cache Deterministic Calls** – Memoize pure functions (e.g., sentiment analysis) to avoid redundant LLM calls.
---
## 8. Real‑World Use Cases
| Industry | Scenario | Benefits |
|----------|----------|----------|
| **Customer Support** | An AI triage bot pulls tickets, extracts urgency, and auto‑assigns to agents or resolves simple queries. | Faster response times, reduced manual workload. |
| **Healthcare** | A virtual assistant reviews patient messages, schedules appointments, and orders lab tests after verification. | Improved patient engagement, compliance with HIPAA (through scoped actions). |
| **Finance** | An autonomous compliance reviewer scans transaction logs, flags anomalies, and triggers manual review workflows. | Higher detection rates, auditability via immutable state. |
| **E‑Commerce** | A shopping assistant fetches product availability, applies coupons, and completes checkout via payment gateway actions. | Seamless UX, reduced cart abandonment. |
| **DevOps** | An incident response bot gathers logs, triggers rollbacks, and posts incident summaries to Slack. | Faster MTTR (Mean Time to Recovery), consistent post‑mortems. |
---
## 9. Future Directions for Open‑Action Protocol
1. **Standardized Agent Personas** – A schema to describe personality traits, tone, and risk tolerance, enabling LLMs to adapt their decision style.
2. **Federated Action Registries** – Peer‑to‑peer discovery of actions across organizations while preserving privacy.
3. **Formal Verification** – Integration with model‑checking tools to prove that a workflow satisfies safety properties (e.g., “no action ever writes to `/etc` without admin scope”).
4. **Hybrid Reasoning Engines** – Combining symbolic planners (PDDL) with LLM‑driven intuition for optimal decision making.
The community is already contributing extensions via the `openaction/extensions` GitHub organization; keep an eye on the roadmap for upcoming releases.
---
## Conclusion
The Open‑Action Protocol 2.0 marks a pivotal step in moving LLMs from *talking machines* to **autonomous agents** capable of orchestrating real‑world actions. By providing a **typed, secure, and versioned** contract for actions, OAP eliminates the ad‑hoc glue code that has long plagued AI integration projects.
In this article we:
* Unpacked the limitations of traditional chatbots and the three pillars of agentic behavior.
* Explored the enriched data model and execution graph features of OAP 2.0.
* Walked through a complete, production‑style implementation of an email‑triage agent, demonstrating state management, conditional branching, and security scoping.
* Highlighted best practices, common pitfalls, and concrete industry use cases.
* Looked ahead to emerging extensions that will further empower developers to build trustworthy, composable AI agents.
Whether you are a **product manager** looking to prototype AI‑driven workflows, a **backend engineer** tasked with building secure action services, or a **researcher** exploring the boundaries of autonomous reasoning, mastering the Open‑Action Protocol gives you a solid, standards‑based foundation to build on.
Start by defining a small, high‑impact workflow, encode it in OAP, and let the LLM do the heavy lifting of orchestration. As you iterate, you’ll discover the true power of **agentic AI**—systems that not only understand language but also *act* on it in a predictable, auditable, and scalable way.
---
## Resources
- **Open‑Action Protocol Official Repository** – Comprehensive spec, SDKs, and community discussions.
[Open‑Action GitHub](https://github.com/openaction/openaction)
- **OpenAI GPT‑4o‑mini Documentation** – Details on prompt engineering, streaming, and function calling.
[OpenAI API Docs](https://platform.openai.com/docs)
- **Google Workspace APIs (Gmail, Calendar)** – Guides for OAuth scopes, rate limits, and best practices.
[Google Workspace Developer Guide](https://developers.google.com/workspace)
- **JSON Schema Validation in Python (jsonschema library)** – Robust schema enforcement for OAP actions.
[jsonschema PyPI](https://pypi.org/project/jsonschema/)
- **Secure OAuth 2.0 Scopes Overview** – Understanding least‑privilege token design.
[OAuth 2.0 Scopes RFC](https://datatracker.ietf.org/doc/html/rfc6749#section-3.3)
- **State Versioning Patterns (Event Sourcing)** – Architectural patterns for immutable state logs.
[Martin Fowler – Event Sourcing](https://martinfowler.com/eaaDev/EventSourcing.html)