TL;DR — Idempotency keys let your payment service safely absorb retries and network glitches without creating duplicate charges. By storing each key with a deterministic result and wiring it into a well‑defined request flow, you gain strong consistency, auditability, and simple rollback paths for production systems.
In the fast‑moving world of online commerce, a single extra charge can cascade into refunds, chargebacks, and lost trust. Yet the same reliability guarantees that make HTTP robust—retries, load balancers, and circuit breakers—also increase the chance that a client will issue the same request twice. Implementing idempotency keys is the industry‑standard answer, but turning the concept into a production‑grade feature requires careful architecture, defensive patterns, and observability. This post walks through a real‑world design that has run at scale behind a multi‑tenant payment platform, shows code snippets in Python and SQL, and lists the operational practices you need to keep the system safe day after day.
Why Idempotency Matters in Payments
- Network retries are inevitable – Mobile carriers, corporate proxies, and CDN edge nodes often retry failed POSTs automatically.
- User‑initiated double clicks – A frustrated shopper may tap “Pay” twice; browsers will resend the request if the first response never arrived.
- Downstream service flakiness – If your fraud‑check service times out, the orchestrator may retry the whole transaction.
When any of these scenarios happen without idempotency, the same credit‑card token can be charged multiple times, leading to:
- Customer support tickets that cost minutes of engineer time.
- Increased chargeback ratios, which affect merchant onboarding.
- Compliance headaches (PCI DSS requires accurate transaction logs).
The remedy is to make the operation itself idempotent: the second request must return the exact same outcome as the first, regardless of when or how often it arrives.
Core Architecture
High‑Level Request Flow
client ──► API Gateway ──► Payment Service ──► Idempotency Store
│ │
▼ ▼
Business Logic Duplicate Check
│ │
▼ ▼
DB Transaction Return Cached Result
- API Gateway extracts the
Idempotency-Keyheader (or body field) and forwards it unchanged. - Payment Service receives the key and performs a single atomic “check‑and‑write” against the Idempotency Store.
- If the key is new, the service:
- Starts a DB transaction that includes the payment request.
- Persists the key together with a pending status and a UUID for the eventual result.
- Executes the business logic (card authorization, ledger entry, webhook dispatch).
- Updates the key record with the final status (
success,failure) and the serialized response payload.
- If the key already exists, the service fetches the stored outcome and returns it immediately, bypassing the business logic entirely.
The crucial invariant is atomicity: the check‑and‑write must be a single ACID operation, otherwise two concurrent requests could both think they own the key and double‑charge. Below is a Python example using PostgreSQL’s INSERT … ON CONFLICT clause.
# payment_service.py
import json
import uuid
import psycopg2
from psycopg2.extras import Json
def process_payment(request_body, idempotency_key, db_conn):
"""Handle a payment request with idempotency protection."""
with db_conn.cursor() as cur:
# 1️⃣ Try to claim the key atomically
try:
cur.execute("""
INSERT INTO idempotency_keys (key, status, result, created_at)
VALUES (%s, 'pending', NULL, now())
ON CONFLICT (key) DO NOTHING
RETURNING id;
""", (idempotency_key,))
claim = cur.fetchone()
except Exception as e:
db_conn.rollback()
raise
# 2️⃣ If claim is None, the key already exists → fetch cached result
if claim is None:
cur.execute("""
SELECT status, result FROM idempotency_keys
WHERE key = %s;
""", (idempotency_key,))
status, result_json = cur.fetchone()
db_conn.commit()
return json.loads(result_json) if status == 'success' else {
"error": "Previous attempt failed", "status": status
}
# 3️⃣ New request – proceed with payment logic
payment_id = uuid.uuid4()
# (Placeholder for actual card processing)
payment_success = external_card_authorize(request_body)
# 4️⃣ Store the final outcome
outcome = {
"payment_id": str(payment_id),
"status": "succeeded" if payment_success else "failed"
}
cur.execute("""
UPDATE idempotency_keys
SET status = %s,
result = %s,
updated_at = now()
WHERE key = %s;
""", ('success' if payment_success else 'failure',
Json(outcome), idempotency_key))
db_conn.commit()
return outcome
Storage Choices
| Storage Type | Pros | Cons | Typical Use |
|---|---|---|---|
| PostgreSQL (row‑level lock) | Strong ACID guarantees, easy to join with business tables | Write‑heavy workloads can hit contention at extreme QPS | Small‑to‑medium SaaS, single‑region |
| Redis (SETNX + Lua script) | Sub‑millisecond latency, natural for caching | Volatile unless persisted, requires careful eviction policy | High‑throughput micro‑service, multi‑region read‑through |
| DynamoDB (Conditional Put) | Fully managed, auto‑scales, TTL support | Eventual consistency on reads, higher latency than Redis | Global‑scale, serverless back‑ends |
In our production system we use PostgreSQL for its transactional guarantees and because the idempotency table lives next to the ledger tables, enabling a single COMMIT that covers both the payment and the key.
Safety Patterns
Duplicate Detection via Deterministic Keys
A deterministic key is derived from request data that does not change across retries (e.g., merchant_id:order_id:timestamp). This eliminates the risk of a client accidentally reusing a stale key for a different transaction.
import hashlib
def make_key(merchant_id: str, order_id: str, timestamp: str) -> str:
raw = f"{merchant_id}:{order_id}:{timestamp}"
return hashlib.sha256(raw.encode()).hexdigest()
If the client cannot generate a deterministic key (e.g., mobile SDK), the server should issue a UUID on the first request and require the client to echo it back on retries. This pattern is documented in the Stripe API under “Idempotency Keys”.
Retry‑Safe Side Effects
All side‑effects (webhook dispatch, ledger entry, external notifications) must be idempotent themselves or be gated behind the same key. A common technique is to store a sent flag alongside the key and only fire the webhook after the transaction commits.
-- idempotency_keys table (PostgreSQL)
CREATE TABLE idempotency_keys (
key TEXT PRIMARY KEY,
status TEXT NOT NULL, -- pending, success, failure
result JSONB,
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ,
webhook_sent BOOLEAN DEFAULT FALSE
);
When the payment succeeds:
UPDATE idempotency_keys
SET status = 'success',
result = '{"payment_id":"...","status":"succeeded"}',
webhook_sent = TRUE
WHERE key = $1;
If a retry hits the same key after the webhook was already sent, the service simply returns the stored result and skips the second webhook call.
Expiration & Cleanup
Idempotency keys are not needed forever. Set a TTL (time‑to‑live) that matches your business window—typically 24 hours for card payments. PostgreSQL can use a scheduled VACUUM or a background worker; Redis can use built‑in key expiry.
# Bash script scheduled via cron to purge old keys (>48h)
psql -d payments -c "
DELETE FROM idempotency_keys
WHERE created_at < now() - interval '48 hours';
"
Production‑Ready Workflow
Deploying with Feature Flags
Roll out idempotency support behind a flag (e.g., enable_idempotency). This lets you:
- Enable it for a subset of merchants.
- Observe latency impact before full adoption.
- Quickly roll back if an unexpected deadlock appears.
Feature flags can be managed with LaunchDarkly or an internal config service. Example snippet:
if config.get('enable_idempotency'):
response = process_payment(body, idempotency_key, db)
else:
response = process_without_idempotency(body, db)
Monitoring and Alerting
- Key Collision Rate – Percentage of requests that hit an existing key. A sudden spike may indicate client misuse or a bug that’s re‑using keys incorrectly.
- Pending‑Too‑Long – Keys stuck in
pendingfor > 30 seconds suggest downstream timeouts or deadlocks. - Webhook Duplicate Errors – Count of “webhook already sent” warnings; should be zero in a correctly idempotent flow.
Prometheus queries:
# Collision rate
sum(rate(http_requests_total{handler="payment", status="200", idempotency="hit"}[5m]))
/
sum(rate(http_requests_total{handler="payment"}[5m]))
# Pending duration histogram
histogram_quantile(0.95, sum(rate(idempotency_pending_seconds_bucket[5m])) by (le))
Testing in CI
- Property‑based testing – Generate random request payloads with the same idempotency key, assert that the second call returns the exact same JSON.
- Chaos engineering – Introduce network latency and forced DB deadlocks to verify that the
ON CONFLICTpath remains safe.
# .github/workflows/idempotency.yml
name: Idempotency Integration Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: payments
ports: ["5432:5432"]
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run property tests
run: pytest -m idempotency
Key Takeaways
- Atomic claim of the idempotency key is non‑negotiable; use
INSERT … ON CONFLICT(SQL) orSETNX(Redis) to guarantee single ownership. - Deterministic keys reduce client error surface; if not possible, let the server issue a UUID and require it on retries.
- Side‑effects must be gated by the same key to avoid duplicate webhook or ledger entries.
- TTL and cleanup keep the store lean; 24‑48 hours is a practical window for most card‑payment flows.
- Observability (collision rate, pending latency) and feature‑flag rollout are essential to move from prototype to production safely.