System Design: Payment Gateway

Apr 7, 2026 6 minutes to read

A payment gateway design is one of the highest-stakes system design questions. Every architectural decision has a financial consequence — double charges, lost transactions, or fraud exposure. The interviewer is testing your understanding of consistency, idempotency, and the realities of financial systems.

Requirements

Functional:

Accept payment instrument (card, bank account, wallet) and charge it
Support 3DS (3D Secure) authentication flow for card payments
Async settlement — funds eventually transferred to merchant
Refunds and partial refunds
Transaction status API — client can poll for async results
Webhook notifications on payment state changes
Idempotent payment requests — retrying doesn’t double-charge

Non-functional:

No double charges under any failure scenario
Transaction durability — no lost payments once accepted
p99 latency < 3s for synchronous payment initiation
High availability for the charge endpoint (payment failure = revenue loss)
PCI DSS compliance — cardholder data must be handled securely
Audit trail for all transaction state changes

Key Decisions

Idempotency: The Most Important Requirement

The defining challenge of payment systems: a client submits a payment, the server processes it, but the response is lost in transit. The client retries. Does the payment happen twice?

Solution: Client-generated idempotency keys

The client generates a unique idempotency key (UUID) for each payment intent. The server stores (idempotency_key, result). On retry with the same key, the server returns the stored result without re-processing.

Client sends: POST /payments
  { amount: 100, currency: "USD", idempotency_key: "uuid-1234" }

Server:
  1. Check idempotency store: key "uuid-1234" exists? → return stored result
  2. Not found → process payment → store result with key → return result

Client retry (same idempotency key):
  → Server returns stored result, no re-processing

Implementation details:

Idempotency key has a TTL (24 hours is typical)
Store the key atomically with the result: INSERT INTO idempotency_keys ... ON CONFLICT DO NOTHING
The idempotency check and payment creation must happen in the same transaction
If the payment is still in-flight (processing), return 202 Accepted with status: PROCESSING

Exactly-Once Charge

Between your system and the payment processor (Stripe, Braintree, Adyen), you face the same problem: your call to the processor might fail after the processor charged the card.

Solution: Use the processor’s own idempotency keys. Stripe, for example, accepts an Idempotency-Key header — retry the same API call with the same key and Stripe returns the original result.

If the processor doesn’t support idempotency keys (older processors), you must query the processor for the transaction state before retrying a charge.

Transaction State Machine

Payments are never just “succeeded” or “failed” — they go through states:

INITIATED → AUTHORIZING → AUTHORIZED → CAPTURING → CAPTURED → SETTLED
                       ↓            ↓
                  AUTH_FAILED    CAPTURE_FAILED
                       ↓
                    FAILED
                    
CAPTURED → REFUNDING → REFUNDED (full or partial)

Each state transition is an event, stored immutably. Never update a payment record in place — append state transition events (event sourcing is natural here).

Ledger Consistency

The accounting invariant: for every debit, there must be an equal credit. No money appears or disappears.

Double-entry ledger:

-- Every transaction creates two ledger entries
INSERT INTO ledger_entries (account_id, amount, type, tx_id) VALUES
  (merchant_escrow_account, +100.00, 'CREDIT', 'tx-123'),   -- merchant receives
  (customer_account,        -100.00, 'DEBIT',  'tx-123');  -- customer pays

-- Balance is always: SUM(credits) - SUM(debits)
-- If these two rows aren't both committed, the books don't balance

Never update a ledger entry. Every adjustment is a new entry (reversal = negative entry).

Architecture

Client
  │
  ├─ POST /payments (synchronous initiation)
  │        │
  ▼        ▼
API Gateway (auth, rate limiting, TLS termination)
  │
  ▼
Payment Service (core)
  ├── Idempotency check → Idempotency DB (Redis or Postgres with TTL)
  ├── Validate → fraud scoring (sync, pre-auth)
  ├── Write INITIATED record → Payments DB (Postgres)
  ├── 3DS required? → Return requires_action with redirect URL
  └── Call Payment Processor (Stripe/Adyen)
         │ success → record AUTHORIZED in DB
         │ failure → record FAILED in DB
         └── Async: publish PaymentAuthorized/Failed event → Kafka

Settlement Service (async)
  Listens to events → triggers capture (AUTHORIZED → CAPTURED) → SETTLED

Notification Service
  Listens to events → sends webhooks to merchants

Ledger Service
  Listens to CAPTURED/SETTLED events → records double-entry ledger rows

PCI Scope Reduction

Handling raw card numbers (PAN) puts your entire system in PCI scope — expensive, complex compliance.

Best practice: Use a payment processor’s tokenization. The client sends card data directly to Stripe’s JS library or Adyen’s hosted fields. The processor returns a token. Your backend only ever sees the token — never the card number.

This dramatically reduces your PCI scope to SAQ A or SAQ A-EP (the simplest tiers).

Never store: Full card numbers (PAN), CVV/CVC (ever, even temporarily — PCI strictly prohibits this), magnetic stripe data.

3D Secure (3DS) Flow

3DS is an additional authentication step where the cardholder proves identity to their bank.

1. Your frontend initiates payment with card token
2. Your backend calls processor → processor returns requires_action with 3DS URL
3. Your frontend redirects user to 3DS page (bank's page)
4. User authenticates (OTP, biometric)
5. Bank redirects back to your return URL with a result token
6. Your backend calls processor to complete the payment with the 3DS token
7. Payment succeeds (liability shifted to bank for fraud)

Why 3DS matters architecturally: The payment flow is asynchronous — you initiate, wait for user action, then complete. Your system must store the pending payment state and pick up where it left off when the user returns.

Failure Modes

Network timeout calling the processor:

Retry with the same idempotency key
If processor confirms the charge succeeded: record as AUTHORIZED
If processor confirms the charge failed: record as FAILED
If processor can’t be reached after N retries: leave in PENDING, retry async via a job

Double webhook from processor:

Process the first; the second must be idempotent
Store processor_event_id with uniqueness constraint — duplicate events fail the insert and are discarded

Database write fails after processor charges:

The processor charged the card, but your DB write failed
Recovery: reconciliation job compares your DB records with processor records on a schedule
Any processor transaction with no corresponding DB record is flagged for investigation

Partial capture failure:

Authorization succeeded, capture failed
Don’t leave an uncaptured auth indefinitely (authorizations expire in 7 days typically)
Retry capture; if repeatedly fails, void the authorization and notify merchant

EM Talking Points

Why not use a RDBMS for everything? Payments are ACID — Postgres is the right choice. Redis for idempotency keys (speed, TTL). Columnar store for analytics and reporting.
How do you handle refunds? Refunds are new ledger entries (reverse the original). Partial refunds: partial negative entries. Refund processing is similar to charge — idempotent, state-machined.
How do you reconcile at end of day? Settlement files from processors compared against your ledger. Any discrepancy triggers investigation. This is standard operations in fintech.
What if Stripe is down? Failover to a secondary processor (Adyen, Braintree) is possible but complex — different APIs, different tokenization. Most teams accept Stripe downtime rather than build multi-processor fallback.
Fraud detection placement: Pre-auth scoring (fast, rules-based: block obviously fraudulent requests before sending to processor). Post-auth scoring (ML-based, async: flag for review, trigger dispute if fraud confirmed after settlement).