nSkillHub
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

System Design: Fraud Detection System

Fraud detection sits at the intersection of real-time systems, machine learning serving, and operational decision-making. The design challenge is: you need fast decisions (before the transaction completes) with high accuracy (false positives cost customers, false negatives cost money). Here’s how to design for both.


Requirements

Functional:

  • Score every transaction for fraud risk before authorization (synchronous, < 200ms)
  • Async deeper analysis for flagged transactions (minutes to hours)
  • Rule-based engine: “block if transaction > $5000 AND new device AND new country”
  • ML scoring: multi-feature risk probability score
  • Case management: analysts review flagged cases, mark fraud/not fraud
  • Feedback loop: analyst decisions feed back into model training
  • Account takeover detection (ATO): suspicious login, device fingerprint, velocity

Non-functional:

  • Pre-auth decision in < 200ms p99 (synchronous path)
  • High availability — if fraud service is down, fail open (allow transaction) or fail closed?
  • False positive rate < 0.5% (1 in 200 legitimate transactions should not be blocked)
  • False negative rate acceptable up to ~0.1% (fraud loss budget)

Key Design Decisions

Fail Open vs Fail Closed

If the fraud service is unreachable, does the transaction proceed?

  • Fail open (allow): Revenue-first. Users aren’t blocked during outages. Fraud loss increases during downtime.
  • Fail closed (block): Safety-first. No fraud during outages. Revenue loss during downtime.

Most e-commerce systems fail open for availability. Accept that a fraud service outage increases fraud loss, rather than stopping all revenue. Use circuit breakers to detect degradation quickly and alert.

Exception: High-risk categories (crypto purchases, large transfers) may fail closed — the fraud cost exceeds the revenue cost of blocking.

Real-Time vs Async Scoring

Two tiers:

Tier 1 (synchronous, < 200ms): Fast rules + lightweight ML model. Must complete before the payment processor call. Purpose: block obvious fraud immediately.

Tier 2 (async, seconds to minutes): Deep ML analysis, network graph analysis, behavioral analysis. Runs after the transaction is authorized. Purpose: flag transactions for review, trigger holds, initiate disputes for post-settlement fraud.

The split is deliberate: a deep GBM model scoring 300 features in 200ms is possible but expensive. Keep the sync path lightweight; move complexity to the async path.


Architecture

Transaction Event (from payment service)
  │
  ├─── Sync path (pre-auth, < 200ms):
  │         │
  │    ┌────▼────────────────────────────────────┐
  │    │  Fraud Scoring Service                  │
  │    │  1. Feature extraction (from cache)     │
  │    │  2. Rule engine evaluation              │
  │    │  3. Lightweight ML model inference      │
  │    │  4. Return: ALLOW / REVIEW / BLOCK       │
  │    └────────────────────────────────────────┘
  │         │
  │    Feature Cache (Redis):
  │    - User transaction velocity (last 1h, 24h, 7d)
  │    - Device history
  │    - IP risk score (GeoIP + VPN detection)
  │    - Merchant category risk
  │
  └─── Async path (post-auth):
            │
       Kafka topic: transaction-events
            │
       ┌────▼────────────────────────────────────┐
       │  Deep Analysis Worker                   │
       │  1. Full ML model (300+ features)        │
       │  2. Graph analysis (account network)     │
       │  3. Behavioral profiling                 │
       │  4. Device fingerprint correlation       │
       └────┬────────────────────────────────────┘
            │
       ┌────▼────────────────┐
       │  Case Management    │  ← Analyst reviews
       │  (flagged txns)     │    FRAUD / NOT_FRAUD decision
       └────┬────────────────┘
            │
       Feedback → Model Training Pipeline

Feature Engineering for Fraud Detection

The quality of features determines model accuracy. Critical features:

Velocity features (require fast computation):

  • Transaction count in last 1/4/24 hours per user
  • Dollar amount in last 1/4/24 hours per user
  • Transaction count in last hour per card
  • Failed attempts in last hour

Device and location features:

  • Is this device seen before for this account?
  • Is this IP address associated with a VPN/proxy/Tor?
  • Geo-distance from last transaction (impossible travel: two transactions 5000 miles apart in 30 minutes)
  • New device + new country combination (very high risk signal)

Behavioral features:

  • Transaction at unusual hour for this user (1am when user typically transacts 9am-5pm)
  • Merchant category deviation from user’s history
  • Amount deviation (user typically spends $20–100, this is $2000)

Network features:

  • Is this email address / card / device linked to known fraud accounts?
  • Count of accounts sharing this device
  • Count of accounts sharing this IP in last 24 hours

Feature computation challenge: Velocity features (count in last hour) can’t be pre-computed cheaply. Use Redis ZADD + ZCOUNT on sorted sets with timestamps to compute rolling windows in O(log N):

ZADD user:123:tx_timestamps <timestamp_ms> <tx_id>
ZCOUNT user:123:tx_timestamps <1h_ago_ms> +inf

ML Model Serving

Model type: Gradient Boosting (XGBoost, LightGBM) is the industry standard for tabular fraud features. Performs better than deep learning for structured data with this feature profile.

Sync path model: Smaller, faster model. 20–50 features, inference < 20ms. Acceptable AUC 0.85.

Async path model: Full model. 300+ features including graph features. Inference 100ms–1s. Higher AUC.

Serving infrastructure:

  • Model stored in object storage (S3), versioned
  • Served via TensorFlow Serving, Triton Inference Server, or BentoML
  • Models hot-swapped without service restarts
  • A/B testing: route X% of traffic to new model, compare fraud rates before full cutover

Model drift monitoring: Fraud patterns change (fraudsters adapt). Monitor:

  • Feature distribution shift (PSI — Population Stability Index)
  • Model score distribution shift
  • False positive/negative rate over time

Retrain on a regular cadence + feedback from analyst labels.


False Positive Cost

Blocking a legitimate transaction is expensive:

  • Customer experience damage (frustrated user, potential churn)
  • Support cost (customer calls to dispute the block)
  • Revenue loss

The false positive / false negative trade-off:

  • Tighter threshold → fewer false negatives (catch more fraud) → more false positives (block legit users)
  • Looser threshold → fewer false positives → more false negatives (miss more fraud)

Optimizing the threshold: Set different thresholds per risk segment. High-risk merchants (crypto, gift cards) → tighter. Low-risk merchants → looser. Premium customers with long history → much looser (their fraud rate is lower and the business cost of blocking them is higher).

Soft declines vs hard declines:

  • Hard decline: transaction blocked outright
  • Soft decline with step-up: “We noticed unusual activity. Please verify via OTP” → user verifies → proceeds. Preserves revenue, reduces false positives at slight friction cost.

Feedback Loop

Analyst decisions are the training signal. If an analyst marks a REVIEW transaction as NOT_FRAUD, the model learns from this.

Important: Labels are delayed and noisy.

  • Chargebacks (confirmed fraud) arrive weeks after the transaction
  • Analyst decisions introduce human bias
  • Not all fraud is disputed (some users don’t notice small fraudulent charges)

Training data pipeline:

  • Transaction events → feature store
  • Labels from analyst decisions + chargebacks (delayed labels)
  • Model training job (weekly or on-demand)
  • Champion/challenger testing before production deployment

EM Talking Points

  • Why not just use rules? Rules are interpretable and fast but brittle. Fraudsters learn the rules and adapt. ML generalizes to new fraud patterns. Ideal: rules for known patterns + ML for novel patterns.
  • How do you detect account takeover vs payment fraud? ATO: behavioral signals at login (unusual device, IP, typing cadence). Payment fraud: signals at transaction time. Two models, two pipelines, shared feature infrastructure.
  • Velocity limits as a fraud signal: 10 failed card attempts in 5 minutes is a carding attack. This is a rule, not ML. Rules handle these obvious cases; ML handles the subtle ones.
  • Graph analysis: Fraudsters often reuse devices, IP addresses, email patterns across multiple accounts. Querying a graph of account-device-IP relationships reveals rings. Graph DB (Neo4j) or graph compute (Spark GraphX) for batch; in-memory graph for real-time.