Assist_Design/docs/how-it-works/system-overview.md

# How the Portal Works (Overview)

Purpose: explain what the portal does, which systems own which data, and how freshness is managed.

## Core Pieces and Responsibilities

- Portal UI (Next.js) + BFF API (NestJS): handles all user traffic and calls external systems.
- Postgres: stores portal users and the cross-system mapping `user_id ↔ whmcs_client_id ↔ sf_account_id`.
- Redis cache: reduces load with a mix of **global** caches (e.g. product catalog) and **account-scoped** caches (e.g. eligibility) to avoid mixing customer data.
- WHMCS: system of record for billing (clients, addresses, invoices, payment methods, subscriptions).
- Salesforce: system of record for CRM (accounts/contacts), product catalog/pricebook, orders, and support cases.
- Freebit: SIM provisioning only, used during mobile/SIM order fulfillment.

## High-Level Data Flows

- Sign-up: portal verifies the customer number in Salesforce → creates a WHMCS client (billing account) → stores the portal user + mapping → updates Salesforce with portal status + WHMCS ID.
- Login/Linking: existing WHMCS users validate their WHMCS credentials; we create the portal user, map IDs, and mark the Salesforce account as portal-active.
- Services & Checkout: products/prices come from the Salesforce portal pricebook; eligibility is checked per account; we require a WHMCS payment method before allowing checkout.
- Orders: created in Salesforce with an address snapshot; Salesforce change events trigger fulfillment, which creates the matching WHMCS order and updates Salesforce statuses.
- Billing: invoices, payment methods, and subscriptions are read from WHMCS; secure SSO links are generated for paying invoices inside WHMCS.
- Support: cases are created/read directly in Salesforce with Origin = “Portal Website.”

## Data Ownership Cheat Sheet

- Identity & session: Portal DB (hashed passwords, no WHMCS/SF credentials stored).
- Billing profile & addresses: WHMCS (authoritative); the portal writes changes back to WHMCS.
- Orders & order status: Salesforce (source of truth); WHMCS receives the billing/provisioning copy during fulfillment.
- Support cases: Salesforce (portal only filters to the account’s cases).

## Caching & Freshness (Redis)

- Services catalog: event-driven (Salesforce CDC) with a 12h safety TTL; "volatile" bits use 60s TTL; eligibility per account is event-driven with the same 12h safety TTL.
- Orders: event-driven (Salesforce CDC), no TTL; invalidated when Salesforce emits order/order-item changes or when we create/provision an order.
- Invoices: list cached 90s; invoice detail cached 5m; invalidated by WHMCS webhooks and by write operations.
- Subscriptions/services: list cached 5m; single subscription cached 10m; invalidated on WHMCS cache busts (webhooks or profile updates).
- Payment methods: cached 15m; payment gateways list cached 1h.
- WHMCS client profile: cached 30m; cleared after profile/address changes.
- Signup account lookup (Salesforce customer number): cached 30s to keep the form responsive.
- Support cases: read live from Salesforce (no cache).

## What Happens on Errors

- We prefer to fail safely with clear messages: for example, missing Customer Number, duplicate account, or missing payment method stops the action and tells the user what to fix.
- If WHMCS or Salesforce is briefly unavailable, the portal surfaces a friendly “try again later” message rather than partial data.
- Fulfillment writes error codes/messages back to Salesforce (e.g., missing payment method) so the team can see why a provision was paused.
- Caches are cleared on writes and key webhooks so stale data is minimized; when cache access fails, we fall back to live reads.

## Public vs Account API Boundary (Security + Caching)

The BFF exposes two “flavors” of service catalog endpoints:

- **Public catalog (never personalized)**: `GET /api/public/services/*`
  - Ignores cookies/tokens (no optional session attach).
  - Safe to cache publicly (subject to TTL) and heavily rate limit.
- **Account catalog (authenticated + personalized)**: `GET /api/account/services/*`
  - Requires auth and can return account-specific catalog variants (e.g. SIM family discount availability).
  - Uses `Cache-Control: private, no-store` at the HTTP layer; server-side caching is handled in Redis.

### How "public caching" works (and why high traffic usually won't hit Salesforce)

There are **two independent caching layers** involved:

- **Redis (server-side) catalog cache**:
  - Catalog reads are cached in Redis via `ServicesCacheService`.
  - Catalog + eligibility data are primarily invalidated by Salesforce events, but we also apply a **12 hour safety TTL** (configurable via `SERVICES_CACHE_SAFETY_TTL_SECONDS`) to self-heal if events are missed.
  - Invalidation is driven by Salesforce **CDC** events (Product2 / PricebookEntry) and an account **Platform Event** for eligibility updates.
  - Result: even if the public catalog is requested millions of times, the BFF typically serves from Redis and only re-queries Salesforce when a relevant Salesforce change event arrives (or on cold start / cache miss).

- **HTTP cache (browser/CDN)**:
  - Public catalog responses include `Cache-Control: public, max-age=..., s-maxage=...`.
  - This reduces load on the BFF by allowing browsers/shared caches/CDNs to reuse responses for the TTL window.
  - This layer is TTL-based, so **staleness up to the TTL** is expected unless your CDN is configured for explicit purge.

### What to worry about at "million visits" scale

- **CDN cookie forwarding / cache key fragmentation**:
  - Browsers will still send cookies to `/api/public/*` by default; the BFF ignores them, but a CDN might treat cookies as part of the cache key unless configured not to.
  - Make sure your CDN/proxy config does **not** include cookies (and ideally not `Authorization`) in the cache key for `/api/public/services/*`.

- **BFF + Redis load (even if Salesforce is protected)**:
  - Redis caching prevents Salesforce read amplification, but the BFF/Redis still need to handle request volume.
  - Rate limiting on public endpoints is intentional to cap abuse and protect infrastructure.

- **CDC subscription health / fallback behavior**:
  - If Salesforce CDC subscriptions are disabled or unhealthy, invalidations may not arrive and Redis caches can become stale until manually cleared.
  - Monitor the CDC subscriber and cache health metrics (`GET /api/health/services/cache`).

### Future work (monitoring + resilience)

- **CDC subscriber monitoring**: alert on disconnects and sustained lack of events (time since last processed event).
- **Replay cursor persistence**: store/restore a replay position across restarts to reduce missed-event risk.
- **Operational runbook**: document the “flush services caches” procedure for incidents where events were missed for an extended period.