Update Cache Documentation to Reflect Safety TTL Implementation

- Revised README and system overview documentation to clarify the introduction of a 12-hour safety TTL for cache invalidation alongside CDC events.
- Enhanced explanations regarding the caching strategy for services, emphasizing the importance of the safety TTL in maintaining data freshness and self-healing capabilities.
- Updated comments in the OrdersCacheService to align with the new caching approach, ensuring consistency across documentation and code.
This commit is contained in:
barsa 2025-12-25 15:50:41 +09:00
parent 0f8435e6bd
commit a688121a16
3 changed files with 48 additions and 19 deletions

View File

@ -14,8 +14,8 @@ Redis-backed caching system with CDC (Change Data Capture) event-driven invalida
┌───────────────────────────▼─────────────────────────────────┐
│ Domain-Specific Cache Services │
│ - OrdersCacheService (CDC-driven, no TTL)
│ - CatalogCacheService (CDC-driven, no TTL)
│ - OrdersCacheService (CDC-driven)
│ - ServicesCacheService (CDC-driven + safety TTL)
│ - WhmcsCacheService (TTL-based) │
│ │
│ Features: │
@ -61,7 +61,13 @@ Redis-backed caching system with CDC (Change Data Capture) event-driven invalida
### 1. CDC-Driven (Orders, Catalog)
**No TTL** - Cache persists indefinitely until CDC event triggers invalidation.
**Event-driven invalidation + safety TTL** - Cache is invalidated on CDC events, and also expires after a long TTL as a safety net.
Why: CDC is the primary freshness mechanism, but a safety TTL helps self-heal if events are missed (deploy downtime, subscriber issues, replay gaps).
Config:
- `SERVICES_CACHE_SAFETY_TTL_SECONDS` (default: 12 hours, set to `0` to disable)
**Pros:**
@ -74,7 +80,7 @@ Redis-backed caching system with CDC (Change Data Capture) event-driven invalida
```typescript
@Injectable()
export class OrdersCacheService {
// No TTL = CDC-only invalidation
// CDC invalidation + safety TTL (service-specific)
async getOrderSummaries(
sfAccountId: string,
fetcher: () => Promise<OrderSummary[]>
@ -216,7 +222,7 @@ async getMyData(id: string, fetcher: () => Promise<MyData>): Promise<MyData> {
const fetchPromise = (async () => {
try {
const fresh = await fetcher();
await this.cache.set(key, fresh); // No TTL = CDC-driven
await this.cache.set(key, fresh); // CDC-driven (TTL varies by domain)
return fresh;
} finally {
this.inflightRequests.delete(key);

View File

@ -31,15 +31,16 @@ interface LegacyCatalogCachePayload<T> {
* product dependency tracking for granular invalidation.
*
* Features:
* - CDC-driven invalidation: No TTL, cache persists until CDC event
* - Event-driven invalidation: CDC / Platform Events invalidate caches on change
* - Safety TTL: long TTL to self-heal if events are missed
* - Product dependency tracking: Granular invalidation by product IDs
* - Request coalescing: Prevents thundering herd on cache miss
* - Metrics tracking: Monitors hits, misses, and invalidations
*
* Cache buckets:
* - catalog: Product catalog data (CDC-driven)
* - static: Static reference data (CDC-driven)
* - eligibility: Account eligibility data (CDC-driven)
* - catalog: Product catalog data (event-driven + safety TTL)
* - static: Static reference data (event-driven + safety TTL)
* - eligibility: Account eligibility data (event-driven + safety TTL)
* - volatile: Frequently changing data (60s TTL)
*/
@Injectable()
@ -263,8 +264,8 @@ export class ServicesCacheService {
// Store and link dependencies separately
if (dependencies) {
await this.storeDependencies(key, dependencies);
await this.linkDependencies(key, dependencies);
await this.storeDependencies(key, dependencies, ttlSeconds);
await this.linkDependencies(key, dependencies, ttlSeconds);
}
return fresh;
@ -293,8 +294,8 @@ export class ServicesCacheService {
}
if (cached.dependencies) {
await this.storeDependencies(key, cached.dependencies);
await this.linkDependencies(key, cached.dependencies);
await this.storeDependencies(key, cached.dependencies, ttlSeconds);
await this.linkDependencies(key, cached.dependencies, ttlSeconds);
}
return normalizedValue;
@ -356,11 +357,19 @@ export class ServicesCacheService {
/**
* Store dependencies metadata for a cache key
*/
private async storeDependencies(key: string, dependencies: CacheDependencies): Promise<void> {
private async storeDependencies(
key: string,
dependencies: CacheDependencies,
ttlSeconds: number | null
): Promise<void> {
const normalized = this.normalizeDependencies(dependencies);
if (normalized) {
const metaKey = this.buildDependencyMetaKey(key);
await this.cache.set(metaKey, normalized);
if (ttlSeconds === null) {
await this.cache.set(metaKey, normalized);
} else {
await this.cache.set(metaKey, normalized, ttlSeconds);
}
}
}
@ -387,7 +396,11 @@ export class ServicesCacheService {
return { productIds: Array.from(new Set(productIds)) };
}
private async linkDependencies(key: string, dependencies: CacheDependencies): Promise<void> {
private async linkDependencies(
key: string,
dependencies: CacheDependencies,
ttlSeconds: number | null
): Promise<void> {
const normalized = this.normalizeDependencies(dependencies);
if (!normalized) {
return;
@ -400,7 +413,11 @@ export class ServicesCacheService {
if (!existing.includes(key)) {
existing.push(key);
}
await this.cache.set(indexKey, { keys: existing });
if (ttlSeconds === null) {
await this.cache.set(indexKey, { keys: existing });
} else {
await this.cache.set(indexKey, { keys: existing }, ttlSeconds);
}
}
}
}

View File

@ -29,7 +29,7 @@ Purpose: explain what the portal does, which systems own which data, and how fre
## Caching & Freshness (Redis)
- Services catalog: event-driven (Salesforce CDC), no TTL; "volatile" bits use 60s TTL; eligibility per account is cached without TTL and invalidated on change.
- Services catalog: event-driven (Salesforce CDC) with a 12h safety TTL; "volatile" bits use 60s TTL; eligibility per account is event-driven with the same 12h safety TTL.
- Orders: event-driven (Salesforce CDC), no TTL; invalidated when Salesforce emits order/order-item changes or when we create/provision an order.
- Invoices: list cached 90s; invoice detail cached 5m; invalidated by WHMCS webhooks and by write operations.
- Subscriptions/services: list cached 5m; single subscription cached 10m; invalidated on WHMCS cache busts (webhooks or profile updates).
@ -62,7 +62,7 @@ There are **two independent caching layers** involved:
- **Redis (server-side) catalog cache**:
- Catalog reads are cached in Redis via `ServicesCacheService`.
- For catalog data (plans/addons/etc) the TTL is intentionally **null** (no TTL): values persist until explicitly invalidated.
- Catalog + eligibility data are primarily invalidated by Salesforce events, but we also apply a **12 hour safety TTL** (configurable via `SERVICES_CACHE_SAFETY_TTL_SECONDS`) to self-heal if events are missed.
- Invalidation is driven by Salesforce **CDC** events (Product2 / PricebookEntry) and an account **Platform Event** for eligibility updates.
- Result: even if the public catalog is requested millions of times, the BFF typically serves from Redis and only re-queries Salesforce when a relevant Salesforce change event arrives (or on cold start / cache miss).
@ -84,3 +84,9 @@ There are **two independent caching layers** involved:
- **CDC subscription health / fallback behavior**:
- If Salesforce CDC subscriptions are disabled or unhealthy, invalidations may not arrive and Redis caches can become stale until manually cleared.
- Monitor the CDC subscriber and cache health metrics (`GET /api/health/services/cache`).
### Future work (monitoring + resilience)
- **CDC subscriber monitoring**: alert on disconnects and sustained lack of events (time since last processed event).
- **Replay cursor persistence**: store/restore a replay position across restarts to reduce missed-event risk.
- **Operational runbook**: document the “flush services caches” procedure for incidents where events were missed for an extended period.