Assist_Design/docs/CDC_API_USAGE_ANALYSIS.md

582 lines
15 KiB
Markdown
Raw Normal View History

# CDC Cache Strategy Analysis: API Usage & Optimization
## 🎯 Your Key Questions Answered
### Question 1: What happens when a customer is offline for 7 days?
**Good News:** Your current architecture is already optimal!
#### How CDC Cache Works
```
Product changes in Salesforce
CDC Event: Product2ChangeEvent
CatalogCdcSubscriber receives event
Invalidates ALL catalog caches (deletes cache keys)
Redis: catalog:internet:plans → DELETED
Redis: catalog:sim:plans → DELETED
Redis: catalog:vpn:plans → DELETED
```
**Key Point:** CDC **deletes** cache entries, it doesn't **update** them.
#### Offline Customer Scenario
```
Day 1: Customer logs in, fetches catalog
→ Cache populated: catalog:internet:plans
Day 2: Product changes in Salesforce
→ CDC invalidates cache
→ Redis: catalog:internet:plans → DELETED
Day 3-7: Customer offline (not logged in)
→ No cache exists (already deleted on Day 2)
→ No API calls made (customer is offline)
Day 8: Customer logs back in
→ Cache miss (was deleted on Day 2)
→ Fetches fresh data from Salesforce (1 API call)
→ Cache populated again
```
**Result:** You're NOT keeping stale cache for offline users. Cache is deleted when data changes, regardless of who's online.
---
### Question 2: Should we stop invalidating cache for offline customers?
**Answer: NO - Current approach is correct!**
#### Why Current Approach is Optimal
**Option 1: Track online users and selective invalidation** ❌
```typescript
// BAD: Track who's online
if (userIsOnline(userId)) {
await catalogCache.invalidate(userId);
}
```
**Problems:**
- Complex: Need to track online users
- Race conditions: User might log in right after check
- Memory overhead: Store online user list
- Still need to invalidate on login anyway
- Doesn't save API calls
**Option 2: Current approach - Invalidate everything** ✅
```typescript
// GOOD: Simple global invalidation
await catalogCache.invalidateAllCatalogs();
```
**Benefits:**
- Simple: No tracking needed
- Correct: Data is always fresh when requested
- Efficient: Deleted cache uses 0 memory
- On-demand: Only fetches when user actually requests
---
### Question 3: How many API calls does CDC actually save?
Let me show you the **real numbers**:
#### Scenario: 100 Active Users, 10 Products in Catalog
##### WITHOUT CDC (TTL-based: 5 minutes)
```
Assumptions:
- Cache TTL: 5 minutes (300 seconds)
- Average user session: 30 minutes
- User checks catalog: 3 times per session
- Active users per day: 100
API Calls per User per Day:
- User logs in, cache is expired/empty
- Check 1: Cache miss → 1 API call → Cache populated
- After 5 minutes: Cache expires → DELETED
- Check 2: Cache miss → 1 API call → Cache populated
- After 5 minutes: Cache expires → DELETED
- Check 3: Cache miss → 1 API call → Cache populated
Total: 3 API calls per user per day
For 100 users:
- 100 users × 3 API calls = 300 API calls/day
- Per month: 300 × 30 = 9,000 API calls
```
##### WITH CDC (Event-driven: null TTL)
```
Assumptions:
- No TTL (cache lives forever until invalidated)
- Product changes: 5 times per day (realistic for production)
- Active users per day: 100
API Calls:
Day starts (8:00 AM):
- User 1 logs in → Cache miss → 1 API call → Cache populated
- Users 2-100 log in → Cache HIT → 0 API calls ✅
Product change at 10:00 AM:
- CDC invalidates cache → All cache DELETED
- Next user (User 23) → Cache miss → 1 API call → Cache populated
- Other users → Cache HIT → 0 API calls ✅
Product change at 2:00 PM:
- CDC invalidates cache → All cache DELETED
- Next user (User 67) → Cache miss → 1 API call → Cache populated
- Other users → Cache HIT → 0 API calls ✅
... (3 more product changes)
Total: 5 API calls per day (one per product change)
Per month: 5 × 30 = 150 API calls
```
#### Comparison
| Metric | TTL (5 min) | CDC (Event) | Savings |
|--------|-------------|-------------|---------|
| API calls/day | 300 | 5 | **98.3%** |
| API calls/month | 9,000 | 150 | **98.3%** |
| Cache hit ratio | ~0% | ~99% | - |
| Data freshness | Up to 5 min stale | < 5 sec stale | - |
**Savings: 8,850 API calls per month!** 🎉
---
### Question 4: Do we even need to call Salesforce API with CDC?
**YES - CDC events don't contain data, only notifications!**
#### What CDC Events Contain
```json
{
"payload": {
"Id": "01t5g000002AbcdEAC",
"Name": "Internet Home 1G",
"changeType": "UPDATE",
"changedFields": ["Name", "UnitPrice"],
"entityName": "Product2"
},
"replayId": 12345
}
```
**Notice:** CDC event only says "Product X changed" - it does NOT include the new values!
#### You Still Need to Fetch Data
```
CDC Event received
Invalidate cache (delete Redis key)
Customer requests catalog
Cache miss (key was deleted)
Fetch from Salesforce API ← STILL NEEDED
Store in cache
Return to customer
```
#### CDC vs Data Fetch
| What | Purpose | API Cost |
|------|---------|----------|
| **CDC Event** | Notification that data changed | 0.01 API calls* |
| **Salesforce Query** | Fetch actual data | 1 API call |
*CDC events count toward limits but at much lower rate
#### Why This is Still Efficient
**Without CDC:**
```
Every 5 minutes: Fetch from Salesforce (whether changed or not)
Result: 288 API calls/day per cached item
```
**With CDC:**
```
Only when data actually changes: Fetch from Salesforce
Product changes 5 times/day
First user after change: 1 API call
Other 99 users: Cache hit
Result: 5 API calls/day total
```
---
## 🚀 Optimization Strategies
Your current approach is already excellent, but here are some additional optimizations:
### Strategy 1: Hybrid TTL (Recommended) ✅
Add a **long backup TTL** to clean up unused cache entries:
```typescript
// Current: No TTL
private readonly CATALOG_TTL: number | null = null;
// Optimized: Add backup TTL
private readonly CATALOG_TTL: number | null = 86400; // 24 hours
private readonly STATIC_TTL: number | null = 604800; // 7 days
```
**Why?**
- **Primary invalidation:** CDC events (real-time)
- **Backup cleanup:** TTL removes unused entries after 24 hours
- **Memory efficient:** Old cache entries don't accumulate
- **Still event-driven:** Most invalidations happen via CDC
**Benefit:** Prevents memory bloat from abandoned cache entries
**Trade-off:** Minimal - active users hit cache before TTL expires
---
### Strategy 2: Cache Warming (Advanced) 🔥
Pre-populate cache when CDC event received:
```typescript
// Current: Invalidate and wait for next request
async handleProductEvent() {
await this.invalidateAllCatalogs(); // Delete cache
}
// Optimized: Invalidate AND warm cache
async handleProductEvent() {
this.logger.log("Product changed, warming cache");
// Invalidate old cache
await this.invalidateAllCatalogs();
// Warm cache with fresh data (background job)
await this.cacheWarmingService.warmCatalogCache();
}
```
**Implementation:**
```typescript
@Injectable()
export class CacheWarmingService {
async warmCatalogCache(): Promise<void> {
// Fetch fresh data in background
const [internet, sim, vpn] = await Promise.all([
this.internetCatalog.getPlans(),
this.simCatalog.getPlans(),
this.vpnCatalog.getPlans(),
]);
this.logger.log("Cache warmed with fresh data");
}
}
```
**Benefits:**
- Zero latency for first user after change
- Proactive data freshness
- Better user experience
**Costs:**
- 1 extra API call per CDC event (5/day = negligible)
- Background processing overhead
**When to use:**
- High-traffic applications
- Low latency requirements
- Salesforce API limit is not a concern
---
### Strategy 3: Selective Invalidation (Most Efficient) 🎯
Invalidate only affected cache keys instead of everything:
```typescript
// Current: Invalidate everything
async handleProductEvent(data: unknown) {
await this.invalidateAllCatalogs(); // Nukes all catalog cache
}
// Optimized: Invalidate only affected catalogs
async handleProductEvent(data: unknown) {
const payload = this.extractPayload(data);
const productId = this.extractStringField(payload, ["Id"]);
// Fetch product type to determine which catalog to invalidate
const productType = await this.getProductType(productId);
if (productType === "Internet") {
await this.cache.delPattern("catalog:internet:*");
} else if (productType === "SIM") {
await this.cache.delPattern("catalog:sim:*");
} else if (productType === "VPN") {
await this.cache.delPattern("catalog:vpn:*");
}
}
```
**Benefits:**
- More targeted invalidation
- Unaffected catalogs remain cached
- Even higher cache hit ratio
**Costs:**
- More complex logic
- Need to determine product type (might require API call)
- Edge cases (product changes type)
**Trade-off Analysis:**
- **Saves:** ~2 API calls per product change
- **Costs:** 1 API call to determine product type
- **Net savings:** ~1 API call per event
**Verdict:** Probably not worth the complexity for typical use cases
---
### Strategy 4: User-Specific Cache Keys (Advanced) 👥
Currently, your cache keys are **global** (shared by all users):
```typescript
// Current: Global cache key
buildCatalogKey("internet", "plans") // → "catalog:internet:plans"
```
**Problem with offline users:**
```
Catalog cache key: "catalog:internet:plans" (shared by ALL users)
- 100 users share same cache entry
- 1 offline user's cache doesn't matter (they don't request it)
- Cache is deleted when data changes (correct behavior)
```
**Alternative: User-specific cache keys:**
```typescript
// User-specific cache key
buildCatalogKey("internet", "plans", userId) // → "catalog:internet:plans:user123"
```
**Analysis:**
| Aspect | Global Keys | User-Specific Keys |
|--------|-------------|-------------------|
| Memory usage | Low (1 entry) | High (100 entries for 100 users) |
| API calls | 5/day total | 5/day per user = 500/day |
| Cache hit ratio | 99% | Lower (~70%) |
| CDC invalidation | Delete 1 key | Delete 100 keys |
| Offline user impact | None | Would need to track |
**Verdict:** ❌ Don't use user-specific keys for global catalog data
**When user-specific keys make sense:**
- Eligibility data (already user-specific in your code ✅)
- Order history (user-specific)
- Personal settings
---
## 📊 Recommended Configuration
Based on your architecture, here's my recommendation:
### Option A: Hybrid TTL (Recommended for Most Cases) ✅
```typescript
// apps/bff/src/modules/catalog/services/catalog-cache.service.ts
export class CatalogCacheService {
// Primary: CDC invalidation (real-time)
// Backup: TTL cleanup (memory management)
private readonly CATALOG_TTL = 86400; // 24 hours (backup)
private readonly STATIC_TTL = 604800; // 7 days (rarely changes)
private readonly ELIGIBILITY_TTL = 3600; // 1 hour (user-specific)
private readonly VOLATILE_TTL = 60; // 1 minute (real-time data)
}
```
**Rationale:**
- ✅ CDC provides real-time invalidation (primary mechanism)
- ✅ TTL provides backup cleanup (prevent memory bloat)
- ✅ Simple to implement (just change constants)
- ✅ No additional complexity
- ✅ 99%+ cache hit ratio maintained
**API Call Impact:**
- Active users: 0 additional calls (CDC handles invalidation)
- Inactive users: 0 additional calls (cache expired, user offline)
- Edge cases: ~1-2 additional calls/day (TTL expires before CDC event)
---
### Option B: Aggressive CDC-Only (Current Approach) ⚡
```typescript
// Keep current configuration
private readonly CATALOG_TTL: number | null = null; // No TTL
private readonly STATIC_TTL: number | null = null; // No TTL
private readonly ELIGIBILITY_TTL: number | null = null; // No TTL
```
**When to use:**
- Low traffic (memory not a concern)
- Frequent product changes (CDC invalidates often anyway)
- Maximum data freshness required
**Trade-off:**
- Unused cache entries never expire
- Memory usage grows over time
- Need Redis memory monitoring
---
### Option C: Cache Warming (High-Traffic Sites) 🔥
```typescript
// Combine Hybrid TTL + Cache Warming
export class CatalogCdcSubscriber {
async handleProductEvent() {
// 1. Invalidate cache
await this.catalogCache.invalidateAllCatalogs();
// 2. Warm cache (background)
this.cacheWarmingService.warmCatalogCache().catch(err => {
this.logger.warn("Cache warming failed", err);
});
}
}
```
**When to use:**
- High traffic (1000+ users/day)
- Zero latency requirement
- Salesforce API limits are generous
**Benefit:**
- First user after CDC event: 0ms latency (cache already warm)
- All users: Consistent performance
---
## 🎯 Final Recommendation
For your use case, I recommend **Option A: Hybrid TTL**:
```typescript
// Change these lines in catalog-cache.service.ts
private readonly CATALOG_TTL = 86400; // 24 hours (was: null)
private readonly STATIC_TTL = 604800; // 7 days (was: null)
private readonly ELIGIBILITY_TTL = 3600; // 1 hour (was: null)
private readonly VOLATILE_TTL = 60; // Keep as is
```
### Why This is Optimal
1. **Primary invalidation: CDC (real-time)**
- Product changes → Cache invalidated within 5 seconds
- 99% of invalidations happen via CDC
2. **Backup cleanup: TTL (memory management)**
- Unused cache entries expire after 24 hours
- Prevents memory bloat
- ~1% of invalidations happen via TTL
3. **Best of both worlds:**
- Real-time data freshness (CDC)
- Memory efficiency (TTL)
- Simple implementation (no complexity)
### API Usage with Hybrid TTL
```
100 active users, 10 products, 5 product changes/day
Daily API Calls:
- CDC invalidations: 5 events × 1 API call = 5 calls
- TTL expirations: ~2 calls (inactive users after 24h)
- Total: ~7 API calls/day
Monthly: ~210 API calls
Compare to TTL-only: 9,000 API calls/month
Savings: 97.7% ✅
```
---
## 📈 Monitoring
Add these metrics to track cache efficiency:
```typescript
export interface CatalogCacheMetrics {
invalidations: {
cdc: number; // Invalidations from CDC events
ttl: number; // Invalidations from TTL expiry
manual: number; // Manual invalidations
};
apiCalls: {
total: number; // Total Salesforce API calls
cacheMiss: number; // API calls due to cache miss
cacheHit: number; // Requests served from cache
};
cacheHitRatio: number; // Percentage of cache hits
}
```
**Healthy metrics:**
- Cache hit ratio: > 95%
- CDC invalidations: 5-10/day
- TTL invalidations: < 5/day
- API calls: < 20/day
---
## 🎓 Summary
**Your Questions Answered:**
1. **Offline customers:** ✅ Current approach is correct - CDC deletes cache, not keeps it
2. **Stop invalidating for offline?:** ❌ No - simpler and more correct to invalidate all
3. **API usage:** ✅ CDC saves 98%+ of API calls (9,000 → 150/month)
4. **Need Salesforce API?:** ✅ Yes - CDC notifies, API fetches data
**Recommended Configuration:**
```typescript
CATALOG_TTL = 86400 // 24 hours (backup cleanup)
STATIC_TTL = 604800 // 7 days
ELIGIBILITY_TTL = 3600 // 1 hour
VOLATILE_TTL = 60 // 1 minute
```
**Result:**
- 📉 98% reduction in API calls
- 🚀 < 5 second data freshness
- 💾 Memory-efficient (TTL cleanup)
- 🎯 Simple to maintain (no complexity)
Your CDC setup is **already excellent** - just add the backup TTL for memory management!