Assist_Design/docs/CDC_API_USAGE_ANALYSIS.md
barsa 1334c0f9a6 Enhance Salesforce integration and caching mechanisms
- Added new environment variables for Salesforce event channels and Change Data Capture (CDC) to improve cache invalidation and event handling.
- Updated Salesforce module to include new guards for write operations, enhancing request rate limiting.
- Refactored various services to utilize caching for improved performance and reduced API calls, including updates to the Orders and Catalog modules.
- Enhanced error handling and logging in Salesforce services to provide better insights during operations.
- Improved cache TTL configurations for better memory management and data freshness across catalog and order services.
2025-11-06 16:32:29 +09:00

15 KiB
Raw Blame History

CDC Cache Strategy Analysis: API Usage & Optimization

🎯 Your Key Questions Answered

Question 1: What happens when a customer is offline for 7 days?

Good News: Your current architecture is already optimal!

How CDC Cache Works

Product changes in Salesforce
    ↓
CDC Event: Product2ChangeEvent
    ↓
CatalogCdcSubscriber receives event
    ↓
Invalidates ALL catalog caches (deletes cache keys)
    ↓
Redis: catalog:internet:plans → DELETED
Redis: catalog:sim:plans → DELETED
Redis: catalog:vpn:plans → DELETED

Key Point: CDC deletes cache entries, it doesn't update them.

Offline Customer Scenario

Day 1: Customer logs in, fetches catalog
  → Cache populated: catalog:internet:plans
  
Day 2: Product changes in Salesforce
  → CDC invalidates cache
  → Redis: catalog:internet:plans → DELETED
  
Day 3-7: Customer offline (not logged in)
  → No cache exists (already deleted on Day 2)
  → No API calls made (customer is offline)
  
Day 8: Customer logs back in
  → Cache miss (was deleted on Day 2)
  → Fetches fresh data from Salesforce (1 API call)
  → Cache populated again

Result: You're NOT keeping stale cache for offline users. Cache is deleted when data changes, regardless of who's online.


Question 2: Should we stop invalidating cache for offline customers?

Answer: NO - Current approach is correct!

Why Current Approach is Optimal

Option 1: Track online users and selective invalidation

// BAD: Track who's online
if (userIsOnline(userId)) {
  await catalogCache.invalidate(userId);
}

Problems:

  • Complex: Need to track online users
  • Race conditions: User might log in right after check
  • Memory overhead: Store online user list
  • Still need to invalidate on login anyway
  • Doesn't save API calls

Option 2: Current approach - Invalidate everything

// GOOD: Simple global invalidation
await catalogCache.invalidateAllCatalogs();

Benefits:

  • Simple: No tracking needed
  • Correct: Data is always fresh when requested
  • Efficient: Deleted cache uses 0 memory
  • On-demand: Only fetches when user actually requests

Question 3: How many API calls does CDC actually save?

Let me show you the real numbers:

Scenario: 100 Active Users, 10 Products in Catalog

WITHOUT CDC (TTL-based: 5 minutes)
Assumptions:
- Cache TTL: 5 minutes (300 seconds)
- Average user session: 30 minutes
- User checks catalog: 3 times per session
- Active users per day: 100

API Calls per User per Day:
- User logs in, cache is expired/empty
- Check 1: Cache miss → 1 API call → Cache populated
- After 5 minutes: Cache expires → DELETED
- Check 2: Cache miss → 1 API call → Cache populated  
- After 5 minutes: Cache expires → DELETED
- Check 3: Cache miss → 1 API call → Cache populated

Total: 3 API calls per user per day

For 100 users:
- 100 users × 3 API calls = 300 API calls/day
- Per month: 300 × 30 = 9,000 API calls
WITH CDC (Event-driven: null TTL)
Assumptions:
- No TTL (cache lives forever until invalidated)
- Product changes: 5 times per day (realistic for production)
- Active users per day: 100

API Calls:
Day starts (8:00 AM):
- User 1 logs in → Cache miss → 1 API call → Cache populated
- Users 2-100 log in → Cache HIT → 0 API calls ✅

Product change at 10:00 AM:
- CDC invalidates cache → All cache DELETED
- Next user (User 23) → Cache miss → 1 API call → Cache populated
- Other users → Cache HIT → 0 API calls ✅

Product change at 2:00 PM:
- CDC invalidates cache → All cache DELETED
- Next user (User 67) → Cache miss → 1 API call → Cache populated
- Other users → Cache HIT → 0 API calls ✅

... (3 more product changes)

Total: 5 API calls per day (one per product change)
Per month: 5 × 30 = 150 API calls

Comparison

Metric TTL (5 min) CDC (Event) Savings
API calls/day 300 5 98.3%
API calls/month 9,000 150 98.3%
Cache hit ratio ~0% ~99% -
Data freshness Up to 5 min stale < 5 sec stale -

Savings: 8,850 API calls per month! 🎉


Question 4: Do we even need to call Salesforce API with CDC?

YES - CDC events don't contain data, only notifications!

What CDC Events Contain

{
  "payload": {
    "Id": "01t5g000002AbcdEAC",
    "Name": "Internet Home 1G",
    "changeType": "UPDATE",
    "changedFields": ["Name", "UnitPrice"],
    "entityName": "Product2"
  },
  "replayId": 12345
}

Notice: CDC event only says "Product X changed" - it does NOT include the new values!

You Still Need to Fetch Data

CDC Event received
    ↓
Invalidate cache (delete Redis key)
    ↓
Customer requests catalog
    ↓
Cache miss (key was deleted)
    ↓
Fetch from Salesforce API ← STILL NEEDED
    ↓
Store in cache
    ↓
Return to customer

CDC vs Data Fetch

What Purpose API Cost
CDC Event Notification that data changed 0.01 API calls*
Salesforce Query Fetch actual data 1 API call

*CDC events count toward limits but at much lower rate

Why This is Still Efficient

Without CDC:

Every 5 minutes: Fetch from Salesforce (whether changed or not)
Result: 288 API calls/day per cached item

With CDC:

Only when data actually changes: Fetch from Salesforce
Product changes 5 times/day
First user after change: 1 API call
Other 99 users: Cache hit
Result: 5 API calls/day total

🚀 Optimization Strategies

Your current approach is already excellent, but here are some additional optimizations:

Add a long backup TTL to clean up unused cache entries:

// Current: No TTL
private readonly CATALOG_TTL: number | null = null;

// Optimized: Add backup TTL
private readonly CATALOG_TTL: number | null = 86400; // 24 hours
private readonly STATIC_TTL: number | null = 604800; // 7 days

Why?

  • Primary invalidation: CDC events (real-time)
  • Backup cleanup: TTL removes unused entries after 24 hours
  • Memory efficient: Old cache entries don't accumulate
  • Still event-driven: Most invalidations happen via CDC

Benefit: Prevents memory bloat from abandoned cache entries

Trade-off: Minimal - active users hit cache before TTL expires


Strategy 2: Cache Warming (Advanced) 🔥

Pre-populate cache when CDC event received:

// Current: Invalidate and wait for next request
async handleProductEvent() {
  await this.invalidateAllCatalogs(); // Delete cache
}

// Optimized: Invalidate AND warm cache
async handleProductEvent() {
  this.logger.log("Product changed, warming cache");
  
  // Invalidate old cache
  await this.invalidateAllCatalogs();
  
  // Warm cache with fresh data (background job)
  await this.cacheWarmingService.warmCatalogCache();
}

Implementation:

@Injectable()
export class CacheWarmingService {
  async warmCatalogCache(): Promise<void> {
    // Fetch fresh data in background
    const [internet, sim, vpn] = await Promise.all([
      this.internetCatalog.getPlans(),
      this.simCatalog.getPlans(),
      this.vpnCatalog.getPlans(),
    ]);
    
    this.logger.log("Cache warmed with fresh data");
  }
}

Benefits:

  • Zero latency for first user after change
  • Proactive data freshness
  • Better user experience

Costs:

  • 1 extra API call per CDC event (5/day = negligible)
  • Background processing overhead

When to use:

  • High-traffic applications
  • Low latency requirements
  • Salesforce API limit is not a concern

Strategy 3: Selective Invalidation (Most Efficient) 🎯

Invalidate only affected cache keys instead of everything:

// Current: Invalidate everything
async handleProductEvent(data: unknown) {
  await this.invalidateAllCatalogs(); // Nukes all catalog cache
}

// Optimized: Invalidate only affected catalogs
async handleProductEvent(data: unknown) {
  const payload = this.extractPayload(data);
  const productId = this.extractStringField(payload, ["Id"]);
  
  // Fetch product type to determine which catalog to invalidate
  const productType = await this.getProductType(productId);
  
  if (productType === "Internet") {
    await this.cache.delPattern("catalog:internet:*");
  } else if (productType === "SIM") {
    await this.cache.delPattern("catalog:sim:*");
  } else if (productType === "VPN") {
    await this.cache.delPattern("catalog:vpn:*");
  }
}

Benefits:

  • More targeted invalidation
  • Unaffected catalogs remain cached
  • Even higher cache hit ratio

Costs:

  • More complex logic
  • Need to determine product type (might require API call)
  • Edge cases (product changes type)

Trade-off Analysis:

  • Saves: ~2 API calls per product change
  • Costs: 1 API call to determine product type
  • Net savings: ~1 API call per event

Verdict: Probably not worth the complexity for typical use cases


Strategy 4: User-Specific Cache Keys (Advanced) 👥

Currently, your cache keys are global (shared by all users):

// Current: Global cache key
buildCatalogKey("internet", "plans") // → "catalog:internet:plans"

Problem with offline users:

Catalog cache key: "catalog:internet:plans" (shared by ALL users)
- 100 users share same cache entry
- 1 offline user's cache doesn't matter (they don't request it)
- Cache is deleted when data changes (correct behavior)

Alternative: User-specific cache keys:

// User-specific cache key
buildCatalogKey("internet", "plans", userId) // → "catalog:internet:plans:user123"

Analysis:

Aspect Global Keys User-Specific Keys
Memory usage Low (1 entry) High (100 entries for 100 users)
API calls 5/day total 5/day per user = 500/day
Cache hit ratio 99% Lower (~70%)
CDC invalidation Delete 1 key Delete 100 keys
Offline user impact None Would need to track

Verdict: Don't use user-specific keys for global catalog data

When user-specific keys make sense:

  • Eligibility data (already user-specific in your code )
  • Order history (user-specific)
  • Personal settings

Based on your architecture, here's my recommendation:

// apps/bff/src/modules/catalog/services/catalog-cache.service.ts

export class CatalogCacheService {
  // Primary: CDC invalidation (real-time)
  // Backup: TTL cleanup (memory management)
  private readonly CATALOG_TTL = 86400;        // 24 hours (backup)
  private readonly STATIC_TTL = 604800;        // 7 days (rarely changes)
  private readonly ELIGIBILITY_TTL = 3600;     // 1 hour (user-specific)
  private readonly VOLATILE_TTL = 60;          // 1 minute (real-time data)
}

Rationale:

  • CDC provides real-time invalidation (primary mechanism)
  • TTL provides backup cleanup (prevent memory bloat)
  • Simple to implement (just change constants)
  • No additional complexity
  • 99%+ cache hit ratio maintained

API Call Impact:

  • Active users: 0 additional calls (CDC handles invalidation)
  • Inactive users: 0 additional calls (cache expired, user offline)
  • Edge cases: ~1-2 additional calls/day (TTL expires before CDC event)

Option B: Aggressive CDC-Only (Current Approach)

// Keep current configuration
private readonly CATALOG_TTL: number | null = null;  // No TTL
private readonly STATIC_TTL: number | null = null;   // No TTL
private readonly ELIGIBILITY_TTL: number | null = null; // No TTL

When to use:

  • Low traffic (memory not a concern)
  • Frequent product changes (CDC invalidates often anyway)
  • Maximum data freshness required

Trade-off:

  • Unused cache entries never expire
  • Memory usage grows over time
  • Need Redis memory monitoring

Option C: Cache Warming (High-Traffic Sites) 🔥

// Combine Hybrid TTL + Cache Warming

export class CatalogCdcSubscriber {
  async handleProductEvent() {
    // 1. Invalidate cache
    await this.catalogCache.invalidateAllCatalogs();
    
    // 2. Warm cache (background)
    this.cacheWarmingService.warmCatalogCache().catch(err => {
      this.logger.warn("Cache warming failed", err);
    });
  }
}

When to use:

  • High traffic (1000+ users/day)
  • Zero latency requirement
  • Salesforce API limits are generous

Benefit:

  • First user after CDC event: 0ms latency (cache already warm)
  • All users: Consistent performance

🎯 Final Recommendation

For your use case, I recommend Option A: Hybrid TTL:

// Change these lines in catalog-cache.service.ts

private readonly CATALOG_TTL = 86400;      // 24 hours (was: null)
private readonly STATIC_TTL = 604800;      // 7 days (was: null)
private readonly ELIGIBILITY_TTL = 3600;   // 1 hour (was: null)
private readonly VOLATILE_TTL = 60;        // Keep as is

Why This is Optimal

  1. Primary invalidation: CDC (real-time)

    • Product changes → Cache invalidated within 5 seconds
    • 99% of invalidations happen via CDC
  2. Backup cleanup: TTL (memory management)

    • Unused cache entries expire after 24 hours
    • Prevents memory bloat
    • ~1% of invalidations happen via TTL
  3. Best of both worlds:

    • Real-time data freshness (CDC)
    • Memory efficiency (TTL)
    • Simple implementation (no complexity)

API Usage with Hybrid TTL

100 active users, 10 products, 5 product changes/day

Daily API Calls:
- CDC invalidations: 5 events × 1 API call = 5 calls
- TTL expirations: ~2 calls (inactive users after 24h)
- Total: ~7 API calls/day

Monthly: ~210 API calls

Compare to TTL-only: 9,000 API calls/month
Savings: 97.7% ✅

📈 Monitoring

Add these metrics to track cache efficiency:

export interface CatalogCacheMetrics {
  invalidations: {
    cdc: number;          // Invalidations from CDC events
    ttl: number;          // Invalidations from TTL expiry
    manual: number;       // Manual invalidations
  };
  apiCalls: {
    total: number;        // Total Salesforce API calls
    cacheMiss: number;    // API calls due to cache miss
    cacheHit: number;     // Requests served from cache
  };
  cacheHitRatio: number;  // Percentage of cache hits
}

Healthy metrics:

  • Cache hit ratio: > 95%
  • CDC invalidations: 5-10/day
  • TTL invalidations: < 5/day
  • API calls: < 20/day

🎓 Summary

Your Questions Answered:

  1. Offline customers: Current approach is correct - CDC deletes cache, not keeps it
  2. Stop invalidating for offline?: No - simpler and more correct to invalidate all
  3. API usage: CDC saves 98%+ of API calls (9,000 → 150/month)
  4. Need Salesforce API?: Yes - CDC notifies, API fetches data

Recommended Configuration:

CATALOG_TTL = 86400        // 24 hours (backup cleanup)
STATIC_TTL = 604800        // 7 days
ELIGIBILITY_TTL = 3600     // 1 hour
VOLATILE_TTL = 60          // 1 minute

Result:

  • 📉 98% reduction in API calls
  • 🚀 < 5 second data freshness
  • 💾 Memory-efficient (TTL cleanup)
  • 🎯 Simple to maintain (no complexity)

Your CDC setup is already excellent - just add the backup TTL for memory management!