Assist_Design/docs/salesforce-shadow-sync-plan.md

85 lines
4.2 KiB
Markdown
Raw Normal View History

# Salesforce Shadow Data Sync Plan
## Objectives
- Reduce repetitive Salesforce reads for hot catalog and eligibility data.
- Provide resilient fallbacks when Salesforce limits are reached by serving data from Postgres shadow tables.
- Maintain data freshness within minutes via event-driven updates, with scheduled backstops.
## Scope
- **Catalog metadata**: `Product2`, `PricebookEntry`, add-on metadata (SIM/Internet/VPN).
- **Pricing snapshots**: Unit price, currency, and active flags per SKU.
- **Account eligibility**: `Account.Internet_Eligibility__c` and related readiness fields used by personalized catalogs.
## Proposed Schema (Postgres)
```sql
CREATE TABLE sf_product_shadow (
product_id TEXT PRIMARY KEY,
sku TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
item_class TEXT,
offering_type TEXT,
plan_tier TEXT,
vpn_region TEXT,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL,
raw_payload JSONB NOT NULL
);
CREATE TABLE sf_pricebook_shadow (
pricebook_entry_id TEXT PRIMARY KEY,
product_id TEXT NOT NULL REFERENCES sf_product_shadow(product_id) ON DELETE CASCADE,
pricebook_id TEXT NOT NULL,
unit_price NUMERIC(12,2) NOT NULL,
currency_iso_code TEXT NOT NULL,
is_active BOOLEAN NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL,
raw_payload JSONB NOT NULL
);
CREATE TABLE sf_account_eligibility_shadow (
account_id TEXT PRIMARY KEY,
internet_eligibility TEXT,
eligibility_source TEXT,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL,
raw_payload JSONB NOT NULL
);
```
## Sync Strategy
| Phase | Approach | Tooling |
| --- | --- | --- |
| Backfill | Bulk API v2 query for each object (Product2, PricebookEntry, Account) to seed tables. | New CLI job (`pnpm nx run bff:salesforce-backfill-shadow`) |
| Incremental updates | Subscribe to Platform Events or Change Data Capture streams for Product2, PricebookEntry, and Account. Push events onto existing SalesforceRequestQueue, enqueue to BullMQ worker that upserts into shadow tables. | Extend provisioning queue or add new `SF_SHADOW_SYNC` queue |
| Catch-up | Nightly scheduled Bulk API delta query (using `SystemModstamp`) to reconcile missed events. | Cron worker (same Bull queue) |
### Upsert Flow
1. Event payload arrives from Salesforce Pub/Sub → persisted to queue (reuse `SalesforceRequestQueueService` backoff).
2. Worker normalizes payload (maps relationship fields, handles deletions).
3. Performs PostgreSQL `INSERT ... ON CONFLICT` using transaction to keep product ↔ pricebook relationships consistent.
4. Invalidate Redis keys (`catalog:*`, `eligibility:*`) via `CatalogCacheService.invalidateAllCatalogs()` or targeted invalidation when specific SKU/account changes.
## Integration Points
- **Catalog services**: attempt to read from shadow tables via Prisma before falling back to Salesforce query; only hit Salesforce on cache miss _and_ shadow miss.
- **Eligibility lookup**: `InternetCatalogService.getPlansForUser` first loads from `sf_account_eligibility_shadow`; if stale (>15 min) fallback to Salesforce + refresh row asynchronously.
- **Order flows**: continue using live Salesforce (writes) but use shadow data for price lookups where possible.
## Monitoring & Alerts
- Add Prometheus counters: `sf_shadow_sync_events_total`, `sf_shadow_sync_failures_total`.
- Track lag metrics: `MAX(now() - updated_at)` per table.
- Hook into existing queue health endpoint to expose shadow worker backlog.
## Rollout Checklist
1. Implement schema migrations (SQL or Prisma) under feature flag.
2. Build bulk backfill command; run in staging, verify record counts vs Salesforce SOQL.
3. Enable event ingestion in staging, monitor for 48h, validate cache invalidation.
4. Update catalog services to prefer shadow reads; release behind environment variable `ENABLE_SF_SHADOW_READS`.
5. Roll to production gradually: run backfill, enable read flag, then enable event consumer.
6. Document operational runbooks (replay events, manual backfill, clearing caches).
## Open Questions
- Do we mirror additional fields (e.g., localization strings) needed for future UX changes?
- Should eligibility sync include other readiness signals (credit status, serviceability flags)?
- Confirm retention strategy for `raw_payload` column (e.g., prune older versions weekly).