Assist_Design/docs/operations/provisioning-runbook.md

152 lines
6.1 KiB
Markdown
Raw Normal View History

# Provisioning Runbook (Salesforce Platform Events → Portal → WHMCS)
This runbook helps operators diagnose issues in the order fulfillment path.
## Paths & Channels
- Salesforce Platform Event: `OrderProvisionRequested__e`
- Backend health: `GET /health`
## Required Env (Backend)
- `SF_LOGIN_URL`, `SF_CLIENT_ID`, `SF_USERNAME`
- `SF_PRIVATE_KEY_PATH` (prod: `/app/secrets/sf-private.key`)
- `SF_EVENTS_ENABLED=true`
- `SF_PROVISION_EVENT_CHANNEL=/event/OrderProvisionRequested__e`
- `SF_EVENTS_REPLAY=LATEST` (or `ALL`)
- `PORTAL_PRICEBOOK_ID`
## Common Symptoms and Fixes
- No events received
- Verify Flow publishes `OrderProvisionRequested__e` on Order approval
- Confirm the BFF has `SF_EVENTS_ENABLED=true` and valid SF JWT settings
- Check BFF logs for subscription start on the expected channel
- Event replays not advancing
- Ensure Redis is healthy; last `replayId` is stored under `sf:pe:replay:<channel>`
- If needed, set `SF_EVENTS_REPLAY=ALL` for a one-time backfill, then revert to `LATEST`
- 409 Payment method missing
- Customer has no WHMCS payment method
- Ask customer to add a payment method; retry fulfill
- WHMCS Add/Accept errors
- Check product mappings: `Product2.WH_Product_ID__c` and `Billing_Cycle__c`
- Backend logs show the item mapping report; fix missing mappings
- Salesforce status not updated
- Backend updates `Activation_Status__c` and `WHMCS_Order_ID__c` on success
- Verify connected app JWT config and that the API user has Order update permissions
## Verification Steps
1. In SF, create an Order with OrderItems
2. Approve Order → Flow sets `Activation_Status__c = Activating` and publishes `OrderProvisionRequested__e`
3. Check `/health`: database/redis connected, environment correct
4. Tail logs; confirm: Platform Event enqueued → Guard sees status=Activating → WHMCS add → WHMCS accept → Activated
5. Verify SF fields updated and WHMCS order/service IDs exist
## Logging Cheatsheet
- "Platform Event enqueued for provisioning" — subscriber enqueue
- "Starting fulfillment orchestration" — orchestrator start
- Step logs: `validation`, `sf_status_update`, `order_details`, `mapping`, `whmcs_create`, `whmcs_accept`, `sf_success_update`
- On error: orchestrator updates SF with `Activation_Status__c='Failed'`
## Security Notes
- No inbound Salesforce webhooks are used for provisioning.
- BFF authenticates to Salesforce via JWT; grant API access and Platform Event object read via Permission Set.
- No WHMCS webhooks are consumed; the portal uses the WHMCS API for billing operations.
- Health endpoint
- `/health` includes `integrations.redis` probe to confirm queue/replay storage availability.
## Ops: Manual Retry Flow
- Click "Provision / Retry" on the Order in Salesforce.
- If `Activation_Status__c = Activating`, show a toast "Already in progress".
- Else, set `Activation_Status__c = Activating`, clear last error fields, and let the RecordTriggered Flow publish the event.
Portal does not auto-retry jobs. Network/5xx/timeouts will mark the Order Failed with:
- `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT)
- `Activation_Error_Message__c` (short reason)
---
## Escalation Paths
| Condition | Escalation | Contact |
| ---------------------------- | --------------------- | --------------------------------------------------- |
| Issue persists >30 minutes | Salesforce admin | Check Flow configuration, Platform Event publishing |
| WHMCS returns 5xx >5 times | WHMCS hosting support | Server may be overloaded or down |
| Event replay doesn't recover | Development team | May need code investigation |
| Product mapping errors | Salesforce admin | Add missing `WH_Product_ID__c` values |
| Payment method issues | Customer support | Guide customer to add payment method in WHMCS |
For general incident response procedures, see [Incident Response Runbook](./incident-response.md).
---
## SLA Expectations
| Metric | Target | Warning | Critical |
| ----------------------- | ---------- | ----------- | ----------- |
| Provisioning completion | <5 seconds | >10 seconds | >30 seconds |
| Event processing delay | <1 second | >5 seconds | >30 seconds |
| Error rate | <1% | >1% | >5% |
### Performance Monitoring
- Monitor provisioning duration in logs (from "Platform Event enqueued" to "Activated")
- Track WHMCS API response times
- Alert on Salesforce update failures
---
## Manual Intervention Checklist
When automated retry fails, follow these steps:
1. **Check Salesforce Order**
- Open the Order in Salesforce
- Review `Activation_Status__c`, `Activation_Error_Code__c`, `Activation_Error_Message__c`
- Check if `WHMCS_Order_ID__c` was partially set
2. **Verify Customer Data**
- Confirm customer has valid WHMCS payment method via `GetPayMethods`
- Check `id_mappings` table for correct portal-WHMCS-SF linkage
3. **Validate Product Mappings**
- For each OrderItem, verify `Product2.WH_Product_ID__c` is set
- Verify `Product2.Billing_Cycle__c` matches WHMCS expectations
4. **Check BFF Logs**
- Search for the Salesforce Order ID in logs
- Identify the specific step that failed
- Look for external API errors (WHMCS, Salesforce)
5. **Manual Recovery**
- If WHMCS order was created but SF not updated:
- Manually update `WHMCS_Order_ID__c` and `Activation_Status__c` in Salesforce
- If WHMCS order was not created:
- Fix the root cause (payment method, mapping)
- Retry via Salesforce (set `Activation_Status__c = Activating`)
6. **Verify Resolution**
- Confirm Salesforce Order shows `Activated`
- Confirm WHMCS has the order and services
- Confirm customer can see their subscription in the portal
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Queue Management](./queue-management.md)
- [External Dependencies](./external-dependencies.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)