Assist_Design/docs/operations/provisioning-runbook.md
barsa 72d0b66be7 Enhance Documentation Structure and Update Operational Runbooks
- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management.
- Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources.
- Removed the deprecated disabled-modules.md file to streamline documentation.
- Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025.
- Updated various references in the documentation to reflect the new paths and services in the integrations directory.
2025-12-23 15:55:58 +09:00

152 lines
6.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Provisioning Runbook (Salesforce Platform Events → Portal → WHMCS)
This runbook helps operators diagnose issues in the order fulfillment path.
## Paths & Channels
- Salesforce Platform Event: `OrderProvisionRequested__e`
- Backend health: `GET /health`
## Required Env (Backend)
- `SF_LOGIN_URL`, `SF_CLIENT_ID`, `SF_USERNAME`
- `SF_PRIVATE_KEY_PATH` (prod: `/app/secrets/sf-private.key`)
- `SF_EVENTS_ENABLED=true`
- `SF_PROVISION_EVENT_CHANNEL=/event/OrderProvisionRequested__e`
- `SF_EVENTS_REPLAY=LATEST` (or `ALL`)
- `PORTAL_PRICEBOOK_ID`
## Common Symptoms and Fixes
- No events received
- Verify Flow publishes `OrderProvisionRequested__e` on Order approval
- Confirm the BFF has `SF_EVENTS_ENABLED=true` and valid SF JWT settings
- Check BFF logs for subscription start on the expected channel
- Event replays not advancing
- Ensure Redis is healthy; last `replayId` is stored under `sf:pe:replay:<channel>`
- If needed, set `SF_EVENTS_REPLAY=ALL` for a one-time backfill, then revert to `LATEST`
- 409 Payment method missing
- Customer has no WHMCS payment method
- Ask customer to add a payment method; retry fulfill
- WHMCS Add/Accept errors
- Check product mappings: `Product2.WH_Product_ID__c` and `Billing_Cycle__c`
- Backend logs show the item mapping report; fix missing mappings
- Salesforce status not updated
- Backend updates `Activation_Status__c` and `WHMCS_Order_ID__c` on success
- Verify connected app JWT config and that the API user has Order update permissions
## Verification Steps
1. In SF, create an Order with OrderItems
2. Approve Order → Flow sets `Activation_Status__c = Activating` and publishes `OrderProvisionRequested__e`
3. Check `/health`: database/redis connected, environment correct
4. Tail logs; confirm: Platform Event enqueued → Guard sees status=Activating → WHMCS add → WHMCS accept → Activated
5. Verify SF fields updated and WHMCS order/service IDs exist
## Logging Cheatsheet
- "Platform Event enqueued for provisioning" — subscriber enqueue
- "Starting fulfillment orchestration" — orchestrator start
- Step logs: `validation`, `sf_status_update`, `order_details`, `mapping`, `whmcs_create`, `whmcs_accept`, `sf_success_update`
- On error: orchestrator updates SF with `Activation_Status__c='Failed'`
## Security Notes
- No inbound Salesforce webhooks are used for provisioning.
- BFF authenticates to Salesforce via JWT; grant API access and Platform Event object read via Permission Set.
- No WHMCS webhooks are consumed; the portal uses the WHMCS API for billing operations.
- Health endpoint
- `/health` includes `integrations.redis` probe to confirm queue/replay storage availability.
## Ops: Manual Retry Flow
- Click "Provision / Retry" on the Order in Salesforce.
- If `Activation_Status__c = Activating`, show a toast "Already in progress".
- Else, set `Activation_Status__c = Activating`, clear last error fields, and let the RecordTriggered Flow publish the event.
Portal does not auto-retry jobs. Network/5xx/timeouts will mark the Order Failed with:
- `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT)
- `Activation_Error_Message__c` (short reason)
---
## Escalation Paths
| Condition | Escalation | Contact |
| ---------------------------- | --------------------- | --------------------------------------------------- |
| Issue persists >30 minutes | Salesforce admin | Check Flow configuration, Platform Event publishing |
| WHMCS returns 5xx >5 times | WHMCS hosting support | Server may be overloaded or down |
| Event replay doesn't recover | Development team | May need code investigation |
| Product mapping errors | Salesforce admin | Add missing `WH_Product_ID__c` values |
| Payment method issues | Customer support | Guide customer to add payment method in WHMCS |
For general incident response procedures, see [Incident Response Runbook](./incident-response.md).
---
## SLA Expectations
| Metric | Target | Warning | Critical |
| ----------------------- | ---------- | ----------- | ----------- |
| Provisioning completion | <5 seconds | >10 seconds | >30 seconds |
| Event processing delay | <1 second | >5 seconds | >30 seconds |
| Error rate | <1% | >1% | >5% |
### Performance Monitoring
- Monitor provisioning duration in logs (from "Platform Event enqueued" to "Activated")
- Track WHMCS API response times
- Alert on Salesforce update failures
---
## Manual Intervention Checklist
When automated retry fails, follow these steps:
1. **Check Salesforce Order**
- Open the Order in Salesforce
- Review `Activation_Status__c`, `Activation_Error_Code__c`, `Activation_Error_Message__c`
- Check if `WHMCS_Order_ID__c` was partially set
2. **Verify Customer Data**
- Confirm customer has valid WHMCS payment method via `GetPayMethods`
- Check `id_mappings` table for correct portal-WHMCS-SF linkage
3. **Validate Product Mappings**
- For each OrderItem, verify `Product2.WH_Product_ID__c` is set
- Verify `Product2.Billing_Cycle__c` matches WHMCS expectations
4. **Check BFF Logs**
- Search for the Salesforce Order ID in logs
- Identify the specific step that failed
- Look for external API errors (WHMCS, Salesforce)
5. **Manual Recovery**
- If WHMCS order was created but SF not updated:
- Manually update `WHMCS_Order_ID__c` and `Activation_Status__c` in Salesforce
- If WHMCS order was not created:
- Fix the root cause (payment method, mapping)
- Retry via Salesforce (set `Activation_Status__c = Activating`)
6. **Verify Resolution**
- Confirm Salesforce Order shows `Activated`
- Confirm WHMCS has the order and services
- Confirm customer can see their subscription in the portal
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Queue Management](./queue-management.md)
- [External Dependencies](./external-dependencies.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)