Assist_Design/docs/operations/provisioning-runbook.md
barsa 72d0b66be7 Enhance Documentation Structure and Update Operational Runbooks
- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management.
- Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources.
- Removed the deprecated disabled-modules.md file to streamline documentation.
- Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025.
- Updated various references in the documentation to reflect the new paths and services in the integrations directory.
2025-12-23 15:55:58 +09:00

6.1 KiB
Raw Blame History

Provisioning Runbook (Salesforce Platform Events → Portal → WHMCS)

This runbook helps operators diagnose issues in the order fulfillment path.

Paths & Channels

  • Salesforce Platform Event: OrderProvisionRequested__e
  • Backend health: GET /health

Required Env (Backend)

  • SF_LOGIN_URL, SF_CLIENT_ID, SF_USERNAME
  • SF_PRIVATE_KEY_PATH (prod: /app/secrets/sf-private.key)
  • SF_EVENTS_ENABLED=true
  • SF_PROVISION_EVENT_CHANNEL=/event/OrderProvisionRequested__e
  • SF_EVENTS_REPLAY=LATEST (or ALL)
  • PORTAL_PRICEBOOK_ID

Common Symptoms and Fixes

  • No events received

    • Verify Flow publishes OrderProvisionRequested__e on Order approval
    • Confirm the BFF has SF_EVENTS_ENABLED=true and valid SF JWT settings
    • Check BFF logs for subscription start on the expected channel
  • Event replays not advancing

    • Ensure Redis is healthy; last replayId is stored under sf:pe:replay:<channel>
    • If needed, set SF_EVENTS_REPLAY=ALL for a one-time backfill, then revert to LATEST
  • 409 Payment method missing

    • Customer has no WHMCS payment method
    • Ask customer to add a payment method; retry fulfill
  • WHMCS Add/Accept errors

    • Check product mappings: Product2.WH_Product_ID__c and Billing_Cycle__c
    • Backend logs show the item mapping report; fix missing mappings
  • Salesforce status not updated

    • Backend updates Activation_Status__c and WHMCS_Order_ID__c on success
    • Verify connected app JWT config and that the API user has Order update permissions

Verification Steps

  1. In SF, create an Order with OrderItems
  2. Approve Order → Flow sets Activation_Status__c = Activating and publishes OrderProvisionRequested__e
  3. Check /health: database/redis connected, environment correct
  4. Tail logs; confirm: Platform Event enqueued → Guard sees status=Activating → WHMCS add → WHMCS accept → Activated
  5. Verify SF fields updated and WHMCS order/service IDs exist

Logging Cheatsheet

  • "Platform Event enqueued for provisioning" — subscriber enqueue
  • "Starting fulfillment orchestration" — orchestrator start
  • Step logs: validation, sf_status_update, order_details, mapping, whmcs_create, whmcs_accept, sf_success_update
  • On error: orchestrator updates SF with Activation_Status__c='Failed'

Security Notes

  • No inbound Salesforce webhooks are used for provisioning.

  • BFF authenticates to Salesforce via JWT; grant API access and Platform Event object read via Permission Set.

  • No WHMCS webhooks are consumed; the portal uses the WHMCS API for billing operations.

  • Health endpoint

    • /health includes integrations.redis probe to confirm queue/replay storage availability.

Ops: Manual Retry Flow

  • Click "Provision / Retry" on the Order in Salesforce.
    • If Activation_Status__c = Activating, show a toast "Already in progress".
    • Else, set Activation_Status__c = Activating, clear last error fields, and let the RecordTriggered Flow publish the event.

Portal does not auto-retry jobs. Network/5xx/timeouts will mark the Order Failed with:

  • Activation_Error_Code__c (e.g., 429, 503, ETIMEOUT)
  • Activation_Error_Message__c (short reason)

Escalation Paths

Condition Escalation Contact
Issue persists >30 minutes Salesforce admin Check Flow configuration, Platform Event publishing
WHMCS returns 5xx >5 times WHMCS hosting support Server may be overloaded or down
Event replay doesn't recover Development team May need code investigation
Product mapping errors Salesforce admin Add missing WH_Product_ID__c values
Payment method issues Customer support Guide customer to add payment method in WHMCS

For general incident response procedures, see Incident Response Runbook.


SLA Expectations

Metric Target Warning Critical
Provisioning completion <5 seconds >10 seconds >30 seconds
Event processing delay <1 second >5 seconds >30 seconds
Error rate <1% >1% >5%

Performance Monitoring

  • Monitor provisioning duration in logs (from "Platform Event enqueued" to "Activated")
  • Track WHMCS API response times
  • Alert on Salesforce update failures

Manual Intervention Checklist

When automated retry fails, follow these steps:

  1. Check Salesforce Order

    • Open the Order in Salesforce
    • Review Activation_Status__c, Activation_Error_Code__c, Activation_Error_Message__c
    • Check if WHMCS_Order_ID__c was partially set
  2. Verify Customer Data

    • Confirm customer has valid WHMCS payment method via GetPayMethods
    • Check id_mappings table for correct portal-WHMCS-SF linkage
  3. Validate Product Mappings

    • For each OrderItem, verify Product2.WH_Product_ID__c is set
    • Verify Product2.Billing_Cycle__c matches WHMCS expectations
  4. Check BFF Logs

    • Search for the Salesforce Order ID in logs
    • Identify the specific step that failed
    • Look for external API errors (WHMCS, Salesforce)
  5. Manual Recovery

    • If WHMCS order was created but SF not updated:
      • Manually update WHMCS_Order_ID__c and Activation_Status__c in Salesforce
    • If WHMCS order was not created:
      • Fix the root cause (payment method, mapping)
      • Retry via Salesforce (set Activation_Status__c = Activating)
  6. Verify Resolution

    • Confirm Salesforce Order shows Activated
    • Confirm WHMCS has the order and services
    • Confirm customer can see their subscription in the portal