# External Dependencies Runbook This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal. --- ## System Overview | System | Purpose | Integration | Health Check | | ---------------------- | -------------------------------- | -------------------------- | --------------- | | **Salesforce** | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test | | **WHMCS** | Billing, Payments, Subscriptions | REST API | API action test | | **Freebit** | SIM Management | REST API | OEM auth test | | **SFTP (fs.mvno.net)** | Call/SMS Records | SFTP | Connection test | | **Redis** | Cache, Sessions, Queues | Direct connection | PING command | | **PostgreSQL** | User data, Mappings | Direct connection | Query test | --- ## Salesforce ### Configuration | Variable | Description | | ---------------------------- | ------------------------------------------------------- | | `SF_LOGIN_URL` | Login URL (login.salesforce.com or test.salesforce.com) | | `SF_CLIENT_ID` | Connected App Consumer Key | | `SF_USERNAME` | Integration user username | | `SF_PRIVATE_KEY_PATH` | Path to JWT private key | | `SF_EVENTS_ENABLED` | Enable Platform Event subscription | | `SF_PROVISION_EVENT_CHANNEL` | Platform Event channel for provisioning | | `PORTAL_PRICEBOOK_ID` | Salesforce Pricebook ID for catalog | ### Health Check ```bash # Check Salesforce connectivity via BFF health endpoint curl http://localhost:4000/health | jq '.' # Test JWT authentication manually # The BFF authenticates automatically; check logs for auth errors grep "Salesforce" /var/log/bff/combined.log | tail -20 ``` ### Common Issues **JWT Authentication Failure** - Verify private key file exists and is readable - Check Connected App settings in Salesforce - Ensure integration user is pre-authorized for Connected App - Verify `SF_USERNAME` matches the user assigned to Connected App **Platform Events Not Receiving** - Check `SF_EVENTS_ENABLED=true` - Verify Platform Event permissions for integration user - Check Redis for replay ID: `redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"` - Set `SF_EVENTS_REPLAY=ALL` temporarily to catch up on missed events **API Limits** - Salesforce has daily API call limits - Monitor usage in Salesforce Setup > API Usage - Consider caching frequently accessed data ### Expected Response Times | Operation | Expected | Alert Threshold | | -------------- | --------- | --------------- | | Query | <500ms | >2s | | Update | <1s | >3s | | Platform Event | Real-time | >5s delay | --- ## WHMCS ### Configuration | Variable | Description | | -------------------------------- | ----------------------------------- | | `WHMCS_API_URL` | WHMCS API endpoint URL | | `WHMCS_API_IDENTIFIER` | API credentials identifier | | `WHMCS_API_SECRET` | API credentials secret | | `WHMCS_CUSTOMER_NUMBER_FIELD_ID` | Custom field ID for Customer Number | ### Health Check ```bash # Test WHMCS API directly curl -X POST "$WHMCS_API_URL" \ -d "identifier=$WHMCS_API_IDENTIFIER" \ -d "secret=$WHMCS_API_SECRET" \ -d "action=GetClients" \ -d "responsetype=json" \ -d "limitnum=1" # Should return: {"result":"success","totalresults":...} ``` ### Common Issues **Authentication Failure** - Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials - Check IP whitelist settings (if configured) - Ensure API credentials have required permissions **Rate Limiting** - WHMCS may rate limit excessive requests - Check for 429 responses in logs - Implement request queuing if needed **Field Mapping Issues** - Payment method fields may use different names between WHMCS versions - Check [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) for field mapping ### Expected Response Times | Operation | Expected | Alert Threshold | | ----------- | -------- | --------------- | | GetInvoices | <500ms | >2s | | AddOrder | <1s | >3s | | AcceptOrder | <1s | >3s | | SSO Token | <500ms | >2s | --- ## Freebit ### Configuration | Variable | Description | | ------------------ | ---------------------- | | `FREEBIT_BASE_URL` | Freebit API base URL | | `FREEBIT_OEM_ID` | OEM identifier | | `FREEBIT_OEM_KEY` | OEM authentication key | | `FREEBIT_TIMEOUT` | Request timeout (ms) | ### Health Check ```bash # Check Freebit OEM authentication # The BFF handles auth automatically; check logs for auth errors grep "Freebit" /var/log/bff/combined.log | tail -20 # Check for auth token in cache redis-cli GET "freebit:auth:token" ``` ### Common Issues **OEM Authentication Failure** - Verify `FREEBIT_OEM_ID` and `FREEBIT_OEM_KEY` - Check Freebit API endpoint accessibility - Auth tokens are cached; clear cache if credentials changed **SIM Operations Failing** - Verify SIM account identifier (phone number) format - Check 30-minute operation gap requirements - See [Freebit SIM Management](../integrations/sim/freebit.md) for operation constraints **Network Type Changes Delayed** - Network type changes are queued with 30-minute delay - Check BullMQ queue for pending jobs ### Expected Response Times | Operation | Expected | Alert Threshold | | ------------- | -------- | --------------- | | Auth (cached) | <100ms | >500ms | | GetDetail | <1s | >3s | | Plan Change | <2s | >5s | | Top-up | <2s | >5s | --- ## SFTP (fs.mvno.net) ### Configuration | Variable | Description | | ----------------------- | ----------------------- | | `SFTP_HOST` | SFTP server hostname | | `SFTP_PORT` | SFTP port (default: 22) | | `SFTP_USERNAME` | SFTP username | | `SFTP_PRIVATE_KEY_PATH` | Path to SSH private key | ### Health Check ```bash # Test SFTP connectivity sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF ls exit EOF ``` ### Common Issues **Connection Refused** - Verify SFTP server is accessible - Check firewall rules - Verify SSH key fingerprint **Authentication Failure** - Verify SSH private key is correct - Check key permissions (should be 600) - Ensure public key is authorized on SFTP server **Files Not Found** - Call/SMS records are available 2 months behind current date - File naming: `PASI_talk-detail-YYYYMM.csv`, `PASI_sms-detail-YYYYMM.csv` ### Data Availability | Record Type | Availability | File Pattern | | ------------ | --------------- | ----------------------------- | | Call Details | 2 months behind | `PASI_talk-detail-YYYYMM.csv` | | SMS Details | 2 months behind | `PASI_sms-detail-YYYYMM.csv` | --- ## Credential Rotation ### Salesforce JWT Key Rotation 1. Generate new key pair 2. Upload new public key to Connected App 3. Update `SF_PRIVATE_KEY_PATH` or `SF_PRIVATE_KEY_BASE64` 4. Deploy and verify authentication 5. Remove old key from Connected App after verification ### WHMCS API Credentials Rotation 1. Create new API credentials in WHMCS Admin 2. Update `WHMCS_API_IDENTIFIER` and `WHMCS_API_SECRET` 3. Deploy and verify API calls work 4. Disable old API credentials ### Freebit Key Rotation 1. Request new OEM key from Freebit 2. Update `FREEBIT_OEM_KEY` 3. Clear cached auth token: `redis-cli DEL "freebit:auth:token"` 4. Deploy and verify authentication ### SSH Key Rotation (SFTP) 1. Generate new SSH key pair 2. Provide public key to SFTP administrator 3. Wait for key to be authorized 4. Update `SFTP_PRIVATE_KEY_PATH` 5. Test connectivity 6. Request old key removal from SFTP server --- ## Monitoring Recommendations ### Alerting Thresholds | System | Metric | Warning | Critical | | ---------- | ------------- | ------- | -------- | | Salesforce | Response time | >2s | >5s | | Salesforce | Error rate | >1% | >5% | | WHMCS | Response time | >2s | >5s | | WHMCS | Error rate | >1% | >5% | | Freebit | Response time | >3s | >10s | | Redis | Response time | >100ms | >500ms | | PostgreSQL | Response time | >500ms | >2s | ### Key Metrics to Monitor - External API response times - Error rates per integration - Authentication success/failure rates - Cache hit rates - Queue depths (for async operations) ### Health Check Schedule | System | Check Frequency | Method | | ---------- | ---------------- | ------------------ | | Salesforce | Every 5 minutes | Query test | | WHMCS | Every 5 minutes | GetClients call | | Freebit | Every 15 minutes | Auth token refresh | | Redis | Every 1 minute | PING | | PostgreSQL | Every 1 minute | SELECT 1 | | SFTP | Every 1 hour | Connection test | --- ## Fallback Behaviors | System Down | User Impact | Fallback | | ----------- | ----------------------- | ------------------------------------ | | Salesforce | No orders, no catalog | Show cached catalog, queue orders | | WHMCS | No billing, no payments | Show cached invoices, block checkout | | Freebit | No SIM management | Show cached data, disable actions | | Redis | Slow performance | Direct API calls (no cache) | | PostgreSQL | Portal unusable | Display maintenance message | --- ## Related Documents - [Incident Response](./incident-response.md) - [Provisioning Runbook](./provisioning-runbook.md) - [Salesforce Requirements](../integrations/salesforce/requirements.md) - [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) - [Freebit SIM Management](../integrations/sim/freebit.md) --- **Last Updated:** December 2025