- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management. - Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources. - Removed the deprecated disabled-modules.md file to streamline documentation. - Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025. - Updated various references in the documentation to reflect the new paths and services in the integrations directory.
10 KiB
10 KiB
External Dependencies Runbook
This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal.
System Overview
| System | Purpose | Integration | Health Check |
|---|---|---|---|
| Salesforce | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test |
| WHMCS | Billing, Payments, Subscriptions | REST API | API action test |
| Freebit | SIM Management | REST API | OEM auth test |
| SFTP (fs.mvno.net) | Call/SMS Records | SFTP | Connection test |
| Redis | Cache, Sessions, Queues | Direct connection | PING command |
| PostgreSQL | User data, Mappings | Direct connection | Query test |
Salesforce
Configuration
| Variable | Description |
|---|---|
SF_LOGIN_URL |
Login URL (login.salesforce.com or test.salesforce.com) |
SF_CLIENT_ID |
Connected App Consumer Key |
SF_USERNAME |
Integration user username |
SF_PRIVATE_KEY_PATH |
Path to JWT private key |
SF_EVENTS_ENABLED |
Enable Platform Event subscription |
SF_PROVISION_EVENT_CHANNEL |
Platform Event channel for provisioning |
PORTAL_PRICEBOOK_ID |
Salesforce Pricebook ID for catalog |
Health Check
# Check Salesforce connectivity via BFF health endpoint
curl http://localhost:4000/health | jq '.'
# Test JWT authentication manually
# The BFF authenticates automatically; check logs for auth errors
grep "Salesforce" /var/log/bff/combined.log | tail -20
Common Issues
JWT Authentication Failure
- Verify private key file exists and is readable
- Check Connected App settings in Salesforce
- Ensure integration user is pre-authorized for Connected App
- Verify
SF_USERNAMEmatches the user assigned to Connected App
Platform Events Not Receiving
- Check
SF_EVENTS_ENABLED=true - Verify Platform Event permissions for integration user
- Check Redis for replay ID:
redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e" - Set
SF_EVENTS_REPLAY=ALLtemporarily to catch up on missed events
API Limits
- Salesforce has daily API call limits
- Monitor usage in Salesforce Setup > API Usage
- Consider caching frequently accessed data
Expected Response Times
| Operation | Expected | Alert Threshold |
|---|---|---|
| Query | <500ms | >2s |
| Update | <1s | >3s |
| Platform Event | Real-time | >5s delay |
WHMCS
Configuration
| Variable | Description |
|---|---|
WHMCS_API_URL |
WHMCS API endpoint URL |
WHMCS_API_IDENTIFIER |
API credentials identifier |
WHMCS_API_SECRET |
API credentials secret |
WHMCS_CUSTOMER_NUMBER_FIELD_ID |
Custom field ID for Customer Number |
Health Check
# Test WHMCS API directly
curl -X POST "$WHMCS_API_URL" \
-d "identifier=$WHMCS_API_IDENTIFIER" \
-d "secret=$WHMCS_API_SECRET" \
-d "action=GetClients" \
-d "responsetype=json" \
-d "limitnum=1"
# Should return: {"result":"success","totalresults":...}
Common Issues
Authentication Failure
- Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials
- Check IP whitelist settings (if configured)
- Ensure API credentials have required permissions
Rate Limiting
- WHMCS may rate limit excessive requests
- Check for 429 responses in logs
- Implement request queuing if needed
Field Mapping Issues
- Payment method fields may use different names between WHMCS versions
- Check WHMCS Troubleshooting for field mapping
Expected Response Times
| Operation | Expected | Alert Threshold |
|---|---|---|
| GetInvoices | <500ms | >2s |
| AddOrder | <1s | >3s |
| AcceptOrder | <1s | >3s |
| SSO Token | <500ms | >2s |
Freebit
Configuration
| Variable | Description |
|---|---|
FREEBIT_BASE_URL |
Freebit API base URL |
FREEBIT_OEM_ID |
OEM identifier |
FREEBIT_OEM_KEY |
OEM authentication key |
FREEBIT_TIMEOUT |
Request timeout (ms) |
Health Check
# Check Freebit OEM authentication
# The BFF handles auth automatically; check logs for auth errors
grep "Freebit" /var/log/bff/combined.log | tail -20
# Check for auth token in cache
redis-cli GET "freebit:auth:token"
Common Issues
OEM Authentication Failure
- Verify
FREEBIT_OEM_IDandFREEBIT_OEM_KEY - Check Freebit API endpoint accessibility
- Auth tokens are cached; clear cache if credentials changed
SIM Operations Failing
- Verify SIM account identifier (phone number) format
- Check 30-minute operation gap requirements
- See Freebit SIM Management for operation constraints
Network Type Changes Delayed
- Network type changes are queued with 30-minute delay
- Check BullMQ queue for pending jobs
Expected Response Times
| Operation | Expected | Alert Threshold |
|---|---|---|
| Auth (cached) | <100ms | >500ms |
| GetDetail | <1s | >3s |
| Plan Change | <2s | >5s |
| Top-up | <2s | >5s |
SFTP (fs.mvno.net)
Configuration
| Variable | Description |
|---|---|
SFTP_HOST |
SFTP server hostname |
SFTP_PORT |
SFTP port (default: 22) |
SFTP_USERNAME |
SFTP username |
SFTP_PRIVATE_KEY_PATH |
Path to SSH private key |
Health Check
# Test SFTP connectivity
sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF
ls
exit
EOF
Common Issues
Connection Refused
- Verify SFTP server is accessible
- Check firewall rules
- Verify SSH key fingerprint
Authentication Failure
- Verify SSH private key is correct
- Check key permissions (should be 600)
- Ensure public key is authorized on SFTP server
Files Not Found
- Call/SMS records are available 2 months behind current date
- File naming:
PASI_talk-detail-YYYYMM.csv,PASI_sms-detail-YYYYMM.csv
Data Availability
| Record Type | Availability | File Pattern |
|---|---|---|
| Call Details | 2 months behind | PASI_talk-detail-YYYYMM.csv |
| SMS Details | 2 months behind | PASI_sms-detail-YYYYMM.csv |
Credential Rotation
Salesforce JWT Key Rotation
- Generate new key pair
- Upload new public key to Connected App
- Update
SF_PRIVATE_KEY_PATHorSF_PRIVATE_KEY_BASE64 - Deploy and verify authentication
- Remove old key from Connected App after verification
WHMCS API Credentials Rotation
- Create new API credentials in WHMCS Admin
- Update
WHMCS_API_IDENTIFIERandWHMCS_API_SECRET - Deploy and verify API calls work
- Disable old API credentials
Freebit Key Rotation
- Request new OEM key from Freebit
- Update
FREEBIT_OEM_KEY - Clear cached auth token:
redis-cli DEL "freebit:auth:token" - Deploy and verify authentication
SSH Key Rotation (SFTP)
- Generate new SSH key pair
- Provide public key to SFTP administrator
- Wait for key to be authorized
- Update
SFTP_PRIVATE_KEY_PATH - Test connectivity
- Request old key removal from SFTP server
Monitoring Recommendations
Alerting Thresholds
| System | Metric | Warning | Critical |
|---|---|---|---|
| Salesforce | Response time | >2s | >5s |
| Salesforce | Error rate | >1% | >5% |
| WHMCS | Response time | >2s | >5s |
| WHMCS | Error rate | >1% | >5% |
| Freebit | Response time | >3s | >10s |
| Redis | Response time | >100ms | >500ms |
| PostgreSQL | Response time | >500ms | >2s |
Key Metrics to Monitor
- External API response times
- Error rates per integration
- Authentication success/failure rates
- Cache hit rates
- Queue depths (for async operations)
Health Check Schedule
| System | Check Frequency | Method |
|---|---|---|
| Salesforce | Every 5 minutes | Query test |
| WHMCS | Every 5 minutes | GetClients call |
| Freebit | Every 15 minutes | Auth token refresh |
| Redis | Every 1 minute | PING |
| PostgreSQL | Every 1 minute | SELECT 1 |
| SFTP | Every 1 hour | Connection test |
Fallback Behaviors
| System Down | User Impact | Fallback |
|---|---|---|
| Salesforce | No orders, no catalog | Show cached catalog, queue orders |
| WHMCS | No billing, no payments | Show cached invoices, block checkout |
| Freebit | No SIM management | Show cached data, disable actions |
| Redis | Slow performance | Direct API calls (no cache) |
| PostgreSQL | Portal unusable | Display maintenance message |
Related Documents
- Incident Response
- Provisioning Runbook
- Salesforce Requirements
- WHMCS Troubleshooting
- Freebit SIM Management
Last Updated: December 2025