- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management. - Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources. - Removed the deprecated disabled-modules.md file to streamline documentation. - Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025. - Updated various references in the documentation to reflect the new paths and services in the integrations directory.
326 lines
10 KiB
Markdown
326 lines
10 KiB
Markdown
# External Dependencies Runbook
|
|
|
|
This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal.
|
|
|
|
---
|
|
|
|
## System Overview
|
|
|
|
| System | Purpose | Integration | Health Check |
|
|
| ---------------------- | -------------------------------- | -------------------------- | --------------- |
|
|
| **Salesforce** | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test |
|
|
| **WHMCS** | Billing, Payments, Subscriptions | REST API | API action test |
|
|
| **Freebit** | SIM Management | REST API | OEM auth test |
|
|
| **SFTP (fs.mvno.net)** | Call/SMS Records | SFTP | Connection test |
|
|
| **Redis** | Cache, Sessions, Queues | Direct connection | PING command |
|
|
| **PostgreSQL** | User data, Mappings | Direct connection | Query test |
|
|
|
|
---
|
|
|
|
## Salesforce
|
|
|
|
### Configuration
|
|
|
|
| Variable | Description |
|
|
| ---------------------------- | ------------------------------------------------------- |
|
|
| `SF_LOGIN_URL` | Login URL (login.salesforce.com or test.salesforce.com) |
|
|
| `SF_CLIENT_ID` | Connected App Consumer Key |
|
|
| `SF_USERNAME` | Integration user username |
|
|
| `SF_PRIVATE_KEY_PATH` | Path to JWT private key |
|
|
| `SF_EVENTS_ENABLED` | Enable Platform Event subscription |
|
|
| `SF_PROVISION_EVENT_CHANNEL` | Platform Event channel for provisioning |
|
|
| `PORTAL_PRICEBOOK_ID` | Salesforce Pricebook ID for catalog |
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
# Check Salesforce connectivity via BFF health endpoint
|
|
curl http://localhost:4000/health | jq '.'
|
|
|
|
# Test JWT authentication manually
|
|
# The BFF authenticates automatically; check logs for auth errors
|
|
grep "Salesforce" /var/log/bff/combined.log | tail -20
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**JWT Authentication Failure**
|
|
|
|
- Verify private key file exists and is readable
|
|
- Check Connected App settings in Salesforce
|
|
- Ensure integration user is pre-authorized for Connected App
|
|
- Verify `SF_USERNAME` matches the user assigned to Connected App
|
|
|
|
**Platform Events Not Receiving**
|
|
|
|
- Check `SF_EVENTS_ENABLED=true`
|
|
- Verify Platform Event permissions for integration user
|
|
- Check Redis for replay ID: `redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"`
|
|
- Set `SF_EVENTS_REPLAY=ALL` temporarily to catch up on missed events
|
|
|
|
**API Limits**
|
|
|
|
- Salesforce has daily API call limits
|
|
- Monitor usage in Salesforce Setup > API Usage
|
|
- Consider caching frequently accessed data
|
|
|
|
### Expected Response Times
|
|
|
|
| Operation | Expected | Alert Threshold |
|
|
| -------------- | --------- | --------------- |
|
|
| Query | <500ms | >2s |
|
|
| Update | <1s | >3s |
|
|
| Platform Event | Real-time | >5s delay |
|
|
|
|
---
|
|
|
|
## WHMCS
|
|
|
|
### Configuration
|
|
|
|
| Variable | Description |
|
|
| -------------------------------- | ----------------------------------- |
|
|
| `WHMCS_API_URL` | WHMCS API endpoint URL |
|
|
| `WHMCS_API_IDENTIFIER` | API credentials identifier |
|
|
| `WHMCS_API_SECRET` | API credentials secret |
|
|
| `WHMCS_CUSTOMER_NUMBER_FIELD_ID` | Custom field ID for Customer Number |
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
# Test WHMCS API directly
|
|
curl -X POST "$WHMCS_API_URL" \
|
|
-d "identifier=$WHMCS_API_IDENTIFIER" \
|
|
-d "secret=$WHMCS_API_SECRET" \
|
|
-d "action=GetClients" \
|
|
-d "responsetype=json" \
|
|
-d "limitnum=1"
|
|
|
|
# Should return: {"result":"success","totalresults":...}
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**Authentication Failure**
|
|
|
|
- Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials
|
|
- Check IP whitelist settings (if configured)
|
|
- Ensure API credentials have required permissions
|
|
|
|
**Rate Limiting**
|
|
|
|
- WHMCS may rate limit excessive requests
|
|
- Check for 429 responses in logs
|
|
- Implement request queuing if needed
|
|
|
|
**Field Mapping Issues**
|
|
|
|
- Payment method fields may use different names between WHMCS versions
|
|
- Check [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) for field mapping
|
|
|
|
### Expected Response Times
|
|
|
|
| Operation | Expected | Alert Threshold |
|
|
| ----------- | -------- | --------------- |
|
|
| GetInvoices | <500ms | >2s |
|
|
| AddOrder | <1s | >3s |
|
|
| AcceptOrder | <1s | >3s |
|
|
| SSO Token | <500ms | >2s |
|
|
|
|
---
|
|
|
|
## Freebit
|
|
|
|
### Configuration
|
|
|
|
| Variable | Description |
|
|
| ------------------ | ---------------------- |
|
|
| `FREEBIT_BASE_URL` | Freebit API base URL |
|
|
| `FREEBIT_OEM_ID` | OEM identifier |
|
|
| `FREEBIT_OEM_KEY` | OEM authentication key |
|
|
| `FREEBIT_TIMEOUT` | Request timeout (ms) |
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
# Check Freebit OEM authentication
|
|
# The BFF handles auth automatically; check logs for auth errors
|
|
grep "Freebit" /var/log/bff/combined.log | tail -20
|
|
|
|
# Check for auth token in cache
|
|
redis-cli GET "freebit:auth:token"
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**OEM Authentication Failure**
|
|
|
|
- Verify `FREEBIT_OEM_ID` and `FREEBIT_OEM_KEY`
|
|
- Check Freebit API endpoint accessibility
|
|
- Auth tokens are cached; clear cache if credentials changed
|
|
|
|
**SIM Operations Failing**
|
|
|
|
- Verify SIM account identifier (phone number) format
|
|
- Check 30-minute operation gap requirements
|
|
- See [Freebit SIM Management](../integrations/sim/freebit.md) for operation constraints
|
|
|
|
**Network Type Changes Delayed**
|
|
|
|
- Network type changes are queued with 30-minute delay
|
|
- Check BullMQ queue for pending jobs
|
|
|
|
### Expected Response Times
|
|
|
|
| Operation | Expected | Alert Threshold |
|
|
| ------------- | -------- | --------------- |
|
|
| Auth (cached) | <100ms | >500ms |
|
|
| GetDetail | <1s | >3s |
|
|
| Plan Change | <2s | >5s |
|
|
| Top-up | <2s | >5s |
|
|
|
|
---
|
|
|
|
## SFTP (fs.mvno.net)
|
|
|
|
### Configuration
|
|
|
|
| Variable | Description |
|
|
| ----------------------- | ----------------------- |
|
|
| `SFTP_HOST` | SFTP server hostname |
|
|
| `SFTP_PORT` | SFTP port (default: 22) |
|
|
| `SFTP_USERNAME` | SFTP username |
|
|
| `SFTP_PRIVATE_KEY_PATH` | Path to SSH private key |
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
# Test SFTP connectivity
|
|
sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF
|
|
ls
|
|
exit
|
|
EOF
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
**Connection Refused**
|
|
|
|
- Verify SFTP server is accessible
|
|
- Check firewall rules
|
|
- Verify SSH key fingerprint
|
|
|
|
**Authentication Failure**
|
|
|
|
- Verify SSH private key is correct
|
|
- Check key permissions (should be 600)
|
|
- Ensure public key is authorized on SFTP server
|
|
|
|
**Files Not Found**
|
|
|
|
- Call/SMS records are available 2 months behind current date
|
|
- File naming: `PASI_talk-detail-YYYYMM.csv`, `PASI_sms-detail-YYYYMM.csv`
|
|
|
|
### Data Availability
|
|
|
|
| Record Type | Availability | File Pattern |
|
|
| ------------ | --------------- | ----------------------------- |
|
|
| Call Details | 2 months behind | `PASI_talk-detail-YYYYMM.csv` |
|
|
| SMS Details | 2 months behind | `PASI_sms-detail-YYYYMM.csv` |
|
|
|
|
---
|
|
|
|
## Credential Rotation
|
|
|
|
### Salesforce JWT Key Rotation
|
|
|
|
1. Generate new key pair
|
|
2. Upload new public key to Connected App
|
|
3. Update `SF_PRIVATE_KEY_PATH` or `SF_PRIVATE_KEY_BASE64`
|
|
4. Deploy and verify authentication
|
|
5. Remove old key from Connected App after verification
|
|
|
|
### WHMCS API Credentials Rotation
|
|
|
|
1. Create new API credentials in WHMCS Admin
|
|
2. Update `WHMCS_API_IDENTIFIER` and `WHMCS_API_SECRET`
|
|
3. Deploy and verify API calls work
|
|
4. Disable old API credentials
|
|
|
|
### Freebit Key Rotation
|
|
|
|
1. Request new OEM key from Freebit
|
|
2. Update `FREEBIT_OEM_KEY`
|
|
3. Clear cached auth token: `redis-cli DEL "freebit:auth:token"`
|
|
4. Deploy and verify authentication
|
|
|
|
### SSH Key Rotation (SFTP)
|
|
|
|
1. Generate new SSH key pair
|
|
2. Provide public key to SFTP administrator
|
|
3. Wait for key to be authorized
|
|
4. Update `SFTP_PRIVATE_KEY_PATH`
|
|
5. Test connectivity
|
|
6. Request old key removal from SFTP server
|
|
|
|
---
|
|
|
|
## Monitoring Recommendations
|
|
|
|
### Alerting Thresholds
|
|
|
|
| System | Metric | Warning | Critical |
|
|
| ---------- | ------------- | ------- | -------- |
|
|
| Salesforce | Response time | >2s | >5s |
|
|
| Salesforce | Error rate | >1% | >5% |
|
|
| WHMCS | Response time | >2s | >5s |
|
|
| WHMCS | Error rate | >1% | >5% |
|
|
| Freebit | Response time | >3s | >10s |
|
|
| Redis | Response time | >100ms | >500ms |
|
|
| PostgreSQL | Response time | >500ms | >2s |
|
|
|
|
### Key Metrics to Monitor
|
|
|
|
- External API response times
|
|
- Error rates per integration
|
|
- Authentication success/failure rates
|
|
- Cache hit rates
|
|
- Queue depths (for async operations)
|
|
|
|
### Health Check Schedule
|
|
|
|
| System | Check Frequency | Method |
|
|
| ---------- | ---------------- | ------------------ |
|
|
| Salesforce | Every 5 minutes | Query test |
|
|
| WHMCS | Every 5 minutes | GetClients call |
|
|
| Freebit | Every 15 minutes | Auth token refresh |
|
|
| Redis | Every 1 minute | PING |
|
|
| PostgreSQL | Every 1 minute | SELECT 1 |
|
|
| SFTP | Every 1 hour | Connection test |
|
|
|
|
---
|
|
|
|
## Fallback Behaviors
|
|
|
|
| System Down | User Impact | Fallback |
|
|
| ----------- | ----------------------- | ------------------------------------ |
|
|
| Salesforce | No orders, no catalog | Show cached catalog, queue orders |
|
|
| WHMCS | No billing, no payments | Show cached invoices, block checkout |
|
|
| Freebit | No SIM management | Show cached data, disable actions |
|
|
| Redis | Slow performance | Direct API calls (no cache) |
|
|
| PostgreSQL | Portal unusable | Display maintenance message |
|
|
|
|
---
|
|
|
|
## Related Documents
|
|
|
|
- [Incident Response](./incident-response.md)
|
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
|
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
|
|
- [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md)
|
|
- [Freebit SIM Management](../integrations/sim/freebit.md)
|
|
|
|
---
|
|
|
|
**Last Updated:** December 2025
|