Assist_Design/docs/operations/external-dependencies.md

326 lines
10 KiB
Markdown
Raw Permalink Normal View History

# External Dependencies Runbook
This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal.
---
## System Overview
| System | Purpose | Integration | Health Check |
| ---------------------- | -------------------------------- | -------------------------- | --------------- |
| **Salesforce** | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test |
| **WHMCS** | Billing, Payments, Subscriptions | REST API | API action test |
| **Freebit** | SIM Management | REST API | OEM auth test |
| **SFTP (fs.mvno.net)** | Call/SMS Records | SFTP | Connection test |
| **Redis** | Cache, Sessions, Queues | Direct connection | PING command |
| **PostgreSQL** | User data, Mappings | Direct connection | Query test |
---
## Salesforce
### Configuration
| Variable | Description |
| ---------------------------- | ------------------------------------------------------- |
| `SF_LOGIN_URL` | Login URL (login.salesforce.com or test.salesforce.com) |
| `SF_CLIENT_ID` | Connected App Consumer Key |
| `SF_USERNAME` | Integration user username |
| `SF_PRIVATE_KEY_PATH` | Path to JWT private key |
| `SF_EVENTS_ENABLED` | Enable Platform Event subscription |
| `SF_PROVISION_EVENT_CHANNEL` | Platform Event channel for provisioning |
| `PORTAL_PRICEBOOK_ID` | Salesforce Pricebook ID for catalog |
### Health Check
```bash
# Check Salesforce connectivity via BFF health endpoint
curl http://localhost:4000/health | jq '.'
# Test JWT authentication manually
# The BFF authenticates automatically; check logs for auth errors
grep "Salesforce" /var/log/bff/combined.log | tail -20
```
### Common Issues
**JWT Authentication Failure**
- Verify private key file exists and is readable
- Check Connected App settings in Salesforce
- Ensure integration user is pre-authorized for Connected App
- Verify `SF_USERNAME` matches the user assigned to Connected App
**Platform Events Not Receiving**
- Check `SF_EVENTS_ENABLED=true`
- Verify Platform Event permissions for integration user
- Check Redis for replay ID: `redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"`
- Set `SF_EVENTS_REPLAY=ALL` temporarily to catch up on missed events
**API Limits**
- Salesforce has daily API call limits
- Monitor usage in Salesforce Setup > API Usage
- Consider caching frequently accessed data
### Expected Response Times
| Operation | Expected | Alert Threshold |
| -------------- | --------- | --------------- |
| Query | <500ms | >2s |
| Update | <1s | >3s |
| Platform Event | Real-time | >5s delay |
---
## WHMCS
### Configuration
| Variable | Description |
| -------------------------------- | ----------------------------------- |
| `WHMCS_API_URL` | WHMCS API endpoint URL |
| `WHMCS_API_IDENTIFIER` | API credentials identifier |
| `WHMCS_API_SECRET` | API credentials secret |
| `WHMCS_CUSTOMER_NUMBER_FIELD_ID` | Custom field ID for Customer Number |
### Health Check
```bash
# Test WHMCS API directly
curl -X POST "$WHMCS_API_URL" \
-d "identifier=$WHMCS_API_IDENTIFIER" \
-d "secret=$WHMCS_API_SECRET" \
-d "action=GetClients" \
-d "responsetype=json" \
-d "limitnum=1"
# Should return: {"result":"success","totalresults":...}
```
### Common Issues
**Authentication Failure**
- Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials
- Check IP whitelist settings (if configured)
- Ensure API credentials have required permissions
**Rate Limiting**
- WHMCS may rate limit excessive requests
- Check for 429 responses in logs
- Implement request queuing if needed
**Field Mapping Issues**
- Payment method fields may use different names between WHMCS versions
- Check [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) for field mapping
### Expected Response Times
| Operation | Expected | Alert Threshold |
| ----------- | -------- | --------------- |
| GetInvoices | <500ms | >2s |
| AddOrder | <1s | >3s |
| AcceptOrder | <1s | >3s |
| SSO Token | <500ms | >2s |
---
## Freebit
### Configuration
| Variable | Description |
| ------------------ | ---------------------- |
| `FREEBIT_BASE_URL` | Freebit API base URL |
| `FREEBIT_OEM_ID` | OEM identifier |
| `FREEBIT_OEM_KEY` | OEM authentication key |
| `FREEBIT_TIMEOUT` | Request timeout (ms) |
### Health Check
```bash
# Check Freebit OEM authentication
# The BFF handles auth automatically; check logs for auth errors
grep "Freebit" /var/log/bff/combined.log | tail -20
# Check for auth token in cache
redis-cli GET "freebit:auth:token"
```
### Common Issues
**OEM Authentication Failure**
- Verify `FREEBIT_OEM_ID` and `FREEBIT_OEM_KEY`
- Check Freebit API endpoint accessibility
- Auth tokens are cached; clear cache if credentials changed
**SIM Operations Failing**
- Verify SIM account identifier (phone number) format
- Check 30-minute operation gap requirements
- See [Freebit SIM Management](../integrations/sim/freebit.md) for operation constraints
**Network Type Changes Delayed**
- Network type changes are queued with 30-minute delay
- Check BullMQ queue for pending jobs
### Expected Response Times
| Operation | Expected | Alert Threshold |
| ------------- | -------- | --------------- |
| Auth (cached) | <100ms | >500ms |
| GetDetail | <1s | >3s |
| Plan Change | <2s | >5s |
| Top-up | <2s | >5s |
---
## SFTP (fs.mvno.net)
### Configuration
| Variable | Description |
| ----------------------- | ----------------------- |
| `SFTP_HOST` | SFTP server hostname |
| `SFTP_PORT` | SFTP port (default: 22) |
| `SFTP_USERNAME` | SFTP username |
| `SFTP_PRIVATE_KEY_PATH` | Path to SSH private key |
### Health Check
```bash
# Test SFTP connectivity
sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF
ls
exit
EOF
```
### Common Issues
**Connection Refused**
- Verify SFTP server is accessible
- Check firewall rules
- Verify SSH key fingerprint
**Authentication Failure**
- Verify SSH private key is correct
- Check key permissions (should be 600)
- Ensure public key is authorized on SFTP server
**Files Not Found**
- Call/SMS records are available 2 months behind current date
- File naming: `PASI_talk-detail-YYYYMM.csv`, `PASI_sms-detail-YYYYMM.csv`
### Data Availability
| Record Type | Availability | File Pattern |
| ------------ | --------------- | ----------------------------- |
| Call Details | 2 months behind | `PASI_talk-detail-YYYYMM.csv` |
| SMS Details | 2 months behind | `PASI_sms-detail-YYYYMM.csv` |
---
## Credential Rotation
### Salesforce JWT Key Rotation
1. Generate new key pair
2. Upload new public key to Connected App
3. Update `SF_PRIVATE_KEY_PATH` or `SF_PRIVATE_KEY_BASE64`
4. Deploy and verify authentication
5. Remove old key from Connected App after verification
### WHMCS API Credentials Rotation
1. Create new API credentials in WHMCS Admin
2. Update `WHMCS_API_IDENTIFIER` and `WHMCS_API_SECRET`
3. Deploy and verify API calls work
4. Disable old API credentials
### Freebit Key Rotation
1. Request new OEM key from Freebit
2. Update `FREEBIT_OEM_KEY`
3. Clear cached auth token: `redis-cli DEL "freebit:auth:token"`
4. Deploy and verify authentication
### SSH Key Rotation (SFTP)
1. Generate new SSH key pair
2. Provide public key to SFTP administrator
3. Wait for key to be authorized
4. Update `SFTP_PRIVATE_KEY_PATH`
5. Test connectivity
6. Request old key removal from SFTP server
---
## Monitoring Recommendations
### Alerting Thresholds
| System | Metric | Warning | Critical |
| ---------- | ------------- | ------- | -------- |
| Salesforce | Response time | >2s | >5s |
| Salesforce | Error rate | >1% | >5% |
| WHMCS | Response time | >2s | >5s |
| WHMCS | Error rate | >1% | >5% |
| Freebit | Response time | >3s | >10s |
| Redis | Response time | >100ms | >500ms |
| PostgreSQL | Response time | >500ms | >2s |
### Key Metrics to Monitor
- External API response times
- Error rates per integration
- Authentication success/failure rates
- Cache hit rates
- Queue depths (for async operations)
### Health Check Schedule
| System | Check Frequency | Method |
| ---------- | ---------------- | ------------------ |
| Salesforce | Every 5 minutes | Query test |
| WHMCS | Every 5 minutes | GetClients call |
| Freebit | Every 15 minutes | Auth token refresh |
| Redis | Every 1 minute | PING |
| PostgreSQL | Every 1 minute | SELECT 1 |
| SFTP | Every 1 hour | Connection test |
---
## Fallback Behaviors
| System Down | User Impact | Fallback |
| ----------- | ----------------------- | ------------------------------------ |
| Salesforce | No orders, no catalog | Show cached catalog, queue orders |
| WHMCS | No billing, no payments | Show cached invoices, block checkout |
| Freebit | No SIM management | Show cached data, disable actions |
| Redis | Slow performance | Direct API calls (no cache) |
| PostgreSQL | Portal unusable | Display maintenance message |
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Provisioning Runbook](./provisioning-runbook.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md)
- [Freebit SIM Management](../integrations/sim/freebit.md)
---
**Last Updated:** December 2025