Enhance Documentation Structure and Update Operational Runbooks
- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management. - Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources. - Removed the deprecated disabled-modules.md file to streamline documentation. - Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025. - Updated various references in the documentation to reflect the new paths and services in the integrations directory.
This commit is contained in:
parent
7c929eb4dc
commit
72d0b66be7
@ -138,13 +138,24 @@ Feature guides explaining how the portal functions:
|
|||||||
|
|
||||||
## 🛠️ Operations
|
## 🛠️ Operations
|
||||||
|
|
||||||
| Document | Description |
|
### Runbooks
|
||||||
| ------------------------------------------------------------------ | ----------------------------- |
|
|
||||||
| [Logging](./operations/logging.md) | Centralized logging system |
|
| Document | Description |
|
||||||
| [Security Monitoring](./operations/security-monitoring.md) | Security monitoring setup |
|
| -------------------------------------------------------------- | ----------------------------- |
|
||||||
| [Provisioning Runbook](./operations/provisioning-runbook.md) | Provisioning procedures |
|
| [Incident Response](./operations/incident-response.md) | Emergency procedures |
|
||||||
| [Subscription Management](./operations/subscription-management.md) | Service management |
|
| [Provisioning Runbook](./operations/provisioning-runbook.md) | Order fulfillment procedures |
|
||||||
| [Disabled Modules](./operations/disabled-modules.md) | Temporarily disabled features |
|
| [Database Operations](./operations/database-operations.md) | Backup, recovery, maintenance |
|
||||||
|
| [External Dependencies](./operations/external-dependencies.md) | Integration health checks |
|
||||||
|
| [Queue Management](./operations/queue-management.md) | BullMQ job monitoring |
|
||||||
|
| [External Processes](./operations/external-processes.md) | Team handoffs and workflows |
|
||||||
|
|
||||||
|
### System Operations
|
||||||
|
|
||||||
|
| Document | Description |
|
||||||
|
| ------------------------------------------------------------------ | -------------------------- |
|
||||||
|
| [Logging](./operations/logging.md) | Centralized logging system |
|
||||||
|
| [Security Monitoring](./operations/security-monitoring.md) | Security monitoring setup |
|
||||||
|
| [Subscription Management](./operations/subscription-management.md) | Service management |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -178,11 +189,13 @@ Historical documents kept for reference:
|
|||||||
2. [Domain Types](./development/domain/types.md)
|
2. [Domain Types](./development/domain/types.md)
|
||||||
3. [Performance](./development/portal/performance.md)
|
3. [Performance](./development/portal/performance.md)
|
||||||
|
|
||||||
### DevOps
|
### DevOps / Operations
|
||||||
|
|
||||||
1. [Deployment](./getting-started/deployment.md)
|
1. [Deployment](./getting-started/deployment.md)
|
||||||
2. [Logging](./operations/logging.md)
|
2. [Incident Response](./operations/incident-response.md)
|
||||||
3. [Provisioning Runbook](./operations/provisioning-runbook.md)
|
3. [Provisioning Runbook](./operations/provisioning-runbook.md)
|
||||||
|
4. [Database Operations](./operations/database-operations.md)
|
||||||
|
5. [External Dependencies](./operations/external-dependencies.md)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@ -114,13 +114,18 @@ Coding standards
|
|||||||
│ └── plesk-deploy.sh # ✅ Plesk deployment script
|
│ └── plesk-deploy.sh # ✅ Plesk deployment script
|
||||||
│
|
│
|
||||||
├── 📚 docs/ # Documentation
|
├── 📚 docs/ # Documentation
|
||||||
│ ├── README.md # ✅ Comprehensive guide
|
│ ├── README.md # ✅ Documentation index
|
||||||
│ ├── GETTING_STARTED.md # ✅ Quick start guide
|
│ ├── STRUCTURE.md # ✅ This file
|
||||||
│ ├── RUN.md # ✅ Development workflow
|
│ ├── getting-started/ # Setup and running guides
|
||||||
│ ├── DEPLOY.md # ✅ Production deployment
|
│ │ ├── setup.md # Initial project setup
|
||||||
│ ├── LOGGING.md # ✅ Logging configuration
|
│ │ ├── running.md # Local development
|
||||||
│ ├── SECURITY.md # ✅ Security features and best practices
|
│ │ └── deployment.md # Production deployment
|
||||||
│ └── STRUCTURE.md # ✅ This file
|
│ ├── architecture/ # System design documents
|
||||||
|
│ ├── how-it-works/ # Feature guides
|
||||||
|
│ ├── integrations/ # External system integration
|
||||||
|
│ ├── development/ # Development guides
|
||||||
|
│ ├── operations/ # Operational runbooks
|
||||||
|
│ └── _archive/ # Historical documents
|
||||||
│
|
│
|
||||||
├── 📦 packages/ # Shared packages
|
├── 📦 packages/ # Shared packages
|
||||||
│ └── domain/ # Domain TypeScript utilities
|
│ └── domain/ # Domain TypeScript utilities
|
||||||
@ -135,11 +140,12 @@ Coding standards
|
|||||||
|
|
||||||
### **Environment Template Approach**
|
### **Environment Template Approach**
|
||||||
|
|
||||||
- **`.env.dev.example`** - Development-optimized template
|
Environment templates are located in the `env/` folder:
|
||||||
- **`.env.production.example`** - Production-optimized template
|
|
||||||
- **`.env.example`** - Basic template for custom setups
|
- **`env/dev.env.sample`** - Development environment template
|
||||||
- **`.env`** - Your actual configuration (gitignored)
|
- **`env/portal-backend.env.sample`** - Backend-specific variables
|
||||||
- **Environment-specific defaults** - Appropriate values per environment
|
- **`env/portal-frontend.env.sample`** - Frontend-specific variables
|
||||||
|
- **`.env`** - Your actual configuration (gitignored, at project root)
|
||||||
|
|
||||||
### **Environment Variables**
|
### **Environment Variables**
|
||||||
|
|
||||||
@ -206,13 +212,16 @@ pnpm prod:backup # Database backup
|
|||||||
|
|
||||||
### **Essential Guides**
|
### **Essential Guides**
|
||||||
|
|
||||||
- **`README.md`** - Project overview and architecture
|
Documentation is organized in subdirectories:
|
||||||
- **`GETTING_STARTED.md`** - Quick setup guide
|
|
||||||
- **`RUN.md`** - Development workflow
|
- **`docs/README.md`** - Documentation index and navigation
|
||||||
- **`DEPLOY.md`** - Production deployment
|
- **`docs/STRUCTURE.md`** - This file (project structure)
|
||||||
- **`LOGGING.md`** - Logging configuration
|
- **`docs/getting-started/`** - Setup, running, and deployment guides
|
||||||
- **`SECURITY.md`** - Security features and best practices
|
- **`docs/architecture/`** - System design and architecture
|
||||||
- **`STRUCTURE.md`** - This file
|
- **`docs/how-it-works/`** - Feature guides and workflows
|
||||||
|
- **`docs/integrations/`** - Salesforce, WHMCS, SIM integration
|
||||||
|
- **`docs/development/`** - BFF, Portal, Auth development guides
|
||||||
|
- **`docs/operations/`** - Runbooks and operational procedures
|
||||||
|
|
||||||
### **No Redundancy**
|
### **No Redundancy**
|
||||||
|
|
||||||
|
|||||||
@ -27,4 +27,20 @@ Point-in-time code reviews and analysis documents:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Historical Notes
|
||||||
|
|
||||||
|
### December 2025 Documentation Alignment
|
||||||
|
|
||||||
|
A comprehensive documentation review was performed in December 2025 to align documentation with the actual codebase. The following corrections were made:
|
||||||
|
|
||||||
|
1. **Removed fictional package descriptions** from `system-overview.md` that referenced non-existent `packages/contracts`, `packages/schemas`, and `packages/integrations` packages
|
||||||
|
2. **Deleted `disabled-modules.md`** which referenced non-existent "Cases" and "Jobs" modules
|
||||||
|
3. **Fixed path references** from `vendors/whmcs` to `integrations/whmcs` throughout documentation
|
||||||
|
4. **Updated module lists** to reflect actual BFF modules
|
||||||
|
5. **Created new operational runbooks**: incident-response, database-operations, external-dependencies, queue-management, external-processes
|
||||||
|
|
||||||
|
Documents in this archive folder predate these corrections and may contain outdated references.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
**Note:** These documents may contain outdated information. For current system behavior, refer to the active documentation in the parent `docs/` directory.
|
**Note:** These documents may contain outdated information. For current system behavior, refer to the active documentation in the parent `docs/` directory.
|
||||||
|
|||||||
@ -8,7 +8,7 @@ I've completely restructured the Salesforce-to-Portal order provisioning system
|
|||||||
|
|
||||||
### **1. Dedicated WHMCS Order Service**
|
### **1. Dedicated WHMCS Order Service**
|
||||||
|
|
||||||
**File**: `/apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts`
|
**File**: `/apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts`
|
||||||
|
|
||||||
- **Purpose**: Handles all WHMCS order operations (AddOrder, AcceptOrder)
|
- **Purpose**: Handles all WHMCS order operations (AddOrder, AcceptOrder)
|
||||||
- **Features**:
|
- **Features**:
|
||||||
|
|||||||
@ -8,40 +8,42 @@ I've restructured the provisioning system to **match the exact same clean modula
|
|||||||
|
|
||||||
### **Order Creation (Existing) ↔ Order Provisioning (New)**
|
### **Order Creation (Existing) ↔ Order Provisioning (New)**
|
||||||
|
|
||||||
| **Order Creation** | **Order Provisioning** | **Purpose** |
|
| **Order Creation** | **Order Fulfillment** | **Purpose** |
|
||||||
| ------------------- | -------------------------- | ----------------------------------- |
|
| ------------------- | ------------------------------ | ----------------------------------- |
|
||||||
| `OrderValidator` | `ProvisioningValidator` | Validates requests & business rules |
|
| `OrderValidator` | `OrderFulfillmentValidator` | Validates requests & business rules |
|
||||||
| `OrderBuilder` | `WhmcsOrderMapper` | Transforms/maps data structures |
|
| `OrderBuilder` | `OrderBuilder` | Transforms/maps data structures |
|
||||||
| `OrderItemBuilder` | _(integrated in mapper)_ | Handles item-level processing |
|
| `OrderItemBuilder` | `OrderItemBuilder` | Handles item-level processing |
|
||||||
| `OrderOrchestrator` | `ProvisioningOrchestrator` | Coordinates the complete workflow |
|
| `OrderOrchestrator` | `OrderFulfillmentOrchestrator` | Coordinates the complete workflow |
|
||||||
| `OrdersController` | `PlatformEventsSubscriber` | Event handling (no inbound HTTP) |
|
| `OrdersController` | `PlatformEventsSubscriber` | Event handling (no inbound HTTP) |
|
||||||
|
|
||||||
## 📁 **Clean File Structure**
|
## 📁 **Clean File Structure**
|
||||||
|
|
||||||
```
|
```
|
||||||
apps/bff/src/orders/
|
apps/bff/src/modules/orders/
|
||||||
├── controllers/
|
├── controllers/
|
||||||
│ └── orders.controller.ts # Customer-facing operations
|
│ └── orders.controller.ts # Customer-facing operations
|
||||||
├── queue/
|
├── queue/
|
||||||
│ ├── provisioning.queue.ts # Enqueue provisioning jobs
|
│ ├── provisioning.queue.ts # Enqueue provisioning jobs
|
||||||
│ └── provisioning.processor.ts # Worker processes jobs
|
│ └── provisioning.processor.ts # Worker processes jobs
|
||||||
├── services/
|
├── services/
|
||||||
│ # Order Creation (existing)
|
│ # Order Creation
|
||||||
│ ├── order-validator.service.ts # Request & business validation
|
│ ├── order-validator.service.ts # Request & business validation
|
||||||
│ ├── order-builder.service.ts # Order header construction
|
│ ├── order-builder.service.ts # Order header construction
|
||||||
│ ├── order-item-builder.service.ts # Order items construction
|
│ ├── order-item-builder.service.ts # Order items construction
|
||||||
│ ├── order-orchestrator.service.ts # Creation workflow coordination
|
│ ├── order-orchestrator.service.ts # Creation workflow coordination
|
||||||
│ │
|
│ │
|
||||||
│ # Order Provisioning (new - matching structure)
|
│ # Order Fulfillment/Provisioning
|
||||||
│ ├── provisioning-validator.service.ts # Provisioning validation
|
│ ├── order-fulfillment-validator.service.ts # Provisioning validation
|
||||||
│ ├── whmcs-order-mapper.service.ts # SF → WHMCS mapping
|
│ ├── order-fulfillment-orchestrator.service.ts # Provisioning workflow coordination
|
||||||
│ ├── provisioning-orchestrator.service.ts # Provisioning workflow coordination
|
│ ├── order-fulfillment-error.service.ts # Error handling
|
||||||
│ └── order-provisioning.service.ts # Main provisioning interface
|
│ ├── sim-fulfillment.service.ts # SIM-specific fulfillment
|
||||||
|
│ ├── payment-validator.service.ts # Payment method validation
|
||||||
|
│ └── checkout.service.ts # Checkout flow coordination
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🎯 **Modular Provisioning Services**
|
## 🎯 **Modular Provisioning Services**
|
||||||
|
|
||||||
### **1. ProvisioningValidator**
|
### **1. OrderFulfillmentValidator**
|
||||||
|
|
||||||
**Purpose**: Validates all provisioning prerequisites
|
**Purpose**: Validates all provisioning prerequisites
|
||||||
|
|
||||||
@ -51,7 +53,7 @@ apps/bff/src/orders/
|
|||||||
- ✅ Idempotency checking
|
- ✅ Idempotency checking
|
||||||
- ✅ Request payload validation
|
- ✅ Request payload validation
|
||||||
|
|
||||||
### **2. WhmcsOrderMapper**
|
### **2. OrderBuilder / OrderItemBuilder**
|
||||||
|
|
||||||
**Purpose**: Maps Salesforce OrderItems → WHMCS format
|
**Purpose**: Maps Salesforce OrderItems → WHMCS format
|
||||||
|
|
||||||
@ -61,7 +63,7 @@ apps/bff/src/orders/
|
|||||||
- ✅ Custom fields mapping
|
- ✅ Custom fields mapping
|
||||||
- ✅ Order notes generation with SF tracking
|
- ✅ Order notes generation with SF tracking
|
||||||
|
|
||||||
### **3. ProvisioningOrchestrator**
|
### **3. OrderFulfillmentOrchestrator**
|
||||||
|
|
||||||
**Purpose**: Coordinates complete provisioning workflow
|
**Purpose**: Coordinates complete provisioning workflow
|
||||||
|
|
||||||
|
|||||||
@ -11,9 +11,7 @@ apps/
|
|||||||
portal/ # Next.js frontend
|
portal/ # Next.js frontend
|
||||||
bff/ # NestJS Backend-for-Frontend
|
bff/ # NestJS Backend-for-Frontend
|
||||||
packages/
|
packages/
|
||||||
domain/ # Pure domain/types/utils (isomorphic)
|
domain/ # Pure domain types, validation schemas, and utilities (isomorphic)
|
||||||
logging/ # Centralized logging utilities
|
|
||||||
validation/ # Shared validation schemas
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🎯 **Architecture Principles**
|
## 🎯 **Architecture Principles**
|
||||||
@ -67,16 +65,26 @@ src/
|
|||||||
```
|
```
|
||||||
src/
|
src/
|
||||||
modules/ # Feature-aligned modules
|
modules/ # Feature-aligned modules
|
||||||
auth/ # Authentication
|
auth/ # Authentication and authorization
|
||||||
billing/ # Invoice and payment management
|
users/ # User management
|
||||||
|
id-mappings/ # Portal-WHMCS-Salesforce ID mappings
|
||||||
catalog/ # Product catalog
|
catalog/ # Product catalog
|
||||||
orders/ # Order processing
|
orders/ # Order creation and fulfillment
|
||||||
subscriptions/ # Service management
|
invoices/ # Invoice management
|
||||||
|
subscriptions/ # Service and subscription management
|
||||||
|
currency/ # Currency handling
|
||||||
|
support/ # Support case management
|
||||||
|
realtime/ # Server-Sent Events API
|
||||||
|
verification/ # ID verification
|
||||||
|
notifications/ # User notifications
|
||||||
|
health/ # Health check endpoints
|
||||||
core/ # Core services and utilities
|
core/ # Core services and utilities
|
||||||
|
infra/ # Infrastructure (database, cache, queue, email)
|
||||||
integrations/ # External service integrations
|
integrations/ # External service integrations
|
||||||
salesforce/ # Salesforce CRM integration
|
salesforce/ # Salesforce CRM integration
|
||||||
whmcs/ # WHMCS billing integration
|
whmcs/ # WHMCS billing integration
|
||||||
common/ # Nest providers/interceptors/guards
|
freebit/ # Freebit SIM provider integration
|
||||||
|
sftp/ # SFTP file transfer
|
||||||
main.ts # Application entry point
|
main.ts # Application entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -89,60 +97,67 @@ src/
|
|||||||
|
|
||||||
## 📦 **Shared Packages**
|
## 📦 **Shared Packages**
|
||||||
|
|
||||||
### **Layered Type System Architecture**
|
### **Domain Package (`packages/domain/`)**
|
||||||
|
|
||||||
The codebase follows a strict layering pattern to ensure single source of truth for all types and prevent drift:
|
The domain package is the single source of truth for shared types, validation schemas, and utilities across both the BFF and Portal applications.
|
||||||
|
|
||||||
```
|
```
|
||||||
@customer-portal/contracts (Pure TypeScript types)
|
packages/domain/
|
||||||
↓
|
├── auth/ # Authentication types and validation
|
||||||
@customer-portal/schemas (Runtime validation with Zod)
|
├── billing/ # Invoice and payment types
|
||||||
↓
|
├── catalog/ # Product catalog types
|
||||||
@customer-portal/integrations (Mappers for external APIs)
|
├── checkout/ # Checkout flow types
|
||||||
↓
|
├── common/ # Shared utilities and base types
|
||||||
Applications (BFF, Portal)
|
├── customer/ # Customer profile types
|
||||||
|
├── dashboard/ # Dashboard data types
|
||||||
|
├── mappings/ # ID mapping types (Portal-WHMCS-SF)
|
||||||
|
├── notifications/ # Notification types
|
||||||
|
├── opportunity/ # Salesforce opportunity types
|
||||||
|
├── orders/ # Order types and Salesforce mappings
|
||||||
|
├── payments/ # Payment method types
|
||||||
|
├── providers/ # Provider-specific type definitions
|
||||||
|
├── realtime/ # SSE event types
|
||||||
|
├── salesforce/ # Salesforce API types
|
||||||
|
├── sim/ # SIM lifecycle and Freebit types
|
||||||
|
├── subscriptions/ # Subscription types
|
||||||
|
├── support/ # Support case types
|
||||||
|
├── toolkit/ # Utility functions
|
||||||
|
└── index.ts # Public exports
|
||||||
```
|
```
|
||||||
|
|
||||||
#### **1. Contracts Package (`packages/contracts/`)**
|
#### **Key Principles**
|
||||||
|
|
||||||
- **Purpose**: Pure TypeScript interface definitions - single source of truth
|
- **Framework-agnostic**: No NestJS or React dependencies
|
||||||
- **Contents**: Cross-layer contracts for billing, subscriptions, payments, SIM, orders
|
- **Isomorphic**: Works in both Node.js and browser environments
|
||||||
- **Exports**: Organized by domain (e.g., `@customer-portal/contracts/billing`)
|
- **Zod-first validation**: Schemas defined with Zod for runtime validation
|
||||||
- **Rule**: ZERO runtime dependencies, only pure types
|
- **Provider mappers**: Transform external API responses to domain types
|
||||||
|
|
||||||
#### **2. Schemas Package (`packages/schemas/`)**
|
#### **Usage**
|
||||||
|
|
||||||
- **Purpose**: Runtime validation schemas using Zod
|
Import via `@customer-portal/domain`:
|
||||||
- **Contents**: Matching Zod validators for each contract + integration-specific payload schemas
|
|
||||||
- **Exports**: Organized by domain and integration provider
|
|
||||||
- **Usage**: Validate external API responses, request payloads, and user input
|
|
||||||
|
|
||||||
#### **3. Integration Packages (`packages/integrations/`)**
|
```typescript
|
||||||
|
import { Invoice, SIM_LIFECYCLE_STAGE, OrderStatus } from "@customer-portal/domain";
|
||||||
|
import { invoiceSchema, orderSchema } from "@customer-portal/domain/validation";
|
||||||
|
```
|
||||||
|
|
||||||
- **Purpose**: Transform raw provider data into shared contracts
|
#### **Integration with BFF**
|
||||||
- **Structure**:
|
|
||||||
- `packages/integrations/whmcs/` - WHMCS billing integration
|
|
||||||
- `packages/integrations/freebit/` - Freebit SIM provider integration
|
|
||||||
- **Contents**: Mappers, utilities, and helper functions
|
|
||||||
- **Rule**: Must use `@customer-portal/schemas` for validation at boundaries
|
|
||||||
|
|
||||||
#### **4. Application Layers**
|
The BFF integration layer (`apps/bff/src/integrations/`) uses domain mappers to transform raw provider data:
|
||||||
|
|
||||||
- **BFF** (`apps/bff/`): Import from contracts/schemas, never define duplicate interfaces
|
```
|
||||||
- **Portal** (`apps/portal/`): Import from contracts/schemas, use shared types everywhere
|
External API → Raw Response → Domain Mapper → Domain Type → Use Everywhere
|
||||||
- **Rule**: Applications only consume, never define domain types
|
```
|
||||||
|
|
||||||
### **Legacy: Domain Package (Deprecated)**
|
This ensures a single transformation point and consistent types across the application.
|
||||||
|
|
||||||
- **Status**: Being phased out in favor of contracts + schemas
|
### **Logging**
|
||||||
- **Migration**: Re-exports now point to contracts package for backward compatibility
|
|
||||||
- **Rule**: New code should import from `@customer-portal/contracts` or `@customer-portal/schemas`
|
|
||||||
|
|
||||||
### **Logging Package**
|
Centralized logging is implemented in the BFF using `nestjs-pino`:
|
||||||
|
|
||||||
- **Purpose**: Centralized structured logging
|
- **Structured JSON logging** for production
|
||||||
- **Features**: Pino-based logging with correlation IDs
|
- **Correlation IDs** for request tracing
|
||||||
- **Security**: Automatic PII redaction [[memory:6689308]]
|
- **Automatic PII redaction** for security
|
||||||
|
|
||||||
## 🔗 **Integration Architecture**
|
## 🔗 **Integration Architecture**
|
||||||
|
|
||||||
|
|||||||
@ -41,10 +41,10 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
|
|||||||
|
|
||||||
- Endpoints: `GET /invoices`, `GET /invoices/:id`, `GET /invoices/:id/subscriptions`, `POST /invoices/:id/sso-link`, `POST /invoices/:id/payment-link` (apps/bff/src/invoices/invoices.controller.ts:1).
|
- Endpoints: `GET /invoices`, `GET /invoices/:id`, `GET /invoices/:id/subscriptions`, `POST /invoices/:id/sso-link`, `POST /invoices/:id/payment-link` (apps/bff/src/invoices/invoices.controller.ts:1).
|
||||||
- Service flow: resolve mapping → fetch from WHMCS via `WhmcsService` → transform/cache → return (apps/bff/src/invoices/invoices.service.ts:24).
|
- Service flow: resolve mapping → fetch from WHMCS via `WhmcsService` → transform/cache → return (apps/bff/src/invoices/invoices.service.ts:24).
|
||||||
- List/paginate via WHMCS GetInvoices; details enriched with line items and `serviceId` links (apps/bff/src/vendors/whmcs/services/whmcs-invoice.service.ts:1).
|
- List/paginate via WHMCS GetInvoices; details enriched with line items and `serviceId` links (apps/bff/src/integrations/whmcs/services/whmcs-invoice.service.ts:1).
|
||||||
- Subscriptions listed via WHMCS GetClientsProducts; transformed and cached (apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1).
|
- Subscriptions listed via WHMCS GetClientsProducts; transformed and cached (apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1).
|
||||||
- Payment methods/gateways via WHMCS; cached in Redis; also used for gating order creation/provisioning (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:1).
|
- Payment methods/gateways via WHMCS; cached in Redis; also used for gating order creation/provisioning (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:1).
|
||||||
- SSO links: invoice view/download/pay and payment-page with preselected method/gateway (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168).
|
- SSO links: invoice view/download/pay and payment-page with preselected method/gateway (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168).
|
||||||
|
|
||||||
## Orders — Creation (Portal ➝ Salesforce)
|
## Orders — Creation (Portal ➝ Salesforce)
|
||||||
|
|
||||||
@ -74,23 +74,23 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
|
|||||||
- Validate request: not already provisioned (checks `WHMCS_Order_ID__c`), ensure client has payment method; resolve mapping (apps/bff/src/orders/services/order-fulfillment-validator.service.ts:23)
|
- Validate request: not already provisioned (checks `WHMCS_Order_ID__c`), ensure client has payment method; resolve mapping (apps/bff/src/orders/services/order-fulfillment-validator.service.ts:23)
|
||||||
- Set SF activation status to `Activating` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:98)
|
- Set SF activation status to `Activating` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:98)
|
||||||
- Load SF Order details + OrderItems, map each to WHMCS items using the Product2 mapping (`WH_Product_ID__c`) and billing cycle (apps/bff/src/orders/services/order-whmcs-mapper.service.ts:1)
|
- Load SF Order details + OrderItems, map each to WHMCS items using the Product2 mapping (`WH_Product_ID__c`) and billing cycle (apps/bff/src/orders/services/order-whmcs-mapper.service.ts:1)
|
||||||
- Create WHMCS order (AddOrder) with Stripe as payment method; optional promo code and tracking notes (apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts:20)
|
- Create WHMCS order (AddOrder) with Stripe as payment method; optional promo code and tracking notes (apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts:20)
|
||||||
- Accept/provision order (AcceptOrder), capture service IDs and invoice ID returned (apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts:60)
|
- Accept/provision order (AcceptOrder), capture service IDs and invoice ID returned (apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts:60)
|
||||||
- Update SF: `Status=Completed`, `Activation_Status__c=Activated`, and write back `WHMCS_Order_ID__c` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:117)
|
- Update SF: `Status=Completed`, `Activation_Status__c=Activated`, and write back `WHMCS_Order_ID__c` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:117)
|
||||||
- Error handling: On failure, set `Status=Pending Review`, `Activation_Status__c=Failed`, and write concise error code/message for operator triage (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:146).
|
- Error handling: On failure, set `Status=Pending Review`, `Activation_Status__c=Failed`, and write concise error code/message for operator triage (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:146).
|
||||||
|
|
||||||
## Subscriptions (Shown in Portal)
|
## Subscriptions (Shown in Portal)
|
||||||
|
|
||||||
- Data comes from WHMCS products/services via `GetClientsProducts` and is transformed into a standard Subscription list (apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1).
|
- Data comes from WHMCS products/services via `GetClientsProducts` and is transformed into a standard Subscription list (apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1).
|
||||||
- Cached per user; supports status filtering; invoice items link to `serviceId` to show related subscriptions (apps/bff/src/vendors/whmcs/transformers/whmcs-data.transformer.ts:35).
|
- Cached per user; supports status filtering; invoice items link to `serviceId` to show related subscriptions (apps/bff/src/integrations/whmcs/transformers/whmcs-data.transformer.ts:35).
|
||||||
|
|
||||||
## Payments & SSO
|
## Payments & SSO
|
||||||
|
|
||||||
- Payment methods summary drives UI gating and provisioning validation (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:44).
|
- Payment methods summary drives UI gating and provisioning validation (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:44).
|
||||||
- SSO flows
|
- SSO flows
|
||||||
- General WHMCS SSO (dashboard/settings) via `CreateSsoToken`
|
- General WHMCS SSO (dashboard/settings) via `CreateSsoToken`
|
||||||
- Invoice view/download/pay SSO (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168)
|
- Invoice view/download/pay SSO (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168)
|
||||||
- Payment link with pre‑selected saved method or gateway (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168)
|
- Payment link with pre‑selected saved method or gateway (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168)
|
||||||
|
|
||||||
## Caching & Performance
|
## Caching & Performance
|
||||||
|
|
||||||
@ -139,8 +139,8 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
|
|||||||
- Salesforce events subscriber: apps/bff/src/vendors/salesforce/events/pubsub.subscriber.ts:58
|
- Salesforce events subscriber: apps/bff/src/vendors/salesforce/events/pubsub.subscriber.ts:58
|
||||||
- Provisioning queue processor: apps/bff/src/orders/queue/provisioning.processor.ts:26
|
- Provisioning queue processor: apps/bff/src/orders/queue/provisioning.processor.ts:26
|
||||||
- Invoices service: apps/bff/src/invoices/invoices.service.ts:24
|
- Invoices service: apps/bff/src/invoices/invoices.service.ts:24
|
||||||
- Subscriptions service: apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1
|
- Subscriptions service: apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1
|
||||||
- Payment/SSO service: apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:1
|
- Payment/SSO service: apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:1
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@ -6,21 +6,27 @@ We provide **environment-specific templates** for easy setup:
|
|||||||
|
|
||||||
### 📁 **Available Templates:**
|
### 📁 **Available Templates:**
|
||||||
|
|
||||||
- 🔸 **`.env.example`** - Standard environment template for all environments
|
Located in the `env/` folder:
|
||||||
- 🔸 **Environment-specific values** - Adjust settings based on development vs production needs
|
|
||||||
|
- 🔸 **`env/dev.env.sample`** - Development environment template
|
||||||
|
- 🔸 **`env/portal-backend.env.sample`** - Backend-specific variables reference
|
||||||
|
- 🔸 **`env/portal-frontend.env.sample`** - Frontend-specific variables reference
|
||||||
|
|
||||||
### 🎯 **Benefits:**
|
### 🎯 **Benefits:**
|
||||||
|
|
||||||
- ✅ **Environment-specific**: Clear separation of dev vs prod
|
- ✅ **Environment-specific**: Clear separation of dev vs prod
|
||||||
- ✅ **Secure defaults**: Production uses strong security settings
|
- ✅ **Secure defaults**: Production uses strong security settings
|
||||||
- ✅ **Easy setup**: Copy the right template for your needs
|
- ✅ **Easy setup**: Copy the template for your needs
|
||||||
- ✅ **No confusion**: Clear instructions for each environment
|
- ✅ **No confusion**: Clear instructions for each environment
|
||||||
|
|
||||||
## 🔧 **Environment File Structure**
|
## 🔧 **Environment File Structure**
|
||||||
|
|
||||||
```
|
```
|
||||||
📦 Customer Portal
|
📦 Customer Portal
|
||||||
├── .env.example # 🔸 Environment template
|
├── env/
|
||||||
|
│ ├── dev.env.sample # 🔸 Development template
|
||||||
|
│ ├── portal-backend.env.sample # Backend variables
|
||||||
|
│ └── portal-frontend.env.sample # Frontend variables
|
||||||
├── .env # ✅ Your actual config (gitignored)
|
├── .env # ✅ Your actual config (gitignored)
|
||||||
├── apps/
|
├── apps/
|
||||||
│ ├── bff/ # 🚀 Backend reads from root .env
|
│ ├── bff/ # 🚀 Backend reads from root .env
|
||||||
@ -42,7 +48,7 @@ We provide **environment-specific templates** for easy setup:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Copy development environment template
|
# Copy development environment template
|
||||||
cp .env.dev.example .env
|
cp env/dev.env.sample .env
|
||||||
|
|
||||||
# Edit with your dev values (most defaults work!)
|
# Edit with your dev values (most defaults work!)
|
||||||
nano .env # Configure for local development
|
nano .env # Configure for local development
|
||||||
@ -51,8 +57,8 @@ nano .env # Configure for local development
|
|||||||
**🔸 For Production:**
|
**🔸 For Production:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Copy production environment template
|
# Start from the development template and adjust for production
|
||||||
cp .env.production.example .env
|
cp env/dev.env.sample .env
|
||||||
|
|
||||||
# Edit with your production values (REQUIRED!)
|
# Edit with your production values (REQUIRED!)
|
||||||
nano .env # Replace with secure production values
|
nano .env # Replace with secure production values
|
||||||
|
|||||||
@ -888,10 +888,10 @@ User Action → Cost Calculation → Invoice Creation → Payment Capture → Da
|
|||||||
|
|
||||||
### 📝 **Implementation Files Modified**:
|
### 📝 **Implementation Files Modified**:
|
||||||
|
|
||||||
1. `apps/bff/src/vendors/whmcs/types/whmcs-api.types.ts` - Added WHMCS API types
|
1. `apps/bff/src/integrations/whmcs/types/whmcs-api.types.ts` - Added WHMCS API types
|
||||||
2. `apps/bff/src/vendors/whmcs/services/whmcs-connection.service.ts` - Added API methods
|
2. `apps/bff/src/integrations/whmcs/connection/whmcs-connection.service.ts` - Added API methods
|
||||||
3. `apps/bff/src/vendors/whmcs/services/whmcs-invoice.service.ts` - Added invoice creation
|
3. `apps/bff/src/integrations/whmcs/services/whmcs-invoice.service.ts` - Added invoice creation
|
||||||
4. `apps/bff/src/vendors/whmcs/whmcs.service.ts` - Exposed new methods
|
4. `apps/bff/src/integrations/whmcs/whmcs.service.ts` - Exposed new methods
|
||||||
5. `apps/bff/src/subscriptions/sim-management.service.ts` - Complete payment flow
|
5. `apps/bff/src/subscriptions/sim-management.service.ts` - Complete payment flow
|
||||||
|
|
||||||
## 🎯 **Latest Update: Simplified Top-Up Interface (January 2025)**
|
## 🎯 **Latest Update: Simplified Top-Up Interface (January 2025)**
|
||||||
|
|||||||
@ -61,7 +61,7 @@ The WHMCS `GetPayMethods` API returns payment method data with different field n
|
|||||||
|
|
||||||
### 1. Payment Method Transformer
|
### 1. Payment Method Transformer
|
||||||
|
|
||||||
**File**: `apps/bff/src/vendors/whmcs/transformers/whmcs-data.transformer.ts`
|
**File**: `apps/bff/src/integrations/whmcs/transformers/whmcs-data.transformer.ts`
|
||||||
|
|
||||||
**Changes Made:**
|
**Changes Made:**
|
||||||
|
|
||||||
@ -81,7 +81,7 @@ ccType: whmcsPayMethod.cc_type || whmcsPayMethod.card_type,
|
|||||||
|
|
||||||
### 2. Payment Service Enhancement
|
### 2. Payment Service Enhancement
|
||||||
|
|
||||||
**File**: `apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts`
|
**File**: `apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts`
|
||||||
|
|
||||||
**Changes Made:**
|
**Changes Made:**
|
||||||
|
|
||||||
|
|||||||
407
docs/operations/database-operations.md
Normal file
407
docs/operations/database-operations.md
Normal file
@ -0,0 +1,407 @@
|
|||||||
|
# Database Operations Runbook
|
||||||
|
|
||||||
|
This document covers operational procedures for the PostgreSQL database used by the Customer Portal BFF.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
| Component | Technology | Location |
|
||||||
|
| --------------- | ------------------------- | ----------------------------- |
|
||||||
|
| Database | PostgreSQL 17 | Configured via `DATABASE_URL` |
|
||||||
|
| ORM | Prisma 6 | `apps/bff/prisma/` |
|
||||||
|
| Connection Pool | Prisma connection pooling | Default: 10 connections |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backup Procedures
|
||||||
|
|
||||||
|
### Automated Backups
|
||||||
|
|
||||||
|
> **Note**: Configure automated backups based on your hosting environment.
|
||||||
|
|
||||||
|
**Recommended Schedule:**
|
||||||
|
|
||||||
|
- Full backup: Daily at 02:00 UTC
|
||||||
|
- Transaction log backup: Every 15 minutes
|
||||||
|
- Retention: 30 days
|
||||||
|
|
||||||
|
### Manual Backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create a full database backup
|
||||||
|
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).sql
|
||||||
|
|
||||||
|
# Create a compressed backup
|
||||||
|
pg_dump $DATABASE_URL | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
|
||||||
|
|
||||||
|
# Backup specific tables
|
||||||
|
pg_dump $DATABASE_URL -t users -t id_mappings > user_data_backup.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backup Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify backup integrity (restore to temp database)
|
||||||
|
createdb portal_backup_test
|
||||||
|
psql portal_backup_test < backup_YYYYMMDD.sql
|
||||||
|
|
||||||
|
# Run basic integrity checks
|
||||||
|
psql portal_backup_test -c "SELECT COUNT(*) FROM users"
|
||||||
|
psql portal_backup_test -c "SELECT COUNT(*) FROM id_mappings"
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
dropdb portal_backup_test
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recovery Procedures
|
||||||
|
|
||||||
|
### Point-in-Time Recovery
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
|
||||||
|
- WAL archiving enabled
|
||||||
|
- Continuous backup configured
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop the application
|
||||||
|
pnpm prod:stop
|
||||||
|
|
||||||
|
# Restore from backup
|
||||||
|
pg_restore -d $DATABASE_URL backup_YYYYMMDD.dump
|
||||||
|
|
||||||
|
# Run Prisma migrations to ensure schema is current
|
||||||
|
pnpm db:migrate
|
||||||
|
|
||||||
|
# Restart the application
|
||||||
|
pnpm prod:start
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore from SQL Backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop the application to prevent writes
|
||||||
|
pnpm prod:stop
|
||||||
|
|
||||||
|
# Drop and recreate database (DESTRUCTIVE)
|
||||||
|
dropdb portal_production
|
||||||
|
createdb portal_production
|
||||||
|
|
||||||
|
# Restore from backup
|
||||||
|
psql $DATABASE_URL < backup_YYYYMMDD.sql
|
||||||
|
|
||||||
|
# Verify restoration
|
||||||
|
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users"
|
||||||
|
|
||||||
|
# Restart application
|
||||||
|
pnpm prod:start
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Management
|
||||||
|
|
||||||
|
### Running Migrations
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Development: Apply pending migrations
|
||||||
|
pnpm db:migrate
|
||||||
|
|
||||||
|
# Production: Deploy migrations
|
||||||
|
pnpm db:migrate --skip-generate
|
||||||
|
|
||||||
|
# View migration status
|
||||||
|
npx prisma migrate status
|
||||||
|
```
|
||||||
|
|
||||||
|
### Migration Checklist
|
||||||
|
|
||||||
|
Before deploying migrations to production:
|
||||||
|
|
||||||
|
1. [ ] Test migration on staging environment
|
||||||
|
2. [ ] Verify rollback procedure exists
|
||||||
|
3. [ ] Estimate migration duration
|
||||||
|
4. [ ] Schedule maintenance window if needed
|
||||||
|
5. [ ] Create backup before migration
|
||||||
|
6. [ ] Notify team of deployment
|
||||||
|
|
||||||
|
### Rollback Procedure
|
||||||
|
|
||||||
|
Prisma does not have built-in rollback. Use these approaches:
|
||||||
|
|
||||||
|
**Option 1: Restore from Backup**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restore database to pre-migration state
|
||||||
|
psql $DATABASE_URL < pre_migration_backup.sql
|
||||||
|
|
||||||
|
# Revert migration files in codebase
|
||||||
|
git revert <migration-commit>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option 2: Manual Rollback SQL**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create rollback SQL for each migration
|
||||||
|
# Store in: apps/bff/prisma/rollbacks/
|
||||||
|
|
||||||
|
# Example rollback
|
||||||
|
psql $DATABASE_URL < rollbacks/20240115_rollback.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option 3: Reset and Reseed (Development Only)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# WARNING: Destroys all data
|
||||||
|
pnpm db:reset
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ID Mappings Data Integrity
|
||||||
|
|
||||||
|
The `id_mappings` table links portal users to WHMCS and Salesforce accounts. Corruption here causes authentication and data access failures.
|
||||||
|
|
||||||
|
### Verify Mapping Integrity
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Check for orphaned mappings (portal user deleted but mapping exists)
|
||||||
|
SELECT m.* FROM id_mappings m
|
||||||
|
LEFT JOIN users u ON m.user_id = u.id
|
||||||
|
WHERE u.id IS NULL;
|
||||||
|
|
||||||
|
-- Check for duplicate WHMCS mappings
|
||||||
|
SELECT whmcs_client_id, COUNT(*) as count
|
||||||
|
FROM id_mappings
|
||||||
|
WHERE whmcs_client_id IS NOT NULL
|
||||||
|
GROUP BY whmcs_client_id
|
||||||
|
HAVING COUNT(*) > 1;
|
||||||
|
|
||||||
|
-- Check for duplicate Salesforce mappings
|
||||||
|
SELECT sf_account_id, COUNT(*) as count
|
||||||
|
FROM id_mappings
|
||||||
|
WHERE sf_account_id IS NOT NULL
|
||||||
|
GROUP BY sf_account_id
|
||||||
|
HAVING COUNT(*) > 1;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix Orphaned Mappings
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Remove mappings for deleted users
|
||||||
|
DELETE FROM id_mappings
|
||||||
|
WHERE user_id NOT IN (SELECT id FROM users);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix Duplicate Mappings
|
||||||
|
|
||||||
|
> **Warning**: Investigate duplicates before deleting. They may indicate data issues.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- View duplicate details before fixing
|
||||||
|
SELECT m.*, u.email FROM id_mappings m
|
||||||
|
JOIN users u ON m.user_id = u.id
|
||||||
|
WHERE m.whmcs_client_id IN (
|
||||||
|
SELECT whmcs_client_id FROM id_mappings
|
||||||
|
GROUP BY whmcs_client_id HAVING COUNT(*) > 1
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PostgreSQL Maintenance
|
||||||
|
|
||||||
|
### VACUUM and ANALYZE
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Analyze all tables for query optimization
|
||||||
|
ANALYZE;
|
||||||
|
|
||||||
|
-- Vacuum to reclaim space (non-blocking)
|
||||||
|
VACUUM;
|
||||||
|
|
||||||
|
-- Full vacuum (blocking, reclaims more space)
|
||||||
|
VACUUM FULL;
|
||||||
|
|
||||||
|
-- Vacuum specific table
|
||||||
|
VACUUM ANALYZE id_mappings;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recommended Schedule:**
|
||||||
|
|
||||||
|
- `VACUUM ANALYZE`: Daily during low-traffic hours
|
||||||
|
- `VACUUM FULL`: Monthly during maintenance window
|
||||||
|
|
||||||
|
### Index Maintenance
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Check index usage
|
||||||
|
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
|
||||||
|
FROM pg_stat_user_indexes
|
||||||
|
ORDER BY idx_scan DESC;
|
||||||
|
|
||||||
|
-- Find unused indexes (candidates for removal)
|
||||||
|
SELECT schemaname, tablename, indexname
|
||||||
|
FROM pg_stat_user_indexes
|
||||||
|
WHERE idx_scan = 0;
|
||||||
|
|
||||||
|
-- Reindex a table
|
||||||
|
REINDEX TABLE id_mappings;
|
||||||
|
|
||||||
|
-- Reindex entire database (during maintenance window)
|
||||||
|
REINDEX DATABASE portal_production;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Table Bloat
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Estimate table bloat
|
||||||
|
SELECT
|
||||||
|
schemaname, tablename,
|
||||||
|
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as size,
|
||||||
|
n_dead_tup as dead_rows,
|
||||||
|
n_live_tup as live_rows,
|
||||||
|
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
|
||||||
|
FROM pg_stat_user_tables
|
||||||
|
ORDER BY n_dead_tup DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Connection Pool Monitoring
|
||||||
|
|
||||||
|
### Check Active Connections
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Current connection count
|
||||||
|
SELECT COUNT(*) as connections FROM pg_stat_activity;
|
||||||
|
|
||||||
|
-- Connections by state
|
||||||
|
SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;
|
||||||
|
|
||||||
|
-- Connections by application
|
||||||
|
SELECT application_name, COUNT(*)
|
||||||
|
FROM pg_stat_activity
|
||||||
|
GROUP BY application_name;
|
||||||
|
|
||||||
|
-- Long-running queries (>5 minutes)
|
||||||
|
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
|
||||||
|
FROM pg_stat_activity
|
||||||
|
WHERE state = 'active'
|
||||||
|
AND now() - pg_stat_activity.query_start > interval '5 minutes';
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kill Stuck Connections
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Terminate a specific query
|
||||||
|
SELECT pg_terminate_backend(<pid>);
|
||||||
|
|
||||||
|
-- Terminate all connections except current
|
||||||
|
SELECT pg_terminate_backend(pid)
|
||||||
|
FROM pg_stat_activity
|
||||||
|
WHERE pid <> pg_backend_pid()
|
||||||
|
AND datname = current_database();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prisma Connection Pool Settings
|
||||||
|
|
||||||
|
Configure in `DATABASE_URL` query parameters:
|
||||||
|
|
||||||
|
```
|
||||||
|
postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=10
|
||||||
|
```
|
||||||
|
|
||||||
|
| Parameter | Default | Recommended |
|
||||||
|
| ------------------ | ------- | ------------------ |
|
||||||
|
| `connection_limit` | 10 | 10-20 per instance |
|
||||||
|
| `pool_timeout` | 10s | 10-30s |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring Queries
|
||||||
|
|
||||||
|
### Database Size
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Total database size
|
||||||
|
SELECT pg_size_pretty(pg_database_size(current_database()));
|
||||||
|
|
||||||
|
-- Size per table
|
||||||
|
SELECT
|
||||||
|
tablename,
|
||||||
|
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
|
||||||
|
FROM pg_tables
|
||||||
|
WHERE schemaname = 'public'
|
||||||
|
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Query Performance
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Slowest queries (requires pg_stat_statements extension)
|
||||||
|
SELECT query, calls, mean_time, total_time
|
||||||
|
FROM pg_stat_statements
|
||||||
|
ORDER BY mean_time DESC
|
||||||
|
LIMIT 10;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Lock Monitoring
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Check for locks
|
||||||
|
SELECT
|
||||||
|
pg_locks.pid,
|
||||||
|
pg_stat_activity.query,
|
||||||
|
pg_locks.mode,
|
||||||
|
pg_locks.granted
|
||||||
|
FROM pg_locks
|
||||||
|
JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
|
||||||
|
WHERE NOT pg_locks.granted;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Emergency Procedures
|
||||||
|
|
||||||
|
### Database Unresponsive
|
||||||
|
|
||||||
|
1. Check PostgreSQL process status
|
||||||
|
2. Check disk space and memory
|
||||||
|
3. Kill long-running queries
|
||||||
|
4. Restart PostgreSQL if necessary
|
||||||
|
5. Check application connectivity after restart
|
||||||
|
|
||||||
|
### Disk Space Full
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check disk usage
|
||||||
|
df -h
|
||||||
|
|
||||||
|
# Find large files in PostgreSQL data directory
|
||||||
|
du -sh /var/lib/postgresql/data/*
|
||||||
|
|
||||||
|
# Clear transaction logs (if WAL archiving is working)
|
||||||
|
# WARNING: Only if logs are properly archived
|
||||||
|
```
|
||||||
|
|
||||||
|
### Corruption Detected
|
||||||
|
|
||||||
|
1. **STOP** the application immediately
|
||||||
|
2. Do not attempt repairs without backup verification
|
||||||
|
3. Restore from last known good backup
|
||||||
|
4. Investigate root cause before resuming service
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Incident Response](./incident-response.md)
|
||||||
|
- [External Dependencies](./external-dependencies.md)
|
||||||
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** December 2025
|
||||||
@ -1,28 +0,0 @@
|
|||||||
# Temporarily Disabled Modules
|
|
||||||
|
|
||||||
The backend currently omits two partially implemented modules from the runtime
|
|
||||||
NestJS configuration so that the public API surface only exposes completed
|
|
||||||
features.
|
|
||||||
|
|
||||||
## Cases Module
|
|
||||||
|
|
||||||
- Removed from `AppModule` and `apiRoutes` to ensure the unfinished `/cases`
|
|
||||||
endpoints are not routable.
|
|
||||||
- All existing code remains in `apps/bff/src/modules/cases/` for future
|
|
||||||
development; re-enable by importing the module in
|
|
||||||
`apps/bff/src/app.module.ts` and adding it back to the router configuration in
|
|
||||||
`apps/bff/src/core/config/router.config.ts` once the endpoints are ready.
|
|
||||||
|
|
||||||
## Jobs Module
|
|
||||||
|
|
||||||
- Temporarily excluded from `AppModule` while the reconciliation workflows are
|
|
||||||
fleshed out.
|
|
||||||
- The BullMQ processor now logs an explicit warning and acknowledges each job so
|
|
||||||
queue workers do not hang when the module is re-registered.
|
|
||||||
- When background processing is ready, restore the `JobsModule` import in
|
|
||||||
`apps/bff/src/app.module.ts` and replace the placeholder logic in
|
|
||||||
`ReconcileProcessor.process` with the real reconciliation implementation.
|
|
||||||
|
|
||||||
> **Note**: If additional queues or HTTP routes reference these modules, make
|
|
||||||
> sure they fail fast with a `501 Not Implemented` response or similar logging so
|
|
||||||
> that downstream systems have clear telemetry while the modules are disabled.
|
|
||||||
325
docs/operations/external-dependencies.md
Normal file
325
docs/operations/external-dependencies.md
Normal file
@ -0,0 +1,325 @@
|
|||||||
|
# External Dependencies Runbook
|
||||||
|
|
||||||
|
This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Overview
|
||||||
|
|
||||||
|
| System | Purpose | Integration | Health Check |
|
||||||
|
| ---------------------- | -------------------------------- | -------------------------- | --------------- |
|
||||||
|
| **Salesforce** | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test |
|
||||||
|
| **WHMCS** | Billing, Payments, Subscriptions | REST API | API action test |
|
||||||
|
| **Freebit** | SIM Management | REST API | OEM auth test |
|
||||||
|
| **SFTP (fs.mvno.net)** | Call/SMS Records | SFTP | Connection test |
|
||||||
|
| **Redis** | Cache, Sessions, Queues | Direct connection | PING command |
|
||||||
|
| **PostgreSQL** | User data, Mappings | Direct connection | Query test |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Salesforce
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
| ---------------------------- | ------------------------------------------------------- |
|
||||||
|
| `SF_LOGIN_URL` | Login URL (login.salesforce.com or test.salesforce.com) |
|
||||||
|
| `SF_CLIENT_ID` | Connected App Consumer Key |
|
||||||
|
| `SF_USERNAME` | Integration user username |
|
||||||
|
| `SF_PRIVATE_KEY_PATH` | Path to JWT private key |
|
||||||
|
| `SF_EVENTS_ENABLED` | Enable Platform Event subscription |
|
||||||
|
| `SF_PROVISION_EVENT_CHANNEL` | Platform Event channel for provisioning |
|
||||||
|
| `PORTAL_PRICEBOOK_ID` | Salesforce Pricebook ID for catalog |
|
||||||
|
|
||||||
|
### Health Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Salesforce connectivity via BFF health endpoint
|
||||||
|
curl http://localhost:4000/health | jq '.'
|
||||||
|
|
||||||
|
# Test JWT authentication manually
|
||||||
|
# The BFF authenticates automatically; check logs for auth errors
|
||||||
|
grep "Salesforce" /var/log/bff/combined.log | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**JWT Authentication Failure**
|
||||||
|
|
||||||
|
- Verify private key file exists and is readable
|
||||||
|
- Check Connected App settings in Salesforce
|
||||||
|
- Ensure integration user is pre-authorized for Connected App
|
||||||
|
- Verify `SF_USERNAME` matches the user assigned to Connected App
|
||||||
|
|
||||||
|
**Platform Events Not Receiving**
|
||||||
|
|
||||||
|
- Check `SF_EVENTS_ENABLED=true`
|
||||||
|
- Verify Platform Event permissions for integration user
|
||||||
|
- Check Redis for replay ID: `redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"`
|
||||||
|
- Set `SF_EVENTS_REPLAY=ALL` temporarily to catch up on missed events
|
||||||
|
|
||||||
|
**API Limits**
|
||||||
|
|
||||||
|
- Salesforce has daily API call limits
|
||||||
|
- Monitor usage in Salesforce Setup > API Usage
|
||||||
|
- Consider caching frequently accessed data
|
||||||
|
|
||||||
|
### Expected Response Times
|
||||||
|
|
||||||
|
| Operation | Expected | Alert Threshold |
|
||||||
|
| -------------- | --------- | --------------- |
|
||||||
|
| Query | <500ms | >2s |
|
||||||
|
| Update | <1s | >3s |
|
||||||
|
| Platform Event | Real-time | >5s delay |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## WHMCS
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
| -------------------------------- | ----------------------------------- |
|
||||||
|
| `WHMCS_API_URL` | WHMCS API endpoint URL |
|
||||||
|
| `WHMCS_API_IDENTIFIER` | API credentials identifier |
|
||||||
|
| `WHMCS_API_SECRET` | API credentials secret |
|
||||||
|
| `WHMCS_CUSTOMER_NUMBER_FIELD_ID` | Custom field ID for Customer Number |
|
||||||
|
|
||||||
|
### Health Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test WHMCS API directly
|
||||||
|
curl -X POST "$WHMCS_API_URL" \
|
||||||
|
-d "identifier=$WHMCS_API_IDENTIFIER" \
|
||||||
|
-d "secret=$WHMCS_API_SECRET" \
|
||||||
|
-d "action=GetClients" \
|
||||||
|
-d "responsetype=json" \
|
||||||
|
-d "limitnum=1"
|
||||||
|
|
||||||
|
# Should return: {"result":"success","totalresults":...}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**Authentication Failure**
|
||||||
|
|
||||||
|
- Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials
|
||||||
|
- Check IP whitelist settings (if configured)
|
||||||
|
- Ensure API credentials have required permissions
|
||||||
|
|
||||||
|
**Rate Limiting**
|
||||||
|
|
||||||
|
- WHMCS may rate limit excessive requests
|
||||||
|
- Check for 429 responses in logs
|
||||||
|
- Implement request queuing if needed
|
||||||
|
|
||||||
|
**Field Mapping Issues**
|
||||||
|
|
||||||
|
- Payment method fields may use different names between WHMCS versions
|
||||||
|
- Check [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) for field mapping
|
||||||
|
|
||||||
|
### Expected Response Times
|
||||||
|
|
||||||
|
| Operation | Expected | Alert Threshold |
|
||||||
|
| ----------- | -------- | --------------- |
|
||||||
|
| GetInvoices | <500ms | >2s |
|
||||||
|
| AddOrder | <1s | >3s |
|
||||||
|
| AcceptOrder | <1s | >3s |
|
||||||
|
| SSO Token | <500ms | >2s |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Freebit
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
| ------------------ | ---------------------- |
|
||||||
|
| `FREEBIT_BASE_URL` | Freebit API base URL |
|
||||||
|
| `FREEBIT_OEM_ID` | OEM identifier |
|
||||||
|
| `FREEBIT_OEM_KEY` | OEM authentication key |
|
||||||
|
| `FREEBIT_TIMEOUT` | Request timeout (ms) |
|
||||||
|
|
||||||
|
### Health Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Freebit OEM authentication
|
||||||
|
# The BFF handles auth automatically; check logs for auth errors
|
||||||
|
grep "Freebit" /var/log/bff/combined.log | tail -20
|
||||||
|
|
||||||
|
# Check for auth token in cache
|
||||||
|
redis-cli GET "freebit:auth:token"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**OEM Authentication Failure**
|
||||||
|
|
||||||
|
- Verify `FREEBIT_OEM_ID` and `FREEBIT_OEM_KEY`
|
||||||
|
- Check Freebit API endpoint accessibility
|
||||||
|
- Auth tokens are cached; clear cache if credentials changed
|
||||||
|
|
||||||
|
**SIM Operations Failing**
|
||||||
|
|
||||||
|
- Verify SIM account identifier (phone number) format
|
||||||
|
- Check 30-minute operation gap requirements
|
||||||
|
- See [Freebit SIM Management](../integrations/sim/freebit.md) for operation constraints
|
||||||
|
|
||||||
|
**Network Type Changes Delayed**
|
||||||
|
|
||||||
|
- Network type changes are queued with 30-minute delay
|
||||||
|
- Check BullMQ queue for pending jobs
|
||||||
|
|
||||||
|
### Expected Response Times
|
||||||
|
|
||||||
|
| Operation | Expected | Alert Threshold |
|
||||||
|
| ------------- | -------- | --------------- |
|
||||||
|
| Auth (cached) | <100ms | >500ms |
|
||||||
|
| GetDetail | <1s | >3s |
|
||||||
|
| Plan Change | <2s | >5s |
|
||||||
|
| Top-up | <2s | >5s |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SFTP (fs.mvno.net)
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
| ----------------------- | ----------------------- |
|
||||||
|
| `SFTP_HOST` | SFTP server hostname |
|
||||||
|
| `SFTP_PORT` | SFTP port (default: 22) |
|
||||||
|
| `SFTP_USERNAME` | SFTP username |
|
||||||
|
| `SFTP_PRIVATE_KEY_PATH` | Path to SSH private key |
|
||||||
|
|
||||||
|
### Health Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test SFTP connectivity
|
||||||
|
sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF
|
||||||
|
ls
|
||||||
|
exit
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**Connection Refused**
|
||||||
|
|
||||||
|
- Verify SFTP server is accessible
|
||||||
|
- Check firewall rules
|
||||||
|
- Verify SSH key fingerprint
|
||||||
|
|
||||||
|
**Authentication Failure**
|
||||||
|
|
||||||
|
- Verify SSH private key is correct
|
||||||
|
- Check key permissions (should be 600)
|
||||||
|
- Ensure public key is authorized on SFTP server
|
||||||
|
|
||||||
|
**Files Not Found**
|
||||||
|
|
||||||
|
- Call/SMS records are available 2 months behind current date
|
||||||
|
- File naming: `PASI_talk-detail-YYYYMM.csv`, `PASI_sms-detail-YYYYMM.csv`
|
||||||
|
|
||||||
|
### Data Availability
|
||||||
|
|
||||||
|
| Record Type | Availability | File Pattern |
|
||||||
|
| ------------ | --------------- | ----------------------------- |
|
||||||
|
| Call Details | 2 months behind | `PASI_talk-detail-YYYYMM.csv` |
|
||||||
|
| SMS Details | 2 months behind | `PASI_sms-detail-YYYYMM.csv` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Credential Rotation
|
||||||
|
|
||||||
|
### Salesforce JWT Key Rotation
|
||||||
|
|
||||||
|
1. Generate new key pair
|
||||||
|
2. Upload new public key to Connected App
|
||||||
|
3. Update `SF_PRIVATE_KEY_PATH` or `SF_PRIVATE_KEY_BASE64`
|
||||||
|
4. Deploy and verify authentication
|
||||||
|
5. Remove old key from Connected App after verification
|
||||||
|
|
||||||
|
### WHMCS API Credentials Rotation
|
||||||
|
|
||||||
|
1. Create new API credentials in WHMCS Admin
|
||||||
|
2. Update `WHMCS_API_IDENTIFIER` and `WHMCS_API_SECRET`
|
||||||
|
3. Deploy and verify API calls work
|
||||||
|
4. Disable old API credentials
|
||||||
|
|
||||||
|
### Freebit Key Rotation
|
||||||
|
|
||||||
|
1. Request new OEM key from Freebit
|
||||||
|
2. Update `FREEBIT_OEM_KEY`
|
||||||
|
3. Clear cached auth token: `redis-cli DEL "freebit:auth:token"`
|
||||||
|
4. Deploy and verify authentication
|
||||||
|
|
||||||
|
### SSH Key Rotation (SFTP)
|
||||||
|
|
||||||
|
1. Generate new SSH key pair
|
||||||
|
2. Provide public key to SFTP administrator
|
||||||
|
3. Wait for key to be authorized
|
||||||
|
4. Update `SFTP_PRIVATE_KEY_PATH`
|
||||||
|
5. Test connectivity
|
||||||
|
6. Request old key removal from SFTP server
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring Recommendations
|
||||||
|
|
||||||
|
### Alerting Thresholds
|
||||||
|
|
||||||
|
| System | Metric | Warning | Critical |
|
||||||
|
| ---------- | ------------- | ------- | -------- |
|
||||||
|
| Salesforce | Response time | >2s | >5s |
|
||||||
|
| Salesforce | Error rate | >1% | >5% |
|
||||||
|
| WHMCS | Response time | >2s | >5s |
|
||||||
|
| WHMCS | Error rate | >1% | >5% |
|
||||||
|
| Freebit | Response time | >3s | >10s |
|
||||||
|
| Redis | Response time | >100ms | >500ms |
|
||||||
|
| PostgreSQL | Response time | >500ms | >2s |
|
||||||
|
|
||||||
|
### Key Metrics to Monitor
|
||||||
|
|
||||||
|
- External API response times
|
||||||
|
- Error rates per integration
|
||||||
|
- Authentication success/failure rates
|
||||||
|
- Cache hit rates
|
||||||
|
- Queue depths (for async operations)
|
||||||
|
|
||||||
|
### Health Check Schedule
|
||||||
|
|
||||||
|
| System | Check Frequency | Method |
|
||||||
|
| ---------- | ---------------- | ------------------ |
|
||||||
|
| Salesforce | Every 5 minutes | Query test |
|
||||||
|
| WHMCS | Every 5 minutes | GetClients call |
|
||||||
|
| Freebit | Every 15 minutes | Auth token refresh |
|
||||||
|
| Redis | Every 1 minute | PING |
|
||||||
|
| PostgreSQL | Every 1 minute | SELECT 1 |
|
||||||
|
| SFTP | Every 1 hour | Connection test |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fallback Behaviors
|
||||||
|
|
||||||
|
| System Down | User Impact | Fallback |
|
||||||
|
| ----------- | ----------------------- | ------------------------------------ |
|
||||||
|
| Salesforce | No orders, no catalog | Show cached catalog, queue orders |
|
||||||
|
| WHMCS | No billing, no payments | Show cached invoices, block checkout |
|
||||||
|
| Freebit | No SIM management | Show cached data, disable actions |
|
||||||
|
| Redis | Slow performance | Direct API calls (no cache) |
|
||||||
|
| PostgreSQL | Portal unusable | Display maintenance message |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Incident Response](./incident-response.md)
|
||||||
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
||||||
|
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
|
||||||
|
- [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md)
|
||||||
|
- [Freebit SIM Management](../integrations/sim/freebit.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** December 2025
|
||||||
325
docs/operations/external-processes.md
Normal file
325
docs/operations/external-processes.md
Normal file
@ -0,0 +1,325 @@
|
|||||||
|
# External Processes and Team Handoffs
|
||||||
|
|
||||||
|
This document describes operational processes that occur outside the Customer Portal but are necessary for system operation and customer service.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Process Ownership Matrix
|
||||||
|
|
||||||
|
| Process | Owner | Trigger | Dependencies | Documentation |
|
||||||
|
| ----------------------------- | ----------------- | ------------------------- | --------------------------- | ----------------------------------------------- |
|
||||||
|
| Salesforce Account Creation | Sales Team | Customer inquiry | Salesforce Admin access | Salesforce training docs |
|
||||||
|
| Customer Number Assignment | Sales Team | New customer onboarding | SF Account created | Sales procedures |
|
||||||
|
| CS Order Approval | CS Team | Order in "Pending Review" | Salesforce access | CS training docs |
|
||||||
|
| Internet Eligibility Check | CS Team | Eligibility request Case | Customer address info | CS procedures |
|
||||||
|
| WHMCS Product Setup | DevOps | New product launch | WHMCS Admin access | This document |
|
||||||
|
| Salesforce Flow Maintenance | SF Admin | Feature changes | SF Admin + Dev access | SF Flow documentation |
|
||||||
|
| Freebit Account Configuration | Partner Relations | New SIM products | Freebit partner credentials | Freebit contract docs |
|
||||||
|
| SSL Certificate Renewal | DevOps | Expiration alerts | Certificate provider access | This document |
|
||||||
|
| Database Backups | DevOps | Scheduled / On-demand | DB Admin access | [Database Operations](./database-operations.md) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Customer Onboarding Flow
|
||||||
|
|
||||||
|
### Pre-Portal Setup (Sales Team)
|
||||||
|
|
||||||
|
Before a customer can use the portal, Sales must complete these steps:
|
||||||
|
|
||||||
|
1. **Create Salesforce Account**
|
||||||
|
- Create Account record with customer details
|
||||||
|
- Assign unique `SF_Account_No__c` (Customer Number)
|
||||||
|
- Set initial account status
|
||||||
|
|
||||||
|
2. **Verify Customer Information**
|
||||||
|
- Confirm contact details
|
||||||
|
- Verify billing address
|
||||||
|
- Complete KYC requirements if applicable
|
||||||
|
|
||||||
|
3. **Internet Eligibility (if applicable)**
|
||||||
|
- Submit eligibility check via portal OR
|
||||||
|
- Manually check eligibility and update Account fields:
|
||||||
|
- `Internet_Eligibility__c`
|
||||||
|
- `Internet_Eligibility_Status__c`
|
||||||
|
|
||||||
|
### Handoff to Portal
|
||||||
|
|
||||||
|
Once Sales completes setup, customer can:
|
||||||
|
|
||||||
|
- Sign up using their Customer Number
|
||||||
|
- Link existing WHMCS account (if migrating)
|
||||||
|
- Place orders through the portal
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Order Approval Flow
|
||||||
|
|
||||||
|
### CS Review Process
|
||||||
|
|
||||||
|
When an order is placed, CS must review and approve:
|
||||||
|
|
||||||
|
**Order Review Checklist:**
|
||||||
|
|
||||||
|
1. [ ] Verify customer identity matches Salesforce Account
|
||||||
|
2. [ ] Confirm product eligibility (Internet type matches eligibility)
|
||||||
|
3. [ ] Verify installation address is serviceable
|
||||||
|
4. [ ] Check for duplicate active services
|
||||||
|
5. [ ] Review any special instructions or notes
|
||||||
|
|
||||||
|
**Approval Actions:**
|
||||||
|
|
||||||
|
- Approve: Set Order `Status = Approved`
|
||||||
|
- Triggers provisioning workflow automatically
|
||||||
|
- Reject: Set Order `Status = Cancelled`
|
||||||
|
- Add rejection reason to Order notes
|
||||||
|
- Customer is notified via portal
|
||||||
|
|
||||||
|
**SLA:**
|
||||||
|
|
||||||
|
- Standard orders: Review within 2 business hours
|
||||||
|
- Priority orders: Review within 30 minutes
|
||||||
|
|
||||||
|
### Escalation Triggers
|
||||||
|
|
||||||
|
Escalate to supervisor if:
|
||||||
|
|
||||||
|
- Customer disputes eligibility result
|
||||||
|
- Multiple orders from same account in short period
|
||||||
|
- Order value exceeds threshold
|
||||||
|
- Address verification fails
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Internet Eligibility Process
|
||||||
|
|
||||||
|
### Request Flow
|
||||||
|
|
||||||
|
1. **Customer submits eligibility request** (Portal)
|
||||||
|
- Creates Salesforce Case (Type: Eligibility Check)
|
||||||
|
- Updates Account fields to "Pending"
|
||||||
|
- Creates/updates Opportunity (Stage: Introduction)
|
||||||
|
|
||||||
|
2. **CS reviews request** (Salesforce)
|
||||||
|
- Verify address details
|
||||||
|
- Check service availability databases
|
||||||
|
- Determine eligibility type (Apartment 1G, Home 1G, etc.)
|
||||||
|
|
||||||
|
3. **CS updates Salesforce** (Salesforce)
|
||||||
|
- Set `Internet_Eligibility__c` to result
|
||||||
|
- Set `Internet_Eligibility_Status__c = Checked`
|
||||||
|
- Update Opportunity stage (Ready or Void)
|
||||||
|
- Close the Case
|
||||||
|
|
||||||
|
4. **Customer sees result** (Portal)
|
||||||
|
- Portal reads updated Account fields
|
||||||
|
- Catalog shows eligible products
|
||||||
|
|
||||||
|
**SLA:**
|
||||||
|
|
||||||
|
- Standard check: 24-48 business hours
|
||||||
|
- Express check: 4 business hours
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cancellation Request Process
|
||||||
|
|
||||||
|
### Customer-Initiated Cancellation
|
||||||
|
|
||||||
|
1. **Customer requests cancellation** (Portal)
|
||||||
|
- Creates Salesforce Case (Type: Cancellation Request)
|
||||||
|
- Finds linked Opportunity via `WHMCS_Service_ID__c`
|
||||||
|
- Updates Opportunity stage to "△Cancelling"
|
||||||
|
- Sets `ScheduledCancellationDateAndTime__c`
|
||||||
|
|
||||||
|
2. **CS reviews request** (Salesforce)
|
||||||
|
- Verify customer authorization
|
||||||
|
- Check cancellation terms and fees
|
||||||
|
- Confirm scheduled date
|
||||||
|
|
||||||
|
3. **CS processes cancellation** (WHMCS + Salesforce)
|
||||||
|
- Cancel service in WHMCS (if not automatic)
|
||||||
|
- Update Opportunity stage to "△Cancelled"
|
||||||
|
- Close the Case
|
||||||
|
|
||||||
|
4. **Final billing** (WHMCS)
|
||||||
|
- Generate final invoice if applicable
|
||||||
|
- Process any prorated refunds
|
||||||
|
|
||||||
|
### Cancellation Types
|
||||||
|
|
||||||
|
| Type | Notice Period | Effective Date |
|
||||||
|
| -------- | ---------------------- | ---------------------- |
|
||||||
|
| Internet | 30 days | End of notice period |
|
||||||
|
| SIM | Immediate or scheduled | 1st of following month |
|
||||||
|
| VPN | Immediate | Same day |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Product Configuration
|
||||||
|
|
||||||
|
### Adding New Products
|
||||||
|
|
||||||
|
When launching new products, coordinate between teams:
|
||||||
|
|
||||||
|
**1. Salesforce Setup (SF Admin)**
|
||||||
|
|
||||||
|
- Create Product2 record
|
||||||
|
- Set required fields:
|
||||||
|
- `Name`, `StockKeepingUnit`
|
||||||
|
- `WH_Product_ID__c` (WHMCS product ID)
|
||||||
|
- `Billing_Cycle__c`
|
||||||
|
- `Item_Class__c` (Service, Activation, Add-on)
|
||||||
|
- Add to portal Pricebook (`PORTAL_PRICEBOOK_ID`)
|
||||||
|
|
||||||
|
**2. WHMCS Setup (DevOps/Billing)**
|
||||||
|
|
||||||
|
- Create product in WHMCS Products/Services
|
||||||
|
- Configure pricing and billing cycle
|
||||||
|
- Set up any required custom fields
|
||||||
|
- Test product creation via API
|
||||||
|
|
||||||
|
**3. Portal Verification (Development)**
|
||||||
|
|
||||||
|
- Verify product appears in catalog
|
||||||
|
- Test checkout flow with new product
|
||||||
|
- Confirm provisioning works correctly
|
||||||
|
|
||||||
|
**4. Documentation (All Teams)**
|
||||||
|
|
||||||
|
- Update product documentation
|
||||||
|
- Add to [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)
|
||||||
|
|
||||||
|
### Product Change Checklist
|
||||||
|
|
||||||
|
- [ ] Salesforce Product2 updated
|
||||||
|
- [ ] WHMCS product updated
|
||||||
|
- [ ] Pricing synced between systems
|
||||||
|
- [ ] Portal cache cleared
|
||||||
|
- [ ] Tested in staging environment
|
||||||
|
- [ ] Documentation updated
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Salesforce Flow Maintenance
|
||||||
|
|
||||||
|
### Record-Triggered Flows
|
||||||
|
|
||||||
|
The portal depends on these Salesforce Flows:
|
||||||
|
|
||||||
|
| Flow | Trigger | Action |
|
||||||
|
| ----------------------- | ---------------------------------- | ------------------------------------ |
|
||||||
|
| Order Approval Flow | Order Status → Approved | Publish `OrderProvisionRequested__e` |
|
||||||
|
| Eligibility Update Flow | Account eligibility fields changed | (Optional) Notify customer |
|
||||||
|
|
||||||
|
### Flow Change Procedure
|
||||||
|
|
||||||
|
1. **Development** (SF Admin + Dev)
|
||||||
|
- Clone existing Flow for modification
|
||||||
|
- Test in Salesforce Sandbox
|
||||||
|
- Document changes
|
||||||
|
|
||||||
|
2. **Deployment** (SF Admin)
|
||||||
|
- Schedule deployment during low-traffic period
|
||||||
|
- Notify development team
|
||||||
|
- Activate new Flow version
|
||||||
|
|
||||||
|
3. **Verification** (Dev + QA)
|
||||||
|
- Test affected portal functionality
|
||||||
|
- Verify Platform Events are received
|
||||||
|
- Check BFF logs for any errors
|
||||||
|
|
||||||
|
4. **Rollback Plan**
|
||||||
|
- Keep previous Flow version available
|
||||||
|
- Document rollback procedure
|
||||||
|
- Have SF Admin available during deployment
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSL Certificate Management
|
||||||
|
|
||||||
|
### Certificate Inventory
|
||||||
|
|
||||||
|
| Domain | Provider | Expiration | Renewal Process |
|
||||||
|
| ------------------ | ------------- | ---------- | --------------- |
|
||||||
|
| portal.example.com | Let's Encrypt | Auto-renew | Automated |
|
||||||
|
| api.example.com | Let's Encrypt | Auto-renew | Automated |
|
||||||
|
| whmcs.example.com | [Provider] | [Date] | Manual |
|
||||||
|
|
||||||
|
### Renewal Procedure
|
||||||
|
|
||||||
|
**Automated (Let's Encrypt):**
|
||||||
|
|
||||||
|
- Certbot runs automatically
|
||||||
|
- Monitor for renewal failures
|
||||||
|
- Alert if cert expires within 14 days
|
||||||
|
|
||||||
|
**Manual:**
|
||||||
|
|
||||||
|
1. Generate CSR
|
||||||
|
2. Submit to certificate provider
|
||||||
|
3. Complete domain verification
|
||||||
|
4. Download and install certificate
|
||||||
|
5. Restart affected services
|
||||||
|
6. Verify certificate in browser
|
||||||
|
|
||||||
|
### Certificate Expiration Alerts
|
||||||
|
|
||||||
|
- 30 days: Warning notification
|
||||||
|
- 14 days: Urgent notification
|
||||||
|
- 7 days: Critical alert
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Credential and Access Management
|
||||||
|
|
||||||
|
### Access Request Process
|
||||||
|
|
||||||
|
| System | Request To | Approval By | Access Level Options |
|
||||||
|
| ---------- | ---------- | ----------- | --------------------- |
|
||||||
|
| Salesforce | SF Admin | Manager | Read-only, CS, Admin |
|
||||||
|
| WHMCS | DevOps | Manager | Staff, Admin |
|
||||||
|
| BFF/Portal | DevOps | Tech Lead | Developer, Operator |
|
||||||
|
| Database | DevOps | Tech Lead | Read-only, Read-write |
|
||||||
|
|
||||||
|
### Offboarding Checklist
|
||||||
|
|
||||||
|
When a team member leaves:
|
||||||
|
|
||||||
|
- [ ] Revoke Salesforce access
|
||||||
|
- [ ] Revoke WHMCS access
|
||||||
|
- [ ] Remove from deployment systems
|
||||||
|
- [ ] Rotate any shared credentials they had access to
|
||||||
|
- [ ] Update on-call schedules
|
||||||
|
- [ ] Transfer ownership of documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Communication Channels
|
||||||
|
|
||||||
|
### Team Contacts
|
||||||
|
|
||||||
|
| Team | Channel | Escalation |
|
||||||
|
| ----------- | --------------------- | ------------- |
|
||||||
|
| Development | [Slack/Teams channel] | Tech Lead |
|
||||||
|
| CS Team | [Slack/Teams channel] | CS Manager |
|
||||||
|
| Sales Team | [Slack/Teams channel] | Sales Manager |
|
||||||
|
| DevOps | [Slack/Teams channel] | Ops Lead |
|
||||||
|
| SF Admin | [Email/Slack] | IT Manager |
|
||||||
|
|
||||||
|
### Incident Communication
|
||||||
|
|
||||||
|
See [Incident Response Runbook](./incident-response.md) for incident communication procedures.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Incident Response](./incident-response.md)
|
||||||
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
||||||
|
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
|
||||||
|
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)
|
||||||
|
- [Complete Operations Guide](../how-it-works/COMPLETE-GUIDE.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** December 2025
|
||||||
327
docs/operations/incident-response.md
Normal file
327
docs/operations/incident-response.md
Normal file
@ -0,0 +1,327 @@
|
|||||||
|
# Incident Response Runbook
|
||||||
|
|
||||||
|
This document defines procedures for responding to production incidents affecting the Customer Portal.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Severity Classification
|
||||||
|
|
||||||
|
| Severity | Definition | Response Time | Examples |
|
||||||
|
| ----------------- | -------------------------------------- | ------------- | ----------------------------------------------------------------- |
|
||||||
|
| **P1 - Critical** | Complete service outage or data loss | 15 minutes | Portal unreachable, database corruption, security breach |
|
||||||
|
| **P2 - High** | Major feature unavailable | 1 hour | Order provisioning failing, payment processing down |
|
||||||
|
| **P3 - Medium** | Degraded performance or partial outage | 4 hours | Slow response times, intermittent errors, single integration down |
|
||||||
|
| **P4 - Low** | Minor issue, workaround available | 24 hours | UI glitches, non-critical feature bugs |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Escalation Matrix
|
||||||
|
|
||||||
|
| Level | Scope | Contact | When to Escalate |
|
||||||
|
| ------ | ---------------- | ------------------- | ---------------------------------------------------- |
|
||||||
|
| **L1** | Initial Response | On-call engineer | All incidents |
|
||||||
|
| **L2** | Technical Lead | Development lead | P1/P2 not resolved in 30 minutes |
|
||||||
|
| **L3** | Management | Engineering manager | P1 not resolved in 1 hour, customer impact |
|
||||||
|
| **L4** | External | Vendor support | External system failure (Salesforce, WHMCS, Freebit) |
|
||||||
|
|
||||||
|
### On-Call Contacts
|
||||||
|
|
||||||
|
> **Note**: Update this section with actual contact information for your team.
|
||||||
|
|
||||||
|
| Role | Contact Method | Backup |
|
||||||
|
| ----------------- | ----------------- | ------- |
|
||||||
|
| Primary On-Call | [Slack/PagerDuty] | [Phone] |
|
||||||
|
| Secondary On-Call | [Slack/PagerDuty] | [Phone] |
|
||||||
|
| Engineering Lead | [Slack/Email] | [Phone] |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common Incident Scenarios
|
||||||
|
|
||||||
|
### 1. Salesforce Platform Events Not Receiving
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- Orders stuck in "Pending Review" status
|
||||||
|
- No provisioning activity in logs
|
||||||
|
- `sf:pe:replay:*` Redis keys not updating
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check BFF logs for Platform Event subscription
|
||||||
|
grep "Platform Event" /var/log/bff/combined.log | tail -50
|
||||||
|
|
||||||
|
# Check Redis replay ID
|
||||||
|
redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"
|
||||||
|
|
||||||
|
# Verify Salesforce connectivity
|
||||||
|
curl -X GET http://localhost:4000/health
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resolution:**
|
||||||
|
|
||||||
|
1. Verify `SF_EVENTS_ENABLED=true` in environment
|
||||||
|
2. Check Salesforce Connected App JWT authentication
|
||||||
|
3. Verify Platform Event permissions for integration user
|
||||||
|
4. Set `SF_EVENTS_REPLAY=ALL` temporarily to replay missed events
|
||||||
|
5. Restart BFF to re-establish subscription
|
||||||
|
|
||||||
|
**Escalation:** If unresolved in 30 minutes, contact Salesforce admin.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. WHMCS API Unavailable
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- Billing pages showing "service unavailable"
|
||||||
|
- Provisioning failing with WHMCS errors
|
||||||
|
- Payment method checks failing
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check WHMCS connectivity from BFF
|
||||||
|
curl -X POST $WHMCS_API_URL -d "action=GetClients&responsetype=json"
|
||||||
|
|
||||||
|
# Check BFF logs for WHMCS errors
|
||||||
|
grep "WHMCS" /var/log/bff/error.log | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resolution:**
|
||||||
|
|
||||||
|
1. Verify WHMCS server is accessible
|
||||||
|
2. Check WHMCS API credentials (`WHMCS_API_IDENTIFIER`, `WHMCS_API_SECRET`)
|
||||||
|
3. Check WHMCS server load and resource usage
|
||||||
|
4. Contact WHMCS hosting provider if server is down
|
||||||
|
|
||||||
|
**Escalation:** If WHMCS server is down, contact hosting provider.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Redis Connection Failures
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- Authentication failing
|
||||||
|
- Cache misses on every request
|
||||||
|
- Rate limiting not working
|
||||||
|
- SSE connections dropping
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Redis connectivity
|
||||||
|
redis-cli ping
|
||||||
|
|
||||||
|
# Check Redis memory usage
|
||||||
|
redis-cli INFO memory
|
||||||
|
|
||||||
|
# Check BFF health endpoint
|
||||||
|
curl http://localhost:4000/health | jq '.checks.cache'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resolution:**
|
||||||
|
|
||||||
|
1. Verify Redis URL in environment (`REDIS_URL`)
|
||||||
|
2. Check Redis server memory usage and eviction policy
|
||||||
|
3. Restart Redis if memory is exhausted
|
||||||
|
4. Clear stale keys if necessary: `redis-cli FLUSHDB` (caution: clears all cache)
|
||||||
|
|
||||||
|
**Impact Note:** Redis failure causes:
|
||||||
|
|
||||||
|
- Token blacklist checks to fail (security risk if `AUTH_BLACKLIST_FAIL_CLOSED=false`)
|
||||||
|
- All cached data to be re-fetched from source systems
|
||||||
|
- Rate limiting to stop working
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Database Connection Issues
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- All API requests failing with 500 errors
|
||||||
|
- Health check shows database as "fail"
|
||||||
|
- Prisma connection errors in logs
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check database connectivity
|
||||||
|
psql $DATABASE_URL -c "SELECT 1"
|
||||||
|
|
||||||
|
# Check connection count
|
||||||
|
psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity"
|
||||||
|
|
||||||
|
# Check BFF health endpoint
|
||||||
|
curl http://localhost:4000/health | jq '.checks.database'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resolution:**
|
||||||
|
|
||||||
|
1. Verify PostgreSQL server is running
|
||||||
|
2. Check connection pool limits (Prisma connection_limit)
|
||||||
|
3. Look for long-running queries and kill if necessary
|
||||||
|
4. Restart database if unresponsive
|
||||||
|
|
||||||
|
**Escalation:** If database is corrupted, see [Database Operations Runbook](./database-operations.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. High Error Rate / Performance Degradation
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- Increased response times (>2s average)
|
||||||
|
- Error rate above 1%
|
||||||
|
- Customer complaints
|
||||||
|
|
||||||
|
**Diagnosis:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check BFF process resource usage
|
||||||
|
top -p $(pgrep -f "node.*bff")
|
||||||
|
|
||||||
|
# Check recent error logs
|
||||||
|
tail -100 /var/log/bff/error.log
|
||||||
|
|
||||||
|
# Check external API response times in logs
|
||||||
|
grep "duration" /var/log/bff/combined.log | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resolution:**
|
||||||
|
|
||||||
|
1. Identify which external API is slow (Salesforce, WHMCS, Freebit)
|
||||||
|
2. Check for traffic spikes or unusual patterns
|
||||||
|
3. Scale horizontally if CPU/memory constrained
|
||||||
|
4. Enable circuit breakers or increase timeouts temporarily
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 6. Security Incident
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
|
||||||
|
- Unusual login patterns
|
||||||
|
- Suspected unauthorized access
|
||||||
|
- Data exfiltration alerts
|
||||||
|
|
||||||
|
**Immediate Actions:**
|
||||||
|
|
||||||
|
1. **DO NOT** modify logs or evidence
|
||||||
|
2. Notify security team immediately
|
||||||
|
3. Consider isolating affected systems
|
||||||
|
4. Document all observations with timestamps
|
||||||
|
|
||||||
|
**Escalation:** P1 - Immediately escalate to engineering lead and management.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Incident Response Workflow
|
||||||
|
|
||||||
|
```
|
||||||
|
1. DETECT
|
||||||
|
├── Automated alert received
|
||||||
|
├── Customer report
|
||||||
|
└── Internal discovery
|
||||||
|
|
||||||
|
2. ASSESS
|
||||||
|
├── Determine severity (P1-P4)
|
||||||
|
├── Identify affected systems
|
||||||
|
└── Estimate customer impact
|
||||||
|
|
||||||
|
3. RESPOND
|
||||||
|
├── Follow relevant scenario playbook
|
||||||
|
├── Communicate status
|
||||||
|
└── Escalate if needed
|
||||||
|
|
||||||
|
4. RESOLVE
|
||||||
|
├── Implement fix
|
||||||
|
├── Verify resolution
|
||||||
|
└── Monitor for recurrence
|
||||||
|
|
||||||
|
5. REVIEW
|
||||||
|
├── Document timeline
|
||||||
|
├── Identify root cause
|
||||||
|
└── Create action items
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Communication Templates
|
||||||
|
|
||||||
|
### Internal Status Update
|
||||||
|
|
||||||
|
```
|
||||||
|
INCIDENT UPDATE - [P1/P2/P3/P4] - [Brief Description]
|
||||||
|
|
||||||
|
Status: [Investigating/Identified/Monitoring/Resolved]
|
||||||
|
Impact: [Description of customer impact]
|
||||||
|
Started: [Time in UTC]
|
||||||
|
Last Update: [Time in UTC]
|
||||||
|
|
||||||
|
Current Actions:
|
||||||
|
- [Action 1]
|
||||||
|
- [Action 2]
|
||||||
|
|
||||||
|
Next Update: [Time]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Customer Communication (P1/P2 only)
|
||||||
|
|
||||||
|
```
|
||||||
|
We are currently experiencing issues with [service/feature].
|
||||||
|
|
||||||
|
What's happening: [Brief, non-technical description]
|
||||||
|
Impact: [What customers may experience]
|
||||||
|
Status: Our team is actively working to resolve this issue.
|
||||||
|
|
||||||
|
We will provide updates every [30 minutes/1 hour].
|
||||||
|
|
||||||
|
We apologize for any inconvenience.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Post-Incident Review
|
||||||
|
|
||||||
|
After every P1 or P2 incident, conduct a post-incident review within 3 business days.
|
||||||
|
|
||||||
|
### Review Template
|
||||||
|
|
||||||
|
1. **Incident Summary**
|
||||||
|
- What happened?
|
||||||
|
- When did it start/end?
|
||||||
|
- Who was affected?
|
||||||
|
|
||||||
|
2. **Timeline**
|
||||||
|
- Detection time
|
||||||
|
- Response time
|
||||||
|
- Resolution time
|
||||||
|
- Key milestones
|
||||||
|
|
||||||
|
3. **Root Cause Analysis**
|
||||||
|
- What was the direct cause?
|
||||||
|
- What were contributing factors?
|
||||||
|
- Why wasn't this prevented?
|
||||||
|
|
||||||
|
4. **Action Items**
|
||||||
|
- Immediate fixes applied
|
||||||
|
- Preventive measures needed
|
||||||
|
- Monitoring improvements
|
||||||
|
- Documentation updates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
||||||
|
- [Database Operations](./database-operations.md)
|
||||||
|
- [External Dependencies](./external-dependencies.md)
|
||||||
|
- [Queue Management](./queue-management.md)
|
||||||
|
- [Logging Guide](./logging.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** December 2025
|
||||||
@ -73,3 +73,79 @@ Portal does not auto-retry jobs. Network/5xx/timeouts will mark the Order Failed
|
|||||||
|
|
||||||
- `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT)
|
- `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT)
|
||||||
- `Activation_Error_Message__c` (short reason)
|
- `Activation_Error_Message__c` (short reason)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Escalation Paths
|
||||||
|
|
||||||
|
| Condition | Escalation | Contact |
|
||||||
|
| ---------------------------- | --------------------- | --------------------------------------------------- |
|
||||||
|
| Issue persists >30 minutes | Salesforce admin | Check Flow configuration, Platform Event publishing |
|
||||||
|
| WHMCS returns 5xx >5 times | WHMCS hosting support | Server may be overloaded or down |
|
||||||
|
| Event replay doesn't recover | Development team | May need code investigation |
|
||||||
|
| Product mapping errors | Salesforce admin | Add missing `WH_Product_ID__c` values |
|
||||||
|
| Payment method issues | Customer support | Guide customer to add payment method in WHMCS |
|
||||||
|
|
||||||
|
For general incident response procedures, see [Incident Response Runbook](./incident-response.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SLA Expectations
|
||||||
|
|
||||||
|
| Metric | Target | Warning | Critical |
|
||||||
|
| ----------------------- | ---------- | ----------- | ----------- |
|
||||||
|
| Provisioning completion | <5 seconds | >10 seconds | >30 seconds |
|
||||||
|
| Event processing delay | <1 second | >5 seconds | >30 seconds |
|
||||||
|
| Error rate | <1% | >1% | >5% |
|
||||||
|
|
||||||
|
### Performance Monitoring
|
||||||
|
|
||||||
|
- Monitor provisioning duration in logs (from "Platform Event enqueued" to "Activated")
|
||||||
|
- Track WHMCS API response times
|
||||||
|
- Alert on Salesforce update failures
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Manual Intervention Checklist
|
||||||
|
|
||||||
|
When automated retry fails, follow these steps:
|
||||||
|
|
||||||
|
1. **Check Salesforce Order**
|
||||||
|
- Open the Order in Salesforce
|
||||||
|
- Review `Activation_Status__c`, `Activation_Error_Code__c`, `Activation_Error_Message__c`
|
||||||
|
- Check if `WHMCS_Order_ID__c` was partially set
|
||||||
|
|
||||||
|
2. **Verify Customer Data**
|
||||||
|
- Confirm customer has valid WHMCS payment method via `GetPayMethods`
|
||||||
|
- Check `id_mappings` table for correct portal-WHMCS-SF linkage
|
||||||
|
|
||||||
|
3. **Validate Product Mappings**
|
||||||
|
- For each OrderItem, verify `Product2.WH_Product_ID__c` is set
|
||||||
|
- Verify `Product2.Billing_Cycle__c` matches WHMCS expectations
|
||||||
|
|
||||||
|
4. **Check BFF Logs**
|
||||||
|
- Search for the Salesforce Order ID in logs
|
||||||
|
- Identify the specific step that failed
|
||||||
|
- Look for external API errors (WHMCS, Salesforce)
|
||||||
|
|
||||||
|
5. **Manual Recovery**
|
||||||
|
- If WHMCS order was created but SF not updated:
|
||||||
|
- Manually update `WHMCS_Order_ID__c` and `Activation_Status__c` in Salesforce
|
||||||
|
- If WHMCS order was not created:
|
||||||
|
- Fix the root cause (payment method, mapping)
|
||||||
|
- Retry via Salesforce (set `Activation_Status__c = Activating`)
|
||||||
|
|
||||||
|
6. **Verify Resolution**
|
||||||
|
- Confirm Salesforce Order shows `Activated`
|
||||||
|
- Confirm WHMCS has the order and services
|
||||||
|
- Confirm customer can see their subscription in the portal
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Incident Response](./incident-response.md)
|
||||||
|
- [Queue Management](./queue-management.md)
|
||||||
|
- [External Dependencies](./external-dependencies.md)
|
||||||
|
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
|
||||||
|
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)
|
||||||
|
|||||||
361
docs/operations/queue-management.md
Normal file
361
docs/operations/queue-management.md
Normal file
@ -0,0 +1,361 @@
|
|||||||
|
# Queue Management Runbook
|
||||||
|
|
||||||
|
This document covers monitoring and management of BullMQ job queues used by the Customer Portal BFF.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The BFF uses BullMQ (backed by Redis) for asynchronous job processing:
|
||||||
|
|
||||||
|
| Queue | Purpose | Processor Location |
|
||||||
|
| -------------------- | --------------------------------------------- | ---------------------------------------------------- |
|
||||||
|
| `order-provisioning` | Order fulfillment after CS approval | `apps/bff/src/modules/orders/queue/` |
|
||||||
|
| `sim-management` | Delayed SIM operations (network type changes) | `apps/bff/src/modules/subscriptions/sim-management/` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Queue Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
| Variable | Description | Default |
|
||||||
|
| ------------------------ | ---------------------------------- | -------- |
|
||||||
|
| `REDIS_URL` | Redis connection for queues | Required |
|
||||||
|
| `QUEUE_DEFAULT_ATTEMPTS` | Default retry attempts | 3 |
|
||||||
|
| `QUEUE_BACKOFF_DELAY` | Backoff delay between retries (ms) | 5000 |
|
||||||
|
|
||||||
|
### Queue Options
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Default queue configuration
|
||||||
|
{
|
||||||
|
defaultJobOptions: {
|
||||||
|
attempts: 3,
|
||||||
|
backoff: {
|
||||||
|
type: 'exponential',
|
||||||
|
delay: 5000,
|
||||||
|
},
|
||||||
|
removeOnComplete: 100, // Keep last 100 completed jobs
|
||||||
|
removeOnFail: 500, // Keep last 500 failed jobs
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
### Check Queue Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Connect to Redis and check queue keys
|
||||||
|
redis-cli KEYS "bull:*"
|
||||||
|
|
||||||
|
# Check specific queue length
|
||||||
|
redis-cli LLEN "bull:order-provisioning:wait"
|
||||||
|
redis-cli LLEN "bull:order-provisioning:active"
|
||||||
|
redis-cli ZCARD "bull:order-provisioning:delayed"
|
||||||
|
redis-cli ZCARD "bull:order-provisioning:failed"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Queue Key Structure
|
||||||
|
|
||||||
|
| Key Pattern | Description |
|
||||||
|
| ------------------------ | ----------------------------------- |
|
||||||
|
| `bull:{queue}:wait` | Jobs waiting to be processed |
|
||||||
|
| `bull:{queue}:active` | Jobs currently being processed |
|
||||||
|
| `bull:{queue}:delayed` | Jobs scheduled for future execution |
|
||||||
|
| `bull:{queue}:completed` | Recently completed jobs |
|
||||||
|
| `bull:{queue}:failed` | Failed jobs |
|
||||||
|
|
||||||
|
### Health Metrics
|
||||||
|
|
||||||
|
| Metric | Warning | Critical | Action |
|
||||||
|
| ---------------- | ------- | -------- | --------------------------- |
|
||||||
|
| Wait queue depth | >10 | >50 | Check processor status |
|
||||||
|
| Failed job count | >5 | >20 | Investigate failures |
|
||||||
|
| Processing time | >30s | >60s | Check external dependencies |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Order Provisioning Queue
|
||||||
|
|
||||||
|
### Purpose
|
||||||
|
|
||||||
|
Processes orders after CS approval via Salesforce Platform Events.
|
||||||
|
|
||||||
|
### Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Salesforce Platform Event (OrderProvisionRequested__e)
|
||||||
|
↓
|
||||||
|
Event Subscriber receives event
|
||||||
|
↓
|
||||||
|
Job enqueued to 'order-provisioning' queue
|
||||||
|
↓
|
||||||
|
Processor executes fulfillment workflow
|
||||||
|
↓
|
||||||
|
Order created in WHMCS + Salesforce updated
|
||||||
|
```
|
||||||
|
|
||||||
|
### Job Data Structure
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
{
|
||||||
|
sfOrderId: "8014x000000ABCDXYZ", // Salesforce Order ID
|
||||||
|
idempotencyKey: "8014x...-1703123456789",
|
||||||
|
eventPayload: { ... } // Original Platform Event data
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Failure Reasons
|
||||||
|
|
||||||
|
| Error | Cause | Resolution |
|
||||||
|
| ------------------------ | ------------------------------ | ------------------------------------------------ |
|
||||||
|
| `PAYMENT_METHOD_MISSING` | Customer has no payment method | Customer must add payment method in WHMCS |
|
||||||
|
| `ORDER_NOT_FOUND` | Salesforce Order doesn't exist | Check Order ID, verify not deleted |
|
||||||
|
| `MAPPING_ERROR` | Product mapping missing | Add `WH_Product_ID__c` to Product2 in Salesforce |
|
||||||
|
| `WHMCS_ERROR` | WHMCS API failure | Check WHMCS connectivity and logs |
|
||||||
|
|
||||||
|
### Retry Behavior
|
||||||
|
|
||||||
|
- **Attempts**: 3 total (1 initial + 2 retries)
|
||||||
|
- **Backoff**: Exponential (5s, 10s, 20s)
|
||||||
|
- **On Final Failure**: Salesforce Order updated with error details
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SIM Management Queue
|
||||||
|
|
||||||
|
### Purpose
|
||||||
|
|
||||||
|
Handles delayed SIM operations, particularly network type changes that require a 30-minute gap.
|
||||||
|
|
||||||
|
### Job Types
|
||||||
|
|
||||||
|
| Job Type | Delay | Description |
|
||||||
|
| ------------------- | ---------- | ----------------------------- |
|
||||||
|
| `networkTypeChange` | 30 minutes | Change between 4G/5G networks |
|
||||||
|
|
||||||
|
### Job Data Structure
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
{
|
||||||
|
subscriptionId: 29951,
|
||||||
|
simAccount: "08077052946",
|
||||||
|
operation: "networkTypeChange",
|
||||||
|
params: {
|
||||||
|
networkType: "5G"
|
||||||
|
},
|
||||||
|
scheduledAt: "2024-01-15T10:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Failure Reasons
|
||||||
|
|
||||||
|
| Error | Cause | Resolution |
|
||||||
|
| --------------------- | -------------------------------- | --------------------------------------- |
|
||||||
|
| `FREEBIT_AUTH_FAILED` | Freebit authentication error | Check OEM credentials |
|
||||||
|
| `ACCOUNT_NOT_FOUND` | SIM account not found in Freebit | Verify account identifier |
|
||||||
|
| `OPERATION_CONFLICT` | Another operation pending | Wait for previous operation to complete |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failed Job Investigation
|
||||||
|
|
||||||
|
### View Failed Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List failed jobs (using Redis CLI)
|
||||||
|
redis-cli ZRANGE "bull:order-provisioning:failed" 0 -1
|
||||||
|
|
||||||
|
# Get job details
|
||||||
|
redis-cli HGETALL "bull:order-provisioning:{job-id}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Investigation Steps
|
||||||
|
|
||||||
|
1. **Check job data**: Identify the order/subscription involved
|
||||||
|
2. **Check error message**: Look for specific failure reason
|
||||||
|
3. **Check external system**: Verify Salesforce/WHMCS/Freebit status
|
||||||
|
4. **Check logs**: Search BFF logs for job ID or order ID
|
||||||
|
5. **Determine if retryable**: Some errors are permanent (missing mapping), others are transient (network timeout)
|
||||||
|
|
||||||
|
### Log Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Search logs for specific order
|
||||||
|
grep "8014x000000ABCDXYZ" /var/log/bff/combined.log
|
||||||
|
|
||||||
|
# Search for queue processing errors
|
||||||
|
grep "provisioning" /var/log/bff/error.log | tail -50
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Manual Retry Procedures
|
||||||
|
|
||||||
|
### Retry a Single Failed Job
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Using BullMQ API in Node.js
|
||||||
|
import { Queue } from "bullmq";
|
||||||
|
|
||||||
|
const queue = new Queue("order-provisioning", { connection: redisConnection });
|
||||||
|
const job = await queue.getJob("job-id");
|
||||||
|
await job.retry();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Retry All Failed Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Move all failed jobs back to waiting
|
||||||
|
redis-cli ZRANGEBYSCORE "bull:order-provisioning:failed" -inf +inf | while read jobId; do
|
||||||
|
redis-cli LPUSH "bull:order-provisioning:wait" "$jobId"
|
||||||
|
redis-cli ZREM "bull:order-provisioning:failed" "$jobId"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Warning**: Only retry jobs after fixing the root cause. Retrying without fixing will cause the same failure.
|
||||||
|
|
||||||
|
### Retry via Salesforce (Recommended for Provisioning)
|
||||||
|
|
||||||
|
For order provisioning, the recommended retry method is through Salesforce:
|
||||||
|
|
||||||
|
1. Open the Order in Salesforce
|
||||||
|
2. Clear error fields (`Activation_Error__c`, `Activation_Error_DateTime__c`)
|
||||||
|
3. Set `Activation_Status__c` back to "Activating"
|
||||||
|
4. The Record-Triggered Flow will publish a new Platform Event
|
||||||
|
|
||||||
|
This approach ensures proper idempotency tracking and audit trail.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Clearing Stuck Jobs
|
||||||
|
|
||||||
|
### Clear All Jobs from a Queue
|
||||||
|
|
||||||
|
> **Warning**: This removes all jobs including pending work. Use only in emergencies.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clear all queue data
|
||||||
|
redis-cli DEL \
|
||||||
|
"bull:order-provisioning:wait" \
|
||||||
|
"bull:order-provisioning:active" \
|
||||||
|
"bull:order-provisioning:delayed" \
|
||||||
|
"bull:order-provisioning:completed" \
|
||||||
|
"bull:order-provisioning:failed"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Clear Old Completed/Failed Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove jobs older than 7 days from completed
|
||||||
|
redis-cli ZREMRANGEBYSCORE "bull:order-provisioning:completed" -inf $(date -d '7 days ago' +%s000)
|
||||||
|
|
||||||
|
# Remove jobs older than 30 days from failed
|
||||||
|
redis-cli ZREMRANGEBYSCORE "bull:order-provisioning:failed" -inf $(date -d '30 days ago' +%s000)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Queue Backlog Handling
|
||||||
|
|
||||||
|
### Symptoms of Backlog
|
||||||
|
|
||||||
|
- Wait queue depth increasing
|
||||||
|
- Jobs not being processed
|
||||||
|
- Customer orders stuck in "Activating" status
|
||||||
|
|
||||||
|
### Diagnosis
|
||||||
|
|
||||||
|
1. **Check processor is running**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep "BullMQ" /var/log/bff/combined.log | tail -20
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check Redis connectivity**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
redis-cli PING
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Check for blocked jobs**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
redis-cli LLEN "bull:order-provisioning:active"
|
||||||
|
# If active > 0 for extended time, jobs may be stuck
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Check external dependencies**
|
||||||
|
- Salesforce API
|
||||||
|
- WHMCS API
|
||||||
|
|
||||||
|
### Resolution
|
||||||
|
|
||||||
|
1. **Restart BFF** to reconnect queue workers
|
||||||
|
2. **Clear stuck active jobs** if processor crashed mid-job
|
||||||
|
3. **Scale horizontally** if queue depth is due to high volume
|
||||||
|
4. **Fix root cause** if jobs are failing repeatedly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alerting Configuration
|
||||||
|
|
||||||
|
### Recommended Alerts
|
||||||
|
|
||||||
|
| Alert | Condition | Severity |
|
||||||
|
| ---------------------- | ------------------------------------------------ | -------- |
|
||||||
|
| Queue Backlog | Wait queue > 10 for > 5 minutes | Warning |
|
||||||
|
| Queue Backlog Critical | Wait queue > 50 | Critical |
|
||||||
|
| Failed Jobs Spike | > 5 failures in 15 minutes | Warning |
|
||||||
|
| Processor Down | No job processed in 10 minutes with jobs waiting | Critical |
|
||||||
|
| Job Timeout | Job active for > 5 minutes | Warning |
|
||||||
|
|
||||||
|
### Monitoring Queries
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check queue depths (for monitoring script)
|
||||||
|
WAIT=$(redis-cli LLEN "bull:order-provisioning:wait")
|
||||||
|
ACTIVE=$(redis-cli LLEN "bull:order-provisioning:active")
|
||||||
|
FAILED=$(redis-cli ZCARD "bull:order-provisioning:failed")
|
||||||
|
|
||||||
|
echo "Wait: $WAIT, Active: $ACTIVE, Failed: $FAILED"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Job Design
|
||||||
|
|
||||||
|
- Include sufficient context in job data for debugging
|
||||||
|
- Use idempotency keys to prevent duplicate processing
|
||||||
|
- Keep job payloads small (< 10KB)
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
- Distinguish between retryable and non-retryable errors
|
||||||
|
- Log sufficient context before throwing
|
||||||
|
- Update external systems with error status on final failure
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
|
||||||
|
- Set up alerts for queue depth and failure rate
|
||||||
|
- Monitor job processing duration
|
||||||
|
- Track success/failure ratios over time
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [Incident Response](./incident-response.md)
|
||||||
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
||||||
|
- [External Dependencies](./external-dependencies.md)
|
||||||
|
- [SIM State Machine](../integrations/sim/state-machine.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** December 2025
|
||||||
Loading…
x
Reference in New Issue
Block a user