Enhance Documentation Structure and Update Operational Runbooks

- Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management.
- Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources.
- Removed the deprecated disabled-modules.md file to streamline documentation.
- Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025.
- Updated various references in the documentation to reflect the new paths and services in the integrations directory.
This commit is contained in:
barsa 2025-12-23 15:55:58 +09:00
parent 7c929eb4dc
commit 72d0b66be7
17 changed files with 2007 additions and 153 deletions

View File

@ -138,13 +138,24 @@ Feature guides explaining how the portal functions:
## 🛠️ Operations ## 🛠️ Operations
| Document | Description | ### Runbooks
| ------------------------------------------------------------------ | ----------------------------- |
| [Logging](./operations/logging.md) | Centralized logging system | | Document | Description |
| [Security Monitoring](./operations/security-monitoring.md) | Security monitoring setup | | -------------------------------------------------------------- | ----------------------------- |
| [Provisioning Runbook](./operations/provisioning-runbook.md) | Provisioning procedures | | [Incident Response](./operations/incident-response.md) | Emergency procedures |
| [Subscription Management](./operations/subscription-management.md) | Service management | | [Provisioning Runbook](./operations/provisioning-runbook.md) | Order fulfillment procedures |
| [Disabled Modules](./operations/disabled-modules.md) | Temporarily disabled features | | [Database Operations](./operations/database-operations.md) | Backup, recovery, maintenance |
| [External Dependencies](./operations/external-dependencies.md) | Integration health checks |
| [Queue Management](./operations/queue-management.md) | BullMQ job monitoring |
| [External Processes](./operations/external-processes.md) | Team handoffs and workflows |
### System Operations
| Document | Description |
| ------------------------------------------------------------------ | -------------------------- |
| [Logging](./operations/logging.md) | Centralized logging system |
| [Security Monitoring](./operations/security-monitoring.md) | Security monitoring setup |
| [Subscription Management](./operations/subscription-management.md) | Service management |
--- ---
@ -178,11 +189,13 @@ Historical documents kept for reference:
2. [Domain Types](./development/domain/types.md) 2. [Domain Types](./development/domain/types.md)
3. [Performance](./development/portal/performance.md) 3. [Performance](./development/portal/performance.md)
### DevOps ### DevOps / Operations
1. [Deployment](./getting-started/deployment.md) 1. [Deployment](./getting-started/deployment.md)
2. [Logging](./operations/logging.md) 2. [Incident Response](./operations/incident-response.md)
3. [Provisioning Runbook](./operations/provisioning-runbook.md) 3. [Provisioning Runbook](./operations/provisioning-runbook.md)
4. [Database Operations](./operations/database-operations.md)
5. [External Dependencies](./operations/external-dependencies.md)
--- ---

View File

@ -114,13 +114,18 @@ Coding standards
│ └── plesk-deploy.sh # ✅ Plesk deployment script │ └── plesk-deploy.sh # ✅ Plesk deployment script
├── 📚 docs/ # Documentation ├── 📚 docs/ # Documentation
│ ├── README.md # ✅ Comprehensive guide │ ├── README.md # ✅ Documentation index
│ ├── GETTING_STARTED.md # ✅ Quick start guide │ ├── STRUCTURE.md # ✅ This file
│ ├── RUN.md # ✅ Development workflow │ ├── getting-started/ # Setup and running guides
│ ├── DEPLOY.md # ✅ Production deployment │ │ ├── setup.md # Initial project setup
│ ├── LOGGING.md # ✅ Logging configuration │ │ ├── running.md # Local development
│ ├── SECURITY.md # ✅ Security features and best practices │ │ └── deployment.md # Production deployment
│ └── STRUCTURE.md # ✅ This file │ ├── architecture/ # System design documents
│ ├── how-it-works/ # Feature guides
│ ├── integrations/ # External system integration
│ ├── development/ # Development guides
│ ├── operations/ # Operational runbooks
│ └── _archive/ # Historical documents
├── 📦 packages/ # Shared packages ├── 📦 packages/ # Shared packages
│ └── domain/ # Domain TypeScript utilities │ └── domain/ # Domain TypeScript utilities
@ -135,11 +140,12 @@ Coding standards
### **Environment Template Approach** ### **Environment Template Approach**
- **`.env.dev.example`** - Development-optimized template Environment templates are located in the `env/` folder:
- **`.env.production.example`** - Production-optimized template
- **`.env.example`** - Basic template for custom setups - **`env/dev.env.sample`** - Development environment template
- **`.env`** - Your actual configuration (gitignored) - **`env/portal-backend.env.sample`** - Backend-specific variables
- **Environment-specific defaults** - Appropriate values per environment - **`env/portal-frontend.env.sample`** - Frontend-specific variables
- **`.env`** - Your actual configuration (gitignored, at project root)
### **Environment Variables** ### **Environment Variables**
@ -206,13 +212,16 @@ pnpm prod:backup # Database backup
### **Essential Guides** ### **Essential Guides**
- **`README.md`** - Project overview and architecture Documentation is organized in subdirectories:
- **`GETTING_STARTED.md`** - Quick setup guide
- **`RUN.md`** - Development workflow - **`docs/README.md`** - Documentation index and navigation
- **`DEPLOY.md`** - Production deployment - **`docs/STRUCTURE.md`** - This file (project structure)
- **`LOGGING.md`** - Logging configuration - **`docs/getting-started/`** - Setup, running, and deployment guides
- **`SECURITY.md`** - Security features and best practices - **`docs/architecture/`** - System design and architecture
- **`STRUCTURE.md`** - This file - **`docs/how-it-works/`** - Feature guides and workflows
- **`docs/integrations/`** - Salesforce, WHMCS, SIM integration
- **`docs/development/`** - BFF, Portal, Auth development guides
- **`docs/operations/`** - Runbooks and operational procedures
### **No Redundancy** ### **No Redundancy**

View File

@ -27,4 +27,20 @@ Point-in-time code reviews and analysis documents:
--- ---
## Historical Notes
### December 2025 Documentation Alignment
A comprehensive documentation review was performed in December 2025 to align documentation with the actual codebase. The following corrections were made:
1. **Removed fictional package descriptions** from `system-overview.md` that referenced non-existent `packages/contracts`, `packages/schemas`, and `packages/integrations` packages
2. **Deleted `disabled-modules.md`** which referenced non-existent "Cases" and "Jobs" modules
3. **Fixed path references** from `vendors/whmcs` to `integrations/whmcs` throughout documentation
4. **Updated module lists** to reflect actual BFF modules
5. **Created new operational runbooks**: incident-response, database-operations, external-dependencies, queue-management, external-processes
Documents in this archive folder predate these corrections and may contain outdated references.
---
**Note:** These documents may contain outdated information. For current system behavior, refer to the active documentation in the parent `docs/` directory. **Note:** These documents may contain outdated information. For current system behavior, refer to the active documentation in the parent `docs/` directory.

View File

@ -8,7 +8,7 @@ I've completely restructured the Salesforce-to-Portal order provisioning system
### **1. Dedicated WHMCS Order Service** ### **1. Dedicated WHMCS Order Service**
**File**: `/apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts` **File**: `/apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts`
- **Purpose**: Handles all WHMCS order operations (AddOrder, AcceptOrder) - **Purpose**: Handles all WHMCS order operations (AddOrder, AcceptOrder)
- **Features**: - **Features**:

View File

@ -8,40 +8,42 @@ I've restructured the provisioning system to **match the exact same clean modula
### **Order Creation (Existing) ↔ Order Provisioning (New)** ### **Order Creation (Existing) ↔ Order Provisioning (New)**
| **Order Creation** | **Order Provisioning** | **Purpose** | | **Order Creation** | **Order Fulfillment** | **Purpose** |
| ------------------- | -------------------------- | ----------------------------------- | | ------------------- | ------------------------------ | ----------------------------------- |
| `OrderValidator` | `ProvisioningValidator` | Validates requests & business rules | | `OrderValidator` | `OrderFulfillmentValidator` | Validates requests & business rules |
| `OrderBuilder` | `WhmcsOrderMapper` | Transforms/maps data structures | | `OrderBuilder` | `OrderBuilder` | Transforms/maps data structures |
| `OrderItemBuilder` | _(integrated in mapper)_ | Handles item-level processing | | `OrderItemBuilder` | `OrderItemBuilder` | Handles item-level processing |
| `OrderOrchestrator` | `ProvisioningOrchestrator` | Coordinates the complete workflow | | `OrderOrchestrator` | `OrderFulfillmentOrchestrator` | Coordinates the complete workflow |
| `OrdersController` | `PlatformEventsSubscriber` | Event handling (no inbound HTTP) | | `OrdersController` | `PlatformEventsSubscriber` | Event handling (no inbound HTTP) |
## 📁 **Clean File Structure** ## 📁 **Clean File Structure**
``` ```
apps/bff/src/orders/ apps/bff/src/modules/orders/
├── controllers/ ├── controllers/
│ └── orders.controller.ts # Customer-facing operations │ └── orders.controller.ts # Customer-facing operations
├── queue/ ├── queue/
│ ├── provisioning.queue.ts # Enqueue provisioning jobs │ ├── provisioning.queue.ts # Enqueue provisioning jobs
│ └── provisioning.processor.ts # Worker processes jobs │ └── provisioning.processor.ts # Worker processes jobs
├── services/ ├── services/
│ # Order Creation (existing) │ # Order Creation
│ ├── order-validator.service.ts # Request & business validation │ ├── order-validator.service.ts # Request & business validation
│ ├── order-builder.service.ts # Order header construction │ ├── order-builder.service.ts # Order header construction
│ ├── order-item-builder.service.ts # Order items construction │ ├── order-item-builder.service.ts # Order items construction
│ ├── order-orchestrator.service.ts # Creation workflow coordination │ ├── order-orchestrator.service.ts # Creation workflow coordination
│ │ │ │
│ # Order Provisioning (new - matching structure) │ # Order Fulfillment/Provisioning
│ ├── provisioning-validator.service.ts # Provisioning validation │ ├── order-fulfillment-validator.service.ts # Provisioning validation
│ ├── whmcs-order-mapper.service.ts # SF → WHMCS mapping │ ├── order-fulfillment-orchestrator.service.ts # Provisioning workflow coordination
│ ├── provisioning-orchestrator.service.ts # Provisioning workflow coordination │ ├── order-fulfillment-error.service.ts # Error handling
│ └── order-provisioning.service.ts # Main provisioning interface │ ├── sim-fulfillment.service.ts # SIM-specific fulfillment
│ ├── payment-validator.service.ts # Payment method validation
│ └── checkout.service.ts # Checkout flow coordination
``` ```
## 🎯 **Modular Provisioning Services** ## 🎯 **Modular Provisioning Services**
### **1. ProvisioningValidator** ### **1. OrderFulfillmentValidator**
**Purpose**: Validates all provisioning prerequisites **Purpose**: Validates all provisioning prerequisites
@ -51,7 +53,7 @@ apps/bff/src/orders/
- ✅ Idempotency checking - ✅ Idempotency checking
- ✅ Request payload validation - ✅ Request payload validation
### **2. WhmcsOrderMapper** ### **2. OrderBuilder / OrderItemBuilder**
**Purpose**: Maps Salesforce OrderItems → WHMCS format **Purpose**: Maps Salesforce OrderItems → WHMCS format
@ -61,7 +63,7 @@ apps/bff/src/orders/
- ✅ Custom fields mapping - ✅ Custom fields mapping
- ✅ Order notes generation with SF tracking - ✅ Order notes generation with SF tracking
### **3. ProvisioningOrchestrator** ### **3. OrderFulfillmentOrchestrator**
**Purpose**: Coordinates complete provisioning workflow **Purpose**: Coordinates complete provisioning workflow

View File

@ -11,9 +11,7 @@ apps/
portal/ # Next.js frontend portal/ # Next.js frontend
bff/ # NestJS Backend-for-Frontend bff/ # NestJS Backend-for-Frontend
packages/ packages/
domain/ # Pure domain/types/utils (isomorphic) domain/ # Pure domain types, validation schemas, and utilities (isomorphic)
logging/ # Centralized logging utilities
validation/ # Shared validation schemas
``` ```
## 🎯 **Architecture Principles** ## 🎯 **Architecture Principles**
@ -67,16 +65,26 @@ src/
``` ```
src/ src/
modules/ # Feature-aligned modules modules/ # Feature-aligned modules
auth/ # Authentication auth/ # Authentication and authorization
billing/ # Invoice and payment management users/ # User management
id-mappings/ # Portal-WHMCS-Salesforce ID mappings
catalog/ # Product catalog catalog/ # Product catalog
orders/ # Order processing orders/ # Order creation and fulfillment
subscriptions/ # Service management invoices/ # Invoice management
subscriptions/ # Service and subscription management
currency/ # Currency handling
support/ # Support case management
realtime/ # Server-Sent Events API
verification/ # ID verification
notifications/ # User notifications
health/ # Health check endpoints
core/ # Core services and utilities core/ # Core services and utilities
infra/ # Infrastructure (database, cache, queue, email)
integrations/ # External service integrations integrations/ # External service integrations
salesforce/ # Salesforce CRM integration salesforce/ # Salesforce CRM integration
whmcs/ # WHMCS billing integration whmcs/ # WHMCS billing integration
common/ # Nest providers/interceptors/guards freebit/ # Freebit SIM provider integration
sftp/ # SFTP file transfer
main.ts # Application entry point main.ts # Application entry point
``` ```
@ -89,60 +97,67 @@ src/
## 📦 **Shared Packages** ## 📦 **Shared Packages**
### **Layered Type System Architecture** ### **Domain Package (`packages/domain/`)**
The codebase follows a strict layering pattern to ensure single source of truth for all types and prevent drift: The domain package is the single source of truth for shared types, validation schemas, and utilities across both the BFF and Portal applications.
``` ```
@customer-portal/contracts (Pure TypeScript types) packages/domain/
├── auth/ # Authentication types and validation
@customer-portal/schemas (Runtime validation with Zod) ├── billing/ # Invoice and payment types
├── catalog/ # Product catalog types
@customer-portal/integrations (Mappers for external APIs) ├── checkout/ # Checkout flow types
├── common/ # Shared utilities and base types
Applications (BFF, Portal) ├── customer/ # Customer profile types
├── dashboard/ # Dashboard data types
├── mappings/ # ID mapping types (Portal-WHMCS-SF)
├── notifications/ # Notification types
├── opportunity/ # Salesforce opportunity types
├── orders/ # Order types and Salesforce mappings
├── payments/ # Payment method types
├── providers/ # Provider-specific type definitions
├── realtime/ # SSE event types
├── salesforce/ # Salesforce API types
├── sim/ # SIM lifecycle and Freebit types
├── subscriptions/ # Subscription types
├── support/ # Support case types
├── toolkit/ # Utility functions
└── index.ts # Public exports
``` ```
#### **1. Contracts Package (`packages/contracts/`)** #### **Key Principles**
- **Purpose**: Pure TypeScript interface definitions - single source of truth - **Framework-agnostic**: No NestJS or React dependencies
- **Contents**: Cross-layer contracts for billing, subscriptions, payments, SIM, orders - **Isomorphic**: Works in both Node.js and browser environments
- **Exports**: Organized by domain (e.g., `@customer-portal/contracts/billing`) - **Zod-first validation**: Schemas defined with Zod for runtime validation
- **Rule**: ZERO runtime dependencies, only pure types - **Provider mappers**: Transform external API responses to domain types
#### **2. Schemas Package (`packages/schemas/`)** #### **Usage**
- **Purpose**: Runtime validation schemas using Zod Import via `@customer-portal/domain`:
- **Contents**: Matching Zod validators for each contract + integration-specific payload schemas
- **Exports**: Organized by domain and integration provider
- **Usage**: Validate external API responses, request payloads, and user input
#### **3. Integration Packages (`packages/integrations/`)** ```typescript
import { Invoice, SIM_LIFECYCLE_STAGE, OrderStatus } from "@customer-portal/domain";
import { invoiceSchema, orderSchema } from "@customer-portal/domain/validation";
```
- **Purpose**: Transform raw provider data into shared contracts #### **Integration with BFF**
- **Structure**:
- `packages/integrations/whmcs/` - WHMCS billing integration
- `packages/integrations/freebit/` - Freebit SIM provider integration
- **Contents**: Mappers, utilities, and helper functions
- **Rule**: Must use `@customer-portal/schemas` for validation at boundaries
#### **4. Application Layers** The BFF integration layer (`apps/bff/src/integrations/`) uses domain mappers to transform raw provider data:
- **BFF** (`apps/bff/`): Import from contracts/schemas, never define duplicate interfaces ```
- **Portal** (`apps/portal/`): Import from contracts/schemas, use shared types everywhere External API → Raw Response → Domain Mapper → Domain Type → Use Everywhere
- **Rule**: Applications only consume, never define domain types ```
### **Legacy: Domain Package (Deprecated)** This ensures a single transformation point and consistent types across the application.
- **Status**: Being phased out in favor of contracts + schemas ### **Logging**
- **Migration**: Re-exports now point to contracts package for backward compatibility
- **Rule**: New code should import from `@customer-portal/contracts` or `@customer-portal/schemas`
### **Logging Package** Centralized logging is implemented in the BFF using `nestjs-pino`:
- **Purpose**: Centralized structured logging - **Structured JSON logging** for production
- **Features**: Pino-based logging with correlation IDs - **Correlation IDs** for request tracing
- **Security**: Automatic PII redaction [[memory:6689308]] - **Automatic PII redaction** for security
## 🔗 **Integration Architecture** ## 🔗 **Integration Architecture**

View File

@ -41,10 +41,10 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
- Endpoints: `GET /invoices`, `GET /invoices/:id`, `GET /invoices/:id/subscriptions`, `POST /invoices/:id/sso-link`, `POST /invoices/:id/payment-link` (apps/bff/src/invoices/invoices.controller.ts:1). - Endpoints: `GET /invoices`, `GET /invoices/:id`, `GET /invoices/:id/subscriptions`, `POST /invoices/:id/sso-link`, `POST /invoices/:id/payment-link` (apps/bff/src/invoices/invoices.controller.ts:1).
- Service flow: resolve mapping → fetch from WHMCS via `WhmcsService` → transform/cache → return (apps/bff/src/invoices/invoices.service.ts:24). - Service flow: resolve mapping → fetch from WHMCS via `WhmcsService` → transform/cache → return (apps/bff/src/invoices/invoices.service.ts:24).
- List/paginate via WHMCS GetInvoices; details enriched with line items and `serviceId` links (apps/bff/src/vendors/whmcs/services/whmcs-invoice.service.ts:1). - List/paginate via WHMCS GetInvoices; details enriched with line items and `serviceId` links (apps/bff/src/integrations/whmcs/services/whmcs-invoice.service.ts:1).
- Subscriptions listed via WHMCS GetClientsProducts; transformed and cached (apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1). - Subscriptions listed via WHMCS GetClientsProducts; transformed and cached (apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1).
- Payment methods/gateways via WHMCS; cached in Redis; also used for gating order creation/provisioning (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:1). - Payment methods/gateways via WHMCS; cached in Redis; also used for gating order creation/provisioning (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:1).
- SSO links: invoice view/download/pay and payment-page with preselected method/gateway (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168). - SSO links: invoice view/download/pay and payment-page with preselected method/gateway (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168).
## Orders — Creation (Portal ➝ Salesforce) ## Orders — Creation (Portal ➝ Salesforce)
@ -74,23 +74,23 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
- Validate request: not already provisioned (checks `WHMCS_Order_ID__c`), ensure client has payment method; resolve mapping (apps/bff/src/orders/services/order-fulfillment-validator.service.ts:23) - Validate request: not already provisioned (checks `WHMCS_Order_ID__c`), ensure client has payment method; resolve mapping (apps/bff/src/orders/services/order-fulfillment-validator.service.ts:23)
- Set SF activation status to `Activating` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:98) - Set SF activation status to `Activating` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:98)
- Load SF Order details + OrderItems, map each to WHMCS items using the Product2 mapping (`WH_Product_ID__c`) and billing cycle (apps/bff/src/orders/services/order-whmcs-mapper.service.ts:1) - Load SF Order details + OrderItems, map each to WHMCS items using the Product2 mapping (`WH_Product_ID__c`) and billing cycle (apps/bff/src/orders/services/order-whmcs-mapper.service.ts:1)
- Create WHMCS order (AddOrder) with Stripe as payment method; optional promo code and tracking notes (apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts:20) - Create WHMCS order (AddOrder) with Stripe as payment method; optional promo code and tracking notes (apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts:20)
- Accept/provision order (AcceptOrder), capture service IDs and invoice ID returned (apps/bff/src/vendors/whmcs/services/whmcs-order.service.ts:60) - Accept/provision order (AcceptOrder), capture service IDs and invoice ID returned (apps/bff/src/integrations/whmcs/services/whmcs-order.service.ts:60)
- Update SF: `Status=Completed`, `Activation_Status__c=Activated`, and write back `WHMCS_Order_ID__c` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:117) - Update SF: `Status=Completed`, `Activation_Status__c=Activated`, and write back `WHMCS_Order_ID__c` (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:117)
- Error handling: On failure, set `Status=Pending Review`, `Activation_Status__c=Failed`, and write concise error code/message for operator triage (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:146). - Error handling: On failure, set `Status=Pending Review`, `Activation_Status__c=Failed`, and write concise error code/message for operator triage (apps/bff/src/orders/services/order-fulfillment-orchestrator.service.ts:146).
## Subscriptions (Shown in Portal) ## Subscriptions (Shown in Portal)
- Data comes from WHMCS products/services via `GetClientsProducts` and is transformed into a standard Subscription list (apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1). - Data comes from WHMCS products/services via `GetClientsProducts` and is transformed into a standard Subscription list (apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1).
- Cached per user; supports status filtering; invoice items link to `serviceId` to show related subscriptions (apps/bff/src/vendors/whmcs/transformers/whmcs-data.transformer.ts:35). - Cached per user; supports status filtering; invoice items link to `serviceId` to show related subscriptions (apps/bff/src/integrations/whmcs/transformers/whmcs-data.transformer.ts:35).
## Payments & SSO ## Payments & SSO
- Payment methods summary drives UI gating and provisioning validation (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:44). - Payment methods summary drives UI gating and provisioning validation (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:44).
- SSO flows - SSO flows
- General WHMCS SSO (dashboard/settings) via `CreateSsoToken` - General WHMCS SSO (dashboard/settings) via `CreateSsoToken`
- Invoice view/download/pay SSO (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168) - Invoice view/download/pay SSO (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168)
- Payment link with preselected saved method or gateway (apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:168) - Payment link with preselected saved method or gateway (apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:168)
## Caching & Performance ## Caching & Performance
@ -139,8 +139,8 @@ This document explains how the portal integrates Salesforce (catalog, orders, pr
- Salesforce events subscriber: apps/bff/src/vendors/salesforce/events/pubsub.subscriber.ts:58 - Salesforce events subscriber: apps/bff/src/vendors/salesforce/events/pubsub.subscriber.ts:58
- Provisioning queue processor: apps/bff/src/orders/queue/provisioning.processor.ts:26 - Provisioning queue processor: apps/bff/src/orders/queue/provisioning.processor.ts:26
- Invoices service: apps/bff/src/invoices/invoices.service.ts:24 - Invoices service: apps/bff/src/invoices/invoices.service.ts:24
- Subscriptions service: apps/bff/src/vendors/whmcs/services/whmcs-subscription.service.ts:1 - Subscriptions service: apps/bff/src/integrations/whmcs/services/whmcs-subscription.service.ts:1
- Payment/SSO service: apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts:1 - Payment/SSO service: apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts:1
--- ---

View File

@ -6,21 +6,27 @@ We provide **environment-specific templates** for easy setup:
### 📁 **Available Templates:** ### 📁 **Available Templates:**
- 🔸 **`.env.example`** - Standard environment template for all environments Located in the `env/` folder:
- 🔸 **Environment-specific values** - Adjust settings based on development vs production needs
- 🔸 **`env/dev.env.sample`** - Development environment template
- 🔸 **`env/portal-backend.env.sample`** - Backend-specific variables reference
- 🔸 **`env/portal-frontend.env.sample`** - Frontend-specific variables reference
### 🎯 **Benefits:** ### 🎯 **Benefits:**
- ✅ **Environment-specific**: Clear separation of dev vs prod - ✅ **Environment-specific**: Clear separation of dev vs prod
- ✅ **Secure defaults**: Production uses strong security settings - ✅ **Secure defaults**: Production uses strong security settings
- ✅ **Easy setup**: Copy the right template for your needs - ✅ **Easy setup**: Copy the template for your needs
- ✅ **No confusion**: Clear instructions for each environment - ✅ **No confusion**: Clear instructions for each environment
## 🔧 **Environment File Structure** ## 🔧 **Environment File Structure**
``` ```
📦 Customer Portal 📦 Customer Portal
├── .env.example # 🔸 Environment template ├── env/
│ ├── dev.env.sample # 🔸 Development template
│ ├── portal-backend.env.sample # Backend variables
│ └── portal-frontend.env.sample # Frontend variables
├── .env # ✅ Your actual config (gitignored) ├── .env # ✅ Your actual config (gitignored)
├── apps/ ├── apps/
│ ├── bff/ # 🚀 Backend reads from root .env │ ├── bff/ # 🚀 Backend reads from root .env
@ -42,7 +48,7 @@ We provide **environment-specific templates** for easy setup:
```bash ```bash
# Copy development environment template # Copy development environment template
cp .env.dev.example .env cp env/dev.env.sample .env
# Edit with your dev values (most defaults work!) # Edit with your dev values (most defaults work!)
nano .env # Configure for local development nano .env # Configure for local development
@ -51,8 +57,8 @@ nano .env # Configure for local development
**🔸 For Production:** **🔸 For Production:**
```bash ```bash
# Copy production environment template # Start from the development template and adjust for production
cp .env.production.example .env cp env/dev.env.sample .env
# Edit with your production values (REQUIRED!) # Edit with your production values (REQUIRED!)
nano .env # Replace with secure production values nano .env # Replace with secure production values

View File

@ -888,10 +888,10 @@ User Action → Cost Calculation → Invoice Creation → Payment Capture → Da
### 📝 **Implementation Files Modified**: ### 📝 **Implementation Files Modified**:
1. `apps/bff/src/vendors/whmcs/types/whmcs-api.types.ts` - Added WHMCS API types 1. `apps/bff/src/integrations/whmcs/types/whmcs-api.types.ts` - Added WHMCS API types
2. `apps/bff/src/vendors/whmcs/services/whmcs-connection.service.ts` - Added API methods 2. `apps/bff/src/integrations/whmcs/connection/whmcs-connection.service.ts` - Added API methods
3. `apps/bff/src/vendors/whmcs/services/whmcs-invoice.service.ts` - Added invoice creation 3. `apps/bff/src/integrations/whmcs/services/whmcs-invoice.service.ts` - Added invoice creation
4. `apps/bff/src/vendors/whmcs/whmcs.service.ts` - Exposed new methods 4. `apps/bff/src/integrations/whmcs/whmcs.service.ts` - Exposed new methods
5. `apps/bff/src/subscriptions/sim-management.service.ts` - Complete payment flow 5. `apps/bff/src/subscriptions/sim-management.service.ts` - Complete payment flow
## 🎯 **Latest Update: Simplified Top-Up Interface (January 2025)** ## 🎯 **Latest Update: Simplified Top-Up Interface (January 2025)**

View File

@ -61,7 +61,7 @@ The WHMCS `GetPayMethods` API returns payment method data with different field n
### 1. Payment Method Transformer ### 1. Payment Method Transformer
**File**: `apps/bff/src/vendors/whmcs/transformers/whmcs-data.transformer.ts` **File**: `apps/bff/src/integrations/whmcs/transformers/whmcs-data.transformer.ts`
**Changes Made:** **Changes Made:**
@ -81,7 +81,7 @@ ccType: whmcsPayMethod.cc_type || whmcsPayMethod.card_type,
### 2. Payment Service Enhancement ### 2. Payment Service Enhancement
**File**: `apps/bff/src/vendors/whmcs/services/whmcs-payment.service.ts` **File**: `apps/bff/src/integrations/whmcs/services/whmcs-payment.service.ts`
**Changes Made:** **Changes Made:**

View File

@ -0,0 +1,407 @@
# Database Operations Runbook
This document covers operational procedures for the PostgreSQL database used by the Customer Portal BFF.
---
## Overview
| Component | Technology | Location |
| --------------- | ------------------------- | ----------------------------- |
| Database | PostgreSQL 17 | Configured via `DATABASE_URL` |
| ORM | Prisma 6 | `apps/bff/prisma/` |
| Connection Pool | Prisma connection pooling | Default: 10 connections |
---
## Backup Procedures
### Automated Backups
> **Note**: Configure automated backups based on your hosting environment.
**Recommended Schedule:**
- Full backup: Daily at 02:00 UTC
- Transaction log backup: Every 15 minutes
- Retention: 30 days
### Manual Backup
```bash
# Create a full database backup
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).sql
# Create a compressed backup
pg_dump $DATABASE_URL | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Backup specific tables
pg_dump $DATABASE_URL -t users -t id_mappings > user_data_backup.sql
```
### Backup Verification
```bash
# Verify backup integrity (restore to temp database)
createdb portal_backup_test
psql portal_backup_test < backup_YYYYMMDD.sql
# Run basic integrity checks
psql portal_backup_test -c "SELECT COUNT(*) FROM users"
psql portal_backup_test -c "SELECT COUNT(*) FROM id_mappings"
# Clean up
dropdb portal_backup_test
```
---
## Recovery Procedures
### Point-in-Time Recovery
**Prerequisites:**
- WAL archiving enabled
- Continuous backup configured
```bash
# Stop the application
pnpm prod:stop
# Restore from backup
pg_restore -d $DATABASE_URL backup_YYYYMMDD.dump
# Run Prisma migrations to ensure schema is current
pnpm db:migrate
# Restart the application
pnpm prod:start
```
### Restore from SQL Backup
```bash
# Stop the application to prevent writes
pnpm prod:stop
# Drop and recreate database (DESTRUCTIVE)
dropdb portal_production
createdb portal_production
# Restore from backup
psql $DATABASE_URL < backup_YYYYMMDD.sql
# Verify restoration
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users"
# Restart application
pnpm prod:start
```
---
## Migration Management
### Running Migrations
```bash
# Development: Apply pending migrations
pnpm db:migrate
# Production: Deploy migrations
pnpm db:migrate --skip-generate
# View migration status
npx prisma migrate status
```
### Migration Checklist
Before deploying migrations to production:
1. [ ] Test migration on staging environment
2. [ ] Verify rollback procedure exists
3. [ ] Estimate migration duration
4. [ ] Schedule maintenance window if needed
5. [ ] Create backup before migration
6. [ ] Notify team of deployment
### Rollback Procedure
Prisma does not have built-in rollback. Use these approaches:
**Option 1: Restore from Backup**
```bash
# Restore database to pre-migration state
psql $DATABASE_URL < pre_migration_backup.sql
# Revert migration files in codebase
git revert <migration-commit>
```
**Option 2: Manual Rollback SQL**
```bash
# Create rollback SQL for each migration
# Store in: apps/bff/prisma/rollbacks/
# Example rollback
psql $DATABASE_URL < rollbacks/20240115_rollback.sql
```
**Option 3: Reset and Reseed (Development Only)**
```bash
# WARNING: Destroys all data
pnpm db:reset
```
---
## ID Mappings Data Integrity
The `id_mappings` table links portal users to WHMCS and Salesforce accounts. Corruption here causes authentication and data access failures.
### Verify Mapping Integrity
```sql
-- Check for orphaned mappings (portal user deleted but mapping exists)
SELECT m.* FROM id_mappings m
LEFT JOIN users u ON m.user_id = u.id
WHERE u.id IS NULL;
-- Check for duplicate WHMCS mappings
SELECT whmcs_client_id, COUNT(*) as count
FROM id_mappings
WHERE whmcs_client_id IS NOT NULL
GROUP BY whmcs_client_id
HAVING COUNT(*) > 1;
-- Check for duplicate Salesforce mappings
SELECT sf_account_id, COUNT(*) as count
FROM id_mappings
WHERE sf_account_id IS NOT NULL
GROUP BY sf_account_id
HAVING COUNT(*) > 1;
```
### Fix Orphaned Mappings
```sql
-- Remove mappings for deleted users
DELETE FROM id_mappings
WHERE user_id NOT IN (SELECT id FROM users);
```
### Fix Duplicate Mappings
> **Warning**: Investigate duplicates before deleting. They may indicate data issues.
```sql
-- View duplicate details before fixing
SELECT m.*, u.email FROM id_mappings m
JOIN users u ON m.user_id = u.id
WHERE m.whmcs_client_id IN (
SELECT whmcs_client_id FROM id_mappings
GROUP BY whmcs_client_id HAVING COUNT(*) > 1
);
```
---
## PostgreSQL Maintenance
### VACUUM and ANALYZE
```sql
-- Analyze all tables for query optimization
ANALYZE;
-- Vacuum to reclaim space (non-blocking)
VACUUM;
-- Full vacuum (blocking, reclaims more space)
VACUUM FULL;
-- Vacuum specific table
VACUUM ANALYZE id_mappings;
```
**Recommended Schedule:**
- `VACUUM ANALYZE`: Daily during low-traffic hours
- `VACUUM FULL`: Monthly during maintenance window
### Index Maintenance
```sql
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
-- Reindex a table
REINDEX TABLE id_mappings;
-- Reindex entire database (during maintenance window)
REINDEX DATABASE portal_production;
```
### Check Table Bloat
```sql
-- Estimate table bloat
SELECT
schemaname, tablename,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as size,
n_dead_tup as dead_rows,
n_live_tup as live_rows,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
```
---
## Connection Pool Monitoring
### Check Active Connections
```sql
-- Current connection count
SELECT COUNT(*) as connections FROM pg_stat_activity;
-- Connections by state
SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;
-- Connections by application
SELECT application_name, COUNT(*)
FROM pg_stat_activity
GROUP BY application_name;
-- Long-running queries (>5 minutes)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - pg_stat_activity.query_start > interval '5 minutes';
```
### Kill Stuck Connections
```sql
-- Terminate a specific query
SELECT pg_terminate_backend(<pid>);
-- Terminate all connections except current
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
AND datname = current_database();
```
### Prisma Connection Pool Settings
Configure in `DATABASE_URL` query parameters:
```
postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=10
```
| Parameter | Default | Recommended |
| ------------------ | ------- | ------------------ |
| `connection_limit` | 10 | 10-20 per instance |
| `pool_timeout` | 10s | 10-30s |
---
## Monitoring Queries
### Database Size
```sql
-- Total database size
SELECT pg_size_pretty(pg_database_size(current_database()));
-- Size per table
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
```
### Query Performance
```sql
-- Slowest queries (requires pg_stat_statements extension)
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
```
### Lock Monitoring
```sql
-- Check for locks
SELECT
pg_locks.pid,
pg_stat_activity.query,
pg_locks.mode,
pg_locks.granted
FROM pg_locks
JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE NOT pg_locks.granted;
```
---
## Emergency Procedures
### Database Unresponsive
1. Check PostgreSQL process status
2. Check disk space and memory
3. Kill long-running queries
4. Restart PostgreSQL if necessary
5. Check application connectivity after restart
### Disk Space Full
```bash
# Check disk usage
df -h
# Find large files in PostgreSQL data directory
du -sh /var/lib/postgresql/data/*
# Clear transaction logs (if WAL archiving is working)
# WARNING: Only if logs are properly archived
```
### Corruption Detected
1. **STOP** the application immediately
2. Do not attempt repairs without backup verification
3. Restore from last known good backup
4. Investigate root cause before resuming service
---
## Related Documents
- [Incident Response](./incident-response.md)
- [External Dependencies](./external-dependencies.md)
- [Provisioning Runbook](./provisioning-runbook.md)
---
**Last Updated:** December 2025

View File

@ -1,28 +0,0 @@
# Temporarily Disabled Modules
The backend currently omits two partially implemented modules from the runtime
NestJS configuration so that the public API surface only exposes completed
features.
## Cases Module
- Removed from `AppModule` and `apiRoutes` to ensure the unfinished `/cases`
endpoints are not routable.
- All existing code remains in `apps/bff/src/modules/cases/` for future
development; re-enable by importing the module in
`apps/bff/src/app.module.ts` and adding it back to the router configuration in
`apps/bff/src/core/config/router.config.ts` once the endpoints are ready.
## Jobs Module
- Temporarily excluded from `AppModule` while the reconciliation workflows are
fleshed out.
- The BullMQ processor now logs an explicit warning and acknowledges each job so
queue workers do not hang when the module is re-registered.
- When background processing is ready, restore the `JobsModule` import in
`apps/bff/src/app.module.ts` and replace the placeholder logic in
`ReconcileProcessor.process` with the real reconciliation implementation.
> **Note**: If additional queues or HTTP routes reference these modules, make
> sure they fail fast with a `501 Not Implemented` response or similar logging so
> that downstream systems have clear telemetry while the modules are disabled.

View File

@ -0,0 +1,325 @@
# External Dependencies Runbook
This document covers health checking, monitoring, and troubleshooting for external systems integrated with the Customer Portal.
---
## System Overview
| System | Purpose | Integration | Health Check |
| ---------------------- | -------------------------------- | -------------------------- | --------------- |
| **Salesforce** | CRM, Orders, Catalog | REST API + Platform Events | JWT auth test |
| **WHMCS** | Billing, Payments, Subscriptions | REST API | API action test |
| **Freebit** | SIM Management | REST API | OEM auth test |
| **SFTP (fs.mvno.net)** | Call/SMS Records | SFTP | Connection test |
| **Redis** | Cache, Sessions, Queues | Direct connection | PING command |
| **PostgreSQL** | User data, Mappings | Direct connection | Query test |
---
## Salesforce
### Configuration
| Variable | Description |
| ---------------------------- | ------------------------------------------------------- |
| `SF_LOGIN_URL` | Login URL (login.salesforce.com or test.salesforce.com) |
| `SF_CLIENT_ID` | Connected App Consumer Key |
| `SF_USERNAME` | Integration user username |
| `SF_PRIVATE_KEY_PATH` | Path to JWT private key |
| `SF_EVENTS_ENABLED` | Enable Platform Event subscription |
| `SF_PROVISION_EVENT_CHANNEL` | Platform Event channel for provisioning |
| `PORTAL_PRICEBOOK_ID` | Salesforce Pricebook ID for catalog |
### Health Check
```bash
# Check Salesforce connectivity via BFF health endpoint
curl http://localhost:4000/health | jq '.'
# Test JWT authentication manually
# The BFF authenticates automatically; check logs for auth errors
grep "Salesforce" /var/log/bff/combined.log | tail -20
```
### Common Issues
**JWT Authentication Failure**
- Verify private key file exists and is readable
- Check Connected App settings in Salesforce
- Ensure integration user is pre-authorized for Connected App
- Verify `SF_USERNAME` matches the user assigned to Connected App
**Platform Events Not Receiving**
- Check `SF_EVENTS_ENABLED=true`
- Verify Platform Event permissions for integration user
- Check Redis for replay ID: `redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"`
- Set `SF_EVENTS_REPLAY=ALL` temporarily to catch up on missed events
**API Limits**
- Salesforce has daily API call limits
- Monitor usage in Salesforce Setup > API Usage
- Consider caching frequently accessed data
### Expected Response Times
| Operation | Expected | Alert Threshold |
| -------------- | --------- | --------------- |
| Query | <500ms | >2s |
| Update | <1s | >3s |
| Platform Event | Real-time | >5s delay |
---
## WHMCS
### Configuration
| Variable | Description |
| -------------------------------- | ----------------------------------- |
| `WHMCS_API_URL` | WHMCS API endpoint URL |
| `WHMCS_API_IDENTIFIER` | API credentials identifier |
| `WHMCS_API_SECRET` | API credentials secret |
| `WHMCS_CUSTOMER_NUMBER_FIELD_ID` | Custom field ID for Customer Number |
### Health Check
```bash
# Test WHMCS API directly
curl -X POST "$WHMCS_API_URL" \
-d "identifier=$WHMCS_API_IDENTIFIER" \
-d "secret=$WHMCS_API_SECRET" \
-d "action=GetClients" \
-d "responsetype=json" \
-d "limitnum=1"
# Should return: {"result":"success","totalresults":...}
```
### Common Issues
**Authentication Failure**
- Verify API credentials in WHMCS Admin > Setup > Staff Management > API Credentials
- Check IP whitelist settings (if configured)
- Ensure API credentials have required permissions
**Rate Limiting**
- WHMCS may rate limit excessive requests
- Check for 429 responses in logs
- Implement request queuing if needed
**Field Mapping Issues**
- Payment method fields may use different names between WHMCS versions
- Check [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md) for field mapping
### Expected Response Times
| Operation | Expected | Alert Threshold |
| ----------- | -------- | --------------- |
| GetInvoices | <500ms | >2s |
| AddOrder | <1s | >3s |
| AcceptOrder | <1s | >3s |
| SSO Token | <500ms | >2s |
---
## Freebit
### Configuration
| Variable | Description |
| ------------------ | ---------------------- |
| `FREEBIT_BASE_URL` | Freebit API base URL |
| `FREEBIT_OEM_ID` | OEM identifier |
| `FREEBIT_OEM_KEY` | OEM authentication key |
| `FREEBIT_TIMEOUT` | Request timeout (ms) |
### Health Check
```bash
# Check Freebit OEM authentication
# The BFF handles auth automatically; check logs for auth errors
grep "Freebit" /var/log/bff/combined.log | tail -20
# Check for auth token in cache
redis-cli GET "freebit:auth:token"
```
### Common Issues
**OEM Authentication Failure**
- Verify `FREEBIT_OEM_ID` and `FREEBIT_OEM_KEY`
- Check Freebit API endpoint accessibility
- Auth tokens are cached; clear cache if credentials changed
**SIM Operations Failing**
- Verify SIM account identifier (phone number) format
- Check 30-minute operation gap requirements
- See [Freebit SIM Management](../integrations/sim/freebit.md) for operation constraints
**Network Type Changes Delayed**
- Network type changes are queued with 30-minute delay
- Check BullMQ queue for pending jobs
### Expected Response Times
| Operation | Expected | Alert Threshold |
| ------------- | -------- | --------------- |
| Auth (cached) | <100ms | >500ms |
| GetDetail | <1s | >3s |
| Plan Change | <2s | >5s |
| Top-up | <2s | >5s |
---
## SFTP (fs.mvno.net)
### Configuration
| Variable | Description |
| ----------------------- | ----------------------- |
| `SFTP_HOST` | SFTP server hostname |
| `SFTP_PORT` | SFTP port (default: 22) |
| `SFTP_USERNAME` | SFTP username |
| `SFTP_PRIVATE_KEY_PATH` | Path to SSH private key |
### Health Check
```bash
# Test SFTP connectivity
sftp -i $SFTP_PRIVATE_KEY_PATH $SFTP_USERNAME@$SFTP_HOST << EOF
ls
exit
EOF
```
### Common Issues
**Connection Refused**
- Verify SFTP server is accessible
- Check firewall rules
- Verify SSH key fingerprint
**Authentication Failure**
- Verify SSH private key is correct
- Check key permissions (should be 600)
- Ensure public key is authorized on SFTP server
**Files Not Found**
- Call/SMS records are available 2 months behind current date
- File naming: `PASI_talk-detail-YYYYMM.csv`, `PASI_sms-detail-YYYYMM.csv`
### Data Availability
| Record Type | Availability | File Pattern |
| ------------ | --------------- | ----------------------------- |
| Call Details | 2 months behind | `PASI_talk-detail-YYYYMM.csv` |
| SMS Details | 2 months behind | `PASI_sms-detail-YYYYMM.csv` |
---
## Credential Rotation
### Salesforce JWT Key Rotation
1. Generate new key pair
2. Upload new public key to Connected App
3. Update `SF_PRIVATE_KEY_PATH` or `SF_PRIVATE_KEY_BASE64`
4. Deploy and verify authentication
5. Remove old key from Connected App after verification
### WHMCS API Credentials Rotation
1. Create new API credentials in WHMCS Admin
2. Update `WHMCS_API_IDENTIFIER` and `WHMCS_API_SECRET`
3. Deploy and verify API calls work
4. Disable old API credentials
### Freebit Key Rotation
1. Request new OEM key from Freebit
2. Update `FREEBIT_OEM_KEY`
3. Clear cached auth token: `redis-cli DEL "freebit:auth:token"`
4. Deploy and verify authentication
### SSH Key Rotation (SFTP)
1. Generate new SSH key pair
2. Provide public key to SFTP administrator
3. Wait for key to be authorized
4. Update `SFTP_PRIVATE_KEY_PATH`
5. Test connectivity
6. Request old key removal from SFTP server
---
## Monitoring Recommendations
### Alerting Thresholds
| System | Metric | Warning | Critical |
| ---------- | ------------- | ------- | -------- |
| Salesforce | Response time | >2s | >5s |
| Salesforce | Error rate | >1% | >5% |
| WHMCS | Response time | >2s | >5s |
| WHMCS | Error rate | >1% | >5% |
| Freebit | Response time | >3s | >10s |
| Redis | Response time | >100ms | >500ms |
| PostgreSQL | Response time | >500ms | >2s |
### Key Metrics to Monitor
- External API response times
- Error rates per integration
- Authentication success/failure rates
- Cache hit rates
- Queue depths (for async operations)
### Health Check Schedule
| System | Check Frequency | Method |
| ---------- | ---------------- | ------------------ |
| Salesforce | Every 5 minutes | Query test |
| WHMCS | Every 5 minutes | GetClients call |
| Freebit | Every 15 minutes | Auth token refresh |
| Redis | Every 1 minute | PING |
| PostgreSQL | Every 1 minute | SELECT 1 |
| SFTP | Every 1 hour | Connection test |
---
## Fallback Behaviors
| System Down | User Impact | Fallback |
| ----------- | ----------------------- | ------------------------------------ |
| Salesforce | No orders, no catalog | Show cached catalog, queue orders |
| WHMCS | No billing, no payments | Show cached invoices, block checkout |
| Freebit | No SIM management | Show cached data, disable actions |
| Redis | Slow performance | Direct API calls (no cache) |
| PostgreSQL | Portal unusable | Display maintenance message |
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Provisioning Runbook](./provisioning-runbook.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Troubleshooting](../integrations/whmcs/troubleshooting.md)
- [Freebit SIM Management](../integrations/sim/freebit.md)
---
**Last Updated:** December 2025

View File

@ -0,0 +1,325 @@
# External Processes and Team Handoffs
This document describes operational processes that occur outside the Customer Portal but are necessary for system operation and customer service.
---
## Process Ownership Matrix
| Process | Owner | Trigger | Dependencies | Documentation |
| ----------------------------- | ----------------- | ------------------------- | --------------------------- | ----------------------------------------------- |
| Salesforce Account Creation | Sales Team | Customer inquiry | Salesforce Admin access | Salesforce training docs |
| Customer Number Assignment | Sales Team | New customer onboarding | SF Account created | Sales procedures |
| CS Order Approval | CS Team | Order in "Pending Review" | Salesforce access | CS training docs |
| Internet Eligibility Check | CS Team | Eligibility request Case | Customer address info | CS procedures |
| WHMCS Product Setup | DevOps | New product launch | WHMCS Admin access | This document |
| Salesforce Flow Maintenance | SF Admin | Feature changes | SF Admin + Dev access | SF Flow documentation |
| Freebit Account Configuration | Partner Relations | New SIM products | Freebit partner credentials | Freebit contract docs |
| SSL Certificate Renewal | DevOps | Expiration alerts | Certificate provider access | This document |
| Database Backups | DevOps | Scheduled / On-demand | DB Admin access | [Database Operations](./database-operations.md) |
---
## Customer Onboarding Flow
### Pre-Portal Setup (Sales Team)
Before a customer can use the portal, Sales must complete these steps:
1. **Create Salesforce Account**
- Create Account record with customer details
- Assign unique `SF_Account_No__c` (Customer Number)
- Set initial account status
2. **Verify Customer Information**
- Confirm contact details
- Verify billing address
- Complete KYC requirements if applicable
3. **Internet Eligibility (if applicable)**
- Submit eligibility check via portal OR
- Manually check eligibility and update Account fields:
- `Internet_Eligibility__c`
- `Internet_Eligibility_Status__c`
### Handoff to Portal
Once Sales completes setup, customer can:
- Sign up using their Customer Number
- Link existing WHMCS account (if migrating)
- Place orders through the portal
---
## Order Approval Flow
### CS Review Process
When an order is placed, CS must review and approve:
**Order Review Checklist:**
1. [ ] Verify customer identity matches Salesforce Account
2. [ ] Confirm product eligibility (Internet type matches eligibility)
3. [ ] Verify installation address is serviceable
4. [ ] Check for duplicate active services
5. [ ] Review any special instructions or notes
**Approval Actions:**
- Approve: Set Order `Status = Approved`
- Triggers provisioning workflow automatically
- Reject: Set Order `Status = Cancelled`
- Add rejection reason to Order notes
- Customer is notified via portal
**SLA:**
- Standard orders: Review within 2 business hours
- Priority orders: Review within 30 minutes
### Escalation Triggers
Escalate to supervisor if:
- Customer disputes eligibility result
- Multiple orders from same account in short period
- Order value exceeds threshold
- Address verification fails
---
## Internet Eligibility Process
### Request Flow
1. **Customer submits eligibility request** (Portal)
- Creates Salesforce Case (Type: Eligibility Check)
- Updates Account fields to "Pending"
- Creates/updates Opportunity (Stage: Introduction)
2. **CS reviews request** (Salesforce)
- Verify address details
- Check service availability databases
- Determine eligibility type (Apartment 1G, Home 1G, etc.)
3. **CS updates Salesforce** (Salesforce)
- Set `Internet_Eligibility__c` to result
- Set `Internet_Eligibility_Status__c = Checked`
- Update Opportunity stage (Ready or Void)
- Close the Case
4. **Customer sees result** (Portal)
- Portal reads updated Account fields
- Catalog shows eligible products
**SLA:**
- Standard check: 24-48 business hours
- Express check: 4 business hours
---
## Cancellation Request Process
### Customer-Initiated Cancellation
1. **Customer requests cancellation** (Portal)
- Creates Salesforce Case (Type: Cancellation Request)
- Finds linked Opportunity via `WHMCS_Service_ID__c`
- Updates Opportunity stage to "△Cancelling"
- Sets `ScheduledCancellationDateAndTime__c`
2. **CS reviews request** (Salesforce)
- Verify customer authorization
- Check cancellation terms and fees
- Confirm scheduled date
3. **CS processes cancellation** (WHMCS + Salesforce)
- Cancel service in WHMCS (if not automatic)
- Update Opportunity stage to "△Cancelled"
- Close the Case
4. **Final billing** (WHMCS)
- Generate final invoice if applicable
- Process any prorated refunds
### Cancellation Types
| Type | Notice Period | Effective Date |
| -------- | ---------------------- | ---------------------- |
| Internet | 30 days | End of notice period |
| SIM | Immediate or scheduled | 1st of following month |
| VPN | Immediate | Same day |
---
## Product Configuration
### Adding New Products
When launching new products, coordinate between teams:
**1. Salesforce Setup (SF Admin)**
- Create Product2 record
- Set required fields:
- `Name`, `StockKeepingUnit`
- `WH_Product_ID__c` (WHMCS product ID)
- `Billing_Cycle__c`
- `Item_Class__c` (Service, Activation, Add-on)
- Add to portal Pricebook (`PORTAL_PRICEBOOK_ID`)
**2. WHMCS Setup (DevOps/Billing)**
- Create product in WHMCS Products/Services
- Configure pricing and billing cycle
- Set up any required custom fields
- Test product creation via API
**3. Portal Verification (Development)**
- Verify product appears in catalog
- Test checkout flow with new product
- Confirm provisioning works correctly
**4. Documentation (All Teams)**
- Update product documentation
- Add to [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)
### Product Change Checklist
- [ ] Salesforce Product2 updated
- [ ] WHMCS product updated
- [ ] Pricing synced between systems
- [ ] Portal cache cleared
- [ ] Tested in staging environment
- [ ] Documentation updated
---
## Salesforce Flow Maintenance
### Record-Triggered Flows
The portal depends on these Salesforce Flows:
| Flow | Trigger | Action |
| ----------------------- | ---------------------------------- | ------------------------------------ |
| Order Approval Flow | Order Status → Approved | Publish `OrderProvisionRequested__e` |
| Eligibility Update Flow | Account eligibility fields changed | (Optional) Notify customer |
### Flow Change Procedure
1. **Development** (SF Admin + Dev)
- Clone existing Flow for modification
- Test in Salesforce Sandbox
- Document changes
2. **Deployment** (SF Admin)
- Schedule deployment during low-traffic period
- Notify development team
- Activate new Flow version
3. **Verification** (Dev + QA)
- Test affected portal functionality
- Verify Platform Events are received
- Check BFF logs for any errors
4. **Rollback Plan**
- Keep previous Flow version available
- Document rollback procedure
- Have SF Admin available during deployment
---
## SSL Certificate Management
### Certificate Inventory
| Domain | Provider | Expiration | Renewal Process |
| ------------------ | ------------- | ---------- | --------------- |
| portal.example.com | Let's Encrypt | Auto-renew | Automated |
| api.example.com | Let's Encrypt | Auto-renew | Automated |
| whmcs.example.com | [Provider] | [Date] | Manual |
### Renewal Procedure
**Automated (Let's Encrypt):**
- Certbot runs automatically
- Monitor for renewal failures
- Alert if cert expires within 14 days
**Manual:**
1. Generate CSR
2. Submit to certificate provider
3. Complete domain verification
4. Download and install certificate
5. Restart affected services
6. Verify certificate in browser
### Certificate Expiration Alerts
- 30 days: Warning notification
- 14 days: Urgent notification
- 7 days: Critical alert
---
## Credential and Access Management
### Access Request Process
| System | Request To | Approval By | Access Level Options |
| ---------- | ---------- | ----------- | --------------------- |
| Salesforce | SF Admin | Manager | Read-only, CS, Admin |
| WHMCS | DevOps | Manager | Staff, Admin |
| BFF/Portal | DevOps | Tech Lead | Developer, Operator |
| Database | DevOps | Tech Lead | Read-only, Read-write |
### Offboarding Checklist
When a team member leaves:
- [ ] Revoke Salesforce access
- [ ] Revoke WHMCS access
- [ ] Remove from deployment systems
- [ ] Rotate any shared credentials they had access to
- [ ] Update on-call schedules
- [ ] Transfer ownership of documentation
---
## Communication Channels
### Team Contacts
| Team | Channel | Escalation |
| ----------- | --------------------- | ------------- |
| Development | [Slack/Teams channel] | Tech Lead |
| CS Team | [Slack/Teams channel] | CS Manager |
| Sales Team | [Slack/Teams channel] | Sales Manager |
| DevOps | [Slack/Teams channel] | Ops Lead |
| SF Admin | [Email/Slack] | IT Manager |
### Incident Communication
See [Incident Response Runbook](./incident-response.md) for incident communication procedures.
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Provisioning Runbook](./provisioning-runbook.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)
- [Complete Operations Guide](../how-it-works/COMPLETE-GUIDE.md)
---
**Last Updated:** December 2025

View File

@ -0,0 +1,327 @@
# Incident Response Runbook
This document defines procedures for responding to production incidents affecting the Customer Portal.
---
## Severity Classification
| Severity | Definition | Response Time | Examples |
| ----------------- | -------------------------------------- | ------------- | ----------------------------------------------------------------- |
| **P1 - Critical** | Complete service outage or data loss | 15 minutes | Portal unreachable, database corruption, security breach |
| **P2 - High** | Major feature unavailable | 1 hour | Order provisioning failing, payment processing down |
| **P3 - Medium** | Degraded performance or partial outage | 4 hours | Slow response times, intermittent errors, single integration down |
| **P4 - Low** | Minor issue, workaround available | 24 hours | UI glitches, non-critical feature bugs |
---
## Escalation Matrix
| Level | Scope | Contact | When to Escalate |
| ------ | ---------------- | ------------------- | ---------------------------------------------------- |
| **L1** | Initial Response | On-call engineer | All incidents |
| **L2** | Technical Lead | Development lead | P1/P2 not resolved in 30 minutes |
| **L3** | Management | Engineering manager | P1 not resolved in 1 hour, customer impact |
| **L4** | External | Vendor support | External system failure (Salesforce, WHMCS, Freebit) |
### On-Call Contacts
> **Note**: Update this section with actual contact information for your team.
| Role | Contact Method | Backup |
| ----------------- | ----------------- | ------- |
| Primary On-Call | [Slack/PagerDuty] | [Phone] |
| Secondary On-Call | [Slack/PagerDuty] | [Phone] |
| Engineering Lead | [Slack/Email] | [Phone] |
---
## Common Incident Scenarios
### 1. Salesforce Platform Events Not Receiving
**Symptoms:**
- Orders stuck in "Pending Review" status
- No provisioning activity in logs
- `sf:pe:replay:*` Redis keys not updating
**Diagnosis:**
```bash
# Check BFF logs for Platform Event subscription
grep "Platform Event" /var/log/bff/combined.log | tail -50
# Check Redis replay ID
redis-cli GET "sf:pe:replay:/event/OrderProvisionRequested__e"
# Verify Salesforce connectivity
curl -X GET http://localhost:4000/health
```
**Resolution:**
1. Verify `SF_EVENTS_ENABLED=true` in environment
2. Check Salesforce Connected App JWT authentication
3. Verify Platform Event permissions for integration user
4. Set `SF_EVENTS_REPLAY=ALL` temporarily to replay missed events
5. Restart BFF to re-establish subscription
**Escalation:** If unresolved in 30 minutes, contact Salesforce admin.
---
### 2. WHMCS API Unavailable
**Symptoms:**
- Billing pages showing "service unavailable"
- Provisioning failing with WHMCS errors
- Payment method checks failing
**Diagnosis:**
```bash
# Check WHMCS connectivity from BFF
curl -X POST $WHMCS_API_URL -d "action=GetClients&responsetype=json"
# Check BFF logs for WHMCS errors
grep "WHMCS" /var/log/bff/error.log | tail -20
```
**Resolution:**
1. Verify WHMCS server is accessible
2. Check WHMCS API credentials (`WHMCS_API_IDENTIFIER`, `WHMCS_API_SECRET`)
3. Check WHMCS server load and resource usage
4. Contact WHMCS hosting provider if server is down
**Escalation:** If WHMCS server is down, contact hosting provider.
---
### 3. Redis Connection Failures
**Symptoms:**
- Authentication failing
- Cache misses on every request
- Rate limiting not working
- SSE connections dropping
**Diagnosis:**
```bash
# Check Redis connectivity
redis-cli ping
# Check Redis memory usage
redis-cli INFO memory
# Check BFF health endpoint
curl http://localhost:4000/health | jq '.checks.cache'
```
**Resolution:**
1. Verify Redis URL in environment (`REDIS_URL`)
2. Check Redis server memory usage and eviction policy
3. Restart Redis if memory is exhausted
4. Clear stale keys if necessary: `redis-cli FLUSHDB` (caution: clears all cache)
**Impact Note:** Redis failure causes:
- Token blacklist checks to fail (security risk if `AUTH_BLACKLIST_FAIL_CLOSED=false`)
- All cached data to be re-fetched from source systems
- Rate limiting to stop working
---
### 4. Database Connection Issues
**Symptoms:**
- All API requests failing with 500 errors
- Health check shows database as "fail"
- Prisma connection errors in logs
**Diagnosis:**
```bash
# Check database connectivity
psql $DATABASE_URL -c "SELECT 1"
# Check connection count
psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity"
# Check BFF health endpoint
curl http://localhost:4000/health | jq '.checks.database'
```
**Resolution:**
1. Verify PostgreSQL server is running
2. Check connection pool limits (Prisma connection_limit)
3. Look for long-running queries and kill if necessary
4. Restart database if unresponsive
**Escalation:** If database is corrupted, see [Database Operations Runbook](./database-operations.md).
---
### 5. High Error Rate / Performance Degradation
**Symptoms:**
- Increased response times (>2s average)
- Error rate above 1%
- Customer complaints
**Diagnosis:**
```bash
# Check BFF process resource usage
top -p $(pgrep -f "node.*bff")
# Check recent error logs
tail -100 /var/log/bff/error.log
# Check external API response times in logs
grep "duration" /var/log/bff/combined.log | tail -20
```
**Resolution:**
1. Identify which external API is slow (Salesforce, WHMCS, Freebit)
2. Check for traffic spikes or unusual patterns
3. Scale horizontally if CPU/memory constrained
4. Enable circuit breakers or increase timeouts temporarily
---
### 6. Security Incident
**Symptoms:**
- Unusual login patterns
- Suspected unauthorized access
- Data exfiltration alerts
**Immediate Actions:**
1. **DO NOT** modify logs or evidence
2. Notify security team immediately
3. Consider isolating affected systems
4. Document all observations with timestamps
**Escalation:** P1 - Immediately escalate to engineering lead and management.
---
## Incident Response Workflow
```
1. DETECT
├── Automated alert received
├── Customer report
└── Internal discovery
2. ASSESS
├── Determine severity (P1-P4)
├── Identify affected systems
└── Estimate customer impact
3. RESPOND
├── Follow relevant scenario playbook
├── Communicate status
└── Escalate if needed
4. RESOLVE
├── Implement fix
├── Verify resolution
└── Monitor for recurrence
5. REVIEW
├── Document timeline
├── Identify root cause
└── Create action items
```
---
## Communication Templates
### Internal Status Update
```
INCIDENT UPDATE - [P1/P2/P3/P4] - [Brief Description]
Status: [Investigating/Identified/Monitoring/Resolved]
Impact: [Description of customer impact]
Started: [Time in UTC]
Last Update: [Time in UTC]
Current Actions:
- [Action 1]
- [Action 2]
Next Update: [Time]
```
### Customer Communication (P1/P2 only)
```
We are currently experiencing issues with [service/feature].
What's happening: [Brief, non-technical description]
Impact: [What customers may experience]
Status: Our team is actively working to resolve this issue.
We will provide updates every [30 minutes/1 hour].
We apologize for any inconvenience.
```
---
## Post-Incident Review
After every P1 or P2 incident, conduct a post-incident review within 3 business days.
### Review Template
1. **Incident Summary**
- What happened?
- When did it start/end?
- Who was affected?
2. **Timeline**
- Detection time
- Response time
- Resolution time
- Key milestones
3. **Root Cause Analysis**
- What was the direct cause?
- What were contributing factors?
- Why wasn't this prevented?
4. **Action Items**
- Immediate fixes applied
- Preventive measures needed
- Monitoring improvements
- Documentation updates
---
## Related Documents
- [Provisioning Runbook](./provisioning-runbook.md)
- [Database Operations](./database-operations.md)
- [External Dependencies](./external-dependencies.md)
- [Queue Management](./queue-management.md)
- [Logging Guide](./logging.md)
---
**Last Updated:** December 2025

View File

@ -73,3 +73,79 @@ Portal does not auto-retry jobs. Network/5xx/timeouts will mark the Order Failed
- `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT) - `Activation_Error_Code__c` (e.g., 429, 503, ETIMEOUT)
- `Activation_Error_Message__c` (short reason) - `Activation_Error_Message__c` (short reason)
---
## Escalation Paths
| Condition | Escalation | Contact |
| ---------------------------- | --------------------- | --------------------------------------------------- |
| Issue persists >30 minutes | Salesforce admin | Check Flow configuration, Platform Event publishing |
| WHMCS returns 5xx >5 times | WHMCS hosting support | Server may be overloaded or down |
| Event replay doesn't recover | Development team | May need code investigation |
| Product mapping errors | Salesforce admin | Add missing `WH_Product_ID__c` values |
| Payment method issues | Customer support | Guide customer to add payment method in WHMCS |
For general incident response procedures, see [Incident Response Runbook](./incident-response.md).
---
## SLA Expectations
| Metric | Target | Warning | Critical |
| ----------------------- | ---------- | ----------- | ----------- |
| Provisioning completion | <5 seconds | >10 seconds | >30 seconds |
| Event processing delay | <1 second | >5 seconds | >30 seconds |
| Error rate | <1% | >1% | >5% |
### Performance Monitoring
- Monitor provisioning duration in logs (from "Platform Event enqueued" to "Activated")
- Track WHMCS API response times
- Alert on Salesforce update failures
---
## Manual Intervention Checklist
When automated retry fails, follow these steps:
1. **Check Salesforce Order**
- Open the Order in Salesforce
- Review `Activation_Status__c`, `Activation_Error_Code__c`, `Activation_Error_Message__c`
- Check if `WHMCS_Order_ID__c` was partially set
2. **Verify Customer Data**
- Confirm customer has valid WHMCS payment method via `GetPayMethods`
- Check `id_mappings` table for correct portal-WHMCS-SF linkage
3. **Validate Product Mappings**
- For each OrderItem, verify `Product2.WH_Product_ID__c` is set
- Verify `Product2.Billing_Cycle__c` matches WHMCS expectations
4. **Check BFF Logs**
- Search for the Salesforce Order ID in logs
- Identify the specific step that failed
- Look for external API errors (WHMCS, Salesforce)
5. **Manual Recovery**
- If WHMCS order was created but SF not updated:
- Manually update `WHMCS_Order_ID__c` and `Activation_Status__c` in Salesforce
- If WHMCS order was not created:
- Fix the root cause (payment method, mapping)
- Retry via Salesforce (set `Activation_Status__c = Activating`)
6. **Verify Resolution**
- Confirm Salesforce Order shows `Activated`
- Confirm WHMCS has the order and services
- Confirm customer can see their subscription in the portal
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Queue Management](./queue-management.md)
- [External Dependencies](./external-dependencies.md)
- [Salesforce Requirements](../integrations/salesforce/requirements.md)
- [WHMCS Mapping Reference](../integrations/salesforce/whmcs-mapping.md)

View File

@ -0,0 +1,361 @@
# Queue Management Runbook
This document covers monitoring and management of BullMQ job queues used by the Customer Portal BFF.
---
## Overview
The BFF uses BullMQ (backed by Redis) for asynchronous job processing:
| Queue | Purpose | Processor Location |
| -------------------- | --------------------------------------------- | ---------------------------------------------------- |
| `order-provisioning` | Order fulfillment after CS approval | `apps/bff/src/modules/orders/queue/` |
| `sim-management` | Delayed SIM operations (network type changes) | `apps/bff/src/modules/subscriptions/sim-management/` |
---
## Queue Configuration
### Environment Variables
| Variable | Description | Default |
| ------------------------ | ---------------------------------- | -------- |
| `REDIS_URL` | Redis connection for queues | Required |
| `QUEUE_DEFAULT_ATTEMPTS` | Default retry attempts | 3 |
| `QUEUE_BACKOFF_DELAY` | Backoff delay between retries (ms) | 5000 |
### Queue Options
```typescript
// Default queue configuration
{
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 5000,
},
removeOnComplete: 100, // Keep last 100 completed jobs
removeOnFail: 500, // Keep last 500 failed jobs
}
}
```
---
## Monitoring
### Check Queue Status
```bash
# Connect to Redis and check queue keys
redis-cli KEYS "bull:*"
# Check specific queue length
redis-cli LLEN "bull:order-provisioning:wait"
redis-cli LLEN "bull:order-provisioning:active"
redis-cli ZCARD "bull:order-provisioning:delayed"
redis-cli ZCARD "bull:order-provisioning:failed"
```
### Queue Key Structure
| Key Pattern | Description |
| ------------------------ | ----------------------------------- |
| `bull:{queue}:wait` | Jobs waiting to be processed |
| `bull:{queue}:active` | Jobs currently being processed |
| `bull:{queue}:delayed` | Jobs scheduled for future execution |
| `bull:{queue}:completed` | Recently completed jobs |
| `bull:{queue}:failed` | Failed jobs |
### Health Metrics
| Metric | Warning | Critical | Action |
| ---------------- | ------- | -------- | --------------------------- |
| Wait queue depth | >10 | >50 | Check processor status |
| Failed job count | >5 | >20 | Investigate failures |
| Processing time | >30s | >60s | Check external dependencies |
---
## Order Provisioning Queue
### Purpose
Processes orders after CS approval via Salesforce Platform Events.
### Flow
```
Salesforce Platform Event (OrderProvisionRequested__e)
Event Subscriber receives event
Job enqueued to 'order-provisioning' queue
Processor executes fulfillment workflow
Order created in WHMCS + Salesforce updated
```
### Job Data Structure
```typescript
{
sfOrderId: "8014x000000ABCDXYZ", // Salesforce Order ID
idempotencyKey: "8014x...-1703123456789",
eventPayload: { ... } // Original Platform Event data
}
```
### Common Failure Reasons
| Error | Cause | Resolution |
| ------------------------ | ------------------------------ | ------------------------------------------------ |
| `PAYMENT_METHOD_MISSING` | Customer has no payment method | Customer must add payment method in WHMCS |
| `ORDER_NOT_FOUND` | Salesforce Order doesn't exist | Check Order ID, verify not deleted |
| `MAPPING_ERROR` | Product mapping missing | Add `WH_Product_ID__c` to Product2 in Salesforce |
| `WHMCS_ERROR` | WHMCS API failure | Check WHMCS connectivity and logs |
### Retry Behavior
- **Attempts**: 3 total (1 initial + 2 retries)
- **Backoff**: Exponential (5s, 10s, 20s)
- **On Final Failure**: Salesforce Order updated with error details
---
## SIM Management Queue
### Purpose
Handles delayed SIM operations, particularly network type changes that require a 30-minute gap.
### Job Types
| Job Type | Delay | Description |
| ------------------- | ---------- | ----------------------------- |
| `networkTypeChange` | 30 minutes | Change between 4G/5G networks |
### Job Data Structure
```typescript
{
subscriptionId: 29951,
simAccount: "08077052946",
operation: "networkTypeChange",
params: {
networkType: "5G"
},
scheduledAt: "2024-01-15T10:30:00Z"
}
```
### Common Failure Reasons
| Error | Cause | Resolution |
| --------------------- | -------------------------------- | --------------------------------------- |
| `FREEBIT_AUTH_FAILED` | Freebit authentication error | Check OEM credentials |
| `ACCOUNT_NOT_FOUND` | SIM account not found in Freebit | Verify account identifier |
| `OPERATION_CONFLICT` | Another operation pending | Wait for previous operation to complete |
---
## Failed Job Investigation
### View Failed Jobs
```bash
# List failed jobs (using Redis CLI)
redis-cli ZRANGE "bull:order-provisioning:failed" 0 -1
# Get job details
redis-cli HGETALL "bull:order-provisioning:{job-id}"
```
### Common Investigation Steps
1. **Check job data**: Identify the order/subscription involved
2. **Check error message**: Look for specific failure reason
3. **Check external system**: Verify Salesforce/WHMCS/Freebit status
4. **Check logs**: Search BFF logs for job ID or order ID
5. **Determine if retryable**: Some errors are permanent (missing mapping), others are transient (network timeout)
### Log Search
```bash
# Search logs for specific order
grep "8014x000000ABCDXYZ" /var/log/bff/combined.log
# Search for queue processing errors
grep "provisioning" /var/log/bff/error.log | tail -50
```
---
## Manual Retry Procedures
### Retry a Single Failed Job
```typescript
// Using BullMQ API in Node.js
import { Queue } from "bullmq";
const queue = new Queue("order-provisioning", { connection: redisConnection });
const job = await queue.getJob("job-id");
await job.retry();
```
### Retry All Failed Jobs
```bash
# Move all failed jobs back to waiting
redis-cli ZRANGEBYSCORE "bull:order-provisioning:failed" -inf +inf | while read jobId; do
redis-cli LPUSH "bull:order-provisioning:wait" "$jobId"
redis-cli ZREM "bull:order-provisioning:failed" "$jobId"
done
```
> **Warning**: Only retry jobs after fixing the root cause. Retrying without fixing will cause the same failure.
### Retry via Salesforce (Recommended for Provisioning)
For order provisioning, the recommended retry method is through Salesforce:
1. Open the Order in Salesforce
2. Clear error fields (`Activation_Error__c`, `Activation_Error_DateTime__c`)
3. Set `Activation_Status__c` back to "Activating"
4. The Record-Triggered Flow will publish a new Platform Event
This approach ensures proper idempotency tracking and audit trail.
---
## Clearing Stuck Jobs
### Clear All Jobs from a Queue
> **Warning**: This removes all jobs including pending work. Use only in emergencies.
```bash
# Clear all queue data
redis-cli DEL \
"bull:order-provisioning:wait" \
"bull:order-provisioning:active" \
"bull:order-provisioning:delayed" \
"bull:order-provisioning:completed" \
"bull:order-provisioning:failed"
```
### Clear Old Completed/Failed Jobs
```bash
# Remove jobs older than 7 days from completed
redis-cli ZREMRANGEBYSCORE "bull:order-provisioning:completed" -inf $(date -d '7 days ago' +%s000)
# Remove jobs older than 30 days from failed
redis-cli ZREMRANGEBYSCORE "bull:order-provisioning:failed" -inf $(date -d '30 days ago' +%s000)
```
---
## Queue Backlog Handling
### Symptoms of Backlog
- Wait queue depth increasing
- Jobs not being processed
- Customer orders stuck in "Activating" status
### Diagnosis
1. **Check processor is running**
```bash
grep "BullMQ" /var/log/bff/combined.log | tail -20
```
2. **Check Redis connectivity**
```bash
redis-cli PING
```
3. **Check for blocked jobs**
```bash
redis-cli LLEN "bull:order-provisioning:active"
# If active > 0 for extended time, jobs may be stuck
```
4. **Check external dependencies**
- Salesforce API
- WHMCS API
### Resolution
1. **Restart BFF** to reconnect queue workers
2. **Clear stuck active jobs** if processor crashed mid-job
3. **Scale horizontally** if queue depth is due to high volume
4. **Fix root cause** if jobs are failing repeatedly
---
## Alerting Configuration
### Recommended Alerts
| Alert | Condition | Severity |
| ---------------------- | ------------------------------------------------ | -------- |
| Queue Backlog | Wait queue > 10 for > 5 minutes | Warning |
| Queue Backlog Critical | Wait queue > 50 | Critical |
| Failed Jobs Spike | > 5 failures in 15 minutes | Warning |
| Processor Down | No job processed in 10 minutes with jobs waiting | Critical |
| Job Timeout | Job active for > 5 minutes | Warning |
### Monitoring Queries
```bash
# Check queue depths (for monitoring script)
WAIT=$(redis-cli LLEN "bull:order-provisioning:wait")
ACTIVE=$(redis-cli LLEN "bull:order-provisioning:active")
FAILED=$(redis-cli ZCARD "bull:order-provisioning:failed")
echo "Wait: $WAIT, Active: $ACTIVE, Failed: $FAILED"
```
---
## Best Practices
### Job Design
- Include sufficient context in job data for debugging
- Use idempotency keys to prevent duplicate processing
- Keep job payloads small (< 10KB)
### Error Handling
- Distinguish between retryable and non-retryable errors
- Log sufficient context before throwing
- Update external systems with error status on final failure
### Monitoring
- Set up alerts for queue depth and failure rate
- Monitor job processing duration
- Track success/failure ratios over time
---
## Related Documents
- [Incident Response](./incident-response.md)
- [Provisioning Runbook](./provisioning-runbook.md)
- [External Dependencies](./external-dependencies.md)
- [SIM State Machine](../integrations/sim/state-machine.md)
---
**Last Updated:** December 2025