# Redis-Required Token Flow Implementation Summary ## Overview This document summarizes the implementation of the Redis-required token flow with maintenance response, Salesforce auth timeout and logging improvements, queue throttling threshold updates, per-user refresh token sets, and migration utilities. ## ✅ Completed Features ### 1. Redis-Required Token Flow with Maintenance Response **Environment Variables Added:** - `AUTH_REQUIRE_REDIS_FOR_TOKENS`: When enabled, tokens require Redis to be available - `AUTH_MAINTENANCE_MODE`: Enables maintenance mode for authentication service - `AUTH_MAINTENANCE_MESSAGE`: Customizable maintenance message **Implementation:** - Added `checkServiceAvailability()` method in `AuthTokenService` - Strict Redis requirement enforcement when flag is enabled - Graceful maintenance mode with custom messaging - Production-safe error handling [[memory:6689308]] **Files Modified:** - `apps/bff/src/core/config/env.validation.ts` - `apps/bff/src/modules/auth/services/token.service.ts` - `env/portal-backend.env.sample` ### 2. Salesforce Auth Timeout + Logging **Environment Variables Added:** - `SF_AUTH_TIMEOUT_MS`: Configurable authentication timeout (default: 30s) - `SF_TOKEN_TTL_MS`: Token time-to-live (default: 12 minutes) - `SF_TOKEN_REFRESH_BUFFER_MS`: Refresh buffer time (default: 1 minute) **Implementation:** - Added timeout handling with `AbortController` - Enhanced logging with timing information and error details - Production-safe logging (sensitive data redacted) [[memory:6689308]] - Re-authentication attempt logging with duration tracking - Session expiration detection and automatic retry **Files Modified:** - `apps/bff/src/integrations/salesforce/services/salesforce-connection.service.ts` - `apps/bff/src/core/config/env.validation.ts` - `env/portal-backend.env.sample` ### 3. Queue Throttling Thresholds (Configurable) **Environment Variables Added:** - `WHMCS_QUEUE_CONCURRENCY`: WHMCS concurrent requests (default: 15) - `WHMCS_QUEUE_INTERVAL_CAP`: WHMCS requests per minute (default: 300) - `WHMCS_QUEUE_TIMEOUT_MS`: WHMCS request timeout (default: 30s) - `SF_QUEUE_CONCURRENCY`: Salesforce concurrent requests (default: 15) - `SF_QUEUE_LONG_RUNNING_CONCURRENCY`: SF long-running requests (default: 22) - `SF_QUEUE_INTERVAL_CAP`: SF requests per minute (default: 600) - `SF_QUEUE_TIMEOUT_MS`: SF request timeout (default: 30s) - `SF_QUEUE_LONG_RUNNING_TIMEOUT_MS`: SF long-running timeout (default: 10 minutes) **Implementation:** - Made all queue thresholds configurable via environment variables - Maintained optimized default values (15 concurrent, 5-10 RPS) - Enhanced logging with actual configuration values **Files Modified:** - `apps/bff/src/core/queue/services/whmcs-request-queue.service.ts` - `apps/bff/src/core/queue/services/salesforce-request-queue.service.ts` - `apps/bff/src/core/config/env.validation.ts` - `env/portal-backend.env.sample` ### 4. Per-User Refresh Token Sets **Implementation:** - Enhanced `AuthTokenService` with per-user token management - Added `REFRESH_USER_SET_PREFIX` for organizing tokens by user - Implemented automatic cleanup of excess tokens (max 10 per user) - Added `getUserRefreshTokenFamilies()` method for token inspection - Optimized `revokeAllUserTokens()` using Redis sets instead of scanning **New Methods:** - `storeRefreshTokenInRedis()`: Enhanced storage with user sets - `cleanupExcessUserTokens()`: Automatic cleanup of old tokens - `getUserRefreshTokenFamilies()`: Get user's active token families - `revokeAllUserTokensFallback()`: Fallback for edge cases **Files Modified:** - `apps/bff/src/modules/auth/services/token.service.ts` ### 5. Migration Utilities for Existing Keys Legacy helpers (`token-migration.service.ts`) have been removed along with the admin-only migration endpoints. ## 🚀 Deployment Instructions ### 1. Environment Configuration Add the following to your environment file: ```bash # Redis-required token flow AUTH_REQUIRE_REDIS_FOR_TOKENS=false # Set to true to require Redis AUTH_MAINTENANCE_MODE=false # Set to true for maintenance AUTH_MAINTENANCE_MESSAGE=Authentication service is temporarily unavailable for maintenance. Please try again later. # Salesforce timeouts SF_AUTH_TIMEOUT_MS=30000 SF_TOKEN_TTL_MS=720000 SF_TOKEN_REFRESH_BUFFER_MS=60000 # Queue throttling (adjust as needed) WHMCS_QUEUE_CONCURRENCY=15 WHMCS_QUEUE_INTERVAL_CAP=300 WHMCS_QUEUE_TIMEOUT_MS=30000 SF_QUEUE_CONCURRENCY=15 SF_QUEUE_LONG_RUNNING_CONCURRENCY=22 SF_QUEUE_INTERVAL_CAP=600 SF_QUEUE_TIMEOUT_MS=30000 SF_QUEUE_LONG_RUNNING_TIMEOUT_MS=600000 ``` ### 2. Migration Process Legacy admin migration endpoints were removed. If migration is needed in the future, plan a manual script or one-off job. ### 3. Feature Flag Rollout 1. **Phase 1:** Deploy with `AUTH_REQUIRE_REDIS_FOR_TOKENS=false` 2. **Phase 2:** Run migration in dry-run mode to assess impact 3. **Phase 3:** Execute migration during maintenance window 4. **Phase 4:** Enable `AUTH_REQUIRE_REDIS_FOR_TOKENS=true` for strict mode ## 🔧 Configuration Recommendations ### Production Settings ```bash # Strict Redis requirement for production AUTH_REQUIRE_REDIS_FOR_TOKENS=true # Conservative queue settings for stability WHMCS_QUEUE_CONCURRENCY=10 WHMCS_QUEUE_INTERVAL_CAP=200 SF_QUEUE_CONCURRENCY=12 SF_QUEUE_INTERVAL_CAP=400 # Longer timeouts for production reliability SF_AUTH_TIMEOUT_MS=45000 WHMCS_QUEUE_TIMEOUT_MS=45000 ``` ### Development Settings ```bash # Allow failover for development AUTH_REQUIRE_REDIS_FOR_TOKENS=false # Higher throughput for development WHMCS_QUEUE_CONCURRENCY=20 WHMCS_QUEUE_INTERVAL_CAP=500 SF_QUEUE_CONCURRENCY=20 SF_QUEUE_INTERVAL_CAP=800 ``` ## 🔍 Monitoring and Observability ### Key Metrics to Monitor 1. **Token Operations:** - Redis connection status - Token generation/refresh success rates - Per-user token counts 2. **Queue Performance:** - Queue depths and wait times - Request success/failure rates - Timeout occurrences 3. **Salesforce Auth:** - Authentication duration - Re-authentication frequency - Session expiration events ### Log Patterns to Watch - `"Authentication service in maintenance mode"` - `"Redis required for token operations but not available"` - `"Salesforce authentication timeout"` - `"Cleaned up excess user tokens"` ## 🛡️ Security Considerations 1. **Production Logging:** All sensitive data is redacted in production logs [[memory:6689308]] 2. **Token Limits:** Automatic cleanup prevents token accumulation attacks 3. **Redis Dependency:** Strict mode prevents token operations without Redis 4. **Audit Trail:** All migration operations are logged for compliance 5. **Graceful Degradation:** Maintenance mode provides controlled service interruption ## 📋 Testing Checklist - [ ] Redis failover behavior with strict mode enabled - [ ] Maintenance mode activation and messaging - [ ] Salesforce authentication timeout handling - [ ] Queue throttling under load - [ ] Token migration dry-run and execution - [ ] Per-user token limit enforcement - [ ] Orphaned token cleanup ## 🔄 Rollback Plan If issues arise: 1. **Disable Strict Mode:** Set `AUTH_REQUIRE_REDIS_FOR_TOKENS=false` 2. **Exit Maintenance:** Set `AUTH_MAINTENANCE_MODE=false` 3. **Revert Queue Settings:** Use previous concurrency/timeout values 4. **Token Cleanup:** Use migration service to clean up if needed All changes are backward compatible and can be safely rolled back via environment variables.