# Rate Limit Tuning Guide This document covers rate limiting configuration, adjustment procedures, and troubleshooting for the Customer Portal. --- ## Rate Limiting Overview The portal uses multiple rate limiting mechanisms: | Type | Scope | Backend | Purpose | | ------------------------- | ---------------------------------- | ------------------- | --------------------------- | | **Auth Rate Limiting** | Per endpoint (login, signup, etc.) | Redis | Prevent brute force attacks | | **Global Rate Limiting** | Per route/controller | Redis | API abuse prevention | | **Request Queues** | Per external API | In-memory (p-queue) | External API protection | | **SSE Connection Limits** | Per user | In-memory | Resource protection | --- ## Authentication Rate Limits ### Configuration | Endpoint | Env Variable | Default | Window | | -------------------- | --------------------------------- | ----------- | ------ | | Login | `LOGIN_RATE_LIMIT_LIMIT` | 5 attempts | 15 min | | Login (TTL) | `LOGIN_RATE_LIMIT_TTL` | 900000 ms | - | | Signup | `SIGNUP_RATE_LIMIT_LIMIT` | 5 attempts | 15 min | | Signup (TTL) | `SIGNUP_RATE_LIMIT_TTL` | 900000 ms | - | | Password Reset | `PASSWORD_RESET_RATE_LIMIT_LIMIT` | 5 attempts | 15 min | | Password Reset (TTL) | `PASSWORD_RESET_RATE_LIMIT_TTL` | 900000 ms | - | | Token Refresh | `AUTH_REFRESH_RATE_LIMIT_LIMIT` | 10 attempts | 5 min | | Token Refresh (TTL) | `AUTH_REFRESH_RATE_LIMIT_TTL` | 300000 ms | - | ### CAPTCHA Configuration | Setting | Env Variable | Default | Description | | ----------------- | ------------------------------ | ------- | ------------------------------------ | | CAPTCHA Threshold | `LOGIN_CAPTCHA_AFTER_ATTEMPTS` | 3 | Show CAPTCHA after N failed attempts | | CAPTCHA Always On | `AUTH_CAPTCHA_ALWAYS_ON` | false | Require CAPTCHA for all logins | ### Adjusting Auth Rate Limits **In Production (requires restart):** ```bash # Edit .env file LOGIN_RATE_LIMIT_LIMIT=10 # Increase to 10 attempts LOGIN_RATE_LIMIT_TTL=1800000 # Extend window to 30 minutes # Restart backend docker compose restart backend ``` **Temporary Increase via Redis (immediate, no restart):** ```bash # Check current rate limit for a key redis-cli GET "auth-login:" # Delete a rate limit record to allow immediate retry redis-cli DEL "auth-login:" ``` --- ## Global API Rate Limits ### Configuration Global rate limits are applied via the `@RateLimit` decorator: ```typescript @RateLimit({ limit: 100, ttl: 60 }) // 100 requests per minute @Controller('invoices') export class InvoicesController { ... } ``` ### Common Rate Limit Settings | Endpoint | Limit | TTL | Notes | | ------------- | ----- | --- | --------------------- | | Invoices | 100 | 60s | High-traffic endpoint | | Subscriptions | 100 | 60s | High-traffic endpoint | | Catalog | 200 | 60s | Cached, higher limit | | Orders | 50 | 60s | Write operations | | Profile | 60 | 60s | Standard limit | ### Adjusting Global Rate Limits Global rate limits are defined in code. To adjust: 1. Modify the `@RateLimit` decorator in the controller 2. Deploy the change ```typescript // Before @RateLimit({ limit: 50, ttl: 60 }) // After (double the limit) @RateLimit({ limit: 100, ttl: 60 }) ``` --- ## External API Request Queues ### WHMCS Queue Configuration | Setting | Env Variable | Default | Description | | ------------ | -------------------------- | ------- | ----------------------- | | Concurrency | `WHMCS_QUEUE_CONCURRENCY` | 15 | Max parallel requests | | Interval Cap | `WHMCS_QUEUE_INTERVAL_CAP` | 300 | Max requests per minute | | Timeout | `WHMCS_QUEUE_TIMEOUT_MS` | 30000 | Request timeout (ms) | ### Salesforce Queue Configuration | Setting | Env Variable | Default | Description | | ------------------------ | ----------------------------- | ------- | ----------------------- | | Standard Concurrency | `SF_QUEUE_CONCURRENCY` | 10 | Standard operations | | Long-Running Concurrency | `SF_LONG_RUNNING_CONCURRENCY` | 5 | Bulk operations | | Interval Cap | `SF_QUEUE_INTERVAL_CAP` | 200 | Max requests per minute | | Timeout | `SF_QUEUE_TIMEOUT_MS` | 30000 | Request timeout (ms) | ### Adjusting Queue Limits **Production Adjustment:** ```bash # Edit .env file WHMCS_QUEUE_CONCURRENCY=20 # Increase concurrent requests WHMCS_QUEUE_INTERVAL_CAP=500 # Increase requests per minute # Restart backend docker compose restart backend ``` ### Queue Health Monitoring ```bash # Check queue metrics curl http://localhost:4000/health/queues | jq '.' # Expected output: { "whmcs": { "health": "healthy", "metrics": { "queueSize": 0, "pendingRequests": 2, "failedRequests": 0 } }, "salesforce": { "health": "healthy", "metrics": { ... }, "dailyUsage": { "used": 5000, "limit": 15000 } } } ``` --- ## SSE Connection Limits ### Configuration ```typescript // Per-user SSE connection limit (in-memory) private readonly maxPerUser = 3; ``` This prevents a single user from opening unlimited SSE connections. ### Adjusting SSE Limits This requires a code change in `realtime-connection-limiter.service.ts`: ```typescript // Change from private readonly maxPerUser = 3; // To private readonly maxPerUser = 5; ``` --- ## Bypassing Rate Limits for Testing ### Temporary Bypass via Redis ```bash # Clear all rate limit keys for testing redis-cli KEYS "auth-*" | xargs redis-cli DEL redis-cli KEYS "rate-limit:*" | xargs redis-cli DEL # Clear specific user's rate limit redis-cli KEYS "**" | xargs redis-cli DEL ``` ### Using SkipRateLimit Decorator For development/testing routes: ```typescript @SkipRateLimit() @Get('test-endpoint') async testEndpoint() { ... } ``` ### Environment-Based Bypass Add a development bypass in configuration: ```bash # In .env (development only!) RATE_LIMIT_BYPASS_ENABLED=true ``` ```typescript // In guard if (this.configService.get("RATE_LIMIT_BYPASS_ENABLED") === "true") { return true; } ``` > **Warning**: Never enable bypass in production! --- ## Signs of Rate Limit Issues ### User-Facing Symptoms | Symptom | Possible Cause | Investigation | | -------------------------- | ------------------- | ------------------------- | | "Too many requests" errors | Rate limit exceeded | Check Redis keys, logs | | Login failures | Auth rate limit | Check `auth-login:*` keys | | Slow API responses | Queue backlog | Check `/health/queues` | | 429 errors in logs | Any rate limit | Check logs for specifics | ### Monitoring Indicators | Metric | Warning | Critical | Action | | ----------------- | ------------- | -------- | ------------------------ | | 429 error rate | >1% | >5% | Review rate limits | | Queue size | >10 | >50 | Increase concurrency | | Average wait time | >1s | >5s | Scale or increase limits | | CAPTCHA triggers | Unusual spike | - | Possible attack | ### Log Analysis ```bash # Find rate limit exceeded events grep "Rate limit exceeded" /var/log/bff/combined.log | tail -20 # Find 429 responses grep '"statusCode":429' /var/log/bff/combined.log | tail -20 # Count rate limit events by path grep "Rate limit exceeded" /var/log/bff/combined.log | \ jq -r '.path' | sort | uniq -c | sort -rn ``` --- ## Troubleshooting ### Too Many 429 Errors **Diagnosis:** ```bash # Check which endpoints are rate limited grep "Rate limit exceeded" /var/log/bff/combined.log | \ jq '{path: .path, key: .key}' | head -20 # Check queue health curl http://localhost:4000/health/queues ``` **Resolution:** 1. Identify the affected endpoint 2. Check if limit is appropriate for traffic 3. Increase limit if legitimate traffic 4. Add caching if requests are repetitive ### Legitimate Users Being Blocked **Diagnosis:** ```bash # Check rate limit state for specific key redis-cli KEYS "**" redis-cli GET "auth-login:" ``` **Resolution:** ```bash # Clear the user's rate limit record redis-cli DEL "auth-login:" ``` ### External API Rate Limit Violations **WHMCS Rate Limiting:** ```bash # Check queue metrics curl http://localhost:4000/health/queues/whmcs # Reduce concurrency if WHMCS is overloaded WHMCS_QUEUE_CONCURRENCY=5 WHMCS_QUEUE_INTERVAL_CAP=100 ``` **Salesforce API Limits:** ```bash # Check daily API usage curl http://localhost:4000/health/queues/salesforce | jq '.dailyUsage' # If approaching limit, reduce requests # Consider caching more data ``` ### Redis Connection Issues If rate limiting fails due to Redis: ```bash # Check Redis connectivity redis-cli PING # The guard fails open on Redis errors (allows request) # Check logs for "Rate limiter error - failing open" ``` --- ## Best Practices ### Setting Rate Limits 1. **Start Conservative** - Begin with lower limits, increase as needed 2. **Monitor Before Adjusting** - Understand traffic patterns first 3. **Consider User Experience** - Limits should rarely impact normal use 4. **Document Changes** - Track why limits were adjusted ### Rate Limit Strategies | Strategy | Use Case | Implementation | | ---------- | ----------------------- | ---------------------- | | IP-based | Anonymous endpoints | Default behavior | | User-based | Authenticated endpoints | Include user ID in key | | Combined | Sensitive endpoints | IP + User-Agent hash | | Tiered | Different user classes | Custom logic | ### Performance Considerations - **Redis Latency** - Keep Redis co-located with BFF - **Key Expiration** - Use TTL to prevent Redis bloat - **Fail Open** - Rate limiter allows requests if Redis fails - **Logging** - Log blocked requests for analysis --- ## Rate Limit Response Headers The BFF includes standard rate limit headers: ```http X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1704110400 Retry-After: 60 ``` Clients can use these to implement backoff. --- ## Related Documents - [Incident Response](./incident-response.md) - [Monitoring Setup](./monitoring-setup.md) - [External Dependencies](./external-dependencies.md) - [Queue Management](./queue-management.md) --- **Last Updated:** December 2025