- Added a new section for Release Procedures, detailing deployment and rollback processes. - Updated the System Operations section to include Monitoring Setup, Rate Limit Tuning, and Customer Data Management for improved operational guidance. - Reformatted the table structure for better readability and consistency across documentation.
11 KiB
Rate Limit Tuning Guide
This document covers rate limiting configuration, adjustment procedures, and troubleshooting for the Customer Portal.
Rate Limiting Overview
The portal uses multiple rate limiting mechanisms:
| Type | Scope | Backend | Purpose |
|---|---|---|---|
| Auth Rate Limiting | Per endpoint (login, signup, etc.) | Redis | Prevent brute force attacks |
| Global Rate Limiting | Per route/controller | Redis | API abuse prevention |
| Request Queues | Per external API | In-memory (p-queue) | External API protection |
| SSE Connection Limits | Per user | In-memory | Resource protection |
Authentication Rate Limits
Configuration
| Endpoint | Env Variable | Default | Window |
|---|---|---|---|
| Login | LOGIN_RATE_LIMIT_LIMIT |
5 attempts | 15 min |
| Login (TTL) | LOGIN_RATE_LIMIT_TTL |
900000 ms | - |
| Signup | SIGNUP_RATE_LIMIT_LIMIT |
5 attempts | 15 min |
| Signup (TTL) | SIGNUP_RATE_LIMIT_TTL |
900000 ms | - |
| Password Reset | PASSWORD_RESET_RATE_LIMIT_LIMIT |
5 attempts | 15 min |
| Password Reset (TTL) | PASSWORD_RESET_RATE_LIMIT_TTL |
900000 ms | - |
| Token Refresh | AUTH_REFRESH_RATE_LIMIT_LIMIT |
10 attempts | 5 min |
| Token Refresh (TTL) | AUTH_REFRESH_RATE_LIMIT_TTL |
300000 ms | - |
CAPTCHA Configuration
| Setting | Env Variable | Default | Description |
|---|---|---|---|
| CAPTCHA Threshold | LOGIN_CAPTCHA_AFTER_ATTEMPTS |
3 | Show CAPTCHA after N failed attempts |
| CAPTCHA Always On | AUTH_CAPTCHA_ALWAYS_ON |
false | Require CAPTCHA for all logins |
Adjusting Auth Rate Limits
In Production (requires restart):
# Edit .env file
LOGIN_RATE_LIMIT_LIMIT=10 # Increase to 10 attempts
LOGIN_RATE_LIMIT_TTL=1800000 # Extend window to 30 minutes
# Restart backend
docker compose restart backend
Temporary Increase via Redis (immediate, no restart):
# Check current rate limit for a key
redis-cli GET "auth-login:<ip-hash>"
# Delete a rate limit record to allow immediate retry
redis-cli DEL "auth-login:<ip-hash>"
Global API Rate Limits
Configuration
Global rate limits are applied via the @RateLimit decorator:
@RateLimit({ limit: 100, ttl: 60 }) // 100 requests per minute
@Controller('invoices')
export class InvoicesController { ... }
Common Rate Limit Settings
| Endpoint | Limit | TTL | Notes |
|---|---|---|---|
| Invoices | 100 | 60s | High-traffic endpoint |
| Subscriptions | 100 | 60s | High-traffic endpoint |
| Catalog | 200 | 60s | Cached, higher limit |
| Orders | 50 | 60s | Write operations |
| Profile | 60 | 60s | Standard limit |
Adjusting Global Rate Limits
Global rate limits are defined in code. To adjust:
- Modify the
@RateLimitdecorator in the controller - Deploy the change
// Before
@RateLimit({ limit: 50, ttl: 60 })
// After (double the limit)
@RateLimit({ limit: 100, ttl: 60 })
External API Request Queues
WHMCS Queue Configuration
| Setting | Env Variable | Default | Description |
|---|---|---|---|
| Concurrency | WHMCS_QUEUE_CONCURRENCY |
15 | Max parallel requests |
| Interval Cap | WHMCS_QUEUE_INTERVAL_CAP |
300 | Max requests per minute |
| Timeout | WHMCS_QUEUE_TIMEOUT_MS |
30000 | Request timeout (ms) |
Salesforce Queue Configuration
| Setting | Env Variable | Default | Description |
|---|---|---|---|
| Standard Concurrency | SF_QUEUE_CONCURRENCY |
10 | Standard operations |
| Long-Running Concurrency | SF_LONG_RUNNING_CONCURRENCY |
5 | Bulk operations |
| Interval Cap | SF_QUEUE_INTERVAL_CAP |
200 | Max requests per minute |
| Timeout | SF_QUEUE_TIMEOUT_MS |
30000 | Request timeout (ms) |
Adjusting Queue Limits
Production Adjustment:
# Edit .env file
WHMCS_QUEUE_CONCURRENCY=20 # Increase concurrent requests
WHMCS_QUEUE_INTERVAL_CAP=500 # Increase requests per minute
# Restart backend
docker compose restart backend
Queue Health Monitoring
# Check queue metrics
curl http://localhost:4000/health/queues | jq '.'
# Expected output:
{
"whmcs": {
"health": "healthy",
"metrics": {
"queueSize": 0,
"pendingRequests": 2,
"failedRequests": 0
}
},
"salesforce": {
"health": "healthy",
"metrics": { ... },
"dailyUsage": { "used": 5000, "limit": 15000 }
}
}
SSE Connection Limits
Configuration
// Per-user SSE connection limit (in-memory)
private readonly maxPerUser = 3;
This prevents a single user from opening unlimited SSE connections.
Adjusting SSE Limits
This requires a code change in realtime-connection-limiter.service.ts:
// Change from
private readonly maxPerUser = 3;
// To
private readonly maxPerUser = 5;
Bypassing Rate Limits for Testing
Temporary Bypass via Redis
# Clear all rate limit keys for testing
redis-cli KEYS "auth-*" | xargs redis-cli DEL
redis-cli KEYS "rate-limit:*" | xargs redis-cli DEL
# Clear specific user's rate limit
redis-cli KEYS "*<ip-or-user-identifier>*" | xargs redis-cli DEL
Using SkipRateLimit Decorator
For development/testing routes:
@SkipRateLimit()
@Get('test-endpoint')
async testEndpoint() { ... }
Environment-Based Bypass
Add a development bypass in configuration:
# In .env (development only!)
RATE_LIMIT_BYPASS_ENABLED=true
// In guard
if (this.configService.get("RATE_LIMIT_BYPASS_ENABLED") === "true") {
return true;
}
Warning
: Never enable bypass in production!
Signs of Rate Limit Issues
User-Facing Symptoms
| Symptom | Possible Cause | Investigation |
|---|---|---|
| "Too many requests" errors | Rate limit exceeded | Check Redis keys, logs |
| Login failures | Auth rate limit | Check auth-login:* keys |
| Slow API responses | Queue backlog | Check /health/queues |
| 429 errors in logs | Any rate limit | Check logs for specifics |
Monitoring Indicators
| Metric | Warning | Critical | Action |
|---|---|---|---|
| 429 error rate | >1% | >5% | Review rate limits |
| Queue size | >10 | >50 | Increase concurrency |
| Average wait time | >1s | >5s | Scale or increase limits |
| CAPTCHA triggers | Unusual spike | - | Possible attack |
Log Analysis
# Find rate limit exceeded events
grep "Rate limit exceeded" /var/log/bff/combined.log | tail -20
# Find 429 responses
grep '"statusCode":429' /var/log/bff/combined.log | tail -20
# Count rate limit events by path
grep "Rate limit exceeded" /var/log/bff/combined.log | \
jq -r '.path' | sort | uniq -c | sort -rn
Troubleshooting
Too Many 429 Errors
Diagnosis:
# Check which endpoints are rate limited
grep "Rate limit exceeded" /var/log/bff/combined.log | \
jq '{path: .path, key: .key}' | head -20
# Check queue health
curl http://localhost:4000/health/queues
Resolution:
- Identify the affected endpoint
- Check if limit is appropriate for traffic
- Increase limit if legitimate traffic
- Add caching if requests are repetitive
Legitimate Users Being Blocked
Diagnosis:
# Check rate limit state for specific key
redis-cli KEYS "*<identifier>*"
redis-cli GET "auth-login:<hash>"
Resolution:
# Clear the user's rate limit record
redis-cli DEL "auth-login:<hash>"
External API Rate Limit Violations
WHMCS Rate Limiting:
# Check queue metrics
curl http://localhost:4000/health/queues/whmcs
# Reduce concurrency if WHMCS is overloaded
WHMCS_QUEUE_CONCURRENCY=5
WHMCS_QUEUE_INTERVAL_CAP=100
Salesforce API Limits:
# Check daily API usage
curl http://localhost:4000/health/queues/salesforce | jq '.dailyUsage'
# If approaching limit, reduce requests
# Consider caching more data
Redis Connection Issues
If rate limiting fails due to Redis:
# Check Redis connectivity
redis-cli PING
# The guard fails open on Redis errors (allows request)
# Check logs for "Rate limiter error - failing open"
Best Practices
Setting Rate Limits
- Start Conservative - Begin with lower limits, increase as needed
- Monitor Before Adjusting - Understand traffic patterns first
- Consider User Experience - Limits should rarely impact normal use
- Document Changes - Track why limits were adjusted
Rate Limit Strategies
| Strategy | Use Case | Implementation |
|---|---|---|
| IP-based | Anonymous endpoints | Default behavior |
| User-based | Authenticated endpoints | Include user ID in key |
| Combined | Sensitive endpoints | IP + User-Agent hash |
| Tiered | Different user classes | Custom logic |
Performance Considerations
- Redis Latency - Keep Redis co-located with BFF
- Key Expiration - Use TTL to prevent Redis bloat
- Fail Open - Rate limiter allows requests if Redis fails
- Logging - Log blocked requests for analysis
Rate Limit Response Headers
The BFF includes standard rate limit headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704110400
Retry-After: 60
Clients can use these to implement backoff.
Related Documents
Last Updated: December 2025