- Added a new section for Release Procedures, detailing deployment and rollback processes. - Updated the System Operations section to include Monitoring Setup, Rate Limit Tuning, and Customer Data Management for improved operational guidance. - Reformatted the table structure for better readability and consistency across documentation.
396 lines
11 KiB
Markdown
396 lines
11 KiB
Markdown
# Rate Limit Tuning Guide
|
|
|
|
This document covers rate limiting configuration, adjustment procedures, and troubleshooting for the Customer Portal.
|
|
|
|
---
|
|
|
|
## Rate Limiting Overview
|
|
|
|
The portal uses multiple rate limiting mechanisms:
|
|
|
|
| Type | Scope | Backend | Purpose |
|
|
| ------------------------- | ---------------------------------- | ------------------- | --------------------------- |
|
|
| **Auth Rate Limiting** | Per endpoint (login, signup, etc.) | Redis | Prevent brute force attacks |
|
|
| **Global Rate Limiting** | Per route/controller | Redis | API abuse prevention |
|
|
| **Request Queues** | Per external API | In-memory (p-queue) | External API protection |
|
|
| **SSE Connection Limits** | Per user | In-memory | Resource protection |
|
|
|
|
---
|
|
|
|
## Authentication Rate Limits
|
|
|
|
### Configuration
|
|
|
|
| Endpoint | Env Variable | Default | Window |
|
|
| -------------------- | --------------------------------- | ----------- | ------ |
|
|
| Login | `LOGIN_RATE_LIMIT_LIMIT` | 5 attempts | 15 min |
|
|
| Login (TTL) | `LOGIN_RATE_LIMIT_TTL` | 900000 ms | - |
|
|
| Signup | `SIGNUP_RATE_LIMIT_LIMIT` | 5 attempts | 15 min |
|
|
| Signup (TTL) | `SIGNUP_RATE_LIMIT_TTL` | 900000 ms | - |
|
|
| Password Reset | `PASSWORD_RESET_RATE_LIMIT_LIMIT` | 5 attempts | 15 min |
|
|
| Password Reset (TTL) | `PASSWORD_RESET_RATE_LIMIT_TTL` | 900000 ms | - |
|
|
| Token Refresh | `AUTH_REFRESH_RATE_LIMIT_LIMIT` | 10 attempts | 5 min |
|
|
| Token Refresh (TTL) | `AUTH_REFRESH_RATE_LIMIT_TTL` | 300000 ms | - |
|
|
|
|
### CAPTCHA Configuration
|
|
|
|
| Setting | Env Variable | Default | Description |
|
|
| ----------------- | ------------------------------ | ------- | ------------------------------------ |
|
|
| CAPTCHA Threshold | `LOGIN_CAPTCHA_AFTER_ATTEMPTS` | 3 | Show CAPTCHA after N failed attempts |
|
|
| CAPTCHA Always On | `AUTH_CAPTCHA_ALWAYS_ON` | false | Require CAPTCHA for all logins |
|
|
|
|
### Adjusting Auth Rate Limits
|
|
|
|
**In Production (requires restart):**
|
|
|
|
```bash
|
|
# Edit .env file
|
|
LOGIN_RATE_LIMIT_LIMIT=10 # Increase to 10 attempts
|
|
LOGIN_RATE_LIMIT_TTL=1800000 # Extend window to 30 minutes
|
|
|
|
# Restart backend
|
|
docker compose restart backend
|
|
```
|
|
|
|
**Temporary Increase via Redis (immediate, no restart):**
|
|
|
|
```bash
|
|
# Check current rate limit for a key
|
|
redis-cli GET "auth-login:<ip-hash>"
|
|
|
|
# Delete a rate limit record to allow immediate retry
|
|
redis-cli DEL "auth-login:<ip-hash>"
|
|
```
|
|
|
|
---
|
|
|
|
## Global API Rate Limits
|
|
|
|
### Configuration
|
|
|
|
Global rate limits are applied via the `@RateLimit` decorator:
|
|
|
|
```typescript
|
|
@RateLimit({ limit: 100, ttl: 60 }) // 100 requests per minute
|
|
@Controller('invoices')
|
|
export class InvoicesController { ... }
|
|
```
|
|
|
|
### Common Rate Limit Settings
|
|
|
|
| Endpoint | Limit | TTL | Notes |
|
|
| ------------- | ----- | --- | --------------------- |
|
|
| Invoices | 100 | 60s | High-traffic endpoint |
|
|
| Subscriptions | 100 | 60s | High-traffic endpoint |
|
|
| Catalog | 200 | 60s | Cached, higher limit |
|
|
| Orders | 50 | 60s | Write operations |
|
|
| Profile | 60 | 60s | Standard limit |
|
|
|
|
### Adjusting Global Rate Limits
|
|
|
|
Global rate limits are defined in code. To adjust:
|
|
|
|
1. Modify the `@RateLimit` decorator in the controller
|
|
2. Deploy the change
|
|
|
|
```typescript
|
|
// Before
|
|
@RateLimit({ limit: 50, ttl: 60 })
|
|
|
|
// After (double the limit)
|
|
@RateLimit({ limit: 100, ttl: 60 })
|
|
```
|
|
|
|
---
|
|
|
|
## External API Request Queues
|
|
|
|
### WHMCS Queue Configuration
|
|
|
|
| Setting | Env Variable | Default | Description |
|
|
| ------------ | -------------------------- | ------- | ----------------------- |
|
|
| Concurrency | `WHMCS_QUEUE_CONCURRENCY` | 15 | Max parallel requests |
|
|
| Interval Cap | `WHMCS_QUEUE_INTERVAL_CAP` | 300 | Max requests per minute |
|
|
| Timeout | `WHMCS_QUEUE_TIMEOUT_MS` | 30000 | Request timeout (ms) |
|
|
|
|
### Salesforce Queue Configuration
|
|
|
|
| Setting | Env Variable | Default | Description |
|
|
| ------------------------ | ----------------------------- | ------- | ----------------------- |
|
|
| Standard Concurrency | `SF_QUEUE_CONCURRENCY` | 10 | Standard operations |
|
|
| Long-Running Concurrency | `SF_LONG_RUNNING_CONCURRENCY` | 5 | Bulk operations |
|
|
| Interval Cap | `SF_QUEUE_INTERVAL_CAP` | 200 | Max requests per minute |
|
|
| Timeout | `SF_QUEUE_TIMEOUT_MS` | 30000 | Request timeout (ms) |
|
|
|
|
### Adjusting Queue Limits
|
|
|
|
**Production Adjustment:**
|
|
|
|
```bash
|
|
# Edit .env file
|
|
WHMCS_QUEUE_CONCURRENCY=20 # Increase concurrent requests
|
|
WHMCS_QUEUE_INTERVAL_CAP=500 # Increase requests per minute
|
|
|
|
# Restart backend
|
|
docker compose restart backend
|
|
```
|
|
|
|
### Queue Health Monitoring
|
|
|
|
```bash
|
|
# Check queue metrics
|
|
curl http://localhost:4000/health/queues | jq '.'
|
|
|
|
# Expected output:
|
|
{
|
|
"whmcs": {
|
|
"health": "healthy",
|
|
"metrics": {
|
|
"queueSize": 0,
|
|
"pendingRequests": 2,
|
|
"failedRequests": 0
|
|
}
|
|
},
|
|
"salesforce": {
|
|
"health": "healthy",
|
|
"metrics": { ... },
|
|
"dailyUsage": { "used": 5000, "limit": 15000 }
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## SSE Connection Limits
|
|
|
|
### Configuration
|
|
|
|
```typescript
|
|
// Per-user SSE connection limit (in-memory)
|
|
private readonly maxPerUser = 3;
|
|
```
|
|
|
|
This prevents a single user from opening unlimited SSE connections.
|
|
|
|
### Adjusting SSE Limits
|
|
|
|
This requires a code change in `realtime-connection-limiter.service.ts`:
|
|
|
|
```typescript
|
|
// Change from
|
|
private readonly maxPerUser = 3;
|
|
|
|
// To
|
|
private readonly maxPerUser = 5;
|
|
```
|
|
|
|
---
|
|
|
|
## Bypassing Rate Limits for Testing
|
|
|
|
### Temporary Bypass via Redis
|
|
|
|
```bash
|
|
# Clear all rate limit keys for testing
|
|
redis-cli KEYS "auth-*" | xargs redis-cli DEL
|
|
redis-cli KEYS "rate-limit:*" | xargs redis-cli DEL
|
|
|
|
# Clear specific user's rate limit
|
|
redis-cli KEYS "*<ip-or-user-identifier>*" | xargs redis-cli DEL
|
|
```
|
|
|
|
### Using SkipRateLimit Decorator
|
|
|
|
For development/testing routes:
|
|
|
|
```typescript
|
|
@SkipRateLimit()
|
|
@Get('test-endpoint')
|
|
async testEndpoint() { ... }
|
|
```
|
|
|
|
### Environment-Based Bypass
|
|
|
|
Add a development bypass in configuration:
|
|
|
|
```bash
|
|
# In .env (development only!)
|
|
RATE_LIMIT_BYPASS_ENABLED=true
|
|
```
|
|
|
|
```typescript
|
|
// In guard
|
|
if (this.configService.get("RATE_LIMIT_BYPASS_ENABLED") === "true") {
|
|
return true;
|
|
}
|
|
```
|
|
|
|
> **Warning**: Never enable bypass in production!
|
|
|
|
---
|
|
|
|
## Signs of Rate Limit Issues
|
|
|
|
### User-Facing Symptoms
|
|
|
|
| Symptom | Possible Cause | Investigation |
|
|
| -------------------------- | ------------------- | ------------------------- |
|
|
| "Too many requests" errors | Rate limit exceeded | Check Redis keys, logs |
|
|
| Login failures | Auth rate limit | Check `auth-login:*` keys |
|
|
| Slow API responses | Queue backlog | Check `/health/queues` |
|
|
| 429 errors in logs | Any rate limit | Check logs for specifics |
|
|
|
|
### Monitoring Indicators
|
|
|
|
| Metric | Warning | Critical | Action |
|
|
| ----------------- | ------------- | -------- | ------------------------ |
|
|
| 429 error rate | >1% | >5% | Review rate limits |
|
|
| Queue size | >10 | >50 | Increase concurrency |
|
|
| Average wait time | >1s | >5s | Scale or increase limits |
|
|
| CAPTCHA triggers | Unusual spike | - | Possible attack |
|
|
|
|
### Log Analysis
|
|
|
|
```bash
|
|
# Find rate limit exceeded events
|
|
grep "Rate limit exceeded" /var/log/bff/combined.log | tail -20
|
|
|
|
# Find 429 responses
|
|
grep '"statusCode":429' /var/log/bff/combined.log | tail -20
|
|
|
|
# Count rate limit events by path
|
|
grep "Rate limit exceeded" /var/log/bff/combined.log | \
|
|
jq -r '.path' | sort | uniq -c | sort -rn
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Too Many 429 Errors
|
|
|
|
**Diagnosis:**
|
|
|
|
```bash
|
|
# Check which endpoints are rate limited
|
|
grep "Rate limit exceeded" /var/log/bff/combined.log | \
|
|
jq '{path: .path, key: .key}' | head -20
|
|
|
|
# Check queue health
|
|
curl http://localhost:4000/health/queues
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
1. Identify the affected endpoint
|
|
2. Check if limit is appropriate for traffic
|
|
3. Increase limit if legitimate traffic
|
|
4. Add caching if requests are repetitive
|
|
|
|
### Legitimate Users Being Blocked
|
|
|
|
**Diagnosis:**
|
|
|
|
```bash
|
|
# Check rate limit state for specific key
|
|
redis-cli KEYS "*<identifier>*"
|
|
redis-cli GET "auth-login:<hash>"
|
|
```
|
|
|
|
**Resolution:**
|
|
|
|
```bash
|
|
# Clear the user's rate limit record
|
|
redis-cli DEL "auth-login:<hash>"
|
|
```
|
|
|
|
### External API Rate Limit Violations
|
|
|
|
**WHMCS Rate Limiting:**
|
|
|
|
```bash
|
|
# Check queue metrics
|
|
curl http://localhost:4000/health/queues/whmcs
|
|
|
|
# Reduce concurrency if WHMCS is overloaded
|
|
WHMCS_QUEUE_CONCURRENCY=5
|
|
WHMCS_QUEUE_INTERVAL_CAP=100
|
|
```
|
|
|
|
**Salesforce API Limits:**
|
|
|
|
```bash
|
|
# Check daily API usage
|
|
curl http://localhost:4000/health/queues/salesforce | jq '.dailyUsage'
|
|
|
|
# If approaching limit, reduce requests
|
|
# Consider caching more data
|
|
```
|
|
|
|
### Redis Connection Issues
|
|
|
|
If rate limiting fails due to Redis:
|
|
|
|
```bash
|
|
# Check Redis connectivity
|
|
redis-cli PING
|
|
|
|
# The guard fails open on Redis errors (allows request)
|
|
# Check logs for "Rate limiter error - failing open"
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### Setting Rate Limits
|
|
|
|
1. **Start Conservative** - Begin with lower limits, increase as needed
|
|
2. **Monitor Before Adjusting** - Understand traffic patterns first
|
|
3. **Consider User Experience** - Limits should rarely impact normal use
|
|
4. **Document Changes** - Track why limits were adjusted
|
|
|
|
### Rate Limit Strategies
|
|
|
|
| Strategy | Use Case | Implementation |
|
|
| ---------- | ----------------------- | ---------------------- |
|
|
| IP-based | Anonymous endpoints | Default behavior |
|
|
| User-based | Authenticated endpoints | Include user ID in key |
|
|
| Combined | Sensitive endpoints | IP + User-Agent hash |
|
|
| Tiered | Different user classes | Custom logic |
|
|
|
|
### Performance Considerations
|
|
|
|
- **Redis Latency** - Keep Redis co-located with BFF
|
|
- **Key Expiration** - Use TTL to prevent Redis bloat
|
|
- **Fail Open** - Rate limiter allows requests if Redis fails
|
|
- **Logging** - Log blocked requests for analysis
|
|
|
|
---
|
|
|
|
## Rate Limit Response Headers
|
|
|
|
The BFF includes standard rate limit headers:
|
|
|
|
```http
|
|
X-RateLimit-Limit: 100
|
|
X-RateLimit-Remaining: 95
|
|
X-RateLimit-Reset: 1704110400
|
|
Retry-After: 60
|
|
```
|
|
|
|
Clients can use these to implement backoff.
|
|
|
|
---
|
|
|
|
## Related Documents
|
|
|
|
- [Incident Response](./incident-response.md)
|
|
- [Monitoring Setup](./monitoring-setup.md)
|
|
- [External Dependencies](./external-dependencies.md)
|
|
- [Queue Management](./queue-management.md)
|
|
|
|
---
|
|
|
|
**Last Updated:** December 2025
|