Assist_Design/docs/operations/rate-limit-tuning.md
barsa 90ab71b94d Update README.md to Enhance Documentation Clarity and Add New Sections
- Added a new section for Release Procedures, detailing deployment and rollback processes.
- Updated the System Operations section to include Monitoring Setup, Rate Limit Tuning, and Customer Data Management for improved operational guidance.
- Reformatted the table structure for better readability and consistency across documentation.
2025-12-23 16:08:15 +09:00

11 KiB

Rate Limit Tuning Guide

This document covers rate limiting configuration, adjustment procedures, and troubleshooting for the Customer Portal.


Rate Limiting Overview

The portal uses multiple rate limiting mechanisms:

Type Scope Backend Purpose
Auth Rate Limiting Per endpoint (login, signup, etc.) Redis Prevent brute force attacks
Global Rate Limiting Per route/controller Redis API abuse prevention
Request Queues Per external API In-memory (p-queue) External API protection
SSE Connection Limits Per user In-memory Resource protection

Authentication Rate Limits

Configuration

Endpoint Env Variable Default Window
Login LOGIN_RATE_LIMIT_LIMIT 5 attempts 15 min
Login (TTL) LOGIN_RATE_LIMIT_TTL 900000 ms -
Signup SIGNUP_RATE_LIMIT_LIMIT 5 attempts 15 min
Signup (TTL) SIGNUP_RATE_LIMIT_TTL 900000 ms -
Password Reset PASSWORD_RESET_RATE_LIMIT_LIMIT 5 attempts 15 min
Password Reset (TTL) PASSWORD_RESET_RATE_LIMIT_TTL 900000 ms -
Token Refresh AUTH_REFRESH_RATE_LIMIT_LIMIT 10 attempts 5 min
Token Refresh (TTL) AUTH_REFRESH_RATE_LIMIT_TTL 300000 ms -

CAPTCHA Configuration

Setting Env Variable Default Description
CAPTCHA Threshold LOGIN_CAPTCHA_AFTER_ATTEMPTS 3 Show CAPTCHA after N failed attempts
CAPTCHA Always On AUTH_CAPTCHA_ALWAYS_ON false Require CAPTCHA for all logins

Adjusting Auth Rate Limits

In Production (requires restart):

# Edit .env file
LOGIN_RATE_LIMIT_LIMIT=10        # Increase to 10 attempts
LOGIN_RATE_LIMIT_TTL=1800000     # Extend window to 30 minutes

# Restart backend
docker compose restart backend

Temporary Increase via Redis (immediate, no restart):

# Check current rate limit for a key
redis-cli GET "auth-login:<ip-hash>"

# Delete a rate limit record to allow immediate retry
redis-cli DEL "auth-login:<ip-hash>"

Global API Rate Limits

Configuration

Global rate limits are applied via the @RateLimit decorator:

@RateLimit({ limit: 100, ttl: 60 })  // 100 requests per minute
@Controller('invoices')
export class InvoicesController { ... }

Common Rate Limit Settings

Endpoint Limit TTL Notes
Invoices 100 60s High-traffic endpoint
Subscriptions 100 60s High-traffic endpoint
Catalog 200 60s Cached, higher limit
Orders 50 60s Write operations
Profile 60 60s Standard limit

Adjusting Global Rate Limits

Global rate limits are defined in code. To adjust:

  1. Modify the @RateLimit decorator in the controller
  2. Deploy the change
// Before
@RateLimit({ limit: 50, ttl: 60 })

// After (double the limit)
@RateLimit({ limit: 100, ttl: 60 })

External API Request Queues

WHMCS Queue Configuration

Setting Env Variable Default Description
Concurrency WHMCS_QUEUE_CONCURRENCY 15 Max parallel requests
Interval Cap WHMCS_QUEUE_INTERVAL_CAP 300 Max requests per minute
Timeout WHMCS_QUEUE_TIMEOUT_MS 30000 Request timeout (ms)

Salesforce Queue Configuration

Setting Env Variable Default Description
Standard Concurrency SF_QUEUE_CONCURRENCY 10 Standard operations
Long-Running Concurrency SF_LONG_RUNNING_CONCURRENCY 5 Bulk operations
Interval Cap SF_QUEUE_INTERVAL_CAP 200 Max requests per minute
Timeout SF_QUEUE_TIMEOUT_MS 30000 Request timeout (ms)

Adjusting Queue Limits

Production Adjustment:

# Edit .env file
WHMCS_QUEUE_CONCURRENCY=20      # Increase concurrent requests
WHMCS_QUEUE_INTERVAL_CAP=500    # Increase requests per minute

# Restart backend
docker compose restart backend

Queue Health Monitoring

# Check queue metrics
curl http://localhost:4000/health/queues | jq '.'

# Expected output:
{
  "whmcs": {
    "health": "healthy",
    "metrics": {
      "queueSize": 0,
      "pendingRequests": 2,
      "failedRequests": 0
    }
  },
  "salesforce": {
    "health": "healthy",
    "metrics": { ... },
    "dailyUsage": { "used": 5000, "limit": 15000 }
  }
}

SSE Connection Limits

Configuration

// Per-user SSE connection limit (in-memory)
private readonly maxPerUser = 3;

This prevents a single user from opening unlimited SSE connections.

Adjusting SSE Limits

This requires a code change in realtime-connection-limiter.service.ts:

// Change from
private readonly maxPerUser = 3;

// To
private readonly maxPerUser = 5;

Bypassing Rate Limits for Testing

Temporary Bypass via Redis

# Clear all rate limit keys for testing
redis-cli KEYS "auth-*" | xargs redis-cli DEL
redis-cli KEYS "rate-limit:*" | xargs redis-cli DEL

# Clear specific user's rate limit
redis-cli KEYS "*<ip-or-user-identifier>*" | xargs redis-cli DEL

Using SkipRateLimit Decorator

For development/testing routes:

@SkipRateLimit()
@Get('test-endpoint')
async testEndpoint() { ... }

Environment-Based Bypass

Add a development bypass in configuration:

# In .env (development only!)
RATE_LIMIT_BYPASS_ENABLED=true
// In guard
if (this.configService.get("RATE_LIMIT_BYPASS_ENABLED") === "true") {
  return true;
}

Warning

: Never enable bypass in production!


Signs of Rate Limit Issues

User-Facing Symptoms

Symptom Possible Cause Investigation
"Too many requests" errors Rate limit exceeded Check Redis keys, logs
Login failures Auth rate limit Check auth-login:* keys
Slow API responses Queue backlog Check /health/queues
429 errors in logs Any rate limit Check logs for specifics

Monitoring Indicators

Metric Warning Critical Action
429 error rate >1% >5% Review rate limits
Queue size >10 >50 Increase concurrency
Average wait time >1s >5s Scale or increase limits
CAPTCHA triggers Unusual spike - Possible attack

Log Analysis

# Find rate limit exceeded events
grep "Rate limit exceeded" /var/log/bff/combined.log | tail -20

# Find 429 responses
grep '"statusCode":429' /var/log/bff/combined.log | tail -20

# Count rate limit events by path
grep "Rate limit exceeded" /var/log/bff/combined.log | \
  jq -r '.path' | sort | uniq -c | sort -rn

Troubleshooting

Too Many 429 Errors

Diagnosis:

# Check which endpoints are rate limited
grep "Rate limit exceeded" /var/log/bff/combined.log | \
  jq '{path: .path, key: .key}' | head -20

# Check queue health
curl http://localhost:4000/health/queues

Resolution:

  1. Identify the affected endpoint
  2. Check if limit is appropriate for traffic
  3. Increase limit if legitimate traffic
  4. Add caching if requests are repetitive

Legitimate Users Being Blocked

Diagnosis:

# Check rate limit state for specific key
redis-cli KEYS "*<identifier>*"
redis-cli GET "auth-login:<hash>"

Resolution:

# Clear the user's rate limit record
redis-cli DEL "auth-login:<hash>"

External API Rate Limit Violations

WHMCS Rate Limiting:

# Check queue metrics
curl http://localhost:4000/health/queues/whmcs

# Reduce concurrency if WHMCS is overloaded
WHMCS_QUEUE_CONCURRENCY=5
WHMCS_QUEUE_INTERVAL_CAP=100

Salesforce API Limits:

# Check daily API usage
curl http://localhost:4000/health/queues/salesforce | jq '.dailyUsage'

# If approaching limit, reduce requests
# Consider caching more data

Redis Connection Issues

If rate limiting fails due to Redis:

# Check Redis connectivity
redis-cli PING

# The guard fails open on Redis errors (allows request)
# Check logs for "Rate limiter error - failing open"

Best Practices

Setting Rate Limits

  1. Start Conservative - Begin with lower limits, increase as needed
  2. Monitor Before Adjusting - Understand traffic patterns first
  3. Consider User Experience - Limits should rarely impact normal use
  4. Document Changes - Track why limits were adjusted

Rate Limit Strategies

Strategy Use Case Implementation
IP-based Anonymous endpoints Default behavior
User-based Authenticated endpoints Include user ID in key
Combined Sensitive endpoints IP + User-Agent hash
Tiered Different user classes Custom logic

Performance Considerations

  • Redis Latency - Keep Redis co-located with BFF
  • Key Expiration - Use TTL to prevent Redis bloat
  • Fail Open - Rate limiter allows requests if Redis fails
  • Logging - Log blocked requests for analysis

Rate Limit Response Headers

The BFF includes standard rate limit headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704110400
Retry-After: 60

Clients can use these to implement backoff.



Last Updated: December 2025