- Implemented FormStep component for user input (name, email, address). - Created OtpStep component for OTP verification. - Developed SuccessStep component to display success messages based on account creation. - Introduced eligibility-check.store for managing state throughout the eligibility check process. - Added commitlint configuration for standardized commit messages. - Configured knip for workspace management and project structure.
408 lines
8.6 KiB
Markdown
408 lines
8.6 KiB
Markdown
# Database Operations Runbook
|
|
|
|
This document covers operational procedures for the PostgreSQL database used by the Customer Portal BFF.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
| Component | Technology | Location |
|
|
| --------------- | ------------------------- | ----------------------------- |
|
|
| Database | PostgreSQL 17 | Configured via `DATABASE_URL` |
|
|
| ORM | Prisma 6 | `apps/bff/prisma/` |
|
|
| Connection Pool | Prisma connection pooling | Default: 10 connections |
|
|
|
|
---
|
|
|
|
## Backup Procedures
|
|
|
|
### Automated Backups
|
|
|
|
> **Note**: Configure automated backups based on your hosting environment.
|
|
|
|
**Recommended Schedule:**
|
|
|
|
- Full backup: Daily at 02:00 UTC
|
|
- Transaction log backup: Every 15 minutes
|
|
- Retention: 30 days
|
|
|
|
### Manual Backup
|
|
|
|
```bash
|
|
# Create a full database backup
|
|
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).sql
|
|
|
|
# Create a compressed backup
|
|
pg_dump $DATABASE_URL | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
|
|
|
|
# Backup specific tables
|
|
pg_dump $DATABASE_URL -t users -t id_mappings > user_data_backup.sql
|
|
```
|
|
|
|
### Backup Verification
|
|
|
|
```bash
|
|
# Verify backup integrity (restore to temp database)
|
|
createdb portal_backup_test
|
|
psql portal_backup_test < backup_YYYYMMDD.sql
|
|
|
|
# Run basic integrity checks
|
|
psql portal_backup_test -c "SELECT COUNT(*) FROM users"
|
|
psql portal_backup_test -c "SELECT COUNT(*) FROM id_mappings"
|
|
|
|
# Clean up
|
|
dropdb portal_backup_test
|
|
```
|
|
|
|
---
|
|
|
|
## Recovery Procedures
|
|
|
|
### Point-in-Time Recovery
|
|
|
|
**Prerequisites:**
|
|
|
|
- WAL archiving enabled
|
|
- Continuous backup configured
|
|
|
|
```bash
|
|
# Stop the application
|
|
pnpm prod:stop
|
|
|
|
# Restore from backup
|
|
pg_restore -d $DATABASE_URL backup_YYYYMMDD.dump
|
|
|
|
# Run Prisma migrations to ensure schema is current
|
|
pnpm db:migrate
|
|
|
|
# Restart the application
|
|
pnpm prod:start
|
|
```
|
|
|
|
### Restore from SQL Backup
|
|
|
|
```bash
|
|
# Stop the application to prevent writes
|
|
pnpm prod:stop
|
|
|
|
# Drop and recreate database (DESTRUCTIVE)
|
|
dropdb portal_production
|
|
createdb portal_production
|
|
|
|
# Restore from backup
|
|
psql $DATABASE_URL < backup_YYYYMMDD.sql
|
|
|
|
# Verify restoration
|
|
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users"
|
|
|
|
# Restart application
|
|
pnpm prod:start
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Management
|
|
|
|
### Running Migrations
|
|
|
|
```bash
|
|
# Development: Apply pending migrations
|
|
pnpm db:migrate
|
|
|
|
# Production: Deploy migrations
|
|
pnpm db:migrate --skip-generate
|
|
|
|
# View migration status
|
|
pnpm exec prisma migrate status
|
|
```
|
|
|
|
### Migration Checklist
|
|
|
|
Before deploying migrations to production:
|
|
|
|
1. [ ] Test migration on staging environment
|
|
2. [ ] Verify rollback procedure exists
|
|
3. [ ] Estimate migration duration
|
|
4. [ ] Schedule maintenance window if needed
|
|
5. [ ] Create backup before migration
|
|
6. [ ] Notify team of deployment
|
|
|
|
### Rollback Procedure
|
|
|
|
Prisma does not have built-in rollback. Use these approaches:
|
|
|
|
**Option 1: Restore from Backup**
|
|
|
|
```bash
|
|
# Restore database to pre-migration state
|
|
psql $DATABASE_URL < pre_migration_backup.sql
|
|
|
|
# Revert migration files in codebase
|
|
git revert <migration-commit>
|
|
```
|
|
|
|
**Option 2: Manual Rollback SQL**
|
|
|
|
```bash
|
|
# Create rollback SQL for each migration
|
|
# Store in: apps/bff/prisma/rollbacks/
|
|
|
|
# Example rollback
|
|
psql $DATABASE_URL < rollbacks/20240115_rollback.sql
|
|
```
|
|
|
|
**Option 3: Reset and Reseed (Development Only)**
|
|
|
|
```bash
|
|
# WARNING: Destroys all data
|
|
pnpm db:reset
|
|
```
|
|
|
|
---
|
|
|
|
## ID Mappings Data Integrity
|
|
|
|
The `id_mappings` table links portal users to WHMCS and Salesforce accounts. Corruption here causes authentication and data access failures.
|
|
|
|
### Verify Mapping Integrity
|
|
|
|
```sql
|
|
-- Check for orphaned mappings (portal user deleted but mapping exists)
|
|
SELECT m.* FROM id_mappings m
|
|
LEFT JOIN users u ON m.user_id = u.id
|
|
WHERE u.id IS NULL;
|
|
|
|
-- Check for duplicate WHMCS mappings
|
|
SELECT whmcs_client_id, COUNT(*) as count
|
|
FROM id_mappings
|
|
WHERE whmcs_client_id IS NOT NULL
|
|
GROUP BY whmcs_client_id
|
|
HAVING COUNT(*) > 1;
|
|
|
|
-- Check for duplicate Salesforce mappings
|
|
SELECT sf_account_id, COUNT(*) as count
|
|
FROM id_mappings
|
|
WHERE sf_account_id IS NOT NULL
|
|
GROUP BY sf_account_id
|
|
HAVING COUNT(*) > 1;
|
|
```
|
|
|
|
### Fix Orphaned Mappings
|
|
|
|
```sql
|
|
-- Remove mappings for deleted users
|
|
DELETE FROM id_mappings
|
|
WHERE user_id NOT IN (SELECT id FROM users);
|
|
```
|
|
|
|
### Fix Duplicate Mappings
|
|
|
|
> **Warning**: Investigate duplicates before deleting. They may indicate data issues.
|
|
|
|
```sql
|
|
-- View duplicate details before fixing
|
|
SELECT m.*, u.email FROM id_mappings m
|
|
JOIN users u ON m.user_id = u.id
|
|
WHERE m.whmcs_client_id IN (
|
|
SELECT whmcs_client_id FROM id_mappings
|
|
GROUP BY whmcs_client_id HAVING COUNT(*) > 1
|
|
);
|
|
```
|
|
|
|
---
|
|
|
|
## PostgreSQL Maintenance
|
|
|
|
### VACUUM and ANALYZE
|
|
|
|
```sql
|
|
-- Analyze all tables for query optimization
|
|
ANALYZE;
|
|
|
|
-- Vacuum to reclaim space (non-blocking)
|
|
VACUUM;
|
|
|
|
-- Full vacuum (blocking, reclaims more space)
|
|
VACUUM FULL;
|
|
|
|
-- Vacuum specific table
|
|
VACUUM ANALYZE id_mappings;
|
|
```
|
|
|
|
**Recommended Schedule:**
|
|
|
|
- `VACUUM ANALYZE`: Daily during low-traffic hours
|
|
- `VACUUM FULL`: Monthly during maintenance window
|
|
|
|
### Index Maintenance
|
|
|
|
```sql
|
|
-- Check index usage
|
|
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
|
|
FROM pg_stat_user_indexes
|
|
ORDER BY idx_scan DESC;
|
|
|
|
-- Find unused indexes (candidates for removal)
|
|
SELECT schemaname, tablename, indexname
|
|
FROM pg_stat_user_indexes
|
|
WHERE idx_scan = 0;
|
|
|
|
-- Reindex a table
|
|
REINDEX TABLE id_mappings;
|
|
|
|
-- Reindex entire database (during maintenance window)
|
|
REINDEX DATABASE portal_production;
|
|
```
|
|
|
|
### Check Table Bloat
|
|
|
|
```sql
|
|
-- Estimate table bloat
|
|
SELECT
|
|
schemaname, tablename,
|
|
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as size,
|
|
n_dead_tup as dead_rows,
|
|
n_live_tup as live_rows,
|
|
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
|
|
FROM pg_stat_user_tables
|
|
ORDER BY n_dead_tup DESC;
|
|
```
|
|
|
|
---
|
|
|
|
## Connection Pool Monitoring
|
|
|
|
### Check Active Connections
|
|
|
|
```sql
|
|
-- Current connection count
|
|
SELECT COUNT(*) as connections FROM pg_stat_activity;
|
|
|
|
-- Connections by state
|
|
SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;
|
|
|
|
-- Connections by application
|
|
SELECT application_name, COUNT(*)
|
|
FROM pg_stat_activity
|
|
GROUP BY application_name;
|
|
|
|
-- Long-running queries (>5 minutes)
|
|
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
|
|
FROM pg_stat_activity
|
|
WHERE state = 'active'
|
|
AND now() - pg_stat_activity.query_start > interval '5 minutes';
|
|
```
|
|
|
|
### Kill Stuck Connections
|
|
|
|
```sql
|
|
-- Terminate a specific query
|
|
SELECT pg_terminate_backend(<pid>);
|
|
|
|
-- Terminate all connections except current
|
|
SELECT pg_terminate_backend(pid)
|
|
FROM pg_stat_activity
|
|
WHERE pid <> pg_backend_pid()
|
|
AND datname = current_database();
|
|
```
|
|
|
|
### Prisma Connection Pool Settings
|
|
|
|
Configure in `DATABASE_URL` query parameters:
|
|
|
|
```
|
|
postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=10
|
|
```
|
|
|
|
| Parameter | Default | Recommended |
|
|
| ------------------ | ------- | ------------------ |
|
|
| `connection_limit` | 10 | 10-20 per instance |
|
|
| `pool_timeout` | 10s | 10-30s |
|
|
|
|
---
|
|
|
|
## Monitoring Queries
|
|
|
|
### Database Size
|
|
|
|
```sql
|
|
-- Total database size
|
|
SELECT pg_size_pretty(pg_database_size(current_database()));
|
|
|
|
-- Size per table
|
|
SELECT
|
|
tablename,
|
|
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
|
|
FROM pg_tables
|
|
WHERE schemaname = 'public'
|
|
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
|
|
```
|
|
|
|
### Query Performance
|
|
|
|
```sql
|
|
-- Slowest queries (requires pg_stat_statements extension)
|
|
SELECT query, calls, mean_time, total_time
|
|
FROM pg_stat_statements
|
|
ORDER BY mean_time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
### Lock Monitoring
|
|
|
|
```sql
|
|
-- Check for locks
|
|
SELECT
|
|
pg_locks.pid,
|
|
pg_stat_activity.query,
|
|
pg_locks.mode,
|
|
pg_locks.granted
|
|
FROM pg_locks
|
|
JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
|
|
WHERE NOT pg_locks.granted;
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Procedures
|
|
|
|
### Database Unresponsive
|
|
|
|
1. Check PostgreSQL process status
|
|
2. Check disk space and memory
|
|
3. Kill long-running queries
|
|
4. Restart PostgreSQL if necessary
|
|
5. Check application connectivity after restart
|
|
|
|
### Disk Space Full
|
|
|
|
```bash
|
|
# Check disk usage
|
|
df -h
|
|
|
|
# Find large files in PostgreSQL data directory
|
|
du -sh /var/lib/postgresql/data/*
|
|
|
|
# Clear transaction logs (if WAL archiving is working)
|
|
# WARNING: Only if logs are properly archived
|
|
```
|
|
|
|
### Corruption Detected
|
|
|
|
1. **STOP** the application immediately
|
|
2. Do not attempt repairs without backup verification
|
|
3. Restore from last known good backup
|
|
4. Investigate root cause before resuming service
|
|
|
|
---
|
|
|
|
## Related Documents
|
|
|
|
- [Incident Response](./incident-response.md)
|
|
- [External Dependencies](./external-dependencies.md)
|
|
- [Provisioning Runbook](./provisioning-runbook.md)
|
|
|
|
---
|
|
|
|
**Last Updated:** December 2025
|