Assist_Design/docs/operations/database-operations.md

# Database Operations Runbook

This document covers operational procedures for the PostgreSQL database used by the Customer Portal BFF.

---

## Overview

| Component       | Technology                | Location                      |
| --------------- | ------------------------- | ----------------------------- |
| Database        | PostgreSQL 17             | Configured via `DATABASE_URL` |
| ORM             | Prisma 6                  | `apps/bff/prisma/`            |
| Connection Pool | Prisma connection pooling | Default: 10 connections       |

---

## Backup Procedures

### Automated Backups

> **Note**: Configure automated backups based on your hosting environment.

**Recommended Schedule:**

- Full backup: Daily at 02:00 UTC
- Transaction log backup: Every 15 minutes
- Retention: 30 days

### Manual Backup

```bash
# Create a full database backup
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).sql

# Create a compressed backup
pg_dump $DATABASE_URL | gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz

# Backup specific tables
pg_dump $DATABASE_URL -t users -t id_mappings > user_data_backup.sql
```

### Backup Verification

```bash
# Verify backup integrity (restore to temp database)
createdb portal_backup_test
psql portal_backup_test < backup_YYYYMMDD.sql

# Run basic integrity checks
psql portal_backup_test -c "SELECT COUNT(*) FROM users"
psql portal_backup_test -c "SELECT COUNT(*) FROM id_mappings"

# Clean up
dropdb portal_backup_test
```

---

## Recovery Procedures

### Point-in-Time Recovery

**Prerequisites:**

- WAL archiving enabled
- Continuous backup configured

```bash
# Stop the application
pnpm prod:stop

# Restore from backup
pg_restore -d $DATABASE_URL backup_YYYYMMDD.dump

# Run Prisma migrations to ensure schema is current
pnpm db:migrate

# Restart the application
pnpm prod:start
```

### Restore from SQL Backup

```bash
# Stop the application to prevent writes
pnpm prod:stop

# Drop and recreate database (DESTRUCTIVE)
dropdb portal_production
createdb portal_production

# Restore from backup
psql $DATABASE_URL < backup_YYYYMMDD.sql

# Verify restoration
psql $DATABASE_URL -c "SELECT COUNT(*) FROM users"

# Restart application
pnpm prod:start
```

---

## Migration Management

### Running Migrations

```bash
# Development: Apply pending migrations
pnpm db:migrate

# Production: Deploy migrations
pnpm db:migrate --skip-generate

# View migration status
npx prisma migrate status
```

### Migration Checklist

Before deploying migrations to production:

1. [ ] Test migration on staging environment
2. [ ] Verify rollback procedure exists
3. [ ] Estimate migration duration
4. [ ] Schedule maintenance window if needed
5. [ ] Create backup before migration
6. [ ] Notify team of deployment

### Rollback Procedure

Prisma does not have built-in rollback. Use these approaches:

**Option 1: Restore from Backup**

```bash
# Restore database to pre-migration state
psql $DATABASE_URL < pre_migration_backup.sql

# Revert migration files in codebase
git revert <migration-commit>
```

**Option 2: Manual Rollback SQL**

```bash
# Create rollback SQL for each migration
# Store in: apps/bff/prisma/rollbacks/

# Example rollback
psql $DATABASE_URL < rollbacks/20240115_rollback.sql
```

**Option 3: Reset and Reseed (Development Only)**

```bash
# WARNING: Destroys all data
pnpm db:reset
```

---

## ID Mappings Data Integrity

The `id_mappings` table links portal users to WHMCS and Salesforce accounts. Corruption here causes authentication and data access failures.

### Verify Mapping Integrity

```sql
-- Check for orphaned mappings (portal user deleted but mapping exists)
SELECT m.* FROM id_mappings m
LEFT JOIN users u ON m.user_id = u.id
WHERE u.id IS NULL;

-- Check for duplicate WHMCS mappings
SELECT whmcs_client_id, COUNT(*) as count
FROM id_mappings
WHERE whmcs_client_id IS NOT NULL
GROUP BY whmcs_client_id
HAVING COUNT(*) > 1;

-- Check for duplicate Salesforce mappings
SELECT sf_account_id, COUNT(*) as count
FROM id_mappings
WHERE sf_account_id IS NOT NULL
GROUP BY sf_account_id
HAVING COUNT(*) > 1;
```

### Fix Orphaned Mappings

```sql
-- Remove mappings for deleted users
DELETE FROM id_mappings
WHERE user_id NOT IN (SELECT id FROM users);
```

### Fix Duplicate Mappings

> **Warning**: Investigate duplicates before deleting. They may indicate data issues.

```sql
-- View duplicate details before fixing
SELECT m.*, u.email FROM id_mappings m
JOIN users u ON m.user_id = u.id
WHERE m.whmcs_client_id IN (
  SELECT whmcs_client_id FROM id_mappings
  GROUP BY whmcs_client_id HAVING COUNT(*) > 1
);
```

---

## PostgreSQL Maintenance

### VACUUM and ANALYZE

```sql
-- Analyze all tables for query optimization
ANALYZE;

-- Vacuum to reclaim space (non-blocking)
VACUUM;

-- Full vacuum (blocking, reclaims more space)
VACUUM FULL;

-- Vacuum specific table
VACUUM ANALYZE id_mappings;
```

**Recommended Schedule:**

- `VACUUM ANALYZE`: Daily during low-traffic hours
- `VACUUM FULL`: Monthly during maintenance window

### Index Maintenance

```sql
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

-- Reindex a table
REINDEX TABLE id_mappings;

-- Reindex entire database (during maintenance window)
REINDEX DATABASE portal_production;
```

### Check Table Bloat

```sql
-- Estimate table bloat
SELECT
  schemaname, tablename,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as size,
  n_dead_tup as dead_rows,
  n_live_tup as live_rows,
  ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
```

---

## Connection Pool Monitoring

### Check Active Connections

```sql
-- Current connection count
SELECT COUNT(*) as connections FROM pg_stat_activity;

-- Connections by state
SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;

-- Connections by application
SELECT application_name, COUNT(*)
FROM pg_stat_activity
GROUP BY application_name;

-- Long-running queries (>5 minutes)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
  AND now() - pg_stat_activity.query_start > interval '5 minutes';
```

### Kill Stuck Connections

```sql
-- Terminate a specific query
SELECT pg_terminate_backend(<pid>);

-- Terminate all connections except current
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
  AND datname = current_database();
```

### Prisma Connection Pool Settings

Configure in `DATABASE_URL` query parameters:

```
postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=10
```

| Parameter          | Default | Recommended        |
| ------------------ | ------- | ------------------ |
| `connection_limit` | 10      | 10-20 per instance |
| `pool_timeout`     | 10s     | 10-30s             |

---

## Monitoring Queries

### Database Size

```sql
-- Total database size
SELECT pg_size_pretty(pg_database_size(current_database()));

-- Size per table
SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
```

### Query Performance

```sql
-- Slowest queries (requires pg_stat_statements extension)
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
```

### Lock Monitoring

```sql
-- Check for locks
SELECT
  pg_locks.pid,
  pg_stat_activity.query,
  pg_locks.mode,
  pg_locks.granted
FROM pg_locks
JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE NOT pg_locks.granted;
```

---

## Emergency Procedures

### Database Unresponsive

1. Check PostgreSQL process status
2. Check disk space and memory
3. Kill long-running queries
4. Restart PostgreSQL if necessary
5. Check application connectivity after restart

### Disk Space Full

```bash
# Check disk usage
df -h

# Find large files in PostgreSQL data directory
du -sh /var/lib/postgresql/data/*

# Clear transaction logs (if WAL archiving is working)
# WARNING: Only if logs are properly archived
```

### Corruption Detected

1. **STOP** the application immediately
2. Do not attempt repairs without backup verification
3. Restore from last known good backup
4. Investigate root cause before resuming service

---

## Related Documents

- [Incident Response](./incident-response.md)
- [External Dependencies](./external-dependencies.md)
- [Provisioning Runbook](./provisioning-runbook.md)

---

**Last Updated:** December 2025
Enhance Documentation Structure and Update Operational Runbooks - Added a new section for operational runbooks in README.md, detailing procedures for incident response, database operations, and queue management. - Updated the documentation structure in STRUCTURE.md to reflect the new organization of guides and resources. - Removed the deprecated disabled-modules.md file to streamline documentation. - Enhanced the _archive/README.md with historical notes on documentation alignment and corrections made in December 2025. - Updated various references in the documentation to reflect the new paths and services in the integrations directory. 2025-12-23 15:55:58 +09:00			`# Database Operations Runbook`

			`This document covers operational procedures for the PostgreSQL database used by the Customer Portal BFF.`

			`---`

			`## Overview`

			`\| Component \| Technology \| Location \|`
			`\| --------------- \| ------------------------- \| ----------------------------- \|`
			\| Database \| PostgreSQL 17 \| Configured via `DATABASE_URL` \|
			\| ORM \| Prisma 6 \| `apps/bff/prisma/` \|
			`\| Connection Pool \| Prisma connection pooling \| Default: 10 connections \|`

			`---`

			`## Backup Procedures`

			`### Automated Backups`

			`> Note: Configure automated backups based on your hosting environment.`

			`Recommended Schedule:`

			`- Full backup: Daily at 02:00 UTC`
			`- Transaction log backup: Every 15 minutes`
			`- Retention: 30 days`

			`### Manual Backup`

			```bash
			`# Create a full database backup`
			`pg_dump $DATABASE_URL > backup_$(date +%Y%m%d_%H%M%S).sql`

			`# Create a compressed backup`
			`pg_dump $DATABASE_URL \| gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz`

			`# Backup specific tables`
			`pg_dump $DATABASE_URL -t users -t id_mappings > user_data_backup.sql`
			```

			`### Backup Verification`

			```bash
			`# Verify backup integrity (restore to temp database)`
			`createdb portal_backup_test`
			`psql portal_backup_test < backup_YYYYMMDD.sql`

			`# Run basic integrity checks`
			`psql portal_backup_test -c "SELECT COUNT(*) FROM users"`
			`psql portal_backup_test -c "SELECT COUNT(*) FROM id_mappings"`

			`# Clean up`
			`dropdb portal_backup_test`
			```

			`---`

			`## Recovery Procedures`

			`### Point-in-Time Recovery`

			`Prerequisites:`

			`- WAL archiving enabled`
			`- Continuous backup configured`

			```bash
			`# Stop the application`
			`pnpm prod:stop`

			`# Restore from backup`
			`pg_restore -d $DATABASE_URL backup_YYYYMMDD.dump`

			`# Run Prisma migrations to ensure schema is current`
			`pnpm db:migrate`

			`# Restart the application`
			`pnpm prod:start`
			```

			`### Restore from SQL Backup`

			```bash
			`# Stop the application to prevent writes`
			`pnpm prod:stop`

			`# Drop and recreate database (DESTRUCTIVE)`
			`dropdb portal_production`
			`createdb portal_production`

			`# Restore from backup`
			`psql $DATABASE_URL < backup_YYYYMMDD.sql`

			`# Verify restoration`
			`psql $DATABASE_URL -c "SELECT COUNT(*) FROM users"`

			`# Restart application`
			`pnpm prod:start`
			```

			`---`

			`## Migration Management`

			`### Running Migrations`

			```bash
			`# Development: Apply pending migrations`
			`pnpm db:migrate`

			`# Production: Deploy migrations`
			`pnpm db:migrate --skip-generate`

			`# View migration status`
			`npx prisma migrate status`
			```

			`### Migration Checklist`

			`Before deploying migrations to production:`

			`1. [ ] Test migration on staging environment`
			`2. [ ] Verify rollback procedure exists`
			`3. [ ] Estimate migration duration`
			`4. [ ] Schedule maintenance window if needed`
			`5. [ ] Create backup before migration`
			`6. [ ] Notify team of deployment`

			`### Rollback Procedure`

			`Prisma does not have built-in rollback. Use these approaches:`

			`Option 1: Restore from Backup`

			```bash
			`# Restore database to pre-migration state`
			`psql $DATABASE_URL < pre_migration_backup.sql`

			`# Revert migration files in codebase`
			`git revert <migration-commit>`
			```

			`Option 2: Manual Rollback SQL`

			```bash
			`# Create rollback SQL for each migration`
			`# Store in: apps/bff/prisma/rollbacks/`

			`# Example rollback`
			`psql $DATABASE_URL < rollbacks/20240115_rollback.sql`
			```

			`Option 3: Reset and Reseed (Development Only)`

			```bash
			`# WARNING: Destroys all data`
			`pnpm db:reset`
			```

			`---`

			`## ID Mappings Data Integrity`

			The `id_mappings` table links portal users to WHMCS and Salesforce accounts. Corruption here causes authentication and data access failures.

			`### Verify Mapping Integrity`

			```sql
			`-- Check for orphaned mappings (portal user deleted but mapping exists)`
			`SELECT m.* FROM id_mappings m`
			`LEFT JOIN users u ON m.user_id = u.id`
			`WHERE u.id IS NULL;`

			`-- Check for duplicate WHMCS mappings`
			`SELECT whmcs_client_id, COUNT(*) as count`
			`FROM id_mappings`
			`WHERE whmcs_client_id IS NOT NULL`
			`GROUP BY whmcs_client_id`
			`HAVING COUNT(*) > 1;`

			`-- Check for duplicate Salesforce mappings`
			`SELECT sf_account_id, COUNT(*) as count`
			`FROM id_mappings`
			`WHERE sf_account_id IS NOT NULL`
			`GROUP BY sf_account_id`
			`HAVING COUNT(*) > 1;`
			```

			`### Fix Orphaned Mappings`

			```sql
			`-- Remove mappings for deleted users`
			`DELETE FROM id_mappings`
			`WHERE user_id NOT IN (SELECT id FROM users);`
			```

			`### Fix Duplicate Mappings`

			`> Warning: Investigate duplicates before deleting. They may indicate data issues.`

			```sql
			`-- View duplicate details before fixing`
			`SELECT m.*, u.email FROM id_mappings m`
			`JOIN users u ON m.user_id = u.id`
			`WHERE m.whmcs_client_id IN (`
			`SELECT whmcs_client_id FROM id_mappings`
			`GROUP BY whmcs_client_id HAVING COUNT(*) > 1`
			`);`
			```

			`---`

			`## PostgreSQL Maintenance`

			`### VACUUM and ANALYZE`

			```sql
			`-- Analyze all tables for query optimization`
			`ANALYZE;`

			`-- Vacuum to reclaim space (non-blocking)`
			`VACUUM;`

			`-- Full vacuum (blocking, reclaims more space)`
			`VACUUM FULL;`

			`-- Vacuum specific table`
			`VACUUM ANALYZE id_mappings;`
			```

			`Recommended Schedule:`

			- `VACUUM ANALYZE`: Daily during low-traffic hours
			- `VACUUM FULL`: Monthly during maintenance window

			`### Index Maintenance`

			```sql
			`-- Check index usage`
			`SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read`
			`FROM pg_stat_user_indexes`
			`ORDER BY idx_scan DESC;`

			`-- Find unused indexes (candidates for removal)`
			`SELECT schemaname, tablename, indexname`
			`FROM pg_stat_user_indexes`
			`WHERE idx_scan = 0;`

			`-- Reindex a table`
			`REINDEX TABLE id_mappings;`

			`-- Reindex entire database (during maintenance window)`
			`REINDEX DATABASE portal_production;`
			```

			`### Check Table Bloat`

			```sql
			`-- Estimate table bloat`
			`SELECT`
			`schemaname, tablename,`
			`pg_size_pretty(pg_relation_size(schemaname \|\| '.' \|\| tablename)) as size,`
			`n_dead_tup as dead_rows,`
			`n_live_tup as live_rows,`
			`ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct`
			`FROM pg_stat_user_tables`
			`ORDER BY n_dead_tup DESC;`
			```

			`---`

			`## Connection Pool Monitoring`

			`### Check Active Connections`

			```sql
			`-- Current connection count`
			`SELECT COUNT(*) as connections FROM pg_stat_activity;`

			`-- Connections by state`
			`SELECT state, COUNT(*) FROM pg_stat_activity GROUP BY state;`

			`-- Connections by application`
			`SELECT application_name, COUNT(*)`
			`FROM pg_stat_activity`
			`GROUP BY application_name;`

			`-- Long-running queries (>5 minutes)`
			`SELECT pid, now() - pg_stat_activity.query_start AS duration, query`
			`FROM pg_stat_activity`
			`WHERE state = 'active'`
			`AND now() - pg_stat_activity.query_start > interval '5 minutes';`
			```

			`### Kill Stuck Connections`

			```sql
			`-- Terminate a specific query`
			`SELECT pg_terminate_backend(<pid>);`

			`-- Terminate all connections except current`
			`SELECT pg_terminate_backend(pid)`
			`FROM pg_stat_activity`
			`WHERE pid <> pg_backend_pid()`
			`AND datname = current_database();`
			```

			`### Prisma Connection Pool Settings`

			Configure in `DATABASE_URL` query parameters:

			```
			`postgresql://user:pass@host:5432/db?connection_limit=10&pool_timeout=10`
			```

			`\| Parameter \| Default \| Recommended \|`
			`\| ------------------ \| ------- \| ------------------ \|`
			\| `connection_limit` \| 10 \| 10-20 per instance \|
			\| `pool_timeout` \| 10s \| 10-30s \|

			`---`

			`## Monitoring Queries`

			`### Database Size`

			```sql
			`-- Total database size`
			`SELECT pg_size_pretty(pg_database_size(current_database()));`

			`-- Size per table`
			`SELECT`
			`tablename,`
			`pg_size_pretty(pg_total_relation_size(schemaname \|\| '.' \|\| tablename)) as total_size`
			`FROM pg_tables`
			`WHERE schemaname = 'public'`
			`ORDER BY pg_total_relation_size(schemaname \|\| '.' \|\| tablename) DESC;`
			```

			`### Query Performance`

			```sql
			`-- Slowest queries (requires pg_stat_statements extension)`
			`SELECT query, calls, mean_time, total_time`
			`FROM pg_stat_statements`
			`ORDER BY mean_time DESC`
			`LIMIT 10;`
			```

			`### Lock Monitoring`

			```sql
			`-- Check for locks`
			`SELECT`
			`pg_locks.pid,`
			`pg_stat_activity.query,`
			`pg_locks.mode,`
			`pg_locks.granted`
			`FROM pg_locks`
			`JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid`
			`WHERE NOT pg_locks.granted;`
			```

			`---`

			`## Emergency Procedures`

			`### Database Unresponsive`

			`1. Check PostgreSQL process status`
			`2. Check disk space and memory`
			`3. Kill long-running queries`
			`4. Restart PostgreSQL if necessary`
			`5. Check application connectivity after restart`

			`### Disk Space Full`

			```bash
			`# Check disk usage`
			`df -h`

			`# Find large files in PostgreSQL data directory`
			`du -sh /var/lib/postgresql/data/*`

			`# Clear transaction logs (if WAL archiving is working)`
			`# WARNING: Only if logs are properly archived`
			```

			`### Corruption Detected`

			`1. STOP the application immediately`
			`2. Do not attempt repairs without backup verification`
			`3. Restore from last known good backup`
			`4. Investigate root cause before resuming service`

			`---`

			`## Related Documents`

			`- [Incident Response](./incident-response.md)`
			`- [External Dependencies](./external-dependencies.md)`
			`- [Provisioning Runbook](./provisioning-runbook.md)`

			`---`

			`Last Updated: December 2025`