Assist_Design/docs/operations/release-procedures.md
barsa 90ab71b94d Update README.md to Enhance Documentation Clarity and Add New Sections
- Added a new section for Release Procedures, detailing deployment and rollback processes.
- Updated the System Operations section to include Monitoring Setup, Rate Limit Tuning, and Customer Data Management for improved operational guidance.
- Reformatted the table structure for better readability and consistency across documentation.
2025-12-23 16:08:15 +09:00

403 lines
10 KiB
Markdown

# Release and Deployment Procedures
This document covers pre-deployment checklists, deployment procedures, post-deployment verification, and rollback procedures for the Customer Portal.
---
## Deployment Overview
| Environment | Method | Script | Notes |
| ----------- | -------------- | ------------------ | ------------------------------------ |
| Development | Local | `pnpm dev` | Apps run locally, services in Docker |
| Production | Docker Compose | `pnpm prod:deploy` | Full containerized deployment |
| Updates | Docker Compose | `pnpm prod:update` | Zero-downtime application updates |
### Available Commands
```bash
pnpm prod:deploy # Full deployment (build + start + migrate)
pnpm prod:start # Start all production services
pnpm prod:stop # Stop all production services
pnpm prod:update # Zero-downtime update (rebuild and recreate apps)
pnpm prod:status # Show service status and health
pnpm prod:logs # Show service logs
pnpm prod:backup # Create database backup
pnpm prod:cleanup # Clean up old containers and images
```
---
## Pre-Deployment Checklist
### Code Review
- [ ] All changes have been reviewed and approved
- [ ] No console.log/console.error statements in production code
- [ ] No hardcoded secrets or credentials
- [ ] TypeScript compilation passes (`pnpm type-check`)
- [ ] Linting passes (`pnpm lint`)
- [ ] Tests pass (`pnpm test`)
### Environment Configuration
- [ ] All required environment variables are set in `.env`
- [ ] Database URL is correct for production
- [ ] Redis URL is correct for production
- [ ] External API credentials are valid (Salesforce, WHMCS, Freebit)
- [ ] CORS_ORIGIN matches production domain
- [ ] JWT_SECRET is secure and unique
**Required Environment Variables:**
```bash
DATABASE_URL # PostgreSQL connection string
REDIS_URL # Redis connection string
JWT_SECRET # Secure secret (min 32 chars)
POSTGRES_PASSWORD # Database password
CORS_ORIGIN # Frontend domain
NEXT_PUBLIC_API_BASE # BFF API URL
BFF_PORT # Backend port (usually 4000)
```
### Database Migration Check
- [ ] Review pending migrations (`npx prisma migrate status`)
- [ ] Test migrations on staging/local first
- [ ] Create database backup before applying migrations
- [ ] Prepare rollback SQL if migration is destructive
- [ ] Estimate migration duration for large tables
### Dependency Check
- [ ] Run security audit (`pnpm security:check`)
- [ ] No high/critical vulnerabilities
- [ ] All dependencies are at expected versions
- [ ] Lock file is up to date (`pnpm-lock.yaml`)
### Communication
- [ ] Notify team of deployment schedule
- [ ] Schedule during low-traffic window if possible
- [ ] Prepare customer communication if downtime expected
- [ ] Ensure on-call engineer is available
---
## Deployment Procedure
### Standard Deployment (First Time)
```bash
# 1. Create database backup (if updating existing system)
pnpm prod:backup
# 2. Full deployment
pnpm prod:deploy
```
This command:
1. Validates environment configuration
2. Builds production Docker images
3. Starts database and cache services
4. Waits for database readiness
5. Runs Prisma migrations
6. Starts frontend and backend services
7. Performs health checks
### Application Update (Zero-Downtime)
For updates that don't require database migrations:
```bash
# 1. Create database backup
pnpm prod:backup
# 2. Update applications
pnpm prod:update
```
This rebuilds and recreates frontend and backend containers without stopping the database.
### Database Migration Deployment
For deployments with schema changes:
```bash
# 1. Create database backup
pnpm prod:backup
# 2. Stop application to prevent writes during migration
pnpm prod:stop
# 3. Start only database
docker compose -f docker/prod/docker-compose.yml up -d database
# 4. Run migrations
docker compose -f docker/prod/docker-compose.yml run --rm backend pnpm db:migrate
# 5. Verify migration success
docker compose -f docker/prod/docker-compose.yml exec database psql -U portal -d portal_prod -c "SELECT * FROM _prisma_migrations ORDER BY finished_at DESC LIMIT 5;"
# 6. Start all services
pnpm prod:start
# 7. Verify application health
pnpm prod:status
```
---
## Post-Deployment Verification
### Immediate Checks (0-5 minutes)
- [ ] Health endpoints return `ok`
```bash
curl http://localhost:4000/health
curl http://localhost:3000/_health
```
- [ ] No error spikes in logs
```bash
pnpm prod:logs backend | grep -i error | tail -20
```
- [ ] Database migrations applied successfully
- [ ] Redis connectivity verified
### Functional Checks (5-15 minutes)
- [ ] User can log in to portal
- [ ] Dashboard loads correctly
- [ ] Invoice list displays
- [ ] Subscription list displays
- [ ] Catalog products load
### Integration Checks (15-30 minutes)
- [ ] Salesforce connectivity verified
```bash
curl http://localhost:4000/auth/health-check | jq '.services.salesforce'
```
- [ ] WHMCS connectivity verified
```bash
curl http://localhost:4000/auth/health-check | jq '.services.whmcs'
```
- [ ] Queue health verified
```bash
curl http://localhost:4000/health/queues
```
### Monitoring Checks
- [ ] Metrics are being collected
- [ ] No alert triggers from deployment
- [ ] Log aggregation is working
- [ ] Error rates are normal
---
## Rollback Procedures
### Application Rollback (No DB Changes)
If deployment fails without database changes:
```bash
# 1. Stop current deployment
pnpm prod:stop
# 2. Checkout previous version
git checkout <previous-tag-or-commit>
# 3. Rebuild and deploy
pnpm prod:deploy
```
### Application Rollback with Docker Images
If previous images are available:
```bash
# 1. Stop current services
pnpm prod:stop
# 2. Start with previous image tags
docker compose -f docker/prod/docker-compose.yml up -d \
--no-build \
-e BACKEND_IMAGE=portal-backend:previous \
-e FRONTEND_IMAGE=portal-frontend:previous
```
### Database Rollback
If database migration needs to be reverted:
**Option 1: Restore from Backup**
```bash
# 1. Stop application
pnpm prod:stop
# 2. Restore database
docker compose exec database psql -U portal -d portal_prod < backup_YYYYMMDD_HHMMSS.sql
# 3. Checkout previous code version
git checkout <previous-tag>
# 4. Rebuild and restart
pnpm prod:deploy
```
**Option 2: Manual Rollback SQL**
```bash
# 1. Stop application
pnpm prod:stop
# 2. Apply rollback script (if prepared)
docker compose exec database psql -U portal -d portal_prod < rollback_migration_YYYYMMDD.sql
# 3. Manually remove migration record
docker compose exec database psql -U portal -d portal_prod -c "DELETE FROM _prisma_migrations WHERE migration_name = '20240115_migration_name';"
# 4. Restart with previous code
git checkout <previous-tag>
pnpm prod:deploy
```
### Emergency Rollback
For critical failures requiring immediate action:
```bash
# 1. Immediately stop all services
pnpm prod:stop
# 2. Restore from most recent backup
docker compose exec database psql -U portal -d portal_prod < /path/to/latest_backup.sql
# 3. Deploy last known good version
git checkout <last-known-good-tag>
pnpm prod:deploy
# 4. Notify team
# Send incident notification
```
---
## Feature Flags
The portal does not currently use a formal feature flag system. Feature availability is controlled through:
1. **Environment Variables** - Toggle features via configuration
2. **Conditional Rendering** - Frontend checks for feature availability
3. **Backend Feature Checks** - API endpoints check configuration
### Adding a Feature Toggle
```typescript
// Backend: Check environment variable
const featureEnabled = this.configService.get("FEATURE_NEW_CHECKOUT", "false") === "true";
// Frontend: Check feature availability
if (process.env.NEXT_PUBLIC_FEATURE_NEW_CHECKOUT === "true") {
// Render new feature
}
```
### Emergency Feature Disable
To disable a feature without redeployment:
1. Update environment variable in `.env`
2. Restart affected services:
```bash
docker compose restart backend frontend
```
---
## Deployment Timeline Template
| Time | Action | Owner | Notes |
| ----- | ------------------------------- | ---------- | ------------------------- |
| T-24h | Announce deployment window | Tech Lead | Notify all stakeholders |
| T-2h | Final code review | Developers | Verify all changes merged |
| T-1h | Pre-deployment checklist | DevOps | Complete all checks |
| T-30m | Create backup | DevOps | Verify backup integrity |
| T-15m | Notify team deployment starting | DevOps | Slack/Teams message |
| T-0 | Execute deployment | DevOps | Run deployment commands |
| T+5m | Immediate verification | DevOps | Health checks |
| T+15m | Functional verification | QA/DevOps | Test key flows |
| T+30m | All-clear or rollback decision | Tech Lead | Confirm success |
| T+1h | Post-deployment monitoring | DevOps | Watch metrics |
| T+24h | Close deployment | Tech Lead | Final verification |
---
## Troubleshooting
### Build Failures
```bash
# Check Docker daemon
docker info
# Check disk space
df -h
# Clean Docker resources
docker system prune -a
```
### Migration Failures
```bash
# Check migration status
npx prisma migrate status
# View migration history
docker compose exec database psql -U portal -d portal_prod -c "SELECT * FROM _prisma_migrations;"
# Reset migration (development only!)
npx prisma migrate reset
```
### Service Startup Failures
```bash
# Check service logs
pnpm prod:logs backend
pnpm prod:logs frontend
# Check container status
docker compose ps -a
# Check resource usage
docker stats
```
### Database Connection Issues
```bash
# Test database connectivity
docker compose exec database pg_isready -U portal -d portal_prod
# Check connection count
docker compose exec database psql -U portal -d portal_prod -c "SELECT count(*) FROM pg_stat_activity;"
```
---
## Related Documents
- [Deployment Guide](../getting-started/deployment.md)
- [Database Operations](./database-operations.md)
- [Incident Response](./incident-response.md)
- [Monitoring Setup](./monitoring-setup.md)
---
**Last Updated:** December 2025