Assist_Design/docs/plans/2026-02-25-bff-domain-structure-cleanup-design.md
barsa cc8aa917c2 fix: update Salesforce connection login URL to use test environment
- Change default login URL from production to test Salesforce environment for safer development and testing.
2026-03-02 15:23:33 +09:00

11 KiB

BFF & Domain Structure Cleanup Design

Date: 2026-02-25 Status: Approved Approach: Incremental — 5 independent PRs, each shippable alone

Context

A thorough codebase audit of the BFF and domain packages identified structural improvements to make the codebase cleaner, more predictable, and more maintainable — aligned with enterprise standards (Stripe, Netflix, Airbnb patterns).

What's already strong: Controller patterns, logging (nestjs-pino), configuration (Zod-validated env), security (Helmet, CORS, rate limiting), error registry (82 domain error codes), domain schemas (schema-first Zod approach), queue patterns (BullMQ with idempotency).

What needs improvement: TypeScript strictness, database access abstraction, provider error handling, auth module organization, and domain schema validation gaps.

PR 1: Enable TypeScript Strict Mode

Problem

apps/bff/tsconfig.json has no strict, no noImplicitAny, no strictNullChecks. Type safety bugs pass silently through compilation.

Design

  1. Enable strict: true in the base tsconfig.node.json (affects both BFF and domain)
  2. Keep strictPropertyInitialization: false in apps/bff/tsconfig.json only (NestJS DI injects via decorators — properties are initialized but not in constructors)
  3. Fix all compilation errors surfaced — each one is a real bug the compiler was hiding

What strict enables

  • noImplicitAny — no silent any inference
  • strictNullChecksnull/undefined must be handled explicitly
  • strictBindCallApply — correct types for .bind(), .call(), .apply()
  • strictFunctionTypes — contravariant parameter checking
  • noImplicitThis — explicit this typing
  • alwaysStrict — emit "use strict" in all files

Risk

Medium — will surface potentially many errors. Each fix prevents a real bug.


PR 2: Repository + Unit of Work Layer

Problem

Raw prisma.entity.findUnique() calls scattered throughout services. Tightly coupled to Prisma API, hard to test and mock, no transactional boundaries.

Design

apps/bff/src/infra/database/
├── prisma.service.ts              # Existing — unchanged
├── unit-of-work.service.ts        # NEW — transaction coordinator
├── base.repository.ts             # NEW — generic typed CRUD base
└── repositories/
    ├── user.repository.ts         # Per-entity repositories
    ├── sim.repository.ts
    ├── subscription.repository.ts
    └── index.ts                   # Barrel exports

Base Repository

Generic class typed to the Prisma model delegate. Provides:

  • findById(id) — single record by ID
  • findOne(where) — single record by arbitrary criteria
  • findMany(where, options?) — filtered list with pagination
  • create(data) — insert new record
  • update(id, data) — update by ID
  • delete(id) — soft or hard delete by ID
  • count(where?) — count matching records

Each concrete repository (e.g., UserRepository) extends BaseRepository and can add domain-specific queries.

Unit of Work

Wraps prisma.$transaction() with interactive transactions. Provides a transactional Prisma client to repositories, ensuring atomicity across multiple entity operations.

class UnitOfWork {
  async execute<T>(fn: (tx: TransactionalClient) => Promise<T>): Promise<T> {
    return this.prisma.$transaction(fn);
  }
}

Repositories accept an optional tx parameter to participate in a transaction.

Migration Strategy

Services switch from:

this.prisma.user.findUnique({ where: { id } });

to:

this.userRepository.findById(id);

Change is mechanical and can be done file-by-file.

Risk

Low — additive change, then mechanical migration.


PR 3: Provider Error Codes + Typed Error Classes

Problem

Provider error detection uses brittle string matching. Example from order-fulfillment-error.service.ts: checks if error message includes "Payment method missing". Breaks when provider messages change.

Design — Two Layers

Layer 1: Provider Error Classes (Infrastructure)

apps/bff/src/integrations/common/errors/
├── base-provider.error.ts         # BaseProviderError (extends Error)
├── whmcs.errors.ts                # WhmcsApiError, WhmcsAuthError, WhmcsNotFoundError
├── salesforce.errors.ts           # SalesforceApiError, SalesforceQueryError
├── freebit.errors.ts              # FreebitApiError, FreebitTimeoutError
└── index.ts

BaseProviderError structure:

abstract class BaseProviderError extends Error {
  abstract readonly provider: "whmcs" | "salesforce" | "freebit";
  abstract readonly providerCode: string;
  abstract readonly domainErrorCode: ErrorCode;
  readonly providerMessage: string;
  readonly httpStatus: number;
}

Each concrete error class (e.g., WhmcsPaymentMethodMissingError) carries structured metadata and maps to a domain ErrorCode.

Layer 2: Provider Error Code Enums (Domain)

packages/domain/common/
├── errors.ts                      # Existing — add provider error mapping
├── provider-errors.ts             # NEW — provider-specific error codes

Provider error code enums define the known error conditions per provider:

export const WHMCS_ERROR = {
  PAYMENT_METHOD_MISSING: "WHMCS_PAYMENT_METHOD_MISSING",
  CLIENT_NOT_FOUND: "WHMCS_CLIENT_NOT_FOUND",
  INVOICE_NOT_FOUND: "WHMCS_INVOICE_NOT_FOUND",
  // ...
} as const;

Error Flow

  1. Integration service catches raw provider error
  2. Translates to typed error class: throw new WhmcsPaymentMethodMissingError(rawMessage)
  3. UnifiedExceptionFilter catches BaseProviderError subclass
  4. Maps domainErrorCode → API error response using existing error registry

Migration

Replace all instanceof Error + string checks with instanceof WhmcsNotFoundError etc.

Risk

Low — new code first, then migrate string checks one at a time.


PR 4: Auth Feature-Based Decomposition

Problem

Auth module has 48 providers in one module. Handles login, registration, password reset, OTP, sessions, and get-started flows in a single flat structure. Hard to scan, reason about, and modify independently.

Design — 6 Feature Modules

apps/bff/src/modules/auth/
├── auth.module.ts                  # Orchestrator — imports feature modules
├── auth.controller.ts              # Thin top-level router (if needed)
├── auth-orchestrator.service.ts    # Cross-feature coordination
│
├── login/
│   ├── login.module.ts
│   ├── login.controller.ts         # /auth/login routes
│   └── services/
│       ├── login.service.ts
│       └── login-otp-workflow.service.ts
│
├── get-started/
│   ├── get-started.module.ts
│   ├── get-started.controller.ts   # /auth/get-started routes
│   └── services/
│       ├── get-started-coordinator.service.ts
│       ├── get-started-session.service.ts
│       └── get-started-otp.service.ts
│
├── tokens/
│   ├── tokens.module.ts
│   └── services/
│       ├── auth-token.service.ts
│       ├── token-blacklist.service.ts
│       └── token-refresh.service.ts
│
├── otp/
│   ├── otp.module.ts
│   └── services/
│       ├── otp-generation.service.ts
│       └── otp-verification.service.ts
│
├── sessions/
│   ├── sessions.module.ts
│   └── services/
│       ├── session-manager.service.ts
│       └── trusted-device.service.ts
│
├── password-reset/
│   ├── password-reset.module.ts
│   ├── password-reset.controller.ts  # /auth/password-reset routes
│   └── services/
│
└── shared/
    ├── guards/                     # GlobalAuthGuard, PermissionsGuard
    ├── decorators/                 # @Public, @OptionalAuth
    └── services/                   # Shared step services

Key Principles

  • Each feature module owns its full vertical slice (controller, services)
  • tokens/ and otp/ are shared infrastructure modules imported by login and get-started
  • sessions/ manages session state and trusted devices
  • Top-level auth.module.ts becomes a thin orchestrator that imports feature modules and re-exports shared guards
  • NestJS RouterModule mounts feature controllers under /auth/* prefixes
  • Shared guards and decorators remain in shared/ and are exported globally

Migration Strategy

  1. Create the new module/directory structure
  2. Move existing services into their feature modules
  3. Update module imports/exports
  4. Verify all routes still work via integration tests
  5. Remove the old flat structure

Risk

Medium — restructure touches many files and NestJS DI wiring. Must verify all auth flows work after migration.


PR 5: Tighten Domain Schema Validation

Problem

support/schema.ts uses z.string() for status, priority, and category fields instead of the defined enum schemas. The mapper already transforms Japanese Salesforce values to English display labels before parsing, so the strict enums will work correctly.

Design

Tighten the schema:

// Before (loose):
status: z.string(),
priority: z.string(),
category: z.string().nullable(),

// After (strict):
status: supportCaseStatusSchema,
priority: supportCasePrioritySchema,
category: supportCaseCategorySchema.nullable(),

Also add JSDoc comments to intentional escape hatches:

  • customer/contract.ts interfaces with [key: string]: unknown
  • common/errors.ts with z.record(z.string(), z.unknown())

Behavior Change

If an unmapped Salesforce status arrives, supportCaseSchema.parse() will throw a Zod validation error instead of silently passing through a raw Japanese string. This is the correct behavior — unknown statuses should fail loudly so the mapping can be updated.

Risk

Low — small targeted change. The getStatusDisplayLabel() fallback (?? status) currently passes through unknown values silently, which masks configuration issues.


Execution Order

PRs are independent but this order minimizes friction:

  1. PR 1 (TypeScript strict) — surfaces hidden issues that may affect other PRs
  2. PR 5 (Schema tightening) — smallest, quick win
  3. PR 3 (Provider error codes) — new infrastructure, no migration yet
  4. PR 2 (Repository + UoW) — new infrastructure, then mechanical migration
  5. PR 4 (Auth decomposition) — largest change, do last when other patterns are stable

Out of Scope (Future Work)

These items were identified in the audit but deferred:

  • Testing infrastructure — zero test files exist; separate initiative needed
  • Circuit breaker pattern — for external service resilience
  • Request correlation across async operations — AsyncLocalStorage propagation
  • Database health checks — add to /health endpoint
  • DLQ visibility — expose failed queue jobs
  • API versioning strategy — document approach for breaking changes
  • Prometheus metrics export — replace in-memory queue metrics