Enterprise Reinsurance — Cash Clearance
Full-stack serverless app for high-volume transaction clearance
Context
Designed and built a full-stack cash clearance application for Enterprise Reinsurance — React front-end, AWS-native serverless backend in TypeScript, Aurora PostgreSQL data layer. Users search large transaction sets — some returning 100K+ rows — and mark them cleared against a check. Reworked the backend from a synchronous path that timed out (3–5 minute queries) into a hybrid sync/async architecture with DynamoDB-cached search results, partitioned core tables, and concurrency-safe writes — bringing query latency under 5 seconds while serving heavy transaction volumes.
Constraints
- Regulated cash workflow — duplicate clearances against customer money would be an audit problem, not just a bug; correctness was a hard floor, not a target.
- AWS Lambda 6 MB synchronous response limit — a 100K+ row result set blew past it even after the rows themselves were cheap to fetch.
- No tolerance for a user staring at a spinner past five seconds — the workflow ran inside a finance team's daily routine, and a 15-minute Lambda timeout ceiling was meaningless if the user was already gone.
- $billions in active treaty exposure sat downstream of the clearance decisions — the wrong concurrency call would have shipped a correctness bug into a system where customer checks settle against the results.
- Two users could legitimately sit on the same treaty's open transactions at the same time — the design had to admit concurrent operators without serializing them on a database lock.
Architecture
Data Model
Transactions live in Aurora PostgreSQL as a flat ledger keyed by treaty, cedent, and accounting period — the natural partition dimensions, though we hadn't told Postgres that until the redesign. A single heavy treaty search can pull 100K+ rows across half a dozen cedents. DynamoDB stores the materialized result of each search keyed by a hash of user + filter clause + version token; the value is the full result row set plus precomputed server-side groupings — page ranges sized to fit under the Lambda 6 MB response limit. The version token on the cached snapshot doubles as the optimistic-concurrency contract for writes — it's not just a cache key, it's the workflow's source of truth for 'what set of rows this user is operating on'. The Aurora core tables are split (native partitioning) on treaty and accounting period so the first query — the one the cache can't help with — can skip whole partitions.
Key Sequence
- User submits a search filter (treaty, cedent, period range) from the React grid.
- API Gateway routes the request to the read-path Lambda, which computes a hash key from the user id, filter clause, and current version token.
- Lambda checks DynamoDB for the materialized result snapshot; on hit, it returns the requested page in single-digit milliseconds.
- On miss, Lambda runs the partitioned read against Aurora, materializes the full result plus server-side page groupings into DynamoDB under the hash key, and returns the first page.
- User scrolls, sorts, or re-filters — every subsequent interaction reads from DynamoDB against the same snapshot, never hitting Aurora.
- User clicks Mark Cleared on a batch — the write path Lambda validates the version token against the snapshot; if the underlying set has shifted since the snapshot, the write fails fast and the user re-runs the search.
- Heavy mark-cleared workflows fan out through Step Functions and EventBridge; the user's click returns immediately with a job handle and the UI subscribes to status.
What I owned
- Cut search query latency from 3–5 minutes to under 5 seconds — even for result sets exceeding 100K rows — via DynamoDB caching of search results and PostgreSQL core-table splitting for partitioned reads
- Designed a hybrid sync/async execution model in the Lambda backend — interactive flows stay responsive while heavy workloads run async
- Worked around AWS Lambda payload limits with paginated server-side groupings, streaming the UI grid through chunked requests instead of single-shot returns
- Built optimistic-concurrency safeguards so two users marking against the same transaction set cannot operate on stale data — preventing duplicate clearances
- Owned end-to-end as designer + builder: React front-end (sorting / filtering at 100K+ row scale), AWS-native serverless backend (Lambda + API Gateway + DynamoDB + Aurora PostgreSQL)
- Delivered as Lead Developer alongside the Reinsurance migration work, 2020 – 2024
Trade-offs
- Chose DynamoDB-cached materialized search results over scaling Aurora because the cost lived in the wrong layer — a bigger writer shaves seconds off a query that takes minutes; the cost is that the cache is now load-bearing for correctness, not just speed, and bugs in the snapshot semantics have to be fixed in DDB, not in the database.
- Picked the version token on the cached snapshot as the optimistic-concurrency contract for writes over a database-level lock or strict Aurora serialization; both alternatives were legible and safe, but wrong for the latency profile — the cost is that the DDB cache became the workflow's source of truth, and that move had to be named explicitly to the data team and the finance owner before commit.
- Split the core transaction tables with native Postgres partitioning on treaty and accounting period rather than chasing more indexes; the cost is a heavier write path for any cross-partition write, but write-heavy speculative indexing would have slowed every clearance write to fix the read.
- Built a hybrid sync/async execution model — sync API for interactive reads, Step Functions + EventBridge for heavy mark-cleared workflows — rather than pushing everything async; the cost is two execution paths in the same Lambda fleet, but the interactive path never waits on a write that doesn't have to be synchronous.
- Worked around the Lambda 6 MB response ceiling with precomputed server-side page groupings stored alongside the result snapshot rather than switching to a streaming transport; the cost is more state in DDB per search, but the UI stays a normal paginated grid instead of a streaming consumer.
What I'd change today
I'd ship the DynamoDB cache layer behind a feature flag from day one rather than after the redesign was the only path left — that would have let me prove the snapshot semantics on a slice of treaties before betting the whole workflow on them. I'd also write down the contract — 'the DDB cache is the source of truth for this workflow' — as a versioned architectural statement, not just a walkthrough with the data team and the finance owner. And I'd build the cache-eviction story earlier; we shipped with a generous TTL that worked in practice, but the principal move is to have a deliberate answer for invalidation, not a default that's just long enough not to matter. The blast-radius lesson cost us an unnecessary rollback dry-run I'd rather have skipped.