Three minutes to five seconds: where the cache belongs in a serverless reinsurance app

The query was correct. The database was sized. The bug was that the cost lived in the wrong layer — and the fix was a cache that became load-bearing for correctness, not just speed.

Cash Clearance is the part of the reinsurance workflow where someone in finance takes a check that just landed and matches it against the open transactions it's meant to settle. Sometimes that's twenty rows. Sometimes — when the check covers a quarterly treaty settlement — it's more than 100,000 rows, spread across a half dozen cedents. The user's job is to filter that set down, scroll it, and mark transactions cleared in batches.

When I picked up the app, the first search on a heavy treaty took three to five minutes. Lambda would time out before the response made it back. The UI was unusable on exactly the workflows it had been built for.

The wrong answers

The obvious moves don't fix this, and it's worth saying why, because each one is what an outside reviewer suggests first.

Scale up Aurora. The query plan was already optimal for the schema; we weren't CPU-bound on a four-core writer. A bigger instance shaves seconds off a query that takes minutes.

Raise the Lambda timeout. The 15-minute ceiling is there, but a user staring at a spinner past five seconds is already gone — correctness isn't the problem, interactivity is.

Add more indexes. The hot path was already covered. Adding indexes to a write-heavy core table on every speculative filter combination would have slowed every clearance write to fix the read.

The bug wasn't the query. The bug was that every paged scroll, every sort flip, every re-filter re-ran the same expensive scan against the live tables. The cost lived in the wrong layer.

Three changes, in order of leverage

1. Cache the search result set in DynamoDB, keyed by the filter signature. Not the underlying rows — the materialised result of this user's search. The first query still hits Aurora, but it lands in DynamoDB with a deterministic key (a hash of the user, the filter clause, and a version token). Every subsequent page, sort, or scroll-restore reads from DDB in single-digit milliseconds. The expensive query happens once per logical search, not once per interaction.

2. Split the core transaction tables for partitioned reads. The workload was already partitionable by treaty and accounting period — we just hadn't told Postgres. Native partitioning on the dimensions the filters actually used let the planner skip whole partitions on most searches. This is what made the first query — the one the cache doesn't help with — finish in seconds instead of minutes.

3. Hybrid sync/async in the Lambda backend. Interactive reads stay on the sync API path: API Gateway → Lambda → DDB cache → response. The heavy mark-cleared workflows, which can touch thousands of rows at a time, fan out through Step Functions and EventBridge. The user's click returns immediately with a job handle; the UI subscribes to status. Nothing on the interactive path waits on a write that doesn't need to be synchronous.

The Lambda payload twist

Once the queries were fast, a different ceiling showed up. A 100,000- row result set, even trimmed to display columns, blows past Lambda's 6 MB synchronous response limit. The cache made the rows cheap to fetch; the response shape was now the bottleneck.

The fix piggybacks on the cache. The first search writes the full result into DDB along with server-side groupings — precomputed page ranges sized to fit under the payload ceiling. The UI pulls chunks on demand. The grid feels like it's streaming because, functionally, it is.

The cache is load-bearing for correctness, not just speed

Two users can sit on the same treaty's open transactions and both start marking cleared. Without coordination, you get double clearances and an angry finance lead. The version token on the cached search snapshot doubles as an optimistic-concurrency check on every write — if the underlying set has shifted since the snapshot, the write fails fast and the user re-runs the search. The DDB cache stopped being “a performance optimisation” once it became the place we read the truth from. It's where the contract lives.

What I'd take from this

It's tempting to read latency wins as database tuning stories. This one isn't. The Aurora work mattered, but the move that bought the 3–5 min → <5 sec jump was deciding that the right boundary between an interactive UI and a serverless backend is a cache layer that owns the search snapshot — not a faster database underneath the same hot path.

The pattern generalises. Anywhere you've got a sync API in front of a query that's expensive to repeat and a UI that wants to scroll, sort, and re-filter against the same logical set, the cache doesn't belong below the database — it belongs above it, in the seam between the interaction and the storage.