Raghunath Manyam
ADR-002 · June 10, 2022Closed

Materialize Cash Clearance search results in DynamoDB rather than scale Aurora

Context

The first search on a heavy treaty in Cash Clearance took 3–5 minutes; Lambda timed out before the response made it back. The query plan was already optimal for the schema, the hot path was already indexed, and the writer wasn't CPU-bound. Every outside reviewer said scale Aurora, raise the Lambda timeout, or add more indexes — each would have shaved seconds off a query that took minutes. The real bug was that every paged scroll, every sort flip, every re-filter re-ran the same expensive scan against the live tables. The cost lived in the wrong layer.

Decision

Cache the materialized result set in DynamoDB, keyed by a hash of user + filter clause + version token — not the underlying rows, the materialized result of this user's search. The first query still hits Aurora; every subsequent page, sort, or scroll-restore reads from DDB in single-digit milliseconds. Split the Aurora core tables with native partitioning on treaty and accounting period so the first query — the one the cache can't help with — finishes in seconds.

Consequences

  • Search latency went from 3–5 minutes to under 5 seconds on 100K+ row queries.
  • The expensive Aurora query now happens once per logical search, not once per interaction — Aurora load dropped by orders of magnitude.
  • The DDB cache became load-bearing for the workflow, not just an optimization — bugs in snapshot semantics have to be fixed in DDB, not in the database.
  • The pattern generalizes for every sync-API-over-expensive-query problem in the portfolio: the cache doesn't belong below the database, it belongs above it, in the seam between the interaction and the storage.

Related