Architecture · Decision log

Decisions

The architectural calls I've made on real production systems — closed ones for the audit trail, open ones for the calls I'm still deciding. Each entry: Context, Decision, Consequences, Status. A living document, not a portfolio piece.

ADR-006 · May 15, 2026
Open
Whether to adopt LLM-driven COBOL/PL-I translation as the default mainframe-sunset approach
A wave of US carriers and banks is staring at 2030 mainframe sunset dates with COBOL and PL/I codebases that nobody in the building fully understands. The pattern that worked for me on the rating-engine re-platform — engineer pairs with SME, translates one rule at a time, reconciles cent-for-cent — is 12–18 months of senior-engineer time per program. LLMs can now produce a credible first-draft translation of COBOL or PL/I procedures into modern code in minutes. The open question is whether that compresses the 12–18 month tail into something materially shorter without breaking the human-in-the-loop verification that's the actual load-bearing part of the pattern.
ADR-005 · Jan 20, 2021
Closed
Split the mainframe migration into two pipelines — DMS for historical, Glue for ongoing
The Enterprise Reinsurance migration needed to move 23 mainframe DB2-LUW tables to Aurora PostgreSQL once (the historical lift) and then keep transforming them continuously into a reporting layer feeding ~50 treaty and finance consumers at cadences from 15 minutes to weekly. The naive plan was to run everything through AWS DMS — same tool for the one-time load and the ongoing replication, fewer moving parts on the slide deck. But DMS's ongoing-replication model would have anchored the reporting layer to DMS forever, and the reporting layer needed transformation logic DMS isn't built for.
ADR-004 · Sep 1, 2009
Closed
Translate rating rules side-by-side with actuaries instead of waiting for a consolidated spec
A US personal-lines carrier had committed to retiring a PL/I + IMS-DB rating engine, but no one could produce a complete list of the rules it executed. They lived as 100+ hardcoded procedures over IMS-DB lookups, authored over two decades by actuaries who had since rotated out. The obvious move — request a spec, wait for one, escalate when it doesn't arrive — would have killed the project on the same rock every mainframe re-platform dies on. The rules system and execution system were fused at runtime, which is the actual reason re-platforms fail.
ADR-003 · Aug 22, 2022
Closed
Use the cached snapshot's version token as the optimistic-concurrency contract for writes
After the DynamoDB cache cut Cash Clearance search latency from 3–5 minutes to under 5 seconds, a second problem surfaced: two users could sit on the same treaty's open transactions and both start marking cleared. Without coordination, duplicate clearances would go through — finance lead angry, audit problem, real customer money out the door twice. The conservative answer was a database-level lock or strict serialization on Aurora; both legible, both safe, both wrong for the latency profile we'd just bought.
ADR-002 · Jun 10, 2022
Closed
Materialize Cash Clearance search results in DynamoDB rather than scale Aurora
The first search on a heavy treaty in Cash Clearance took 3–5 minutes; Lambda timed out before the response made it back. The query plan was already optimal for the schema, the hot path was already indexed, and the writer wasn't CPU-bound. Every outside reviewer said scale Aurora, raise the Lambda timeout, or add more indexes — each would have shaved seconds off a query that took minutes. The real bug was that every paged scroll, every sort flip, every re-filter re-ran the same expensive scan against the live tables. The cost lived in the wrong layer.
ADR-001 · Mar 15, 2021
Closed
Use 3 metadata-driven Glue jobs instead of ~69 per-table jobs
The Enterprise Reinsurance migration had to move 23 mainframe DB2-LUW tables to Aurora PostgreSQL and stand up a reporting layer feeding ~50 treaty and finance consumers. The first sketch — the design every senior engineer would have drawn — was one Glue job per table, three stages each, for ~69 jobs total. Familiar shape, easy to assign, easy to defend. I spent close to two weeks staffing and sketching toward it before I noticed I'd said the sentence "yeah, it'll be a lot to operate, but that's the cost" four times in one week. The tables didn't differ in shape that mattered to the pipeline; the transformations were parameterizable.

Decisions

Whether to adopt LLM-driven COBOL/PL-I translation as the default mainframe-sunset approach

Split the mainframe migration into two pipelines — DMS for historical, Glue for ongoing

Translate rating rules side-by-side with actuaries instead of waiting for a consolidated spec

Use the cached snapshot's version token as the optimistic-concurrency contract for writes

Materialize Cash Clearance search results in DynamoDB rather than scale Aurora

Use 3 metadata-driven Glue jobs instead of ~69 per-table jobs