Musings
Long-form journal articles on the technical decisions behind the work, plus the behavioral stories behind them. Posted when there's something worth saying.
Journal
Long-formEvery LLM tutorial assumes you can send the data to the API. In regulated reinsurance that assumption breaks before the first prompt — most of the architecture lives in the boundary, not the model.
- Vector DB vs. structured store vs. live API: where each piece of a serverless RAG actually belongsMay 27, 2026
Most serverless RAG architectures fail not at the model but at the data layer — by trying to make the vector DB carry weight it was never designed for. The interesting decision is which piece of context belongs where, and the question that decides it is freshness.
Turning mainframe COBOL into AWS-native code with an LLM is easy to demo and hard to make repeatable. The architecture that makes it a factory rather than a party trick is in the pipeline shape, the per-stage model routing, and the one checkpoint that sits before the irreversible work.
Building /chat on my own site forced three decisions I usually get to defer on someone else's product: where the cost ceiling lives, what counts as a citation, and how to keep an embedding pipeline from breaking a deploy.
- Mainframe modernization without a re-platform freeze: a pattern from five years of rating-engine workMay 15, 2026
Every mainframe modernization plan dies on the same rock: the business can't tolerate a write freeze, and the engineers can't tolerate live dual-writes against a system they don't fully understand. The way out isn't faster cutover. It's separating the rules from the engine.
The query was correct. The database was sized. The bug was that the cost lived in the wrong layer — and the fix was a cache that became load-bearing for correctness, not just speed.