Raghunath Manyam
ADR-001 · March 15, 2021Closed

Use 3 metadata-driven Glue jobs instead of ~69 per-table jobs

Context

The Enterprise Reinsurance migration had to move 23 mainframe DB2-LUW tables to Aurora PostgreSQL and stand up a reporting layer feeding ~50 treaty and finance consumers. The first sketch — the design every senior engineer would have drawn — was one Glue job per table, three stages each, for ~69 jobs total. Familiar shape, easy to assign, easy to defend. I spent close to two weeks staffing and sketching toward it before I noticed I'd said the sentence "yeah, it'll be a lot to operate, but that's the cost" four times in one week. The tables didn't differ in shape that mattered to the pipeline; the transformations were parameterizable.

Decision

Build the ETL as 3 generalized Glue jobs — extract, transform, load — driven by a metadata table that names each source, its per-source transformation parameters, and its consumer cadence. One pattern, every source. Scrapped the per-table sketch and rebuilt against the metadata vocabulary.

Consequences

  • 23 tables migrated and continuously transformed through 3 jobs — fewer surfaces to test, version, and run on-call against; one bug fix, not 23.
  • The metadata table became a first-class operational artifact — adding a new source means extending metadata rows, not writing new jobs.
  • An unusual source whose transformation can't be expressed in the existing metadata vocabulary forces an extension to the schema, not a drop-in new job — the cost the per-table design wouldn't have charged.
  • The two-week tax I paid defending the per-table design before changing my mind became the tripwire lesson: if you're justifying the cost of your own design more than once, you're not designing — you're defending.

Related