Is Batch Bad?

A batch process should not be deprecated as soon as an API appears!

Published: Wednesday, Mar 3, 2021 Last modified: Saturday, Sep 7, 2024

tl;dr A batch job should NOT be be immediately deprecated once an API appears since the API will probably take quite some time to mature!

We want a real time system! We need a single source of truth!

To share data between distributed systems you have a few choices:

  1. Synchronise the data in a “batch” job (typically daily / overnight), and often a push (from the source) instead of a pull
  2. Allow some (internal) database sharing via a “database-link” to query
  3. Create a Restful API for the database, and fetch data from it

Batch

Pros:

Cons:

database-link

Pros:

Cons:

API

Pros:

Cons:

Conclusion

If Enterprise grade reliability is required, I think the resilience that batch offers, is hard to beat. Distributed systems suffer from network failures, and with batch you have plenty of time to make sure that’s not an issue if your business process can afford it. It is great to have a local copy at hand to operate without dependencies in a fast manner. Batch jobs should not be deprecated immediately when an API shows up.

Why not both?

The ideal solution during this transition is to probably have batch as a fallback and API with a tight SLO / budget. However it needs to be crystal clear to the client how old the data is, as to not get the false impression that a fall back result is “real time”.

As the “real time” API matures, it’s conceivable that the batch job can be deprecated in favour of just using the API to cache some results up front, i.e. use the API to batch.