Microservice Timeouts

My microservice architecture is timing out

Published: Tuesday, Feb 9, 2021 Last modified: Friday, Jul 11, 2025

I’m still cynical since my earlier poor Microservice pattern experience, nonetheless I see the pattern increasingly.

Microservices are attractive to people familiar with the Unix philosophy:

Do one thing and do it right
Separation of ownership – each service can have clear ownership and boundaries, allowing the service owner to operate and make deployments independently from each other
Universal interface between them

However…

Microservices fail often

@startuml
"Purchase" -> "Auth" : Check user
"Auth" --> "Purchase" : 2s
"Purchase" -> "Payment" : Deduct balance
"Payment" --> "Purchase" : 2s
"Purchase" -> "Provision" : Supply
"Provision" --> "Purchase" : 3s
@enduml

APIs without a SLO & a strict response time budget, inevitably have response times that grow. And don’t forget unavoidable network issues!

So in the above case we can easily timeout, aka 504 Gateway Timeout.

What are the approaches to handle this case?

Introduce a queue to retry
Make the purchase API asynchronous
Monitor the APIs to start strictly enforcing low response times
Introduce caching
Make the API idempotent

Each of these solutions are actually very hard to implement!

Queues sound trivial, but they are not. You need AWS SQS with support for DLQs
Asynchronous APIs often require a callback, which require addressable endpoints
Monitoring each API via Prometheus / Grafana is non-trivial .. you will go down a rabbit hole when it comes to distributed tracing
Caching is hard
To make an API idempotent aka “retry-abble”, often you need to store state so you know where you left off

Let’s imagine the Payment API also has microservice dependencies:

@startuml
"Purchase" -> "Auth" : Check user
"Auth" --> "Purchase" : 2s
"Purchase" -> "Payment" : Take payment
"Payment" -> "Account" : Check balance
"Account" --> "Payment" : 1s
"Payment" -> "Account" : Deduct balance
"Account" --> "Payment" : 1s
"Payment" --> "Purchase" : 2s
"Purchase" -> "Provision" : Supply
"Provision" --> "Purchase" : 3s
@enduml

Distributed systems are really hard

Microservices delegate things outside their domain, and introduce inter dependencies
There isn’t a Universal interface. REST/HTTP is great for synchronous, but what happens when you need to go asynchronous?

Rebuttal via a YT comment

Luis Santos correctly points out the bad practices here:

Using Queues without proper monitoring or a deadletter strategy
Not having proper monitoring
Neglecting performance
Sharing a database between services
Distributed transactions
Incorrect service boundaries (this is cause of the previous 2 bad practices)

Regarding the idempotence problem. You don’t need a cache. You just need to take advantage of your database optimistic locking mechanisms. You could use something like upsert, insert ignore or a conditional insert.

I’m not sure “Sharing a database between services” (read-only) is such a bad practice. Since duplicate data on several databases can be far worse.

Pages that link to this page:

Iterating on a sequence diagram