Microservice Timeouts
My microservice architecture is timing out
Published: Tuesday, Feb 9, 2021 Last modified: Saturday, Sep 7, 2024
I’m still cynical since my earlier poor Microservice pattern experience, nonetheless I see the pattern increasingly.
Microservices are attractive to people familiar with the Unix philosophy:
- Do one thing and do it right
- Separation of ownership – each service can have clear ownership and boundaries, allowing the service owner to operate and make deployments independently from each other
- Universal interface between them
However…
Microservices fail often
@startuml
"Purchase" -> "Auth" : Check user
"Auth" --> "Purchase" : 2s
"Purchase" -> "Payment" : Deduct balance
"Payment" --> "Purchase" : 2s
"Purchase" -> "Provision" : Supply
"Provision" --> "Purchase" : 3s
@enduml
APIs without a SLO & a strict response time budget, inevitably have response times that grow. And don’t forget unavoidable network issues!
So in the above case we can easily timeout, aka 504 Gateway Timeout.
What are the approaches to handle this case?
- Introduce a queue to retry
- Make the purchase API asynchronous
- Monitor the APIs to start strictly enforcing low response times
- Introduce caching
- Make the API idempotent
Each of these solutions are actually very hard to implement!
- Queues sound trivial, but they are not. You need AWS SQS with support for DLQs
- Asynchronous APIs often require a callback, which require addressable endpoints
- Monitoring each API via Prometheus / Grafana is non-trivial .. you will go down a rabbit hole when it comes to distributed tracing
- Caching is hard
- To make an API idempotent aka “retry-abble”, often you need to store state so you know where you left off
Let’s imagine the Payment API also has microservice dependencies:
@startuml
"Purchase" -> "Auth" : Check user
"Auth" --> "Purchase" : 2s
"Purchase" -> "Payment" : Take payment
"Payment" -> "Account" : Check balance
"Account" --> "Payment" : 1s
"Payment" -> "Account" : Deduct balance
"Account" --> "Payment" : 1s
"Payment" --> "Purchase" : 2s
"Purchase" -> "Provision" : Supply
"Provision" --> "Purchase" : 3s
@enduml
Distributed systems are really hard
- Microservices delegate things outside their domain, and introduce inter dependencies
- There isn’t a Universal interface. REST/HTTP is great for synchronous, but what happens when you need to go asynchronous?
Rebuttal via a YT comment
Luis Santos correctly points out the bad practices here:
- Using Queues without proper monitoring or a deadletter strategy
- Not having proper monitoring
- Neglecting performance
- Sharing a database between services
- Distributed transactions
- Incorrect service boundaries (this is cause of the previous 2 bad practices)
Regarding the idempotence problem. You don’t need a cache. You just need to take advantage of your database optimistic locking mechanisms. You could use something like upsert, insert ignore or a conditional insert.
I’m not sure “Sharing a database between services” (read-only) is such a bad practice. Since duplicate data on several databases can be far worse.