Uvicorn failing to reply to healthchecks on K8s #2077
Replies: 2 comments 2 replies
-
My 2 cents on this:
Given this, what I would to is to increase the timeout for liveness probes in kubernetes, matching the maximum amount of time you expect requests might take, and thus the maximum theoretical amount of time new requests may stay "in queue". I'm pretty sure Uvicorn itself is not doing anything wrong here :) |
Beta Was this translation helpful? Give feedback.
-
I had a similar issue where the service was too busy to reply to HTTP based health checks. This was affecting my readiness probe. I switched from HTTP based to command based readiness probe that does not use the endpoints in my FastAPI application at all. It helped. My health checks are trivial, though. |
Beta Was this translation helpful? Give feedback.
-
We have a FastAPI microservice that has an async endpoint that does some IO-bound work and some CPU-bound work. The requests it receives are of variable size, meaning that some might require more time/resources to send back a response, while other less. Previously we were running it using Gunicorn and 16 workers, and didn't have an issue with availability - the pods were always able to reply to the healthchecks as there was almost always a worker available to reply. The setup had some issues, such as some workers starving others, harder monitoring between the processes, so recently we have been trying to move to vanilla Uvicorn to reduce complexity and rely just on K8s to scale up and down.
We have found that with this change, the service now has issues replying in time to K8s liveness probe, meaning that sometime, if one of these requests takes too much, the pod gets killed. I'm trying to find a way to have the worker reply in time to the healthcheck that's coming, these are the changes that I tried:
concurrency-limit
and increase thebacklog
, both of these changes were mostly inconsequential as we never reach the limit for either of these. Decreasing it further would just mean that the caller would have to wait a bit more before sending the request as it would get 503 erros.What I've done to test the availability of the service is running Apache Benchmark locally on the aforementioned endpoint (100 requests, 25 concurrency), testing at the same time AB on the healthcheck endpoint (1000 requests, 25 concurrency) and see how much it takes, but I've gotten mixed results - the main test takes roughly 500 seconds to complete, and I get slightly different results depending on when I launch the healthcheck load test.
Currently my hypothesis is that it has something to do with the request backlog - the healthchecks get queued up, but are waiting to be processed until the worker wraps up some of the requests it's currently processing. I thought that, by having the main thread virtually unblocked these requests would all start processing as soon as they come instead of waiting, but it seems like it's not that way.
Am I fundamentally misunderstanding something?
How can I increase availability of this service? What I have left to try that doesn't require full-refactoring is moving the endpoint to be sync, which I'm currently testing.
Beta Was this translation helpful? Give feedback.
All reactions