Uvicorn failing to reply to healthchecks on K8s #2077

nhabbash · 2023-08-21T08:40:30Z

nhabbash
Aug 21, 2023

We have a FastAPI microservice that has an async endpoint that does some IO-bound work and some CPU-bound work. The requests it receives are of variable size, meaning that some might require more time/resources to send back a response, while other less. Previously we were running it using Gunicorn and 16 workers, and didn't have an issue with availability - the pods were always able to reply to the healthchecks as there was almost always a worker available to reply. The setup had some issues, such as some workers starving others, harder monitoring between the processes, so recently we have been trying to move to vanilla Uvicorn to reduce complexity and rely just on K8s to scale up and down.
We have found that with this change, the service now has issues replying in time to K8s liveness probe, meaning that sometime, if one of these requests takes too much, the pod gets killed. I'm trying to find a way to have the worker reply in time to the healthcheck that's coming, these are the changes that I tried:

Move all CPU-bound operations to a threadpool to avoid blocking the main thread and allow it to process incoming requests. A processpool might be better but we had some issues with serialization, a threadpool should not increase speed but ideally it should increase availability by unblocking the main thread.
Add a sync and an async healthchecks. The rationale here was that sync endpoints are run in a separate threadpool and thus wouldn't be blocked or slowed by what the main thread is doing. Seems like I was wrong on this and sync healthchecks still take a lot of time to get a response here.
Set a concurrency-limit and increase the backlog, both of these changes were mostly inconsequential as we never reach the limit for either of these. Decreasing it further would just mean that the caller would have to wait a bit more before sending the request as it would get 503 erros.

What I've done to test the availability of the service is running Apache Benchmark locally on the aforementioned endpoint (100 requests, 25 concurrency), testing at the same time AB on the healthcheck endpoint (1000 requests, 25 concurrency) and see how much it takes, but I've gotten mixed results - the main test takes roughly 500 seconds to complete, and I get slightly different results depending on when I launch the healthcheck load test.

Currently my hypothesis is that it has something to do with the request backlog - the healthchecks get queued up, but are waiting to be processed until the worker wraps up some of the requests it's currently processing. I thought that, by having the main thread virtually unblocked these requests would all start processing as soon as they come instead of waiting, but it seems like it's not that way.
Am I fundamentally misunderstanding something?

How can I increase availability of this service? What I have left to try that doesn't require full-refactoring is moving the endpoint to be sync, which I'm currently testing.

gi0baro · 2023-08-26T12:15:36Z

gi0baro
Aug 26, 2023

My 2 cents on this:

while it is true that everything you run in a threadpool won't run on the main thread and thus, as you said, won't be blocked by what the main thread is doing, it is also true that if the main thread holds a GIL lock, whatever you do in the threadpool still need for the GIL to be released before proceeding. A processpool shouldn't suffer the same issue, but as you said makes serialization harder
IMHO the fact kubernetes is killing your pod because it doesn't reply to lifechecks is correct. Even if you are processing a single request, failing to reply to the liveness probe also mean you can't reply to any request at all, and thus, are you actually available as pod?

Given this, what I would to is to increase the timeout for liveness probes in kubernetes, matching the maximum amount of time you expect requests might take, and thus the maximum theoretical amount of time new requests may stay "in queue".

I'm pretty sure Uvicorn itself is not doing anything wrong here :)

2 replies

gi0baro Aug 26, 2023

Also, a clarification: even if you process requests in a separate thread pool in your application, all the pre and post handling made by uvicorn still runs in the main thread.

dulacp Apr 18, 2024

@nhabbash, are you using the multiprocessing library for your CPU-bound work? I'm asking because I faced a similar issue and was able to fix it in the end. Now the health probes are responding 100% of the time.

There is a sneaky leaky-abstraction to be aware: uvicorn might load your api in a child spawn process either when using the --reload flag or when specifying more than 1 worker.

And in Kubernetes with a single worker, the multiprocessing library won't be configured by uvicorn to use "spawn" processes instead of "forks".

Do you think it could be related to your issue?

anton-daneyko-ultramarin · 2025-02-04T13:05:36Z

anton-daneyko-ultramarin
Feb 4, 2025

I had a similar issue where the service was too busy to reply to HTTP based health checks. This was affecting my readiness probe. I switched from HTTP based to command based readiness probe that does not use the endpoints in my FastAPI application at all. It helped. My health checks are trivial, though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uvicorn failing to reply to healthchecks on K8s #2077

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Uvicorn failing to reply to healthchecks on K8s #2077

nhabbash Aug 21, 2023

Replies: 2 comments · 2 replies

gi0baro Aug 26, 2023

gi0baro Aug 26, 2023

dulacp Apr 18, 2024

anton-daneyko-ultramarin Feb 4, 2025

nhabbash
Aug 21, 2023

Replies: 2 comments 2 replies

gi0baro
Aug 26, 2023

anton-daneyko-ultramarin
Feb 4, 2025