Requests distribution #2467
-
Hey 👋 Recently we noticed that request distribution between different processes on a single machine looks like this: We use Here is an example of the `supervisor` config:
and here is a simple app for testingimport asyncio
import os
from fastapi import FastAPI
app = FastAPI()
# Global counter and lock
request_count = 0
request_count_lock = asyncio.Lock()
# Function to safely increment and print request count
async def increment_request_count():
global request_count
async with request_count_lock:
request_count += 1
@app.get("/")
async def read_root():
await increment_request_count()
return {"pid": os.getpid(), 'count': request_count} and a simple script to test itimport asyncio
import httpx
from httpx import Limits
async def call(client):
response = await client.get('http://127.0.0.1:8000')
print(response.text)
return response.json()
async def main():
results = {}
async with httpx.AsyncClient(limits=Limits(max_connections=10, max_keepalive_connections=10)) as client:
for result in (await asyncio.gather(*[call(client) for _ in range(1000)])):
results[result['pid']] = result['count']
print(results)
if __name__ == '__main__':
asyncio.run(main()) This setup provides poor requests distribution. I tried using uvicorn as a process manager, but the result is the same. Previously, there was a similar discussion, but about
Sorry for the long intro 😅 Here is my question: If it's not possible to do that using just Thank you!! |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 20 replies
-
Hi @SlavaSkvortsov , right... I don't know where the issue relies on. We do set the Line 512 in a507532 Any ideas @graingert @abersheeran ? My recommendation would be to rely on an external load balancer e.g. nginx. |
Beta Was this translation helpful? Give feedback.
-
Please make sure you use the new multi-process manager and increase the concurrency (the test script you gave establishes at most 10 connections, so the test is too accidental). |
Beta Was this translation helpful? Give feedback.
-
We have the same problem with gunicorn, our application structure is an asynchronous gunicorn-worker with processing of most requests in a synchronous thread (there is only one thread). Perhaps if you repeat this load profile, the balancing problem will be visible. |
Beta Was this translation helpful? Give feedback.
-
@SlavaSkvortsov does https://github.com/encode/uvicorn/pull/2472/files fix it for you? |
Beta Was this translation helpful? Give feedback.
-
Hi, I stumbled upon this discussion as I'm also experiencing such problem when having multiple workers in The FastAPI app is rather simple, I have few endpoints that are talking to DB querying some products - I'm using MSSQL Express 2019 (as this is a requirement..). Currently I'm running the app inside WSL2. I'm trying to do my best to have everything @products_router.get(
"/",
summary="Get list of all products, paginated",
)
async def get_all_products(
session: DBEngineDep,
params: TotalCursorParams = Depends(),
) -> TotalCursorPage[Product]:
"""
Get list of all products, paginated
"""
all_products: TotalCursorPage[Product] = await insertgt.get_all_products(
session=session,
params=params,
)
if all_products.total is not None and all_products.total > 0:
return all_products
else:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="No products found",
) And a single call to it takes ~400ms - 500ms:
I start the application from if __name__ == "__main__":
workers: int = TypeAdapter(int).validate_python(os.getenv("WORKERS", 6))
reload: bool = TypeAdapter(bool).validate_python(os.getenv("RELOAD", False))
uvicorn.run(
app="main:app",
host="0.0.0.0",
port=8000,
reload=reload,
workers=workers,
# uvloop works only on Linux!
loop="uvloop",
log_config="log-config.yaml",
log_level="debug",
) I'd expect that if I start the app with 6 workers, all of them would pick up the requests and would load balance equally. But unfortunately, this isn't the case. I test the app with After initial load I see usually ~3-4 out of 6 workers being active and serving responses, then after some of the users finish querying the DB, all other users are served just by 1 I must add that when I issue another request "on the side" (outside Locust tests) app is still responsive and I get the response back - so I think Here is very trivial example of this behavior - 2 first users queried DB in ~1 min., then we observe huge bottleneck (1 worker serving requests, others doing nothing):
I observed that when I kill the currently struggling serving worker process, other workers are picking up the load and are starting to serve responses - so RPS increases for some time, then the situation repeats. Most probably I will have to deploy this app (stack) in a container running on a PC not as a web app (k8s etc.), so running it with multiple healthy workers makes sense to me I prepared a video which demonstrates this issue, I will try to attach it Test & app startup: fastapi-uvicorn-workers-bottlenecks_001.mp4App initially serving requests with few workers, then bottleneck starts: fastapi-uvicorn-workers-bottlenecks_002.mp4I had to kill "the one" worker process and other processes started serving again, but not all of them: fastapi-uvicorn-workers-bottlenecks_003.mp4Issue repeats: fastapi-uvicorn-workers-bottlenecks_004.mp4 |
Beta Was this translation helpful? Give feedback.
-
@graingert answering your questions:
Yes, I tested the real app with
I observed that when I have 10 users trying to get all DB items, it feels like some users becomes tied to a particular worker, when user finishes all requests, worker stops doing the work. Users that left, queue on a single worker, until I kill that worker process. It's more visible in the videos I attached I think I've been able to create a mock app that behaves similarly to the real app. The key is to have some CPU load generated, initially I tried just simple Here's the mock app: uvicorn-workers-troubleshooting-app.tar.gz I used
And the recording that I did, (I had to split it into ~10MB files) troubleshooting-app-all_001.mp4In the below video around 0:05 (5 sec.), we can see that as soon as 3 users finish their requests - workers become idle, not picking up any requests even though there are still 7 users left to be served, we see that only 2 workers are serving the responses for a while. Then around 1:21 another user finished, and only 1 worker is still serving responses troubleshooting-app-all_002.mp4In the below video, around 0:06 I killed that one busy worker, and 4 workers picked up the work - still not all of them (app starts with 6 workers). Then around 0:50 problem repeats troubleshooting-app-all_003.mp4This issue also happens on a MacOS - I used the same example app that is attached: I hope this helps |
Beta Was this translation helpful? Give feedback.
@SlavaSkvortsov does https://github.com/encode/uvicorn/pull/2472/files fix it for you?