Requests distribution #2467

SlavaSkvortsov · 2024-09-26T12:32:53Z

SlavaSkvortsov
Sep 26, 2024

Hey 👋

Recently we noticed that request distribution between different processes on a single machine looks like this:

Some of the processes get most of the requests, while others barely pick up any.

We use uvicorn together with supervisor (guide).

Here is an example of the `supervisor` config:

[fcgi-program:uvicorn]
numprocs=4
command=uvicorn app:app --fd 0
socket=tcp://127.0.0.1:8000
process_name=test_%(process_num)02d

and here is a simple app for testing

import asyncio
import os

from fastapi import FastAPI

app = FastAPI()

# Global counter and lock
request_count = 0
request_count_lock = asyncio.Lock()


# Function to safely increment and print request count
async def increment_request_count():
    global request_count
    async with request_count_lock:
        request_count += 1


@app.get("/")
async def read_root():
    await increment_request_count()
    return {"pid": os.getpid(), 'count': request_count}

and a simple script to test it

import asyncio

import httpx
from httpx import Limits


async def call(client):
    response = await client.get('http://127.0.0.1:8000')
    print(response.text)
    return response.json()


async def main():
    results = {}

    async with httpx.AsyncClient(limits=Limits(max_connections=10, max_keepalive_connections=10)) as client:
        for result in (await asyncio.gather(*[call(client) for _ in range(1000)])):
            results[result['pid']] = result['count']

    print(results)


if __name__ == '__main__':
    asyncio.run(main())

This setup provides poor requests distribution. I tried using uvicorn as a process manager, but the result is the same.

Previously, there was a similar discussion, but about gunicorn. Right now, gunicorn struggles with the same problem, but there is a PR that fixes it by utilizing SO_REUSEPORT socket option that

For TCP sockets, this option allows accept(2) load distribution in a multi-threaded server to be improved by using a distinct listener socket for each thread. This provides improved load distribution as compared to traditional techniques such using a single accept(2)ing thread that distributes connections, or having multiple threads that compete to accept(2) from the same socket

Sorry for the long intro 😅 Here is my question:
Is there a way to achieve more or less even request distribution across the workers on the same machine using uvicorn?

If it's not possible to do that using just uvicorn, which approach could you recommend to improve the balancing?

Thank you!!

Answered by graingert

Sep 29, 2024

@SlavaSkvortsov does https://github.com/encode/uvicorn/pull/2472/files fix it for you?

View full answer

Kludex · 2024-09-28T08:57:00Z

Kludex
Sep 28, 2024
Maintainer

Hi @SlavaSkvortsov , right... I don't know where the issue relies on. We do set the SO_REUSEPORT, but it doesn't seem to be enough:

uvicorn/uvicorn/config.py

Line 512 in a507532

sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

Any ideas @graingert @abersheeran ?

My recommendation would be to rely on an external load balancer e.g. nginx.

10 replies

graingert Sep 28, 2024
Collaborator

Also there's an additional footgun here, I think you should not set REUSEPORT when running the dev server (reload mode) otherwise if a user runs the dev server on two different terminals Linux will load balance between them, and in reality the user will expect the second server to fail to bind to an already in use port

Kludex Sep 28, 2024
Maintainer

man.freebsd.org/cgi/man.cgi?setsockopt(2) bsd is different and I think needs SO_REUSEPORT_LB which isn't supported in python

I've found this implementation around:

https://github.com/cherrypy/cheroot/blob/1ff20b18e98e424e0616654c28c92524deef348a/cheroot/server.py#L1983-L2007

graingert Sep 28, 2024
Collaborator

But I don't think python socket defines that flag

graingert Sep 28, 2024
Collaborator

see how SO_REUSEPORT behaves differently on macos and linux:

import threading
import sys
import socket
import concurrent.futures

def main():
    serving = threading.Event()
    accepted = set()

    def serve(name):
        with socket.create_server(("localhost", 8080), reuse_port=True) as sock:
            serving.set()
            while True:
                client_sock, address = sock.accept()
                accepted.add(name)
                if len(accepted) > 1:
                    print("done!")
                    return
                print(f"sock accepted on server {name}")
                with client_sock:
                    pass

    def client():
        serving.wait()
        while True:
            with socket.create_connection(("localhost", 8080)):
                pass

    with concurrent.futures.ThreadPoolExecutor(3) as tpe:
        f1 = tpe.submit(serve, 1)
        f2 = tpe.submit(serve, 2)
        f3 = tpe.submit(client)
        for v in concurrent.futures.as_completed((f1, f2, f3)):
            v.result()



if __name__ == "__main__":
    sys.exit(main())

on macos this runs forever printing sock accepted on server 1 forever, on linux it terminates after both servers accept a connection

graingert Sep 29, 2024
Collaborator

one neat thing you can do is for the first socket bind you can set SO_REUSEPORT 0, then after it's bound set SO_REUSEPORT 1, this way if another server is bound to the port you get an OSError 98

abersheeran · 2024-09-28T15:20:23Z

abersheeran
Sep 28, 2024
Collaborator

Please make sure you use the new multi-process manager and increase the concurrency (the test script you gave establishes at most 10 connections, so the test is too accidental).

0 replies

minmax · 2024-09-29T12:05:33Z

minmax
Sep 29, 2024

We have the same problem with gunicorn, our application structure is an asynchronous gunicorn-worker with processing of most requests in a synchronous thread (there is only one thread). gunicorn -> uvicorn-worker -> sync thread

Perhaps if you repeat this load profile, the balancing problem will be visible.

0 replies

graingert · 2024-09-29T19:51:02Z

graingert
Sep 29, 2024
Collaborator

@SlavaSkvortsov does https://github.com/encode/uvicorn/pull/2472/files fix it for you?

7 replies

graingert Oct 1, 2024
Collaborator

I also just noticed I got the logic inverted of the supported modes and have pushed a fix and test

SlavaSkvortsov Oct 1, 2024
Author

Oh wow, it's working!

On Gentoo:
Patched, with supervisor:

(venv-patched) ~/slava.skvortsov $ python test-copy.py 
{80879: 324, 80881: 341, 80880: 75, 80882: 260}
48.70701523736938

Patched, with uvicorm's process manager

(venv-patched) ~/slava.skvortsov $ python test-copy.py 
{87512: 279, 87511: 251, 87510: 226, 87509: 244}
8.80605852051113

Amazing!

I don't understand, though, why it's not working with supervisor? Is there a way to make them compatible?

graingert Oct 1, 2024
Collaborator

supervisor binds the socket and passes it to the child process. The fix should be in supervisor to bind a new socket for each process with SO_REUSEPORT

SlavaSkvortsov Oct 2, 2024
Author

I see, thanks!

Do you plan to release the PR? Would like to avoid using a fork in production 😅

graingert Oct 2, 2024
Collaborator

currently I'm fighting coverage issues, and don't have much time to work on it - I'll need someone to finish it off for me.

I raised an issue with supervisor to add the feature there: Supervisor/supervisor#1661

mdpdesign · 2025-02-14T15:26:06Z

mdpdesign
Feb 14, 2025

Hi, I stumbled upon this discussion as I'm also experiencing such problem when having multiple workers in uvicorn, after some initial load only 1 worker is serving responses to the client and other workers are idling. I'm struggling to understand what exactly is happening and how to solve the bottleneck.

The FastAPI app is rather simple, I have few endpoints that are talking to DB querying some products - I'm using MSSQL Express 2019 (as this is a requirement..). Currently I'm running the app inside WSL2.

I'm trying to do my best to have everything async, so I use SQLAlchemy with aiodbc + async session and engine + fastapi-pagination for cursor based pagination for the results from DB, to limit the number of results returned, the endpoint function looks like this:

@products_router.get(
    "/",
    summary="Get list of all products, paginated",
)
async def get_all_products(
    session: DBEngineDep,
    params: TotalCursorParams = Depends(),
) -> TotalCursorPage[Product]:
    """
    Get list of all products, paginated
    """

    all_products: TotalCursorPage[Product] = await insertgt.get_all_products(
        session=session,
        params=params,
    )

    if all_products.total is not None and all_products.total > 0:
        return all_products
    else:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="No products found",
        )

And a single call to it takes ~400ms - 500ms:

curl -s -o /dev/null -w "%{http_code} %{time_total} %{http_version} %{method} %{url}\n" -X 'GET' 'http://localhost:8000/api/v1/products/?size=500' -H 'accept: application/json'

200 0.432331 1.1 GET http://localhost:8000/api/v1/products/?size=500
200 0.421117 1.1 GET http://localhost:8000/api/v1/products/?size=500
200 0.403372 1.1 GET http://localhost:8000/api/v1/products/?size=500

I start the application from main.py programmatically:

if __name__ == "__main__":
    workers: int = TypeAdapter(int).validate_python(os.getenv("WORKERS", 6))
    reload: bool = TypeAdapter(bool).validate_python(os.getenv("RELOAD", False))

    uvicorn.run(
        app="main:app",
        host="0.0.0.0",
        port=8000,
        reload=reload,
        workers=workers,
        # uvloop works only on Linux!
        loop="uvloop",
        log_config="log-config.yaml",
        log_level="debug",
    )

I'd expect that if I start the app with 6 workers, all of them would pick up the requests and would load balance equally. But unfortunately, this isn't the case. I test the app with Locust and 10 users in parallel to query all the DB items - basically to know how fast a user can query all items in the DB.

After initial load I see usually ~3-4 out of 6 workers being active and serving responses, then after some of the users finish querying the DB, all other users are served just by 1 uvicorn worker which greatly impacts the app performance.

I must add that when I issue another request "on the side" (outside Locust tests) app is still responsive and I get the response back - so I think async works.

Here is very trivial example of this behavior - 2 first users queried DB in ~1 min., then we observe huge bottleneck (1 worker serving requests, others doing nothing):

[2025-02-14 15:33:21,856] DESKTOP-GMSMBCO/INFO/locustfile: User 'Ethan   ', elapsed time: 58.048524 (0:00:58.048524) seconds
[2025-02-14 15:33:22,053] DESKTOP-GMSMBCO/INFO/locustfile: User 'Juan    ', elapsed time: 58.245155 (0:00:58.245155) seconds
[2025-02-14 15:35:28,922] DESKTOP-GMSMBCO/INFO/locustfile: User 'Christian', elapsed time: 185.110893 (0:03:05.110893) seconds
[2025-02-14 15:35:30,426] DESKTOP-GMSMBCO/INFO/locustfile: User 'Daniel  ', elapsed time: 186.615245 (0:03:06.615245) seconds
[2025-02-14 15:36:46,746] DESKTOP-GMSMBCO/INFO/locustfile: User 'Benjamin', elapsed time: 262.935235 (0:04:22.935235) seconds
[2025-02-14 15:36:47,084] DESKTOP-GMSMBCO/INFO/locustfile: User 'Andrew  ', elapsed time: 263.275920 (0:04:23.275920) seconds
[2025-02-14 15:37:43,826] DESKTOP-GMSMBCO/INFO/locustfile: User 'Jacob   ', elapsed time: 320.013285 (0:05:20.013285) seconds
[2025-02-14 15:37:45,825] DESKTOP-GMSMBCO/INFO/locustfile: User 'Zachary ', elapsed time: 322.013534 (0:05:22.013534) seconds
[2025-02-14 15:37:53,055] DESKTOP-GMSMBCO/INFO/locustfile: User 'Dennis  ', elapsed time: 329.245970 (0:05:29.245970) seconds
[2025-02-14 15:37:53,271] DESKTOP-GMSMBCO/INFO/locustfile: User 'John    ', elapsed time: 329.460643 (0:05:29.460643) seconds

I observed that when I kill the currently struggling serving worker process, other workers are picking up the load and are starting to serve responses - so RPS increases for some time, then the situation repeats.

Most probably I will have to deploy this app (stack) in a container running on a PC not as a web app (k8s etc.), so running it with multiple healthy workers makes sense to me

I prepared a video which demonstrates this issue, I will try to attach it

Test & app startup:

fastapi-uvicorn-workers-bottlenecks_001.mp4

App initially serving requests with few workers, then bottleneck starts:

fastapi-uvicorn-workers-bottlenecks_002.mp4

I had to kill "the one" worker process and other processes started serving again, but not all of them:

fastapi-uvicorn-workers-bottlenecks_003.mp4

Issue repeats:

fastapi-uvicorn-workers-bottlenecks_004.mp4

3 replies

graingert Feb 14, 2025
Collaborator

This should be fixed by my PR currently need a decision on using pytest-cov for coverage of subprocess

mdpdesign Feb 14, 2025

@graingert - thanks for the response!

I tried installing uvicorn with uv (uv pip install "git+https://github.com/encode/uvicorn@socket-load-balance") from your branch, but doing almost the same test didn't help (maybe I'm doing something wrong 😅) - I still observed only 1 worker serving responses after a while

The difference was that I started the app with uvicorn --socket-load-balance --workers 6 --log-config ../../log-config.yaml --log-level debug "main:app" instead calling main.py

I must add that I also tried serving the app with gunicorn & uvicorn workers - uv run gunicorn main:app -w 6 --reuse-port -k uvicorn.workers.UvicornWorker - but that didn't help as well 🙁

graingert Feb 14, 2025
Collaborator

Can you make a runnable example, I can download and run? Can you try periodically dumping the tracebacks from sys._current_frames() and inspecting the tracebacks when the worker process is stalled.

Does this problem happen with gunicorn?

Do all the workers stall at the same time (except one)?

mdpdesign · 2025-02-15T08:50:52Z

mdpdesign
Feb 15, 2025

@graingert answering your questions:

Does this problem happen with gunicorn?

Yes, I tested the real app with gunicorn with uvicorn worker class and it behaved the same way

Do all the workers stall at the same time (except one)?

I observed that when I have 10 users trying to get all DB items, it feels like some users becomes tied to a particular worker, when user finishes all requests, worker stops doing the work. Users that left, queue on a single worker, until I kill that worker process. It's more visible in the videos I attached

I think I've been able to create a mock app that behaves similarly to the real app. The key is to have some CPU load generated, initially I tried just simple await asyncio.sleep(0.5) - to simulate DB query as per my previous answer. But that didn't show the issue

Here's the mock app:

uvicorn-workers-troubleshooting-app.tar.gz

I used uv to manage the project, to run it:

# Create .venv and install packages
uv sync --python 3.13.2

# Run application
uv run src/app/main.py

# Run locust for load testing
uv run locust -f locustfile.py --web-host 0.0.0.0 --web-port 8090 -u 10 -r 10 --tags all-products |& tee locust-log.log

# Logs included from my own testing (correlates to videos)
original-app-logs.log
original-locust-log.log

And the recording that I did, (I had to split it into ~10MB files)

troubleshooting-app-all_001.mp4

In the below video around 0:05 (5 sec.), we can see that as soon as 3 users finish their requests - workers become idle, not picking up any requests even though there are still 7 users left to be served, we see that only 2 workers are serving the responses for a while. Then around 1:21 another user finished, and only 1 worker is still serving responses

troubleshooting-app-all_002.mp4

In the below video, around 0:06 I killed that one busy worker, and 4 workers picked up the work - still not all of them (app starts with 6 workers). Then around 0:50 problem repeats

troubleshooting-app-all_003.mp4

This issue also happens on a MacOS - I used the same example app that is attached:

I hope this helps

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests distribution #2467

{{title}}

Replies: 6 comments 20 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Requests distribution #2467

SlavaSkvortsov Sep 26, 2024

Replies: 6 comments · 20 replies

Kludex Sep 28, 2024 Maintainer

graingert Sep 28, 2024 Collaborator

Kludex Sep 28, 2024 Maintainer

graingert Sep 28, 2024 Collaborator

graingert Sep 28, 2024 Collaborator

graingert Sep 29, 2024 Collaborator

abersheeran Sep 28, 2024 Collaborator

minmax Sep 29, 2024

graingert Sep 29, 2024 Collaborator

graingert Oct 1, 2024 Collaborator

SlavaSkvortsov Oct 1, 2024 Author

graingert Oct 1, 2024 Collaborator

SlavaSkvortsov Oct 2, 2024 Author

graingert Oct 2, 2024 Collaborator

mdpdesign Feb 14, 2025

graingert Feb 14, 2025 Collaborator

mdpdesign Feb 14, 2025

graingert Feb 14, 2025 Collaborator

mdpdesign Feb 15, 2025

SlavaSkvortsov
Sep 26, 2024

Replies: 6 comments 20 replies

Kludex
Sep 28, 2024
Maintainer

graingert Sep 28, 2024
Collaborator

Kludex Sep 28, 2024
Maintainer

graingert Sep 28, 2024
Collaborator

graingert Sep 28, 2024
Collaborator

graingert Sep 29, 2024
Collaborator

abersheeran
Sep 28, 2024
Collaborator

minmax
Sep 29, 2024

graingert
Sep 29, 2024
Collaborator

graingert Oct 1, 2024
Collaborator

SlavaSkvortsov Oct 1, 2024
Author

graingert Oct 1, 2024
Collaborator

SlavaSkvortsov Oct 2, 2024
Author

graingert Oct 2, 2024
Collaborator

mdpdesign
Feb 14, 2025

graingert Feb 14, 2025
Collaborator

graingert Feb 14, 2025
Collaborator

mdpdesign
Feb 15, 2025