Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastAPI Integration with OpenAI Client Returns 500 Error on /chat/completions Endpoint #169

Open
Abhiraj-Alois opened this issue Feb 19, 2025 · 0 comments

Comments

@Abhiraj-Alois
Copy link

Abhiraj-Alois commented Feb 19, 2025

When attempting to use LMStudio with FastAPI and the OpenAI async client, the /chat/completions endpoint returns a 500 error, despite working correctly when accessed directly. The server returns a 200 status code but includes a 500 server error indicating an incorrect chat/completions endpoint.

Environment

  • LMStudio Version: 0.3.10
  • Python Version: 3.12.3
  • LMStudio Operating System: Windows
  • Code Operating System: Ubuntu

Current Behavior

The API endpoint returns a 500 error with the message "Unexpected endpoint or method. (POST /chat/completions)" when called through FastAPI, despite the endpoint being correct and functional when tested directly.

Expected Behavior

The endpoint should process the request and return a valid response, as it does when tested directly through LMStudio's interface.

Example Code:

@router.get("/test_llm")
async def test_llm():    
    llm = LMStudioLLM()
    
    prompt = f"When was Valentine's Day?"
    response = await llm._call(prompt)    

    md(response)
    return {"response": response}
LMSTUDIO_MODEL = os.getenv("LMSTUDIO_MODEL", "llama-3.2-3b-instruct")
LMSTUDIO_BASE_URL = os.getenv("LMSTUDIO_BASE_URL", "http://192.168.X.XXX:1234/v1")
LMSTUDIO_API_KEY = os.getenv("LMSTUDIO_API_KEY", "lm_studio")

async def lmstudio_llm(inputs: str) -> Optional[str]:
    client = AsyncOpenAI(base_url=LMSTUDIO_BASE_URL, api_key=LMSTUDIO_API_KEY)
    md(f"[DEBUG] LLM API Endpoint: {LMSTUDIO_BASE_URL}")
    md(f"[DEBUG] LLM API Model: {LMSTUDIO_MODEL}")
    md(f"[DEBUG] LLM API Key Exists: {bool(LMSTUDIO_API_KEY)}")
    for attempt in range(3):
        try:
            response = await client.chat.completions.create(
                model=LMSTUDIO_MODEL,
                messages=[{"role": "user", "content": inputs}],
                temperature=0.2,
                stream=False
            )
            
            if response and response.choices:
                return response.choices[0].message.content.strip()
            else:
                md(f"[ERROR] Empty response or no choices in response: {response}")

        except APIConnectionError as e:
            md(f"[ERROR] Connection failed (attempt {attempt+1}/3): {e}")

        except OpenAIError as e:
            md(f"[ERROR] OpenAI API error: {e}")
            return None 

        await asyncio.sleep(5) 

    md("[ERROR] Failed to connect after multiple attempts.")
    return None

class LMStudioLLM:
    async def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        response = await lmstudio_llm(prompt)
        if response is None:
            raise RuntimeError("LLM failed to generate a response.")
        return response

Error Log:

[DEBUG] LLM API Endpoint: http://192.168.X.XXX:1234/v1
DEBUG] LLM API Model: llama-3.2-3b-instruct
DEBUG] LLM API Key Exists: True
[ERROR] Empty response or no choices in response: 
ChatCompletion(id=None, choices=None, created=None, 
model=None, object=None, service_tier=None, 
system_fingerprint=None, usage=None, error='Unexpected 
endpoint or method. (POST /chat/completions)')
[ERROR] Empty response or no choices in response: 
ChatCompletion(id=None, choices=None, created=None, 
model=None, object=None, service_tier=None, 
system_fingerprint=None, usage=None, error='Unexpected 
endpoint or method. (POST /chat/completions)')
[ERROR] Empty response or no choices in response: 
ChatCompletion(id=None, choices=None, created=None, 
model=None, object=None, service_tier=None, 
system_fingerprint=None, usage=None, error='Unexpected 
endpoint or method. (POST /chat/completions)')
[ERROR] Failed to connect after multiple attempts.
INFO:     192.168.X.XXX:45184 - "GET /api/test_llm HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "backend/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "backend/.venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "backend/.venv/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "backend/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "backend/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "backend/app/routes/batch_documents.py", line 119, in test_llm
    response1 = await llm._call(prompt2)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "backend/app/services/lmstudio_llm.py", line 63, in _call
    raise RuntimeError("LLM failed to generate a response.")
RuntimeError: LLM failed to generate a response.

Steps to Reproduce

  • Set up LMStudio server with the specified configuration
  • Implement the FastAPI endpoint as shown in the code example
  • Make a GET request to the /test_llm endpoint
  • Observe the 500 error response

Additional Context

  • The same code works when tested directly by running that particular file
  • Multiple retry attempts result in the same error
  • I've tried multiple approaches including using both LMStudio from llama_index and direct AsyncOpenAI client
  • All configuration parameters match between working and non-working scenarios
  • Tested with Async and Sync and still the same result

Questions

  • Is there a specific way to configure FastAPI to work with LMStudio's OpenAI-compatible API?
  • Are there known issues with async OpenAI client usage in FastAPI applications?
  • Is there a recommended approach for integrating LMStudio with FastAPI?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant