You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to use LMStudio with FastAPI and the OpenAI async client, the /chat/completions endpoint returns a 500 error, despite working correctly when accessed directly. The server returns a 200 status code but includes a 500 server error indicating an incorrect chat/completions endpoint.
Environment
LMStudio Version: 0.3.10
Python Version: 3.12.3
LMStudio Operating System: Windows
Code Operating System: Ubuntu
Current Behavior
The API endpoint returns a 500 error with the message "Unexpected endpoint or method. (POST /chat/completions)" when called through FastAPI, despite the endpoint being correct and functional when tested directly.
Expected Behavior
The endpoint should process the request and return a valid response, as it does when tested directly through LMStudio's interface.
Example Code:
@router.get("/test_llm")asyncdeftest_llm():
llm=LMStudioLLM()
prompt=f"When was Valentine's Day?"response=awaitllm._call(prompt)
md(response)
return {"response": response}
LMSTUDIO_MODEL=os.getenv("LMSTUDIO_MODEL", "llama-3.2-3b-instruct")
LMSTUDIO_BASE_URL=os.getenv("LMSTUDIO_BASE_URL", "http://192.168.X.XXX:1234/v1")
LMSTUDIO_API_KEY=os.getenv("LMSTUDIO_API_KEY", "lm_studio")
asyncdeflmstudio_llm(inputs: str) ->Optional[str]:
client=AsyncOpenAI(base_url=LMSTUDIO_BASE_URL, api_key=LMSTUDIO_API_KEY)
md(f"[DEBUG] LLM API Endpoint: {LMSTUDIO_BASE_URL}")
md(f"[DEBUG] LLM API Model: {LMSTUDIO_MODEL}")
md(f"[DEBUG] LLM API Key Exists: {bool(LMSTUDIO_API_KEY)}")
forattemptinrange(3):
try:
response=awaitclient.chat.completions.create(
model=LMSTUDIO_MODEL,
messages=[{"role": "user", "content": inputs}],
temperature=0.2,
stream=False
)
ifresponseandresponse.choices:
returnresponse.choices[0].message.content.strip()
else:
md(f"[ERROR] Empty response or no choices in response: {response}")
exceptAPIConnectionErrorase:
md(f"[ERROR] Connection failed (attempt {attempt+1}/3): {e}")
exceptOpenAIErrorase:
md(f"[ERROR] OpenAI API error: {e}")
returnNoneawaitasyncio.sleep(5)
md("[ERROR] Failed to connect after multiple attempts.")
returnNoneclassLMStudioLLM:
asyncdef_call(self, prompt: str, stop: Optional[List[str]] =None) ->str:
response=awaitlmstudio_llm(prompt)
ifresponseisNone:
raiseRuntimeError("LLM failed to generate a response.")
returnresponse
Error Log:
[DEBUG] LLM API Endpoint: http://192.168.X.XXX:1234/v1
DEBUG] LLM API Model: llama-3.2-3b-instruct
DEBUG] LLM API Key Exists: True
[ERROR] Empty response or no choices in response:
ChatCompletion(id=None, choices=None, created=None,
model=None, object=None, service_tier=None,
system_fingerprint=None, usage=None, error='Unexpected
endpoint or method. (POST /chat/completions)')
[ERROR] Empty response or no choices in response:
ChatCompletion(id=None, choices=None, created=None,
model=None, object=None, service_tier=None,
system_fingerprint=None, usage=None, error='Unexpected
endpoint or method. (POST /chat/completions)')
[ERROR] Empty response or no choices in response:
ChatCompletion(id=None, choices=None, created=None,
model=None, object=None, service_tier=None,
system_fingerprint=None, usage=None, error='Unexpected
endpoint or method. (POST /chat/completions)')
[ERROR] Failed to connect after multiple attempts.
INFO: 192.168.X.XXX:45184 - "GET /api/test_llm HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "backend/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "backend/.venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "backend/.venv/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/applications.py", line 112, in __call__
await self.middleware_stack(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "backend/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "backend/.venv/lib/python3.12/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "backend/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "backend/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "backend/app/routes/batch_documents.py", line 119, in test_llm
response1 = await llm._call(prompt2)
^^^^^^^^^^^^^^^^^^^^^^^^
File "backend/app/services/lmstudio_llm.py", line 63, in _call
raise RuntimeError("LLM failed to generate a response.")
RuntimeError: LLM failed to generate a response.
Steps to Reproduce
Set up LMStudio server with the specified configuration
Implement the FastAPI endpoint as shown in the code example
Make a GET request to the /test_llm endpoint
Observe the 500 error response
Additional Context
The same code works when tested directly by running that particular file
Multiple retry attempts result in the same error
I've tried multiple approaches including using both LMStudio from llama_index and direct AsyncOpenAI client
All configuration parameters match between working and non-working scenarios
Tested with Async and Sync and still the same result
Questions
Is there a specific way to configure FastAPI to work with LMStudio's OpenAI-compatible API?
Are there known issues with async OpenAI client usage in FastAPI applications?
Is there a recommended approach for integrating LMStudio with FastAPI?
The text was updated successfully, but these errors were encountered:
When attempting to use LMStudio with FastAPI and the OpenAI async client, the /chat/completions endpoint returns a 500 error, despite working correctly when accessed directly. The server returns a 200 status code but includes a 500 server error indicating an incorrect chat/completions endpoint.
Environment
Current Behavior
The API endpoint returns a 500 error with the message "Unexpected endpoint or method. (POST /chat/completions)" when called through FastAPI, despite the endpoint being correct and functional when tested directly.
Expected Behavior
The endpoint should process the request and return a valid response, as it does when tested directly through LMStudio's interface.
Example Code:
Error Log:
Steps to Reproduce
Additional Context
Questions
The text was updated successfully, but these errors were encountered: