Bug description
I ran LM Studio Local LLM API Server (OpenAI compatibility mode). ChatClient.call()
API works perfectly, but the ChatClient.stream()
does not work!
I enabled debug logs and also used the SimpleLoggerAdvisor
and saw this log for streaming:
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.web.servlet.DispatcherServlet : POST "/stream", parameters={}
DEBUG 54629 --- [streaming] [nio-8080-exec-6] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to dev.danvega.streaming.ChatController#chatWithStream(String)
DEBUG 54629 --- [streaming] [nio-8080-exec-6] m.m.a.RequestResponseBodyMethodProcessor : Read "application/json;charset=UTF-8" to ["{"message": "What is the largest country on Earth?"}"]
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.a.c.c.advisor.SimpleLoggerAdvisor : request: AdvisedRequest[chatModel=OpenAiChatModel [defaultOptions=OpenAiChatOptions: {"streamUsage":false,"model":"meta-llama-3.1-8b-instruct","temperature":0.7}], userText={"message": "What is the largest country on Earth?"}, systemText=, chatOptions=OpenAiChatOptions: {"streamUsage":false,"model":"meta-llama-3.1-8b-instruct","temperature":0.7}, media=[], functionNames=[], functionCallbacks=[], messages=[], userParams={}, systemParams={}, advisors=[org.springframework.ai.chat.client.advisor.observation.ObservableRequestResponseAdvisor@6e5f4d0], advisorParams={}]
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.w.r.f.client.ExchangeFunctions : [8220a7c] HTTP POST http://localhost:1234/v1/chat/completions
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.http.codec.json.Jackson2JsonEncoder : [8220a7c] Encoding [ChatCompletionRequest[messages=[ChatCompletionMessage[rawContent={"message": "What is the largest co (truncated)...]
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.w.c.request.async.WebAsyncManager : Started async request for "/stream"
DEBUG 54629 --- [streaming] [nio-8080-exec-6] o.s.web.servlet.DispatcherServlet : Exiting but response remains open for further handling
But I could not see anything in the LM Studio server log!
Environment
Spring AI version: 1.0.0-M6
Java version: 21
OS: macOS and Windows 11
Steps to reproduce
I used the spring-ai-openai-spring-boot-starter
lib with this configuration:
ai:
openai:
api-key: lm-studio
base-url: http://localhost:1234
chat:
options:
model: meta-llama-3.1-8b-instruct
embedding:
options:
model: text-embedding-nomic-embed-text-v1.5-embedding
Expected behavior
When I call LM Studio Local LLM API Server (OpenAI compatibility mode) using the curl command (with stream enabled), the ChatClient.call()
API also works with the above configuration.
Minimal Complete Reproducible example
You can run an LM Studio Local LLM API Server (OpenAI compatibility mode) and clone this example from @danvega and apply my my configs. You will see that the ChatClient.call()
works, but the stream()
does not!
Comment From: dev-jonghoonpark
LM Studio Local LLM API Server (OpenAI compatibility mode)
The link seems to be invalid, returning a 404 error.
Comment From: aperepel
It is still an issue as of today's latest version, it seems. With the regular call version, everything works. The streaming version doesn't even trigger the model load in LM Studio.
See the code below to reproduce (replace with any small model).
OpenAiApi openAiApi = OpenAiApi.builder()
.baseUrl("http://localhost:1234")
.apiKey(new NoopApiKey())
.restClientBuilder(RestClient.builder()
// Force HTTP/1.1 for both streaming and non-streaming
.requestFactory(new JdkClientHttpRequestFactory(HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.connectTimeout(Duration.ofSeconds(30))
.build())))
.build();
OpenAiChatModel chatModel = OpenAiChatModel.builder()
.openAiApi(openAiApi)
.defaultOptions(OpenAiChatOptions.builder()
.model("medgemma-27b-text-it@8bit").build())
.build();
// Non-streaming call works
// String response = ChatClient.builder(chatModel)
// .build()
// .prompt("Hello")
// .call()
// .content();
//
// System.out.println(response);
System.out.println("Starting streaming chat response...\n");
// streaming version hangs
ChatClient.builder(chatModel)
.build()
.prompt("Hello")
.stream()
.content()
.doOnNext(chunk -> {
System.out.print(chunk);
System.out.flush(); // Ensure immediate output
})
.doOnComplete(() -> System.out.println("\n\nStreaming completed."))
.doOnError(error -> System.err.println("Error during streaming: " + error.getMessage()))
.blockLast(); // Block to wait for the stream to complete
Comment From: jspw
I've successfully resolved this streaming issue with LM Studio after an incredible debugging journey that led to a surprising discovery.
The Initial Problem
After extensive testing, I initially thought the issue was about HTTP/1.1 vs HTTP/2 protocols or Spring AI's default HttpClient configuration that doesn't work properly with LM Studio's streaming implementation.
My Investigation Journey
I went through multiple theories and tests:
- Without custom config: Streaming fails completely ❌
- With custom Reactor Netty HttpClient: Streaming works perfectly ✅
- With Apache HttpClient 5: Streaming fails ❌
- Protocol doesn't matter: Whether I set HTTP/1.1, HTTP/2, or nothing at all - it works as long as I provide a custom Reactor Netty HttpClient
Even when I explicitly configured HTTP/2, the logs showed it gracefully falls back to HTTP/1.1 when connecting to LM Studio, and streaming works fine.
The Plot Twist 🤯
Here's where it gets CRAZY: I commented out ALL my custom configuration code to test something, and streaming still worked!
This led me to the real discovery...
The ACTUAL Solution (Mind-Blowing Simple)
The entire issue was simply a missing dependency! Just add this to your pom.xml
:
<dependency>
<groupId>io.projectreactor.netty</groupId>
<artifactId>reactor-netty-http</artifactId>
</dependency>
That's it! No custom configuration needed at all!
What Really Happened
- Spring AI requires Reactor Netty for streaming functionality
- Transitive dependency resolution sometimes doesn't include the correct version
- Missing/incompatible reactor-netty-http → Streaming fails with LM Studio
- Explicit dependency declaration → Correct version loaded → Everything works!
The Real Root Cause The problem was never about:
- ❌ HTTP protocol versions (HTTP/1.1 vs HTTP/2)
- ❌ Custom HttpClient configuration
- ❌ LM Studio compatibility issues
- ❌ Blocking vs reactive clients
It was simply: Missing reactor-netty-http dependency ✅
Comment From: jspw
I've successfully resolved this streaming issue with LM Studio after an incredible debugging journey that led to a surprising discovery.
The Initial Problem
After extensive testing, I initially thought the issue was about HTTP/1.1 vs HTTP/2 protocols or Spring AI's default HttpClient configuration that doesn't work properly with LM Studio's streaming implementation.
My Investigation Journey
I went through multiple theories and tests:
- Without custom config: Streaming fails completely ❌
- With custom Reactor Netty HttpClient: Streaming works perfectly ✅
- With Apache HttpClient 5: Streaming fails ❌
- Protocol doesn't matter: Whether I set HTTP/1.1, HTTP/2, or nothing at all - it works as long as I provide a custom Reactor Netty HttpClient
Even when I explicitly configured HTTP/2, the logs showed it gracefully falls back to HTTP/1.1 when connecting to LM Studio, and streaming works fine.
The Plot Twist 🤯
Here's where it gets CRAZY: I commented out ALL my custom configuration code to test something, and streaming still worked!
This led me to the real discovery...
The ACTUAL Solution (Mind-Blowing Simple)
The entire issue was simply a missing dependency! Just add this to your
pom.xml
:
That's it! No custom configuration needed at all! io.projectreactor.netty reactor-netty-http What Really Happened
- Spring AI requires Reactor Netty for streaming functionality
- Transitive dependency resolution sometimes doesn't include the correct version
- Missing/incompatible reactor-netty-http → Streaming fails with LM Studio
- Explicit dependency declaration → Correct version loaded → Everything works!
The Real Root Cause The problem was never about:
- ❌ HTTP protocol versions (HTTP/1.1 vs HTTP/2)
- ❌ Custom HttpClient configuration
- ❌ LM Studio compatibility issues
- ❌ Blocking vs reactive clients
It was simply: Missing reactor-netty-http dependency ✅
The fact is, SpringAI uses WebClient (spring-webflux) and WebClient uses reactor-netty-http's HttpClient for streaming. But "reactor-netty-http" is optional in spring-webflux project. So, we have to explicitly add the dependency there.