1、配置文件 spring: ai: openai: base-url: http://192.168.1.123:11434 chat: options: model: qwen2.5:7b-instruct-fp16 stream-usage: true

mcp:
  client:
    type: async
    sse:
      connections:
        mcp-server-chart:
          url: http://localhost:1122/

2、代码 @Primary @Bean public ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory, ToolCallbackProvider tools) { return builder.defaultSystem(""" 你是一个智能人工助手 """) .defaultAdvisors( MessageChatMemoryAdvisor.builder(chatMemory).build(), new SimpleLoggerAdvisor( ModelOptionsUtils::toJsonStringPrettyPrinter, ModelOptionsUtils::toJsonStringPrettyPrinter, 0 ) ) .defaultToolCallbacks(tools) .build(); }

如果注释调.defaultToolCallbacks(tools) 这个。 就可以流返回。 如果加上后。AI回复的结果就变成了一次全返回

Comment From: sunyuhan1998

Hi @benxiaohai061 ! I have a very simple MCP server-side and client-side example that uses Ollama to serve the model just like you do: MCP-EXAMPLE, but in my example the streaming output is returned correctly with or without defaultToolCallbacks(tools), so maybe you can help you locate the problem by comparing it.If your problem persists, can you provide a minimized example project and maybe I can help you locate the problem.

Comment From: jichengda

我也遇到了同样的问题,如果不使用toos,向AI提问的时候是可以正常流式返回的。如果使用了toos,向AI提问的时候就不可以六十返回了。

Comment From: jichengda

@GetMapping(path = "/stream", produces = "text/html;charset=utf-8")
public Flux<String> stream(String question, String chatId) {
    return chatClient.prompt().user(question).stream().content(); // 这样是正常的流式响应
    return chatClient.prompt().user(question).tools(new DataTools()).stream().content(); // 这样不是流式响应,阻塞式
}