1、配置文件 spring: ai: openai: base-url: http://192.168.1.123:11434 chat: options: model: qwen2.5:7b-instruct-fp16 stream-usage: true
mcp:
client:
type: async
sse:
connections:
mcp-server-chart:
url: http://localhost:1122/
2、代码 @Primary @Bean public ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory, ToolCallbackProvider tools) { return builder.defaultSystem(""" 你是一个智能人工助手 """) .defaultAdvisors( MessageChatMemoryAdvisor.builder(chatMemory).build(), new SimpleLoggerAdvisor( ModelOptionsUtils::toJsonStringPrettyPrinter, ModelOptionsUtils::toJsonStringPrettyPrinter, 0 ) ) .defaultToolCallbacks(tools) .build(); }
如果注释调.defaultToolCallbacks(tools) 这个。 就可以流返回。 如果加上后。AI回复的结果就变成了一次全返回
Comment From: sunyuhan1998
Hi @benxiaohai061 ! I have a very simple MCP server-side and client-side example that uses Ollama to serve the model just like you do: MCP-EXAMPLE, but in my example the streaming output is returned correctly with or without defaultToolCallbacks(tools)
, so maybe you can help you locate the problem by comparing it.If your problem persists, can you provide a minimized example project and maybe I can help you locate the problem.
Comment From: jichengda
我也遇到了同样的问题,如果不使用toos,向AI提问的时候是可以正常流式返回的。如果使用了toos,向AI提问的时候就不可以六十返回了。
Comment From: jichengda
@GetMapping(path = "/stream", produces = "text/html;charset=utf-8")
public Flux<String> stream(String question, String chatId) {
return chatClient.prompt().user(question).stream().content(); // 这样是正常的流式响应
return chatClient.prompt().user(question).tools(new DataTools()).stream().content(); // 这样不是流式响应,阻塞式
}