Bug description
When adding a MessageChatMemoryAdvisor
to my chat client. I would expect it to persist all messages, including the tool calls that the model makes. However, when getting the final chatResponse
, I can only see the user message and the final model response in the chat memory.
Environment Java 21, SpringAI M4, InMemoryChatMemory
Steps to reproduce
1. Create a chat client with at least one tool call and add a MessageChatMemoryAdvisor
with an InMemoryChatMemory
instance passed in to it.
2. Make a user query which invokes the tool call
3. Investigate the ChatMemory instance and see that only the user input and the model response are included in the history.
Expected behavior I expected to also be able to see the model tool calls in the ChatHistory
Comment From: leooooow
Yes, I've noticed that in the current code implementation, if the message type is 'tool', the process of calling the tool and sending the tool result to the ai model is done recursively internally, without being exposed externally.
//org.springframework.ai.openai.OpenAiChatModel#internalCall
if (!isProxyToolCalls(prompt, this.defaultOptions)
&& isToolCall(response, Set.of(OpenAiApi.ChatCompletionFinishReason.TOOL_CALLS.name(),
OpenAiApi.ChatCompletionFinishReason.STOP.name()))) {
var toolCallConversation = handleToolCalls(prompt, response);
// Recursively call the call method with the tool call message
// conversation that contains the call responses.
return this.internalCall(new Prompt(toolCallConversation, prompt.getOptions()), response);
}
My current approach is to define a ToolWrapper and register it in the context, so that when a tool is called, it can persist messages of the 'tool' type.
@Override
public String call(String functionInput, ToolContext toolContext) {
String response = tool.call(functionInput, toolContext);
saveToolResultMessage(toolContext, toolCallId, response);
return response;
}
Comment From: ThomasVitale
Thanks for raising this issue. The memory implementation in ChatClient
doesn't currently support storing the intermediate tool messages, but work is in progress to add that support.
In the meantime, there are two possible ways to get access to those tool-related messages:
- Handle the tool execution logic externally. This scenario is describe in the docs: Framework-Controlled Tool Execution vs. User-Controlled Tool Execution.
- Extract the tool messages from the
ToolContext
, as also suggested by @leooooow. This scenario is already supported by the framework, which populates theToolContext
with the entire conversation history up to the tool call. From within a tool, you can extract the message history directly as follows:
class CustomerTools {
@Tool(description = "Retrieve customer information")
Customer getCustomerInfo(Long id, ToolContext toolContext) {
List<Message> toolCallHistory = toolContext.getToolCallHistory();
// Do something with the toolCallHistory
return customerRepository.findById(id, toolContext.get("tenantId"));
}
}
Comment From: rwankar
My current approach is to define a ToolWrapper and register it in the context, so that when a tool is called, it can persist messages of the 'tool' type.
Can you describe this approach in detail please? I'm using the low level API to define FunctionToolCallback and instead of handling the toolCallHistory in every tool I would prefer to have this ToolWrapper so I can have the code in just one place.
Comment From: rwankar
I wrote a wrapper around BiFunction<> to intercept the calls. However toolContext.getToolCallHistory();
returns null.
Also, if I have 2 tools that the LLM calls in succession for the same user prompt, then is the tool history per tool or is it per thread?
For example, tool1 is called. At this moment history will be null. Now tool2 is called (within the same thread). At this point will it show the call to tool1 or will it be null since tool2 has never been called before?
I'm now attempting to try the "User-Controlled tool execution" steps.
Comment From: ls-rein-martha
Spring AI v1.0.0
I use manual tool execution as well for workaround, at streaming chat model security context also lost, for anyone who uses it, you will need to use manual tool calling for now.
One of the problem is when calling the prompt manually for tool call, the chat memory will duplicate the user message, thus, I added CustomChatMemoryAdvisor, basically copy and paste MessageChatMemoryAdvisor
Add isUserPrompt
attribute so you can check whether it is a real user prompt or a tool calling.
Add it to the builder:
private boolean isUserPrompt = true;
public CustomMessageChatMemoryAdvisor.Builder isUserPrompt(boolean isUserPrompt) {
this.isUserPrompt = isUserPrompt;
return this;
}
Then simply add the below code at before(...)
method (before step number 4 comment):
// 3.5. CUSTOM Check to skip adding user message to memory if it's not a user prompt
if (!isUserPrompt) {
return processedChatClientRequest;
}
At your call prompt for the tool, just set it to false since it is a tool calling and not a real user prompt.
WARN: You will need to handle additional stuffs if you have tool approval feature.
Comment From: LiveNathan
This is a critical production blocker
I encountered the same issue, and this essentially makes Spring AI unusable for production tool-calling applications.
The Problem
When tool calls aren't persisted in chat memory, subsequent requests completely break:
1. First request: LLM calls tool correctly
2. Tool executes, response stored as "Channel 1 name changed"
3. Second request: LLM sees the previous response in history but not the tool call
4. LLM assumes it can skip the tool and just responds with text like "Channel 2 name changed"
5. No actual tool execution occurs
Impact
- Can't use the same tool twice in a conversation
- LLM essentially "lies" about performing actions it never executed
- Completely breaks any stateful tool-based workflows
- Forces us to disable chat memory entirely, degrading user experience
Reproduction
// First call works
"change channel 1 name to X" → Tool called ✓
// All subsequent calls fail
"change channel 2 name to Y" → No tool call, just text response ✗
This isn't an edge case - it's the core functionality of tool calling with conversation context. Without this fix, we're forced to either:
- Disable memory (poor UX)
- Clear memory between each request (defeats the purpose)
- Switch to another framework
Priority: Critical - This blocks any production deployment of tool-calling applications.