Spring AI Model Tool Calls not persisted in chat memory

Bug description When adding a MessageChatMemoryAdvisor to my chat client. I would expect it to persist all messages, including the tool calls that the model makes. However, when getting the final chatResponse, I can only see the user message and the final model response in the chat memory.

Environment Java 21, SpringAI M4, InMemoryChatMemory

Steps to reproduce 1. Create a chat client with at least one tool call and add a MessageChatMemoryAdvisor with an InMemoryChatMemory instance passed in to it. 2. Make a user query which invokes the tool call 3. Investigate the ChatMemory instance and see that only the user input and the model response are included in the history.

Expected behavior I expected to also be able to see the model tool calls in the ChatHistory

Comment From: leooooow

Yes, I've noticed that in the current code implementation, if the message type is 'tool', the process of calling the tool and sending the tool result to the ai model is done recursively internally, without being exposed externally.

//org.springframework.ai.openai.OpenAiChatModel#internalCall
if (!isProxyToolCalls(prompt, this.defaultOptions)
                && isToolCall(response, Set.of(OpenAiApi.ChatCompletionFinishReason.TOOL_CALLS.name(),
                        OpenAiApi.ChatCompletionFinishReason.STOP.name()))) {
    var toolCallConversation = handleToolCalls(prompt, response);
    // Recursively call the call method with the tool call message
    // conversation that contains the call responses.
    return this.internalCall(new Prompt(toolCallConversation, prompt.getOptions()), response);
        }

My current approach is to define a ToolWrapper and register it in the context, so that when a tool is called, it can persist messages of the 'tool' type.

@Override
    public String call(String functionInput, ToolContext toolContext) {

        String response = tool.call(functionInput, toolContext);
         saveToolResultMessage(toolContext, toolCallId, response);
        return response;
    }

Comment From: ThomasVitale

Thanks for raising this issue. The memory implementation in ChatClient doesn't currently support storing the intermediate tool messages, but work is in progress to add that support.

In the meantime, there are two possible ways to get access to those tool-related messages:

Handle the tool execution logic externally. This scenario is describe in the docs: Framework-Controlled Tool Execution vs. User-Controlled Tool Execution.
Extract the tool messages from the ToolContext, as also suggested by @leooooow. This scenario is already supported by the framework, which populates the ToolContext with the entire conversation history up to the tool call. From within a tool, you can extract the message history directly as follows:

class CustomerTools {

    @Tool(description = "Retrieve customer information")
    Customer getCustomerInfo(Long id, ToolContext toolContext) {
        List<Message> toolCallHistory = toolContext.getToolCallHistory();
        // Do something with the toolCallHistory
        return customerRepository.findById(id, toolContext.get("tenantId"));
    }

}

Comment From: rwankar

My current approach is to define a ToolWrapper and register it in the context, so that when a tool is called, it can persist messages of the 'tool' type.

Can you describe this approach in detail please? I'm using the low level API to define FunctionToolCallback and instead of handling the toolCallHistory in every tool I would prefer to have this ToolWrapper so I can have the code in just one place.

Comment From: rwankar

I wrote a wrapper around BiFunction<> to intercept the calls. However toolContext.getToolCallHistory(); returns null.

Also, if I have 2 tools that the LLM calls in succession for the same user prompt, then is the tool history per tool or is it per thread?

For example, tool1 is called. At this moment history will be null. Now tool2 is called (within the same thread). At this point will it show the call to tool1 or will it be null since tool2 has never been called before?

I'm now attempting to try the "User-Controlled tool execution" steps.

Comment From: ls-rein-martha

Spring AI v1.0.0

I use manual tool execution as well for workaround, at streaming chat model security context also lost, for anyone who uses it, you will need to use manual tool calling for now.

One of the problem is when calling the prompt manually for tool call, the chat memory will duplicate the user message, thus, I added CustomChatMemoryAdvisor, basically copy and paste MessageChatMemoryAdvisor

Add isUserPrompt attribute so you can check whether it is a real user prompt or a tool calling. Add it to the builder:

        private boolean isUserPrompt = true;
        public CustomMessageChatMemoryAdvisor.Builder isUserPrompt(boolean isUserPrompt) {
            this.isUserPrompt = isUserPrompt;
            return this;
        }

Then simply add the below code at before(...) method (before step number 4 comment):


        // 3.5. CUSTOM Check to skip adding user message to memory if it's not a user prompt
        if (!isUserPrompt) {
            return processedChatClientRequest;
        }

At your call prompt for the tool, just set it to false since it is a tool calling and not a real user prompt.

WARN: You will need to handle additional stuffs if you have tool approval feature.

Comment From: LiveNathan

This is a critical production blocker

I encountered the same issue, and this essentially makes Spring AI unusable for production tool-calling applications.

The Problem

When tool calls aren't persisted in chat memory, subsequent requests completely break: 1. First request: LLM calls tool correctly 2. Tool executes, response stored as "Channel 1 name changed"
3. Second request: LLM sees the previous response in history but not the tool call 4. LLM assumes it can skip the tool and just responds with text like "Channel 2 name changed" 5. No actual tool execution occurs

Impact

Can't use the same tool twice in a conversation
LLM essentially "lies" about performing actions it never executed
Completely breaks any stateful tool-based workflows
Forces us to disable chat memory entirely, degrading user experience

Reproduction

// First call works
"change channel 1 name to X" → Tool called ✓
// All subsequent calls fail  
"change channel 2 name to Y" → No tool call, just text response ✗

This isn't an edge case - it's the core functionality of tool calling with conversation context. Without this fix, we're forced to either:

Disable memory (poor UX)
Clear memory between each request (defeats the purpose)
Switch to another framework

Priority: Critical - This blocks any production deployment of tool-calling applications.

Comment From: sonus21

We are also facing a severe production issue in the lead capture flow. Currently, tool executions are not properly tracked, causing the same tool to be invoked repeatedly even after a successful execution.

Impact: Simple flows like lead capture fail because create_lead is called multiple times for the same user. Tools that depend on the previous state, such as schedule_appointment, also fail because they attempt to operate on a lead that already exists, leading to errors or duplicate records.

Comment From: LiveNathan

Update on my September 9 comment:

The original issue is still valid—tool calls are not being stored in MessageWindowChatMemory. However, my assessment that this is a "critical production blocker" was incorrect.

What I discovered: Despite tool calls not being persisted in memory, multi-turn tool calling conversations work reliably in practice. I ran extensive benchmarks (9 LLM configurations, 5 iterations each, 4-6 prompt sequences) and found:

Google Gemini 2.0 Flash: 100% success rate, 85-100% accuracy
OpenAI GPT-4o-mini: 100% success rate, 64-100% accuracy
DeepSeek: 100% success rate, 46-100% accuracy

Test scenarios included: - Sequential tool calls building on previous state - Reading and modifying based on earlier tool results - 6-prompt conversations with 14+ tool operations

The issue exists, but doesn't break functionality. While tool calls aren't in the memory object (as documented), the LLMs maintain sufficient context through the conversation to make correct subsequent tool calls, or at least this is my assumption.

Resources: - Benchmark code: https://github.com/LiveNathan/cheapest-llm-tool-calling - Full results: https://open.substack.com/pub/nathanlively/p/i-benchmarked-9-llms-to-find-cheapest-for-multi-turn-tool-calling-with-spring-ai

Note: Provider choice matters—some models (like Groq Llama 3.3 70b) had reliability issues at scale, but Google/OpenAI/DeepSeek worked consistently.

Comment From: ilayaperumalg

@LiveNathan Thanks for the detailed reports and benchmarking!