Spring AI Streaming mode AssistantMessage does not retain toolCalls, causing issues with tool confirmation workflows

🐞 Bug description When using the streaming mode (Flux) to call a model that returns tool calls, the aggregated AssistantMessage constructed by MessageAggregator does not contain the toolCalls property. This makes it impossible to retrieve tool call information from previous assistant messages stored in memory.

This behavior becomes particularly problematic when internalToolExecutionEnabled=false, where tool execution is intended to be controlled manually by the user. In such workflows, it's necessary to retrieve tool call information from the last assistant message in memory, but that data is missing due to the above issue.

Note: This issue is not caused by setting internalToolExecutionEnabled=false. Instead, the issue is exacerbated by it, since downstream components rely on consistent toolCalls data across both streaming and non-streaming modes.

💻 Environment Spring AI Version: 1.0.0

Java Version: 17

Model: Qwen2.5-72B-Instruct

Usage Mode: Streaming (Flux)

Tool Execution Mode: internalToolExecutionEnabled=false

Vector store: Not involved

🪜 Steps to reproduce Configure a Spring AI chat client with streaming mode enabled.

Ensure that the response from the model includes tool calls.

Use org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;

Inspect the AssistantMessage – toolCalls is missing.

Attempt to retrieve tool calls from memory (e.g., ChatMemory#getMessages) – fails.

✅ Expected behavior The toolCalls property from the GenerationMetadata should be correctly propagated to the resulting AssistantMessage, regardless of whether streaming or non-streaming mode is used. This ensures consistent memory behavior and supports downstream workflows such as manual tool execution confirmation.

🧪 Minimal Complete Reproducible example

My AI Config: @Bean public ChatMemoryRepository chatMemoryRepository() { return new InMemoryChatMemoryRepository(); }

@Bean
public ChatMemory chatMemory(ChatMemoryRepository chatMemoryRepository) {
    return MessageWindowChatMemory.builder().maxMessages(10).chatMemoryRepository(chatMemoryRepository).build();
}

@Bean
public OpenAiChatModel chatModel(OpenAiApi openAiApi, ToolCallingManager toolCallingManager, List<AgentToolsProvider> agentToolsProviders) {
    AgentToolsProvider[] providers = agentToolsProviders.toArray(new AgentToolsProvider[0]);
    ToolCallback[] toolCallbacks = ToolCallbacks.from((Object[]) providers);

    OpenAiChatOptions chatOptions = OpenAiChatOptions.builder()
            .temperature(0.6)
            .model("qwen2.5-72b-instruct")
            .internalToolExecutionEnabled(false)
            .toolCallbacks(toolCallbacks)
            .build();
    return OpenAiChatModel.builder()
            .defaultOptions(chatOptions)
            .toolCallingManager(toolCallingManager)
            .openAiApi(openAiApi)
            .build();
}

@Bean
ChatClient chatClient(OpenAiChatModel chatModel, ChatMemory chatMemory) {
    return ChatClient.builder(chatModel)
            .defaultAdvisors(
                    new SimpleLoggerAdvisor(),
                    MessageChatMemoryAdvisor.builder(chatMemory).build()
            )
            .defaultSystem(systemResource)
            .build();

}

Just chat with AI:

private Flux<ChatResponse> callWithMemory(String conversationId, String userText) {
    Prompt promptWithMemory = new Prompt(chatMemory.get(conversationId), chatModel.getDefaultOptions());
    return client.prompt(promptWithMemory)
            .user(userText)
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId))
            .stream()
            .chatResponse();
}