Spring AI [Feature Request] Move Tool Execution from ChatModel Internal Recursion to ChatClient Advisor Layer

Problem Statement

The current architecture implements tool calling as internal recursion within ChatModel implementations, which causes several critical issues that have been reported in #2101 and #4979:

1. Tool Call Messages Are Lost

When multiple rounds of tool execution occur, intermediate messages (AssistantMessage with toolCalls and ToolResponseMessage) are not persisted in the final ChatResponse.

Current Flow:

ChatClient → ChatModel.call()
              ↓ (Hidden recursion)
              internalCall() → executeTools() → internalCall() → ...
              ↓
              ChatResponse (only final result, tool messages lost)

Impact: - ChatMemory cannot track tool executions (#2101) - Advisors cannot observe tool calling process (#4979) - Debugging multi-step tool executions is nearly impossible - Audit trails are incomplete

2. Poor Observability

Advisors operate at the ChatClient level but cannot intercept tool executions happening inside ChatModel:

public class LoggingAdvisor implements CallAdvisor {
    @Override
    public ChatClientResponse adviseCall(ChatClientRequest request, CallAdvisorChain chain) {
        ChatClientResponse response = chain.nextCall(request);

        // ❌ Cannot see:
        // - Which tools were called
        // - Tool arguments
        // - Tool execution results
        // - Tool execution time

        return response;
    }
}

3. Increased Implementation Complexity

Every ChatModel implementation must handle tool calling logic independently:

OpenAiChatModel - 100+ lines of tool handling code
OllamaChatModel - duplicate logic
AnthropicChatModel - duplicate logic
VertexAiGeminiChatModel - duplicate logic

This violates DRY principles and increases maintenance burden.

4. Inconsistent with Spring Ecosystem

Spring's interceptor/filter patterns allow full lifecycle observation, but Spring AI's current design hides critical processing steps:

// Spring MVC - full lifecycle visibility ✅
public class LoggingInterceptor implements HandlerInterceptor {
    public void preHandle() { }
    public void postHandle() { }    // Can observe all processing
    public void afterCompletion() { }
}

// Spring AI - hidden tool execution ❌
public class LoggingAdvisor implements CallAdvisor {
    // Cannot observe tool execution happening in ChatModel
}

Proposed Solution

Move tool execution logic from ChatModel internal recursion to a dedicated ToolExecutionAdvisor at the ChatClient layer.

Architecture Change

Before (Internal Recursion):

ChatClient (Advisors)
    ↓
ChatModel.call()
    ↓ (Black box)
    internalCall() → Tool Execution → internalCall() (recursive)
    ↓
ChatResponse (incomplete)

After (External Recursion):

ChatClient
    ↓
Advisor Chain:
    1. MessageChatMemoryAdvisor (load history)
    2. ToolExecutionAdvisor (NEW!)
        ↓
        ChatModel.call() (simple AI call, no tool handling)
        ↓
        Detect tool calls?
            Yes → Execute tools → Generate ToolResponseMessage
                → Recursive call to ChatClient (external recursion)
            No → Return response
    3. MessageChatMemoryAdvisor (save history) ✅
    4. LoggingAdvisor (observe everything) ✅
    ↓
ChatResponse (complete with all messages)

Pseudocode Implementation

1. Simplified ChatModel (No Tool Handling)

public class OpenAiChatModel implements ChatModel {

    @Override
    public ChatResponse call(Prompt prompt) {
        // Only call OpenAI API, no tool execution logic
        ChatCompletionRequest request = createRequest(prompt);
        ChatCompletion completion = openAiApi.chatCompletion(request);

        // Return response directly, even if it contains tool calls
        return buildChatResponse(completion);

        // ✅ Removed: 100+ lines of tool execution and recursive logic
    }
}

2. New ToolExecutionAdvisor

public class ToolExecutionAdvisor implements CallAdvisor {

    private final ToolCallingManager toolCallingManager;
    private final int maxIterations;

    @Override
    public int getOrder() {
        // Execute after MessageChatMemoryAdvisor loads history
        return MessageChatMemoryAdvisor.DEFAULT_ORDER + 10;
    }

    @Override
    public ChatClientResponse adviseCall(
            ChatClientRequest request, 
            CallAdvisorChain chain) {

        return executeWithToolSupport(request, chain, 0);
    }

    private ChatClientResponse executeWithToolSupport(
            ChatClientRequest request,
            CallAdvisorChain chain,
            int iteration) {

        // Prevent infinite loops
        if (iteration >= maxIterations) {
            throw new IllegalStateException("Max tool iterations exceeded");
        }

        // Call next advisor (eventually reaches ChatModel)
        ChatClientResponse response = chain.nextCall(request);

        // Check for tool calls
        AssistantMessage message = response.chatResponse()
            .getResult()
            .getOutput();

        if (CollectionUtils.isEmpty(message.getToolCalls())) {
            // No tool calls, return response
            return response;
        }

        // Execute tools
        ToolExecutionResult toolResult = toolCallingManager.executeToolCalls(
            request.prompt(), 
            response.chatResponse()
        );

        // Build conversation history with tool messages
        List<Message> conversationHistory = toolResult.conversationHistory();

        // Update context (allows MessageChatMemoryAdvisor to save tool messages)
        Map<String, Object> updatedContext = new HashMap<>(request.context());
        updatedContext.put("conversationHistory", conversationHistory);

        if (toolResult.returnDirect()) {
            // Return tool result directly
            return ChatClientResponse.builder()
                .from(response)
                .chatResponse(buildToolResultResponse(toolResult))
                .advisorContext(updatedContext)
                .build();
        }

        // Recursive call with tool results (EXTERNAL RECURSION)
        ChatClientRequest newRequest = ChatClientRequest.builder()
            .from(request)
            .prompt(new Prompt(conversationHistory, request.prompt().getOptions()))
            .context(updatedContext)
            .build();

        // Re-enter advisor chain
        return executeWithToolSupport(newRequest, chain, iteration + 1);
    }
}

3. Configuration

@Configuration
public class ChatClientConfig {

    @Bean
    public ChatClient chatClient(
            ChatModel chatModel,
            ChatMemory chatMemory,
            ToolCallingManager toolCallingManager) {

        return ChatClient.builder(chatModel)
            .defaultAdvisors(
                // 1. Load conversation history
                MessageChatMemoryAdvisor.builder()
                    .chatMemory(chatMemory)
                    .order(100)
                    .build(),

                // 2. Handle tool execution (NEW!)
                ToolExecutionAdvisor.builder(toolCallingManager)
                    .maxIterations(10)
                    .order(110)
                    .build(),

                // 3. Logging - now can observe everything!
                new LoggingAdvisor()
                    .order(120)
            )
            .build();
    }
}

Benefits

✅ 1. Complete Message Persistence

All tool-related messages automatically flow through MessageChatMemoryAdvisor:

User: "Set an alarm for 10 AM"
  ↓ (saved by Memory Advisor)
Assistant: ToolCall[SetAlarm(time="10:00")]
  ↓ (saved by Memory Advisor)
Tool: "Alarm set successfully"
  ↓ (saved by Memory Advisor)
Assistant: "I've set your alarm for 10 AM"
  ↓ (saved by Memory Advisor)

Fixes #2101

✅ 2. Full Observability

Advisors can now observe the complete tool execution lifecycle:

public class LoggingAdvisor implements CallAdvisor {
    @Override
    public ChatClientResponse adviseCall(...) {
        ChatClientResponse response = chain.nextCall(request);

        // ✅ Can now see:
        AssistantMessage message = response.chatResponse().getResult().getOutput();
        message.getToolCalls().forEach(toolCall -> {
            log.info("Tool: {}, Args: {}", toolCall.name(), toolCall.arguments());
        });

        // ✅ Can see tool results from context
        List<Message> history = response.advisorContext().get("conversationHistory");
        // Extract and log ToolResponseMessages

        return response;
    }
}

Fixes #4979

✅ 3. Simplified ChatModel Implementations

Each model implementation reduces from ~150 lines to ~20 lines:

// Before: OpenAiChatModel (150+ lines)
- API call
- Tool detection
- Tool execution
- Recursive calls
- Error handling

// After: OpenAiChatModel (20 lines)
- API call
- Return response

✅ 4. Flexible Tool Execution Strategies

Users can customize tool execution behavior:

ToolExecutionAdvisor.builder(toolManager)
    .maxIterations(5)
    .timeout(Duration.ofSeconds(30))
    .onToolCall(toolCall -> auditLog.record(toolCall))
    .onToolError(error -> alertService.send(error))
    .build()

✅ 5. Tool Approval Workflows

Easy to implement approval flows:

public class ToolApprovalAdvisor implements CallAdvisor {
    @Override
    public ChatClientResponse adviseCall(...) {
        ChatClientResponse response = chain.nextCall(request);

        // Check for tool calls
        if (hasToolCalls(response)) {
            // Request user approval before ToolExecutionAdvisor executes
            if (!userApprovalService.approve(response.getToolCalls())) {
                throw new ToolExecutionDeniedException();
            }
        }

        return response;
    }
}

// Order: ToolApprovalAdvisor (105) → ToolExecutionAdvisor (110)

✅ 6. Consistent with Spring Patterns

Aligns with Spring's interceptor/filter design philosophy where all processing is observable.

Migration Path

Phase 1: Introduce New API (Backward Compatible)

Add ToolExecutionAdvisor
Keep existing ChatModel tool handling for backward compatibility
Users can opt-in to new approach

Phase 2: Deprecate Old Approach

@Deprecated(since = "2.0", forRemoval = true)
protected ChatResponse handleToolCalls(Prompt prompt, ChatResponse response) {
    // Old internal tool handling logic
}

Phase 3: Remove (Spring AI 3.0)

Completely remove internal tool calling logic from ChatModel implementations.

Related Issues

2101 - Tool call messages not persisted in chat memory (24 months old)
4979 - Advisors cannot observe tool execution (8 months old)

Conclusion

This architectural change transforms tool execution from a hidden internal process to a visible, controllable external process, providing:

Complete message persistence
Full observability
Simplified implementation
Flexible customization
Consistency with Spring design patterns

This is not just a bug fix—it's a fundamental architectural improvement that resolves multiple long-standing issues while simplifying the framework.

Comment From: myifeng

Comment From: ilayaperumalg

@Tangerg Thanks for the detailed inputs.

I assume you are already aware of the existing ToolCallAdvisor which takes care of handling the tool execution via recursive advisor approach.

While Spring AI, by default, uses the underlying model's internal recursive approach, one can switch to the ToolCallAdvisor to make use of advisor layer advantages.