Problem

When using Spring AI's Google GenAI integration with tool calling and chat memory, the Google API sometimes returns multiple candidates/results, with some containing empty text content. Spring AI adds all these results (including the empty ones) to the conversation history via MessageChatMemoryAdvisor. When these empty messages are included in subsequent requests, Google's API rejects them with an error about missing "parts field".

Environment

  • Spring AI Version: 1.1.0-M3
  • Spring Boot Version: 3.5.6
  • Java Version: 25
  • Google GenAI Model: gemini-2.0-flash-001

Reproduction Steps

  1. Clone the reproduction repository: https://github.com/LiveNathan/cheapest-llm-tool-calling
  2. Set up environment variable: GEMINI_API_KEY
  3. Remove the EmptyMessageFilterAdvisor from GoogleNativeProvider
  4. Enable logging in logback-test.xml
  5. Run the test: ./mvnw test -Dtest=LlmToolCallingBenchmarkTest
  6. The test will fail on the 4th prompt in a multi-turn conversation with tool calling

Code to Reproduce

// Simplified version showing the issue
MockMixingConsoleService mockConsoleService = new MockMixingConsoleService();

GoogleGenAiChatModel chatModel = GoogleGenAiChatModel.builder()
    .genAiClient(Client.builder().apiKey(apiKey).build())
    .defaultOptions(GoogleGenAiChatOptions.builder()
        .model("gemini-2.0-flash-001")
        .temperature(0.1)
        .build())
    .build();

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(
        MessageChatMemoryAdvisor.builder(
            MessageWindowChatMemory.builder().build()
        ).build(),
        new SimpleLoggerAdvisor())
    .build();

// Multi-turn conversation with tool calling
chatClient.prompt()
    .user("Name channels 1-7: Kick, Snare, Hi-hat, Tom 1, Tom 2, Overheads L, Overheads R")
    .tools(mockConsoleService)
    .call();

chatClient.prompt()
    .user("Add bass on channel 8 and guitar on channel 9")
    .tools(mockConsoleService)
    .call();

chatClient.prompt()
    .user("What's on channel 6? Now swap it with what's on channel 9")
    .tools(mockConsoleService)
    .call();

// This 4th call fails
chatClient.prompt()
    .user("Add lead vocal on channel 12, backing vocals on 13-14")
    .tools(mockConsoleService)
    .call();

Error Output

2025-10-05T17:41:49.583-05:00 DEBUG --- SimpleLoggerAdvisor : response: {
  "results" : [ {
    "output" : {
      "text" : "OK. Channel 6 is currently assigned to Overheads L..."
    }
  }, {
    "output" : {
      "text" : ""  // <-- Empty text in second result
    }
  } ]
}

// Next request includes the empty message:
AssistantMessage [messageType=ASSISTANT, toolCalls=[], textContent=, ...]

// Google API rejects this:
java.lang.RuntimeException: Failed to generate content
Caused by: com.google.genai.errors.ClientException: 400 . Unable to submit request 
because it must include at least one parts field, which describes the prompt input.

Root Cause Analysis

  1. Google's Gemini API returns multiple candidates/results for a single request, particularly when tool calling is involved
  2. Some of these results have empty text content
  3. Spring AI's GoogleGenAiChatModel adds all results to the conversation history, including the empty ones
  4. MessageChatMemoryAdvisor preserves these empty messages in the chat memory
  5. When the empty AssistantMessage with textContent="" is sent in the next request, Google's API rejects it with a 400 error

Expected Behavior

Spring AI should: - Filter out empty assistant messages before adding them to conversation history, OR - Only use the first non-empty result from Google's response, OR - Handle multiple candidates properly without breaking subsequent requests

Workaround

Create a custom advisor to filter empty messages:

public class EmptyMessageFilterAdvisor implements CallAdvisor, StreamAdvisor {
    private static final Logger logger = LoggerFactory.getLogger(EmptyMessageFilterAdvisor.class);

    @Override
    public String getName() {
        return "EmptyMessageFilterAdvisor";
    }

    @Override
    public int getOrder() {
        return 0; // Execute early in the chain
    }

    @Override
    public ChatClientResponse adviseCall(ChatClientRequest request, CallAdvisorChain chain) {
        ChatClientResponse response = chain.nextCall(request);

        List<Generation> filteredResults = response.chatResponse().getResults().stream()
                .filter(generation -> {
                    if (generation.getOutput() instanceof AssistantMessage assistantMessage) {
                        String text = assistantMessage.getContent();
                        boolean isEmpty = text == null || text.trim().isEmpty();
                        if (isEmpty) {
                            logger.debug("Filtering out empty assistant message");
                        }
                        return !isEmpty;
                    }
                    return true;
                })
                .collect(Collectors.toList());

        if (filteredResults.size() != response.chatResponse().getResults().size()) {
            ChatResponse filteredChatResponse = new ChatResponse(
                    filteredResults, 
                    response.chatResponse().getMetadata()
            );
            return ChatClientResponse.builder()
                    .chatResponse(filteredChatResponse)
                    .build();
        }

        return response;
    }

    @Override
    public Flux<ChatClientResponse> adviseStream(ChatClientRequest request, StreamAdvisorChain chain) {
        return chain.nextStream(request)
                .filter(response -> {
                    if (response.chatResponse() != null && 
                        response.chatResponse().getResult() != null &&
                        response.chatResponse().getResult().getOutput() instanceof AssistantMessage assistantMessage) {
                        String text = assistantMessage.getContent();
                        return text != null && !text.trim().isEmpty();
                    }
                    return true;
                });
    }
}

Then add it to the ChatClient:

ChatClient.builder(chatModel)
    .defaultAdvisors(
        new EmptyMessageFilterAdvisor(),  // Add this FIRST
        MessageChatMemoryAdvisor.builder(MessageWindowChatMemory.builder().build()).build(),
        new SimpleLoggerAdvisor())
    .build();

Suggested Fix

The issue could be fixed in GoogleGenAiChatModel by: 1. Filtering out empty results before returning the ChatResponse 2. Only including the first non-empty result in the response 3. Or, ensuring empty assistant messages are handled correctly when constructing requests to Google's API

Additional Notes

  • Setting candidateCount(1) does not resolve the issue, as Google can still return multiple results with one being empty
  • This issue only manifests in multi-turn conversations with tool calling and memory enabled
  • The issue has been confirmed with gemini-2.0-flash-001 model
  • Full reproduction code and benchmark tests are available at: https://github.com/LiveNathan/cheapest-llm-tool-calling