Spring AI Gemini and function calling: function call may be discarded.

Bug description

This is linked to the PR https://github.com/spring-projects/spring-ai/pull/2029 where I try to fix this.

The gemini can respond multiple part, some containing test, some tool call. A response can be like:

part 1: message: " I understand what you want to do, I will check ..."
part 2: function call
part 3: message "if it is not enough you will need to check yourself .... "

Today, the response is processed by spring-ai in a way that function are executed only if all part of the response are function call.

It seems it is ok to switch the code https://github.com/spring-projects/spring-ai/blob/4fc6edd80c42801ab8aec6530c34a32c73604390/models/spring-ai-vertex-ai-gemini/src/main/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatModel.java#L598 to use 'anyMatch' instead of 'allMatch' to check for function call in the response. As long there is a function call in the response, it should be executed.

Also, the response send back to the system is filtered, I think the response should send back all part returned by the API ( https://github.com/spring-projects/spring-ai/blob/4fc6edd80c42801ab8aec6530c34a32c73604390/models/spring-ai-vertex-ai-gemini/src/main/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatModel.java#L600 )

Environment

Spring-ai 1.0.0-M6 Gemini api (2.0 flash)

Steps to reproduce

Use springAI with a gemini model, add the tool "CurrentWeatherService" and prompt:

The procedure can be either to check the temperature for a city and then if the temperature you will need to update the fan speed from 0-100 depending on the temperature 0-30. Explain the procedure is more clean word and process it for the city Tokyo

It should trigger a response with 2 parts: 1 small message and 1 function call.

Expected behavior

SpringAI should execute function if there is one in a part of the response.

Also, the return should contains all the generation and all the part. Currently in the event of I push a prompt, genAI ask for a function call, the prompt with the function execution is pushed, only the last generation is send back, which means if we continue the chat, some data are missing.

Minimal Complete Reproducible example

Use springAI with a gemini model, add the tool "CurrentWeatherService" and prompt:

The procedure can be either to check the temperature for a city and then if the temperature you will need to update the fan speed from 0-100 depending on the temperature 0-30. Explain the procedure is more clean word and process it for the city Tokyo

It should trigger a response with 2 parts: 1 small message and 1 function call.

Comment From: thomasflad

@GregoireW Could this be related to the issue I have here

Comment From: GregoireW

@thomasflad to be 100% you would have to set up a break point on the code I set in the first message and check.

But if you have a text message (can be empty) as response of you call, then no function call would have been made even if gemini ask for it so if you have that... I guess this can be related to your issue.

Comment From: mands

Hitting the same issue, when Gemini returns a text part along with a function call part, only the text part is returned in the AssistantMessage.

The change above does work, but results in dropping the text message afaik. Luckily the core method, VertexAiGeminiChatModel#responseCandidateToGeneration is protected so it's possible to override and provide your own implementation of VertexAiGeminiChatModel.

The following works for me, returning all text and function call parts from a Vertex response,

  @Override
  protected List<Generation> responseCandidateToGeneration(Candidate candidate) {
    // TODO - The candidateIndex (e.g. choice must be assigned to the generation).
    int candidateIndex = candidate.getIndex();
    var candidateFinishReason = candidate.getFinishReason();

    Map<String, Object> messageMetadata = Map.of(
      "candidateIndex", candidateIndex,
      "finishReason", candidateFinishReason
    );

    var chatGenerationMetadata = ChatGenerationMetadata.builder()
      .finishReason(candidateFinishReason.name())
      .build();

    return candidate.getContent().getPartsList().stream().map(part -> {
      AssistantMessage assistantMessage;
      if (part.hasFunctionCall()) {
        FunctionCall functionCall = part.getFunctionCall();
        var functionName = functionCall.getName();
        var functionArguments = structToJson(functionCall.getArgs());
        var assistantToolCalls = new AssistantMessage.ToolCall("", "function", functionName, functionArguments);
        assistantMessage = new AssistantMessage("", messageMetadata, List.of(assistantToolCalls));
      } else {
        assistantMessage = new AssistantMessage(part.getText(), messageMetadata);
      }
      return new Generation(assistantMessage, chatGenerationMetadata);
    }).toList();
  }

Would be great to get this upstream if anyone from the Spring AI team sees it.

Comment From: GregoireW

Your change seems nicer than mine (not too hard ;) ) . but i'm not sure how you get all function call // all texts.

I took your code, and using

        ChatResponse response= ChatClient.create(chatModel)
           .prompt(new Prompt(promptMessage))
           .toolCallbacks(functionCallbacks)
          .call()
          .chatResponse();

the response stil contains 1 generation with the last message, the intermediate generation ( first return, function call, function call response) are not exposed

Comment From: mands

Oh right, I think that may be as I'm doing the tool-calling myself rather than letting the ChatClient handle it (sorry should have mentioned that!). The docs mention it at https://docs.spring.io/spring-ai/reference/api/tools.html#_user_controlled_tool_execution - my code isn't much different to the code sample provided in the docs.

This way, within my tool-calling loop, I can keep the text output and handle each tool call within the output parts.

However it does start to get quite painful, especially as I use the returnDirect functionality as well. It seems a bit like the core messaging abstraction Spring AI provides doesn't map to Gemini very well.