Bug description
This is linked to the PR https://github.com/spring-projects/spring-ai/pull/2029 where I try to fix this.
The gemini can respond multiple part, some containing test, some tool call. A response can be like:
part 1: message: " I understand what you want to do, I will check ..."
part 2: function call
part 3: message "if it is not enough you will need to check yourself .... "
Today, the response is processed by spring-ai in a way that function are executed only if all part of the response are function call.
It seems it is ok to switch the code https://github.com/spring-projects/spring-ai/blob/4fc6edd80c42801ab8aec6530c34a32c73604390/models/spring-ai-vertex-ai-gemini/src/main/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatModel.java#L598 to use 'anyMatch' instead of 'allMatch' to check for function call in the response. As long there is a function call in the response, it should be executed.
Also, the response send back to the system is filtered, I think the response should send back all part returned by the API ( https://github.com/spring-projects/spring-ai/blob/4fc6edd80c42801ab8aec6530c34a32c73604390/models/spring-ai-vertex-ai-gemini/src/main/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatModel.java#L600 )
Environment
Spring-ai 1.0.0-M6 Gemini api (2.0 flash)
Steps to reproduce
Use springAI with a gemini model, add the tool "CurrentWeatherService" and prompt:
The procedure can be either to check the temperature for a city and then if the temperature you will need to update the fan speed from 0-100 depending on the temperature 0-30. Explain the procedure is more clean word and process it for the city Tokyo
It should trigger a response with 2 parts: 1 small message and 1 function call.
Expected behavior
SpringAI should execute function if there is one in a part of the response.
Also, the return should contains all the generation and all the part. Currently in the event of I push a prompt, genAI ask for a function call, the prompt with the function execution is pushed, only the last generation is send back, which means if we continue the chat, some data are missing.
Minimal Complete Reproducible example
Use springAI with a gemini model, add the tool "CurrentWeatherService" and prompt:
The procedure can be either to check the temperature for a city and then if the temperature you will need to update the fan speed from 0-100 depending on the temperature 0-30. Explain the procedure is more clean word and process it for the city Tokyo
It should trigger a response with 2 parts: 1 small message and 1 function call.
Comment From: thomasflad
@GregoireW Could this be related to the issue I have here
Comment From: GregoireW
@thomasflad to be 100% you would have to set up a break point on the code I set in the first message and check.
But if you have a text message (can be empty) as response of you call, then no function call would have been made even if gemini ask for it so if you have that... I guess this can be related to your issue.
Comment From: mands
Hitting the same issue, when Gemini returns a text part along with a function call part, only the text part is returned in the AssistantMessage
.
The change above does work, but results in dropping the text
message afaik. Luckily the core method, VertexAiGeminiChatModel#responseCandidateToGeneration
is protected
so it's possible to override and provide your own implementation of VertexAiGeminiChatModel
.
The following works for me, returning all text and function call parts from a Vertex response,
@Override
protected List<Generation> responseCandidateToGeneration(Candidate candidate) {
// TODO - The candidateIndex (e.g. choice must be assigned to the generation).
int candidateIndex = candidate.getIndex();
var candidateFinishReason = candidate.getFinishReason();
Map<String, Object> messageMetadata = Map.of(
"candidateIndex", candidateIndex,
"finishReason", candidateFinishReason
);
var chatGenerationMetadata = ChatGenerationMetadata.builder()
.finishReason(candidateFinishReason.name())
.build();
return candidate.getContent().getPartsList().stream().map(part -> {
AssistantMessage assistantMessage;
if (part.hasFunctionCall()) {
FunctionCall functionCall = part.getFunctionCall();
var functionName = functionCall.getName();
var functionArguments = structToJson(functionCall.getArgs());
var assistantToolCalls = new AssistantMessage.ToolCall("", "function", functionName, functionArguments);
assistantMessage = new AssistantMessage("", messageMetadata, List.of(assistantToolCalls));
} else {
assistantMessage = new AssistantMessage(part.getText(), messageMetadata);
}
return new Generation(assistantMessage, chatGenerationMetadata);
}).toList();
}
Would be great to get this upstream if anyone from the Spring AI team sees it.
Comment From: GregoireW
Your change seems nicer than mine (not too hard ;) ) . but i'm not sure how you get all function call // all texts.
I took your code, and using
ChatResponse response= ChatClient.create(chatModel)
.prompt(new Prompt(promptMessage))
.toolCallbacks(functionCallbacks)
.call()
.chatResponse();
the response stil contains 1 generation with the last message, the intermediate generation ( first return, function call, function call response) are not exposed
Comment From: mands
Oh right, I think that may be as I'm doing the tool-calling myself rather than letting the ChatClient
handle it (sorry should have mentioned that!). The docs mention it at https://docs.spring.io/spring-ai/reference/api/tools.html#_user_controlled_tool_execution - my code isn't much different to the code sample provided in the docs.
This way, within my tool-calling loop, I can keep the text output and handle each tool call within the output parts.
However it does start to get quite painful, especially as I use the returnDirect
functionality as well. It seems a bit like the core messaging abstraction Spring AI provides doesn't map to Gemini very well.