Spring AI ChatModel streaming chat response data is too large.

When using Flux<ChatClientResponse>, the context is returned with every response, which results in large data volume. This can be optimized — for example, only include the context in the response when chatResponse.result.metadata.finishReason == "stop".

Comment From: 192902649

public Flux<ChatClientResponse> chatV3(@PathVariable("chatId") String chatId,@RequestParam("message") String message) {
    Flux<ChatClientResponse> flux = ollamaChatClient.prompt()
            .user(message)
            .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, chatId))
            .stream().chatClientResponse();

    return flux;
}

Comment From: 192902649

Comment From: vinupreethi

@ilayaperumalg i would like to work on this.

Comment From: ilayaperumalg

@vinupreethi sure, please. thank you for your interest!