Bug description When using the Spring AI chatClient.prompt().stream() API with OpenAI (tested on version 1.1.0-M3), the reported token usage is always zero.

Even when enabling the withStreamUsage(true) option in OpenAiChatOptions and collecting the final chunk via .stream().chatResponse(), the Usage object remains zero. This makes it impossible to track token consumption in real-time or after streaming.

This seems related to GitHub Issue #814 , but even after following the suggested approaches there, the problem persists.

Environment

Spring AI version: 1.1.0-M3

Java version: 17

Model: OpenAI GPT-3.5 / GPT-4

OS: Linux / MacOS

Vector store: N/A

Steps to reproduce

Configure a Spring AI OpenAI chat client.

Use the streaming API with chatClient.prompt().stream() or .stream().chatResponse().

Enable token usage streaming:

OpenAiChatOptions.builder().streamUsage(true).build()

Send a prompt that generates multiple chunks.

Observe the Usage object in ChatResponse.metadata.usage() or via SimpleLoggerAdvisor.

Example code:

Flux chatResponseFlux = chatClient.prompt() .options(OpenAiChatOptions.builder().streamUsage(true).build()) .advisors(new SimpleLoggerAdvisor()) .user("Hi") .stream() .chatResponse();

chatResponseFlux.collectList().block().forEach(response -> { Usage usage = response.getMetadata().getUsage(); // Always zero System.out.println("Total tokens used: " + usage.getTotalTokens()); });

Expected behavior

Each chunk in the streaming response should include incremental token usage.

At minimum, the final chunk should reliably provide the total usage.

Usage data should be correctly reflected in SimpleLoggerAdvisor and ChatResponse.metadata.usage().

Actual behavior

All intermediate chunks report zero usage.

Even the final chunk after .collectList().block() reports zero.

Tested on Spring AI 1.1.0-M3 — issue persists.

.

Additional notes / attempts

Using .stream().chatResponse() and collecting the last chunk does not solve the problem.

Usage is zero in all tests, making real-time or post-stream token accounting impossible.

This appears to be a limitation in Spring AI’s deserialization/handling of OpenAiUsage in streaming scenarios.

Comment From: mrfnir

Has anyone managed to find a way to solve this issue?

Comment From: mrfnir

Friends who have a similar problem like me. Fortunately, in the last moments when I was close to despair, I realized an important point. If you use intermediate service providers to access openai services, some of them cause disruption in the use of the services and by changing them, fortunately the problem was solved and now I have the consumption token specifications. I hope this helps you :)

Comment From: ilayaperumalg

@mrfnir Thank you for updating your feedback and helping the community!