I wanted to obtain the number of tokens consumed for calling the large model, so I customized the Advisor, but it didn't take effect

Here is my definition of Advisor: `public class CustomizeLoggerAdvisor implements CallAdvisor, StreamAdvisor {

private Integer order;

@Override public String getName() { return this.getClass().getSimpleName(); }

@Override public int getOrder() { return null != order ? order : 1; }

private ChatClientRequest before(ChatClientRequest request) { log.info("AI Request: {}", request.prompt().getContents()); return request; }

private void observeAfter(ChatClientResponse advisedResponse) { ChatResponse response = advisedResponse.chatResponse(); if (null == response) { log.info("AI Response is null"); return; } ChatResponseMetadata responseMetadata = response.getMetadata(); Usage usage = responseMetadata.getUsage(); log.info("total tokens:{}", usage.getTotalTokens()); log.info("input tokens:{}", usage.getPromptTokens()); log.info("output tokens:{}", usage.getCompletionTokens()); }

@Override public ChatClientResponse adviseCall(ChatClientRequest advisedRequest, CallAdvisorChain chain) { advisedRequest = this.before(advisedRequest); ChatClientResponse advisedResponse = chain.nextCall(advisedRequest); this.observeAfter(advisedResponse); return advisedResponse; }

@Override public Flux adviseStream(ChatClientRequest advisedRequest, StreamAdvisorChain chain) { advisedRequest = this.before(advisedRequest); Flux advisedResponses = chain.nextStream(advisedRequest); return new ChatClientMessageAggregator(). aggregateChatClientResponse(advisedResponses, this::observeAfter); }`

I added my defined Advisor to ChatClient: ChatClient .builder(openedAiChatModel).defaultAdvisors(new CustomizeLoggerAdvisor()) .build()

After calling the big model, the tokens I obtained were all 0

Comment From: YunKuiLu

What model are you using? Can you provide a minimal reproduction project?

Comment From: checkHup

What model are you using? Can you provide a minimal reproduction project?

Thank you for your response. I will post the core code modules and ask for your help in analyzing them. If there are any missing parts, I will provide them again

The model I am using is Tongyi Qianwen: Qwen3-4B

yml: spring: ai: openai: base-url: http://192.168.8.11:8000 # vLLM ip api-key: 1 chat: options: model: /home/ai/models/Qwen/Qwen3-4B max-completion-tokens: 2000 temperature: 0.5 top-p: 0.8 frequency-penalty: 1.0 presence-penalty: 0.5

pom: spring-ai-bom:1.1.0-M1

Method of model invocation:

@Bean ChatClient chatClient(List mcpClients) {

    var toolCallbackProvider = new AsyncMcpToolCallbackProvider(mcpClients);

    OpenAiChatOptions options = OpenAiChatOptions
            .builder()
            .model(model)
                .parallelToolCalls(true)
                .temperature(temperature)
                .topP(topP)
                .maxCompletionTokens(maxCompletionTokens)
                .frequencyPenalty(frequencyPenalty)
                .presencePenalty(presencePenalty)
                .build();

    return ChatClient
            .builder(openedAiChatModel)
            .defaultSystem(
                    "You are a versatile assistant"
            )
            .defaultToolCallbacks(toolCallbackProvider.getToolCallbacks())
            .defaultOptions(options)
            .build();
}

private final ChatClient chatClient;

this.chatClient .prompt() .user(promptUserSpec) .advisors(new CustomizeLoggerAdvisor()) .stream() .chatResponse() .map(data -> { String text = data.getResult().getOutput().getText(); if(text==null){ text=""; } if (text.contains("\n")) text = text.replaceAll("\n", "<br>"); ServerSentEvent<Object> sse = ServerSentEvent.builder() //.id(UUID.randomUUID().toString()) .data(text) .build(); return sse; }) .doOnError(error -> { log.info(error.getMessage()); });

Comment From: YunKuiLu

Are you running qwen3-4b with vLLM locally? I don't have this environment set up yet, so it might take some time to reproduce.

While waiting, you can try to debug this piece of code to see if there are any clues.

https://github.com/spring-projects/spring-ai/blob/84efb6a63628e80a1bb848524d3b8c43a9fbcd3c/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiChatModel.java#L230-L238

Comment From: checkHup

chatCompletion

Thank you for your reply. I did deploy the qwen3-4b model using VLLM-0.10.1.1, but I also used the GPT-OSS-20B model and still couldn't obtain the consumption of tokens Before waiting for your latest response, I will try to follow the code you provided and give it a try

Comment From: YunKuiLu

@checkHup You missed adding stream_options={"include_usage": True} in the ChatOptions.

You can set it up like this:

ChatClient
                .builder(openAiChatModel)
                .defaultOptions(OpenAiChatOptions.builder().streamUsage(true).build())
                .defaultSystem(
                        "You are a versatile assistant"
                )
                .build();

or

    openai:
      base-url: http://127.0.0.1:8000
      api-key:  1
      chat:
        options:
          model: Qwen/Qwen3-4B
          stream-usage: true     // this

Comment From: checkHup

@checkHup You missed adding stream_options={"include_usage": True} in the ChatOptions.

You can set it up like this:

ChatClient .builder(openAiChatModel) .defaultOptions(OpenAiChatOptions.builder().streamUsage(true).build()) .defaultSystem( "You are a versatile assistant" ) .build();

or

openai: base-url: http://127.0.0.1:8000 api-key: 1 chat: options: model: Qwen/Qwen3-4B stream-usage: true // this

Thank you so much for helping me solve the problem that has been bothering me for a long time. I have mastered this knowledge point well. Thank you again for your guidance

Comment From: checkHup

I will close this question. Thank you to the kind-hearted person @YunKuiLu

Comment From: checkHup

@checkHup You missed adding stream_options={"include_usage": True} in the ChatOptions.

You can set it up like this:

ChatClient .builder(openAiChatModel) .defaultOptions(OpenAiChatOptions.builder().streamUsage(true).build()) .defaultSystem( "You are a versatile assistant" ) .build();

or

openai: base-url: http://127.0.0.1:8000 api-key: 1 chat: options: model: Qwen/Qwen3-4B stream-usage: true // this

Excuse me again, I tried and was able to get the token consumption, but I couldn't get the number of tools called by MCP,I have successfully called the MCP tool and it has returned normally

`private void observeAfter(ChatClientResponse advisedResponse) { ChatResponse response = advisedResponse.chatResponse(); if (null == response) { log.info("AI Response is null"); return; } ChatResponseMetadata responseMetadata = response.getMetadata(); Usage usage = responseMetadata.getUsage(); log.info("total tokens:{}", usage.getTotalTokens()); log.info("input tokens:{}", usage.getPromptTokens()); log.info("output tokens:{}", usage.getCompletionTokens()); AssistantMessage assistantMessage = response.getResult().getOutput(); List toolCallList = assistantMessage.getToolCalls(); log.info("choose [{}] tools to use", toolCallList.size()); }

@OverRide public ChatClientResponse adviseCall(ChatClientRequest advisedRequest, CallAdvisorChain chain) { advisedRequest = this.before(advisedRequest); ChatClientResponse advisedResponse = chain.nextCall(advisedRequest); this.observeAfter(advisedResponse); return advisedResponse; }`

I'm not sure if this method is called to get it, could you please guide me,The number of tools obtained has always been 0

AssistantMessage assistantMessage = response.getResult().getOutput(); List<AssistantMessage.ToolCall> toolCallList = assistantMessage.getToolCalls();

Comment From: YunKuiLu

Currently, there's no way to get the number of executed tools from ChatClientResponse. You might want to create a new issue to track this.

Comment From: checkHup

Currently, there's no way to get the number of executed tools from ChatClientResponse. You might want to create a new issue to track this.

Okay, I understand. Then I'll keep an eye on the development trends of Spring AI. Thank you very much for your answer

Comment From: YunKuiLu

There’s a somewhat tricky workaround, though. Use the Observation capability to monitor tool execution, record tool events into ThreadLocal when invoking tools, and then retrieve them from ThreadLocal after the call completes.

  1. Add the dependency
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
  1. Use the Observation capability to monitor tool execution.
@Configuration
public class ObservationConfig {

    @Bean
    ToolExecuteObservationHandler toolExecuteObservationHandler() {
        return new ToolExecuteObservationHandler(List.of(ToolCallingObservationContext.class));
    }


    @Slf4j
    public static class ToolExecuteObservationHandler implements ObservationHandler {

        private final List<Class<? extends Observation.Context>> supportedContextTypes;

        public ToolExecuteObservationHandler(List<Class<? extends Observation.Context>> supportedContextTypes) {

            Assert.notNull(supportedContextTypes, "SupportedContextTypes must not be null");

            this.supportedContextTypes = supportedContextTypes;
        }

        @Override
        public boolean supportsContext(Observation.Context context) {
            return (context == null) ? false : this.supportedContextTypes.stream().anyMatch(clz -> clz.isInstance(context));
        }

        @Override
        public void onStart(Observation.Context context) {
            log.info("onStart: {}", context);
        }

        @Override
        public void onEvent(Observation.Event event, Observation.Context context) {
            log.info("onEvent: {} {}", event, context);
        }

        @Override
        public void onStop(Observation.Context context) {
            log.info("onStop: {}", context);
        }

        @Override
        public void onError(Observation.Context context) {
            log.error("onError: {}", context);
        }
    }
}

Comment From: checkHup

There’s a somewhat tricky workaround, though. Use the Observation capability to monitor tool execution, record tool events into ThreadLocal when invoking tools, and then retrieve them from ThreadLocal after the call completes.

  1. Add the dependency

<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency>

  1. Use the Observation capability to monitor tool execution.

``` @Configuration public class ObservationConfig {

@Bean
ToolExecuteObservationHandler toolExecuteObservationHandler() {
    return new ToolExecuteObservationHandler(List.of(ToolCallingObservationContext.class));
}


@Slf4j
public static class ToolExecuteObservationHandler implements ObservationHandler {

    private final List<Class<? extends Observation.Context>> supportedContextTypes;

    public ToolExecuteObservationHandler(List<Class<? extends Observation.Context>> supportedContextTypes) {

        Assert.notNull(supportedContextTypes, "SupportedContextTypes must not be null");

        this.supportedContextTypes = supportedContextTypes;
    }

    @Override
    public boolean supportsContext(Observation.Context context) {
        return (context == null) ? false : this.supportedContextTypes.stream().anyMatch(clz -> clz.isInstance(context));
    }

    @Override
    public void onStart(Observation.Context context) {
        log.info("onStart: {}", context);
    }

    @Override
    public void onEvent(Observation.Event event, Observation.Context context) {
        log.info("onEvent: {} {}", event, context);
    }

    @Override
    public void onStop(Observation.Context context) {
        log.info("onStop: {}", context);
    }

    @Override
    public void onError(Observation.Context context) {
        log.error("onError: {}", context);
    }
}

} ```

Thank you for your guidance. At present, I call the big model service and the mcp service separately. I use conversationId to identify each question and answer, pass the conversationId to the mcp service, add one to the number after each tool call, and store the number in Redis, so that the number of calling tools can be obtained