Bug description
I'm experiencing what appears to be multiple API calls to the LLM provider when calling different methods on the same ChatClient.CallResponseSpec
instance. When I call both chatResponse()
and entity()
on the same response object, it seems like two separate API calls are being made instead of reusing a cached response.
I'm not entirely sure if this is a bug in Spring AI or if I'm misunderstanding how the API is supposed to work, but I'm looking for guidance on the correct approach.
My use case requires both structured output parsing (via .entity()
) and metadata access (via .chatResponse().getMetadata()
), specifically for token usage tracking. Currently, I can't find a way to get both from what appears to be a single API call.
The observed behavior includes: - What seems like doubled API costs - Significantly increased response times (2+ seconds instead of milliseconds) - Difficulty implementing proper token usage tracking
Environment
- Spring AI version: 1.0.0-M7
- Java version: 17
- Spring Boot version: 3.4.4
- LLM Provider: OpenAI (also using Anthropic)
- Maven dependencies: spring-ai-starter-model-openai, spring-ai-starter-model-anthropic
Steps to reproduce
- Create a
ChatClient
instance - Use
.entity()
for structured output: ```java // This works for structured output but provides no access to metadata MyStructuredResponse modelAnswer = getChatClient() .prompt(prompt) .call() .entity(MyStructuredResponse.class);
// No way to access token usage from this call // Would need a separate call to get metadata: Usage usage = getChatClient() .prompt(prompt) // Different prompt due to augmentation .call() .chatResponse() .getMetadata() .getUsage(); ```
The issue is that .entity()
doesn't provide any way to access the ChatResponse
or its metadata from the same call.
Expected behavior
I would expect a way to access metadata (particularly token usage) when using structured output. Possible solutions could include:
-
An enhanced method like:
java StructuredResponseWithMetadata<MyClass> result = response.entityWithMetadata(MyClass.class); MyClass data = result.getEntity(); Usage usage = result.getMetadata().getUsage();
-
Or making metadata accessible on the structured response itself:
java CallResponseSpec response = getChatClient().prompt(prompt).call(); MyClass data = response.entity(MyClass.class); Usage usage = response.getLastCallMetadata().getUsage(); // Access metadata from the entity() call
The key need is to track token usage for cost management when using structured output.
Minimal Complete Reproducible example
@RestController
public class TestController {
@Autowired
private ChatClient chatClient;
@GetMapping("/test-double-call")
public ResponseEntity<String> testDoubleCall() {
String prompt = "Return a JSON object with a 'message' field containing 'Hello World'";
// Time the total operation
long totalStart = System.nanoTime();
// Get the response spec
ChatClient.CallResponseSpec response = chatClient.prompt(prompt).call();
// First access - measure time
long firstStart = System.nanoTime();
Usage usage = response.chatResponse().getMetadata().getUsage();
long firstEnd = System.nanoTime();
System.out.println("First call (chatResponse) duration: " + (firstEnd - firstStart) / 1_000_000 + " ms");
// Second access - measure time
long secondStart = System.nanoTime();
String content = response.entity(String.class);
long secondEnd = System.nanoTime();
System.out.println("Second call (entity) duration: " + (secondEnd - secondStart) / 1_000_000 + " ms");
long totalEnd = System.nanoTime();
System.out.println("Total duration: " + (totalEnd - totalStart) / 1_000_000 + " ms");
return ResponseEntity.ok("Check console logs - you'll see two long durations instead of one API call + fast cached access");
}
}
Additional context
I understand that .entity()
must perform prompt augmentation to include structured output instructions, so it makes sense that it can't reuse a previous response. However, this creates a gap in the API where structured output users cannot access token usage metadata.
Current workarounds are suboptimal:
1. Using only chatResponse()
and manually parsing JSON (loses structured output convenience)
2. Making two separate calls with different prompts (doubles API costs)
3. Using lower-level APIs (loses ChatClient benefits)
Is there a recommended pattern or planned feature to access metadata when using .entity()
for structured output? This would be valuable for cost tracking and observability in production applications.