Bug description In RetrievalAugmentationAdvisor, configuring a CompressionQueryTransformer or RewriteQueryTransformer with a different model and temperature via ChatOptions has no effect. The ChatClient in these modules seem to be taking the default options that is set in the properties file.
For e.g, when I tried to configure these pre-retrieval modules with the 'get-4o' model and with a temperature of 0.2, the ChatClient is still using the temperature as 0.7 (default) and the model as 'gpt-4.1' (which is set in the application.properties file). I verified this in the trace and logs using Zipkin.
Sample code -
- Constructor of my Class -
public DocsAssistantService(ChatClient.Builder builder, VectorStore vectorStore,
ChatMemory chatMemory) {
this.builder = builder;
this.vectorStore = vectorStore;
this.chatMemory = chatMemory;
chatClient = builder.defaultAdvisors(
MessageChatMemoryAdvisor.builder(chatMemory).build(),
new SimpleLoggerAdvisor()
).build();
}
- Method in the Class -
public Flux<String> stream(String chatId, String userMessage) {
var chatClientResponse = chatClient.prompt().system(SYSTEM_MESSAGE)
.user(userMessage).advisors(a -> {
a.param(CONVERSATION_ID, chatId);
})
.advisors(RetrievalAugmentationAdvisor.builder().queryTransformers(
CompressionQueryTransformer.builder()
.chatClientBuilder(builder.build()
.mutate()
.defaultOptions(OpenAiChatOptions
.builder()
.model("gpt-4o")
.temperature(0.2)
.build()))
.build(),
RewriteQueryTransformer.builder()
.chatClientBuilder(builder.build()
.mutate()
.defaultOptions(OpenAiChatOptions
.builder()
.model("gpt-4o")
.temperature(0.2)
.build()))
.build())
.documentRetriever(VectorStoreDocumentRetriever
.builder().vectorStore(vectorStore)
.similarityThreshold(0.5).topK(4)
.build())
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(false)
.promptTemplate(new PromptTemplate(
CONTEXT_PROMPT))
.documentFormatter(
DOCUMENT_FORMATTER)
.build())
.build())
.options(OpenAiChatOptions.builder().streamUsage(true).build())
.stream();
return chatClientResponse.content();
}
I tried a different way of building the ChatClient and observed the same behaviour in this approach too -
Disable the ChatClient.Builder autoconfiguration by setting the property spring.ai.chat.client.enabled=false
.
And used the ChatModel to configure the ChatClient as follows -
- Constructor of my Class -
public DocsAssistantService(VectorStore vectorStore, ChatModel chatModel,
ChatMemory chatMemory) {
this.vectorStore = vectorStore;
this.chatMemory = chatMemory;
this.chatModel = chatModel;
this.chatClient =
ChatClient.builder(chatModel)
.defaultAdvisors(MessageChatMemoryAdvisor
.builder(chatMemory).build(),
new SimpleLoggerAdvisor()
).build();
}
- Method in the Class
public Flux<String> stream(String chatId, String userMessage) {
var chatClientResponse = chatClient.prompt().system(SYSTEM_MESSAGE)
.user(userMessage).advisors(a -> {
a.param(CONVERSATION_ID, chatId);
})
.advisors(RetrievalAugmentationAdvisor.builder().queryTransformers(
CompressionQueryTransformer.builder()
.chatClientBuilder(ChatClient
.builder(chatModel)
.defaultOptions(OpenAiChatOptions
.builder()
.model("gpt-4o")
.temperature(0.2)
.build()))
.build(),
RewriteQueryTransformer.builder()
.chatClientBuilder(ChatClient
.builder(chatModel)
.defaultOptions(OpenAiChatOptions
.builder()
.model("gpt-4o")
.temperature(0.2)
.build()))
.build())
.documentRetriever(VectorStoreDocumentRetriever
.builder().vectorStore(vectorStore)
.similarityThreshold(0.5).topK(4)
.build())
.queryAugmenter(ContextualQueryAugmenter.builder()
.allowEmptyContext(false)
.promptTemplate(new PromptTemplate(
CONTEXT_PROMPT))
.documentFormatter(
DOCUMENT_FORMATTER)
.build())
.build())
.options(OpenAiChatOptions.builder().streamUsage(true).build())
.stream();
return chatClientResponse.content();
}
Even with this method, the low temperature settings and the different chat model is ignored.
Environment
<spring-ai.version>1.0.0-RC1</spring-ai.version>