I’m implementing a RAG system with Spring AI that combines document-based context with dynamic, real-time data fetched via tools. However, I’ve encountered an issue where combining RetrievalAugmentationAdvisor
with ToolCallbacks
leads to significantly reduced tool usage by the model.
Use Case:
My application includes a knowledge base of static documents. These documents contain valuable information (e.g., the description of an employee’s role responsibilities and their name at the time of writing), but some data (e.g., the current employee's with that role name) must be retrieved at runtime via tools.
When I run ChatClient
using only RetrievalAugmentationAdvisor
or only ToolCallbacks
, each works correctly. However, when both are enabled toghether, the LLM strongly prefers the retrieval context and rarely calls tools - but sometimes it does, so it is aware of them.
I attempted to mitigate this by adjusting the system prompt and modifying the ContextualQueryAugmenter
’s prompt template to explicitly encourage tool usage over static context when appropriate, but this had no noticeable effect.
Models Tested:
gpt-4.1
gemini-2.5-flash-preview-04-17
gpt-4.1-mini
(which seemed slightly more inclined to use tools)
Question/Issue:
I am not sure that this is a bug of the RetrievalAugmentationAdvisor
implementation in Spring AI, or rather a limitation of used LLMs.
Is there any guidance on how can RetrievalAugmentationAdvisor
be effectively complemented by tools providing fresher data?
Comment From: ThomasVitale
If I understand correctly the use case, you would like to have a workflow where based on the question, the model should use a combination of RAG and tool calling. Is that right?
If it is, then I'd recommend adopting a routing approach so that the model decides dynamically where to fetch the context from among the available options.
One way to implement such architecture is Agentic RAG, where RAG flows are provided as tools next to other "regular" tools, leaving to the model the task of calling the ones that make sense based on the question.
You can find an example here: https://github.com/ThomasVitale/llm-apps-java-spring-ai/blob/main/rag/rag-conditional/src/main/java/com/thomasvitale/ai/spring/RagControllerQueryRouting.java I defined three tools: 2 for retrieving context from a vector store, and 1 for retrieving context from a web search. The model decides which tools to call based on the question.
When using the RetrievalAugmentationAdvisor
, the RAG flow is always executed. The LLM doesn't have any "decision power" about it since it's something you enable explicitly from the application. That's by design. If you need to introduce conditional routes, then we need to adopt a pattern like the one I shared above.
Comment From: amagnolo
You're right, the use case I'm considering needs a combination of RAG and tool calling, but they often need to be used together and not alternatively. In the example I mentioned above (employees roles and contacts), a possible user query could be: "I need a new PC. What shall I do?"
The ideal LLM behavior would be:
- use RAG to gather documents relating to the buying request procedure
- those documents mention that the person to contact is the HR head, Mr. X at the time of writing
- use a real-time tool to retrieve the current HR head: it's now Ms. Y
- respond to the user: "You have to do this and that, then contact Ms. Y"
Sometimes it does exactly that. Unfortunately, most times the model skips the tool call, resulting in stale contact names. Although my prompt and tool description explicitly instruct it to prioritize real-time data, the LLM only follows that guidance when it actually has decided to call the tool, instead of calling the tool whenever updated information is needed.
If I understand correctly your proposed solution, it uses either the tool or the RAG, but in this case none of them could give a complete answer (i.e. both the procedure and the current contact name).
As a workaround, I am considering a post-processing step: after generating the RAG-based answer, run a second LLM pass to detect any contact references and, if found, invoke the tool to replace stale names with current ones in the response.
Do you think this approach is feasible? Or is there a better pattern to ensure that both document-based context and fresh tool data are combined effectively?
Ideally, I'd prefer to improve RetrievalAugmentationAdvisor
behavior rather than implement a multi-step pipeline. But perhaps LLMs aren't quite that intelligent yet. ;-)
Comment From: amagnolo
I managed to improve my implementation by registering the document retriever itself as a tool alongside the others (similar to your example, but without returnDirect=true
).
It seems that LLMs (at least the ones I tested) are more willing to call multiple tools when the documents aren't pre-injected into the prompt (as RetrievalAugmentationAdvisor
does), but instead need to be fetched via an explicit tool invocation.
Ultimately, the recommendation seems to be not to use RetrievalAugmentationAdvisor
for this use case.