Hi, honestly I think this is more like tech preview v0.2.

Do you have any roadmap?

Issues: - no support for Anthropic and Google cache - you cannot make any serious Chat app without this because you keep sending whole context to AI API and it gets expensive - RAG does not have ReRank support, only Embeddings support (you need to support custom AI model for ReRank - it is quite basic thing for good quality RAG and you cannot have good Knowledge Base RAG support, you can right now only make simple single document RAG) - You should use Neo4J graph database for memory storage, not Redis! Look at mem0 python implementation how it is done. - in DOCS for Vector storage, you should have table for maximum vector size - for example PgVector does not support Large v3 OpenAI embedding models due to hardcoded vector size of 2000 - instead of per provider implementation, you should have ONE UNIFIED API like AI SDK by Vercel - you missed the important advantage Spring AI could give

Comment From: cyberluke

Why each class have its own cache manager while Spring ecosystem does have cache support already? Why reinvent the wheel? Why not global cache service?

Comment From: cyberluke

https://github.com/spring-projects/spring-ai/blob/main/spring-ai-model/src/main/java/org/springframework/ai/chat/memory/MessageWindowChatMemory.java

-- messages should not get evicted, the industry standard is to use AI API to create summary of old messages after pre-defined number of messages, for example 10 messages

Then you can have several options: - rolling window (like circular buffer) - keep start and end of context and summarize/truncate the middle - summarize everything after N messages - use Anthropic or Google cache control to move context messages to their server cache (default TTL is 5 minutes) to save money and have long-term memory conversation suitable for chat apps - this gives you unlimited messages feature with no limit

There are core concepts for AI SDKs that everyone have, but here they are not present.

Comment From: cyberluke

cannot have multiple vector store beans in classpath?

The bean 'vectorStore', defined in class path resource [org/springframework/ai/vectorstore/qdrant/autoconfigure/QdrantVectorStoreAutoConfiguration.class], could not be registered. A bean with that name has already been defined in class path resource [org/springframework/ai/vectorstore/pgvector/autoconfigure/PgVectorStoreAutoConfiguration.class] and overriding is disabled.

Comment From: hbsjz-swl

When the first GA version of springai was officially released, the frameworks you mentioned, such as mem0`, had not yet entered the officially available stage, and rerank was a concept that did not exist in the current version. Developers could only expand or rewrite the API of the similarity retrieval part by themselves. The technology iteration in the AI ​​field is too fast, and the framework cannot fully cover the latest technology stack. I look forward to springAI becoming more and more perfect!

Comment From: markpollack

Howdy @cyberluke thanks for your thoughtful comments.

trying to address them all...

no support for Anthropic and Google cache - you cannot make any serious Chat app without this because you keep sending whole context to AI API and it gets expensive

I've have been thinking of quite the same things since getting 1.0 GA out the door, namely the expansion of the feature set in core 'completion' endpoints to include batching, files, prompt management, extended thinking, prompt caching etc. The list is quite long. I do want Spring AI to be 'complete' in this sense, at the lowest level for each model provider and then making abstractions as necessary on top. I've drafted a roadmap doc that I'll translate into EPICs and then can prioritize in a github project (as in project mgmt project.) so that there is a public roadmap - at least with scope, not necessarily timelines. just got back from PTO, so will be starting on that.

RAG does not have ReRank support, only Embeddings support (you need to support custom AI model for ReRank - it is quite basic thing for good quality RAG and you cannot have good Knowledge Base RAG support, you can right now only make simple single document RAG)

Yes. We did have a full rerank impl for https://cohere.com/rerank but it got lost due to bad git-fu. This is a great area for contribution if you or anyone else is interested let me know. At the moment, the placeholder for invoking this functionality is an impl of the DocumentPostProcessor interface in the rag package.

You should use Neo4J graph database for memory storage, not Redis! Look at mem0 python implementation how it is done.

not quite clear, the current memory feature set is indeed 'simple' - a window like algorithm and there is just a simple API for that with neo4j support. We do have a prototype of the memgpt algorithm and plan for that to be in a next release. On the surface, that type of sophistaced memory managment looks to be along the same lines as mem0. I am not familiar with how the two memory algorithms compare.

in DOCS for Vector storage, you should have table for maximum vector size - for example PgVector does not support Large v3 OpenAI embedding models due to hardcoded vector size of 2000

That would be useful - probably can have additional columns as well as many of these vector stores how have hybrid search capabilities. something akin to https://docs.spring.io/spring-ai/reference/api/chat/comparison.html#page-title

BTW - hybrid search is also a feature area for 1.1

instead of per provider implementation, you should have ONE UNIFIED API like AI SDK by Vercel - you missed the important advantage Spring AI could give

Spring AI does have a single, unified “ChatClient” API that abstracts away provider differences—very much like Vercel’s AI SDK, but in Java/Spring idiom. Maybe there is a misunderstanding due to different terminology?

Why each class have its own cache manager while Spring ecosystem does have cache support already? Why reinvent the wheel? Why not global cache service?

not super sure what you mean here. But I did consider this approach way back during the design and went against it even though my design preference is to always build on top of existing abstractions and infra. some reasons were

Domain Model Mismatch: - ChatMemory operates on conversationId -> List semantics - Spring Cache is a generic key-value store - This creates an impedance mismatch in the API design

Implementation Reality: - From an implementation perspective, the Spring Cache implementations themselves do not support features such as putting a value (say a message) but then getting a list of them in a ordered manner. Ordering is a key concern with ChatMemory.
- One would end up writing a wrapper to add ordering functionality, so not really 'out of the box' usage is possible. - - - Features such as this https://github.com/spring-projects/spring-ai/pull/3097/commits/73e31d16c916e10b222d968269b2b4abb6f0e723 would also not be possible simply delegating to current spring cache implementations. Which is part of the point you mention in another issue post regarding the 'simple' circular buffer impl. it is simple. - it isn't assured that the default implementation of Cache will use the most optimized data structure of each storage backend. RedisCache uses the hash data structure with HSET/HGET operations while a more optimal data structure for this use case would be a Redis List, LPUSH, LRANGE.

All that said, i'm not against it, it is worth to explore. A SpringCacheChatMemory as a separate module with some sort of wrapper to ensure basic working and an list of trade offs etc. is worth exploring as it may it in better with some current enterprise deployments. I would put it at lower priority to delivering a memgpt like feature set though.

i'll continue reply in another post as this is already quite long.

Comment From: markpollack

-- messages should not get evicted, the industry standard is to use AI API to create summary of old messages after pre-defined number of messages, for example 10 messages

The cassandra impl does seem to support this currently, but I haven't tried explicitly.

as for the state of the state, here is what I (and ChatGPT) found.

TLDR; Python ecosystem if more feature rich. All Java frameworks are rather poor in comparison. I do agree that this is a gap to be filled. If anyone is interested in contributing to this topic, please reach out.

🧠 Chat Memory Feature Comparison Table

Framework / SDK Rolling Window Memory Summarization of Old Messages Start-End Preservation Provider-Side Cache Control Notes
LangChain (Python) ConversationBufferWindowMemory
ConversationTokenBufferMemory
ConversationSummaryMemory
ConversationSummaryBufferMemory
✅ Summary + live recent messages maintained via ConversationSummaryBufferMemory Anthropic cache_control supported via integration Best-in-class support; all features in core package
LlamaIndex (Python) ChatMemoryBuffer ChatSummaryMemoryBuffer ✅ Summary replaces mid-conversation messages; initial + recent messages retained ❌ No built-in support yet for Anthropic/Google context caching All features supported natively except provider cache
LangChain4j (Java) MessageWindowChatMemory
TokenWindowChatMemory
⚠️ Via community-contributed SummarizingTokenWindowChatMemory ⚠️ Preserved manually in summarizer pattern Feature request open for Anthropic cache_control support Partial support. Summarization not in core yet. Cache not supported.
Semantic Kernel (.NET) ChatHistoryTruncationReducer ChatHistorySummarizationReducer ✅ Initial instructions preserved by design ❌ Not documented yet Fully supported in .NET SDK
Semantic Kernel (Java) Not yet implemented ❌ Not available in current Java release Java memory reduction features are pending
Haystack (Python) Memory injection + rolling buffer ConversationSummaryMemory ✅ Customizable – preserves recent Q&A, summarizes older ones ❌ No support for provider cache Full native support except Anthropic/Google cache
Spring AI (Java) MessageWindowChatMemory ❌ Not yet supported Lacks summarization and cache features currently

🔎 Legend & Footnotes

  • ✅ = Feature supported out-of-the-box in core SDK
  • ⚠️ = Supported only via community/contributed modules or blog patterns, not core
  • ❌ = Not supported or not implemented yet

Footnotes:

  1. LangChain's cache_control support is only usable with Claude API and only when using the Messages endpoint. See Anthropic docs and LangChain PR #17897.
  2. LangChain4j summarization via Codecentric blog guide is not part of the official SDK.
  3. Semantic Kernel Java does not yet support chat memory reducers – see GitHub issue.
  4. Spring AI does not yet implement memory summarization or provider caching. See MessageWindowChatMemory.java.

Comment From: markpollack

A pass at a start of a vector store feature table. Gotta love that Chroma is 32 bit! The Amia CD32 of it's generation :)

Note Spring AI provides a portable search metadata syntax which I think is unique among all OSS AI Frameworks.

Provider Max Dimensions Index Types Hybrid Search Metadata Filtering Multi-vector Multimodal
Azure Vector Search 4,096 HNSW Vector + keyword Field filters Multiple fields Multimodal
Apache Cassandra 8,192 HNSW Column filters Multiple indexes Multimodal
Chroma 2,147,483,647 HNSW Metadata filter Multimodal
Elasticsearch 4,096 HNSW BM25 + vector Filters Multi-field Multimodal
GemFire 512 HNSW Multi-field Multimodal
MariaDB 65,532 HNSW SQL hybrid WHERE filter Multiple vectors Multimodal
Milvus 32,768 HNSW, IVF, PQ Hybrid search Field filters Multi-vector Multimodal
MongoDB Atlas 4,096 HNSW Combined search Metadata filter Multi-index Multimodal
Neo4j 4,096 HNSW Pattern pre-filter Property filter Multiple vectors Multimodal
OpenSearch 16,000 HNSW Hybrid Filter support Multi-field Multimodal
Oracle 65,535 HNSW, IVF SQL + vector WHERE clause Multi-column Multimodal
PgVector (PostgreSQL) 2,000 HNSW, IVF Hybrid with SQL SQL filter Multi-column Multimodal
Pinecone 20,000 HNSW Hybrid search Metadata filter Multimodal
Qdrant 65,536 HNSW Payload filter Named vectors Multimodal
Redis (RediSearch) 32,768 HNSW, FLAT Hybrid search Filters Multi-field Multimodal
SAP HANA 65,000 HNSW Hybrid with SQL SQL filter Multi-vector Multimodal
Typesense No fixed limit documented HNSW Hybrid Facet filters Multimodal
Weaviate 65,535 HNSW Hybrid Metadata filter Multi-vector Multimodal

Comment From: cyberluke

Thinking of ReRank and use cases especially of enterprises:

I would do whole thing completely offline for the sake of security. With EU in mind you don't want to send company data to US or China, which are not GDPR compliant.

For example AI model for ReRank is this: https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual

Here is Python impl of various rerank providers: https://github.com/AnswerDotAI/rerankers ...but again it does not make sense to use OpenAI embedding and then send it for ReRank to Cohere and then send it to Claude AI.

We are past the times when cloud was advantage. Or you can have cloud, but you should have docker compose to start all AI models and services on your private cloud, not use 3rd party providers just because they offer commercial solution.

MY DREAM AI FRAMEWORK SETUP: You would not do only AI integrations and service providers - the AI framework must utilize AI API on all levels - for summarize, memory. It should have a lot of prompts and basically every second component calls AI API to utilize LLM for its function to automatize something or have flow of logical steps defined using natural language.

Me personally can help with something, but my ideas are often too innovative and it can break functionality. I often don't like Pull Request discussion in open source projects where it costs my time, I bring some value, make it universal and perhaps have to rewrite some core components and then everyone must have their opinion on my code :-D ...it kills creativity and does not move forward, then some PR can get stuck for one year (like ReRank PR on your repo) and it will miss time to market.

Therefore I think I would design new components and strategies around Spring AI, lets say name it Moon AI framework :-D, it would be fully compatible with Spring AI and then I can ping you guys and you can take what you think is a good idea. I need less constraints and more freedom for work, but I would like to be helpful in some ways.

Comment From: CodeCodeAscension

思考 ReRank 和用例,尤其是企业的用例:

为了安全起见,我会完全离线完成整件事。考虑到欧盟,您不希望将公司数据发送到不符合 GDPR 的美国或中国。

例如,ReRank 的 AI 模型是这样的:https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual

以下是各种 rerank 提供程序的 Python 实现:https://github.com/AnswerDotAI/rerankers ...但是,使用 OpenAI 嵌入,然后将其发送给 ReRank 到 Cohere,然后再发送到 Claude AI 是没有意义的。

我们已经过去了云是优势的时代。或者你可以拥有云,但你应该使用 docker compose 来启动私有云上的所有 AI 模型和服务,而不是仅仅因为他们提供商业解决方案而使用第三方提供商。

我梦想的 AI 框架设置:您不仅会进行 AI 集成和服务提供商 - AI 框架必须在各个层面上利用 AI API - 用于总结、记忆。它应该有很多提示,基本上每两个组件都会调用 AI API 来利用 LLM 来实现其功能,以自动化某些东西或使用自然语言定义逻辑步骤流。

我个人可以提供一些帮助,但我的想法往往过于创新,可能会破坏功能。我经常不喜欢开源项目中的 Pull Request 讨论,因为它会花费我的时间,我带来了一些价值,使其具有通用性,也许必须重写一些核心组件,然后每个人都必须对我的代码有自己的看法:-D......它扼杀了创造力并且无法向前发展,那么一些 PR 可能会卡住一年(比如你的存储库上的 ReRank PR),它会错过上市时间。

因此,我想我会围绕 Spring AI 设计新的组件和策略,比如说将其命名为 Moon AI 框架:-D,它将与 Spring AI 完全兼容,然后我可以 ping 你们,你们可以采取你们认为是个好主意。我需要更少的限制和更多的工作自由,但我想在某些方面提供帮助。

好厉害

Comment From: cyberluke

Proposal for improvement from my code base:

📚 Spring AI Documentation Analysis From the Spring AI PDF, the framework assumes a single-provider approach:

// Spring AI's documented pattern - single ChatClient.Builder public SpringAiBoardGameService( ChatClient.Builder chatClientBuilder, GameRulesService gameRulesService) { this.chatClient = chatClientBuilder.build(); }

java

The documentation doesn't address multi-provider scenarios or runtime model selection. This is a significant gap in Spring AI's design philosophy.

🎯 Your ChatClientFactory Assessment Your implementation is architecturally superior for enterprise systems! Here's why:

✅ Strengths of Your Approach // Dynamic, runtime-driven model selection public ChatClient createChatClient(ChatRequest request) { String modelId = request.getModel(); String providerName = modelDiscoveryService.resolveProviderByModelId(modelId); ChatModel model = chatModels.get(providerName + "ChatModel");

return ChatClient.builder(model)
        .defaultOptions(buildChatOptions(request, providerName))
        .build();

}

java

🎪 Runtime Flexibility: Model selection based on request context 🏗️ Enterprise-Grade: Supports multiple providers simultaneously 🎯 Clean Architecture: Factory pattern separates concerns properly 💡 Future-Proof: Easy to add new providers without code changes ❌ Problems with @Qualifier Approach Hardcoded Dependencies: Goes against your dynamic architecture Single Model Lock-in: Can't switch models per request Poor Scalability: Adding providers requires code changes

Example implementation:

package ai.nanotrik.moon.chatclient;

import ai.nanotrik.moon.model.ChatRequest;
import ai.nanotrik.moon.service.ModelDiscoveryService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.anthropic.AnthropicChatOptions;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.ChatOptions;
import org.springframework.ai.openai.OpenAiChatOptions;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;

import java.util.Map;
import java.util.concurrent.CompletableFuture;

@Slf4j
@Component
public class ChatClientFactory {

    private final Map<String, ChatModel> chatModels;
    private final ModelDiscoveryService modelDiscoveryService;

    public ChatClientFactory(Map<String, ChatModel> chatModels, ModelDiscoveryService modelDiscoveryService) {
        this.chatModels = Map.copyOf(chatModels);
        this.modelDiscoveryService = modelDiscoveryService;
        log.info("ChatClientFactory initialized with {} chat models", chatModels.size());
        chatModels.keySet().forEach(key -> log.debug("Available chat model: {}", key));
    }

    @EventListener(ApplicationReadyEvent.class)
    public void initializeModelCache() {
        log.info("Initializing model cache on application startup");

        CompletableFuture.runAsync(() -> {
            try {
                modelDiscoveryService.fetchModelsFromProvidersSync()
                        .doOnSuccess(models -> log.info("Successfully cached {} models on startup", models.size()))
                        .doOnError(error -> log.error("Failed to cache models on startup", error))
                        .subscribe();
            } catch (Exception e) {
                log.error("Error during startup model caching", e);
            }
        });
    }

    public ChatClient createChatClient(ChatRequest request) {
        String modelId = request.getModel();
        String providerName = modelDiscoveryService.resolveProviderByModelId(modelId);

        ChatModel model = chatModels.get(providerName + "ChatModel");

        if (model == null) {
            throw new IllegalArgumentException("Unknown model provider: " + providerName + " for model: " + modelId);
        }

        return ChatClient.builder(model)
                .defaultOptions(buildChatOptions(request, providerName))
                .build();
    }

    private ChatOptions buildChatOptions(ChatRequest request, String providerName) {
        return switch (providerName.toLowerCase()) {
            case "openai" -> OpenAiChatOptions.builder()
                    .model(request.getModel())
                    .temperature(request.getTemperature())
                    .maxTokens(request.getMaxTokens())
                    .topP(request.getTopP())
                    .build();

            case "anthropic" -> AnthropicChatOptions.builder()
                    .model(request.getModel())
                    .temperature(request.getTemperature())
                    .maxTokens(request.getMaxTokens())
                    .topP(request.getTopP())
                    .build();

            default -> throw new IllegalArgumentException("Unsupported provider: " + providerName);
        };
    }
}
PLUS IT CALLS this model service that fetches LATEST models from /v1/models endpoint from each provider and prints out to console:
Fetching models from provider 'openAi' at URL: https://api.openai.com/v1/models
Fetching models from provider 'anthropic' at URL: https://api.anthropic.com/v1/models
✅ Successfully fetched 9 models from anthropic
✅ Successfully fetched 49 models from openAi
✅ Updated model registry with 58 models from 2 providers ChatClientFactory : Successfully cached 58 models on startup