The VectorStore
interfaces covers two things: similarity search and DocumentWriter
/add/accept/delete.
These are different concepts and a SimilaritySearchService
or similarly named interface should be split out.
I currently have something I would like to expose as a VectorStore
that is readonly and does not support the ingestion operations.
Comment From: rjrudin
I agree with this - I think it'll be common to implement similaritySearch
without making use of the add/delete operations - i.e. there could be a completely separate system for getting data into the data store that implements similaritySearch
.
Comment From: markpollack
Yes, I agree as well that we need some redesign here and also te ability to pass options to add/delete operations. There are also other operations besides similarity search that need to be exposed.
Comment From: rjrudin
@markpollack Let me know if this worth a separate ticket, happy to open one - I'd also like to provide additional context to a SearchRequest
, such as additional query context that isn't part of the user's query. For example, a user may wish to engage a chatbot about crimes in a particular area defined by a bounding box. Under the hood, I want to use that bounding box to select a subset of records in a database and then do a similarity search on those records. That additional query context doesn't fit into a Filter.Expression
- perhaps a metadata map on SearchRequest
would be a reasonable extension point?
Comment From: ThomasVitale
As part of the new (experimental) Modular RAG features, a VectorStoreDocumentRetriever has been introduced to support search operations decoupled from the generic CRUD operations in the VectorStore API.
@johnsonr does that help with your use case?
The new DocumentRetriever API will support searching data not only from a vector store, but also from other types of sources, such as web search engines or knowledge graphs.
Docs: https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html#_retrieval
Comment From: markpollack
@sobychacko thoughts on this.
Comment From: markpollack
Yep. We can retrofit a SimilaritySearchOperations
interface (name TBD) to that it can be passed around to code without fear of that code adding anything.
Comment From: markpollack
@rjrudin The SearchRequest
is specifically to feed in the three things that are required for a similarity search - the query string, topk to return and filter expression as a passthrough to execution via a vector db that supports similarity search. Your query that you mention seems orthogonal to that execution path/use case. It sounds more like a geo-query (great with mongodb) in a non-vector-store database and then some sort of custom ranking of those records (perhaps by closest to a point).