Expected Behavior

The document of FilterExpressionBuilder says "This builder DSL mimics the common https://www.baeldung.com/hibernate-criteria-queries syntax." I'd expect all common hibernate operators supported, but the 'like' operator is not supported here.

Current Behavior

No 'like' operator so you can't filter by a metadata containing a certain string.

Context

Many times the similarity search can't return the desired results as the algorithm does not work like the traditional deterministic comparison, in that case we could combine the traditional search against metadata to improve the results. An important and very useful operator to perform the traditional search is 'like', please implement it in FilterExpressionBuilder and related classes.

Comment From: tzolov

The SpringAI VectorStore filter-expression syntax is based on the common SQL filter expressions grammar.

Our aim is to support filter expressions that can been run and ported across the various Vector Stores implementations.

The different stores though support different subsets of filter operators. For some deficiencies, such as missing NOT or IN/NIN we have provided logical transformations that can convert such expressions into semantically equal expressions using different operators. For example the A IN [x, y, z] can be transformed into A == x || A == y || A == z. Hope you get the idea.

Unfortunately the LIKE operator seems to be supported by a limited set of Vector Stores and there is no an obvious workarounds to compensate this deficiencies.

If you have ideas or want to contribute in this space you are more than welcome.

Comment From: rsandx

@tzolov, thanks for looking into this issue quickly. I understand that LIKE operator is more complicated than other simple comparison operators, especially if your goal is to map directly to the syntax supported by the target vector stores. But as you said you have provided logical transformations for some operators such as NOT or IN/NIN, I wonder if you could do that for LIKE. Here is a post that gives some ideas to implement a SQL like 'LIKE' operator in java. I particularly like the following comment:

"You could turn '%string%' to contains(), 'string%' to startsWith() and '%string"' to endsWith().

You should also run toLowerCase() on both the string and pattern as LIKE is case-insenstive.

Not sure how you'd handle '%string%other%' except with a Regular Expression though."

So worst case the LIKE logic could be implemented with some post-processing, although inefficient, but you may find an efficient way to do that. Hope this helps.

Comment From: sjivan

@tzolov any updates on this? The lack of "contains" metadata is a severe limitation and it's a blocker in my usecase from using Spring AI.

Some verctor stores like Milvus, Postgres, Webviate support containts / like metadata filters and libraies like LangChain, Haystack and LangChain4J support the contains filter for these select Vector stores. The FilterExpressionBuilder API should support the ContainsString filter and can throw an UnsupportedoperationException if the underlying Vector store does support it with documentation on which verctor stores support it. See LangChain4J's documentation on Metadata Filter and the ContainsString operator.


https://docs.langchain4j.dev/tutorials/rag#metadata

Filter

The Filter allows filtering by Metadata entries when performing a vector search.

Currently, the following Filter types/operations are supported:

IsEqualTo IsNotEqualTo IsGreaterThan IsGreaterThanOrEqualTo IsLessThan IsLessThanOrEqualTo IsIn IsNotIn ContainsString And Not Or NOTE Not all embedding stores support filtering by Metadata, please see the "Filtering by Metadata" column here.

Some stores that support filtering by Metadata do not support all possible Filter types/operations. For example, ContainsString is currently supported only by Milvus, PgVector and Qdrant.

Comment From: markpollack

Hi, I'm afraid this has slipped down the priority list for 1.0 GA. I've moved the issue to continue the discussion as creating a portable abstraction that doesn't work all the time also creates problems.