Bug description
var criteria = SearchRequest.builder().query(payload)
.topK(properties.getTopK())
.query(text)
.similarityThreshold(properties.getSimilarityThreshold())
.build();
log.info("Searching criteria: {}",criteria);
var similarities = vectorStore.similaritySearch(criteria);
log.info("similarities: {}", similarities);
return filter(payloadDocument, similarities);
See SQL not using threshold
[postgres-embedding-similarity-processor] [ery-processor-1] o.s.jdbc.core.JdbcTemplate : Executing prepared SQL statement [SELECT *, embedding <=> ? AS distance FROM public.vector_store WHERE embedding <=> ? < ? ORDER BY distance LIMIT ? ]
Environment
Expected behavior
Expected threshold to added to the SQL condition
Comment From: sunyuhan1998
I kind of didn't understand your question, in PgVectorStore
, when doing similaritySearch()
, similarityThreshold
looks like it's already used, in the SQL you gave, "embedding <=> ? < ?" part, where the second "?" is the distance
calculated by similarityThreshold
.
The following is the source code for this section:
@Override
public List<Document> doSimilaritySearch(SearchRequest request) {
String nativeFilterExpression = (request.getFilterExpression() != null)
? this.filterExpressionConverter.convertExpression(request.getFilterExpression()) : "";
String jsonPathFilter = "";
if (StringUtils.hasText(nativeFilterExpression)) {
jsonPathFilter = " AND metadata::jsonb @@ '" + nativeFilterExpression + "'::jsonpath ";
}
double distance = 1 - request.getSimilarityThreshold();
PGvector queryEmbedding = getQueryEmbedding(request.getQuery());
return this.jdbcTemplate.query(
String.format(this.getDistanceType().similaritySearchSqlTemplate, getFullyQualifiedTableName(),
jsonPathFilter),
new DocumentRowMapper(this.objectMapper), queryEmbedding, queryEmbedding, distance, request.getTopK());
}
Comment From: ggreen
I AM SO SORRY, I APOLOGIZE. There is no issue with the threshold. This is my error. I was having an issue with the filter Expression, that may be my issue also.