Please do a quick search on GitHub issues first, the feature you are about to request might have already been requested.
Expected Behavior
Different instances of weaviate may have different properties, such as CONTENT_FIELD_NAME, OBJECT_CLASS, COLLECTION_NAME. It would be great if more properties can be customized to fit for existing weaviate instances, like those did for NEO4J vector store, which is more customizable.
Current Behavior
Currently only very few properties like OBJECT_CLASS and CONSISTENCY_LEVEL can be customized, many properties are hard-coded in the implementation of weaviate vector store, which is difficult to be reused.
Context
As many properties are hard-coded in the current implementation of weaviate vector store, it's very difficult to fit for existing weaviate instances, unless re-writting the implementation.
Comment From: dev-jonghoonpark
@sp-yang
I'd like to try implementing this. but first, I have a few questions about the implementation.
Question 1.
Could you list the specific properties that are needed? Should all of the following variables be configurable?
https://github.com/spring-projects/spring-ai/blob/384478edbf99e28c0a89cd5740d13a4e3f8483e7/vector-stores/spring-ai-weaviate-store/src/main/java/org/springframework/ai/vectorstore/weaviate/WeaviateVectorStore.java#L98-L110
Question 2.
Should COLLECTION_NAME
be configurable?
Currently, the implementation of weaviateVectorStore uses objectClass as the value for collectionName.
https://github.com/spring-projects/spring-ai/blob/384478edbf99e28c0a89cd5740d13a4e3f8483e7/vector-stores/spring-ai-weaviate-store/src/main/java/org/springframework/ai/vectorstore/weaviate/WeaviateVectorStore.java#L432-L438
Comment From: sp-yang
@dev-jonghoonpark : Many thanks for your prompt response. Regarding the 1st question, only CONTENT_FIELD_NAME need to be configurable at the moment. Regarding the 2nd question, using ObjectClass as the value for CollectionName would be OK for us, too.
Comment From: jPhy
I think it would be useful to have buildWeaviateSimilaritySearchFields
protected instead of private.
Then anyone could override it to customize the fields to be considered in the search enabling to bind previously existing databases.
Currently, however, we only need to customize the CONTENT_FIELD_NAME.
Comment From: jPhy
I'm a bit confused about the handling of metadata. The filterMetadataFields
and the Field with name METADATA_FIELD_NAME
look like data duplication to me, please correct me if I misunderstand something:
- When writing to the Weaviate database, then the metadata are written twice, once as JSON String to the Field METADATA_FIELD_NAME
and again for all the fields defined as filterMetadataFields
.
- When reading Documents from the Database, only the Field METADATA_FIELD_NAME
is read and parsed but the filterMetadataFields
are not part of the resulting Document
.
Our pre-existing database has some metadata fields which are not prefixed at all nor written to the database in a second field named METADATA_FIELD_NAME
. Therefore, we currently override (since these methods are private, we actually copy the entire class and modify the copy)
- buildWeaviateSimilaritySearchFields
to add metadata fields without the METADATA_FIELD_PREFIX
-prefix.
private Field[] buildWeaviateSimilaritySearchFields() {
List<Field> searchWeaviateFieldList = new ArrayList<>();
// ************************************************
// * CHANGED: We don't need the other fields *
// ************************************************
searchWeaviateFieldList.add(Field.builder().name(CONTENT_FIELD_NAME).build());
searchWeaviateFieldList.add(Field.builder().name(METADATA_FIELD_NAME).build());
searchWeaviateFieldList.addAll(this.filterMetadataFields.stream()
// *************************
// * CHANGED: no prefix *
// *************************
.map(mf -> Field.builder().name(/* METADATA_FIELD_PREFIX */ + mf.name()).build())
.toList());
return searchWeaviateFieldList.toArray(new Field[0]);
}
toDocument
such that it reads the fields registered asbuildWeaviateSimilaritySearchFields
rather thanMETADATA_FIELD_NAME
. We don't have a field likeMETADATA_FIELD_NAME
with the entire metadata as JSON in our database.
@SuppressWarnings("unchecked")
private Document toDocument(Map<String, ?> item) {
String content = item.get(CONTENT_FIELD_NAME);
// ****************************************************************************************************
// The main point here is that we consider all the weaviateSimilaritySearchFields
// that don't have a special other meaning (like the content) as metadata.
// Since we removed all other fields with special meaning in buildWeaviateSimilaritySearchFields,
// the only remaining special field for us is CONTENT_FIELD_NAME.
// ****************************************************************************************************
var metadata = new HashMap<>(item);
metadata.remove(CONTENT_FIELD_NAME);
return Document.builder()
.text(content)
.metadata((Map<String, Object>) metadata)
.build();
}
We consider our solution very hacky and would like to use your implementation instead. I hope this clarifies the challenges we are currently facing. @dev-jonghoonpark thank you for your support.
Comment From: dev-jonghoonpark
I have submitted a PR related to this issue. I would appreciate it if you could review it.
https://github.com/spring-projects/spring-ai/pull/3555
@sp-yang