Fast and compact vector format

Would it be possible to add a bit representation for vectors which uses hamming distance for similarity scoring? It is 32x smaller than full float-based vectors, very fast to compare (XOR + popcnt) and doesn’t sacrifice too much in quality. Other engines like elasticsearch and Vespa support it for this reason.

Given the Redis focus on speed and in-memory data I’d have thought this would be an ideal feature.
Users could pass bytes as hex or base64 strings. Queries could have a min similarity (aka max distance) to control match quality.

Comment From: minchopaskal

Do you mean bitfields which are basically bit-vectors? You can BITOP XOR + BITCOUNT. The latter is implemented via popcnt. Or given you mention elasticsearch are you refering to the RediSearch module?

Comment From: markharwood

Yes this is in the context of search which could also mean the use of HNSW indices to accelerate and avoid brute force scans of data. Apologies if this is the wrong repo for this request (new to redis)

Comment From: minchopaskal

No worries, for question related to RediSearch you can refer to its repo. About your question you can take a look here as it seems it was already discussed - https://github.com/RediSearch/RediSearch/issues/1133

Comment From: alrz

I don't think inverted indexes as discussed in https://github.com/RediSearch/RediSearch/issues/1133 cover this. The distance function returns N rows while an inverted index is only for exact match, not "close".

Can you consider reopening?