The vectorset API, for mutate, is unusually granular - one element per command; it feels like a number of the mutate APIs would benefit from allowing variadic usage (on a single vectorset); not least, from the outsider perspective (i.e. I have absolutely no relevant domain knowledge) it feels like this should allow potential optimizations in terms of adjacency maintenance.
Suggestions:
VADD(probably the highest usage mutate API) - I believe that most of the args don't make sense other than the first element - and even if they do: perhaps we can assume that a singleVADDshould take identical values (if not: eat the multiple commands); that leaves, potentially:VADD key [REDUCE dim] (FP32 | VALUES num) vector element [CAS] [NOQUANT | Q8 | BIN] [EF build-exploration-factor] [SETATTR attributes] [M numlinks] [(FP32 | VALUES num) vector element [SETATTR attributes]...], i.e. the only variadic parts are the vector (whereFP32orVALUESimplicitly says "new element incoming", the element, and optionally the attributes - all the other values are assumed to match the 1st element; a minimal bulk load (without attributes) would then be(existing VADD command for blob1/member1) FP32 blob2 member2 FP32 blob3 member3 FP32 blob4 member4 ...- butSETATTR attrNcould optionally be used after each, before the nextFP32/VALUESVREM- potentiallyVREM key element [element ... ]VSETATTR- potentiallyVSETATTR key element "{ JSON obj }" [element "{ JSON obj }"...]- returns the number of successfully added elements (hopefully N)
I'm coming at this mostly from a "what do I expect the API to look like?" perspective, thinking of things like SADD / ZADD, SREM / ZREM, and HSET respectively, and also (in the case of ZADD) thinking of bulk load throughput.
Thoughts?
Comment From: kevin-montrose
I believe additionally variadic options for:
- VGETATTR - VGETATTR key element [element ...]
- VISMEMBER - VISMEMBER key element [element ...]
Also fit nicely.
It's a pity, but I don't think there's a nice way to extend VEMB and VLINKS. Those are relatively niche though.
Comment From: mgravell
Indeed, because of the [RAW] I couldn't think of an unambiguous way of implementing such. But if we did, I'd wager that we'd want to optimize for FP32, so maybe there's potentially a "VEMBFP32 key member [member ..]]" that returns either a string (for one member) or an array of strings... Or something.
Message ID: @.***>
Comment From: kevin-montrose
I suppose with both it could just a separate option:
- VEMB key element [RAW] [COUNT count] [element] ...
* COUNT count must be the last option, future options could be added before COUNT
* Alternatively something like MULTI and allow count to be implicit
- VLINKS key element [WITHSCORES] [COUNT count] [element] ...
* Same deal
Comment From: antirez
Thank you for the suggestion. I'll return to this issue when I'll be back from vacation in September, but to add some context, the idea was to simplify the API with single items only since the latency and CPU cost of VADD and similar commands is so high compared to other Redis commands, that it is almost always a better idea to make the accesses more granular, and there is not much to win by doing multiple operations at once. Also, the threaded nature of vector sets allow for a simpler implementation if there is to perform just a single action each time.
Comment From: kevin-montrose
I'll push back on the ergonomics a bit. It seems reasonabe to want to rebuild a Vector Set (either during development, after a new deployment, or in a recovery scenario, for example), and batching in those cases feels nicer from a user perspective even if isn't that much more efficient. I'd also expect it just for symmetry with the other built-in datatypes all(?) of which seem to support variadic adds/inserts/updates.
There are also some potential efficiency gains with batching, such as those discussed in this paper (see New Technique in ANNS: Batch Insertion and Pruning section) which basically amounts to "perform work on batches in parallel before adding to existing graph". Whether or not those are practical to pursue I don't know.