I've been trying to figure out an effective way of trimming streams so that they don't grow unbounded, while taking care to only delete messages that have been acknowledged by all consumer groups. I wound up writing a Lua script to handle it (see https://gist.github.com/chanks/c2e7e0efbd3d038775208047abb68524), but I worry about its efficiency, and think that a built-in option to XTRIM could do this more effectively.
Here's a bit of discussion on the redis-db google group: https://groups.google.com/forum/#!topic/redis-db/99HusgMM7QU
In the meantime, if anyone else needs the above Lua script or has any suggestions on how it could be improved, let me know!
Comment From: tsutsu
As an alternative to this approach (defining semantics for an XTRIM
method that preserves PEL-referenced keys, or simulating such semantics with a Lua script), what if keys that were in a PEL just... stuck around, even after they were XTRIM
med or XDEL
ed away, until the PEL let go of them?
Or, in implementation terms—what if streamReplyWithRange
, when passed a group
:
-
"extracted" the value fields from the rax nodes it walked, out into a standalone rObject;
-
rewrote the rax node to have, as a value, a reference the rObj;
-
shared the rObj with the PEL (such that the PEL is also holding a reference to it), giving the field-set rObj a refcount > 1.
Then, a given stream field-set wouldn't go away until it was both XTRIM
med/XDEL
ed (= no more stream rax ref) and XACK
ed or XGROUP DESTROY
ed by any cGroups it had been read into (= no more PEL ref(s).)
(There's probably other implementations that are less costly, maybe just copying rather than rewriting; I'm just using this to get the idea across of the semantics that this design would imply.)
This is basically the semantics I expected from XTRIM
: I'm deleting from the stream, but not deleting from the consumer groups that are fed by the stream. Those cGroups are supposed to have the semantics (if not the implementation) of standalone FIFO queues, with values copied in from the parent stream. No matter what deletion algorithm XTRIM
uses—or even if I use XDEL
explicitly!—I would expect data that was "in" at least one cGroup to stick around in that cGroup.
Comment From: tsutsu
Or, thinking about this more, there's also another alternative semantics, implemented by Google Cloud Pub/Sub's "topic snapshots."
In this design, each cGroup holds a cursor into the stream, where new reads into the cGroup (and through it to a consumer), from the stream, will occur from. (So far the same as a cGroup's last_id
.) However, unlike last_id
, this cursor semantically "co-owns" everything above it in the stream it references. As such, anything in a stream that at least one attached cGroup hasn't gotten around to reading yet, is implicitly "shared" between the stream and said cGroups. Deleting values from the stream when a cGroup "hasn't read them yet" doesn't really delete them, and the cGroup will still be able to make progress and read them eventually. Only when all cGroups have progressed their read cursors past a given point, will the stream be able to deallocate the data for that point.
Before then, reads directly on the stream (rather than through a cGroup) will see the elements as missing from the stream, but they're not—they're just visibility-tracked for direct readers (using e.g. a combination of a calculated minimum of cGroup cursors, and a visibility bitmap for random XDEL
s.) New cGroups created off the stream, would also use this same "direct read" semantics to translate their configured initial position into an effective initial cursor position (so a new cGroup starting "from 0" would just start with the first non-trimmed event.)
This design can be extended by making this referential constraint optional; some cGroups could have cursors which are "weak references" (equivalent to today's behavior), while others could be "strong references." The min of the set of strong-reference cGroup cursors, then, would be the "checkpoint" below which XTRIM
med/XDEL
ed nodes could be fully purged.
Comment From: parikls
+1 for this
Comment From: pibesk
+1
Comment From: JamesRamm
Believe this is also the same request: https://github.com/redis/redis/issues/6403
Comment From: Yohe-Am
Such a feature would be great. Thanks for the script @chanks
Comment From: mgagliardo91
+1
Comment From: Renrhaf
+1
Comment From: forgotPassword
Redis 6.2 (Feb 2021) added support for MINID argument for XTRIM. You can update above script, or, don't forget that the ID's are timestamps. Eg you probably can trim messages older than 1 month, regardless if they were ack or not... xtrim <key> minid ~ <timestamp>
.
Comment From: ricoisme
+1
Comment From: jerviscui
+1
Comment From: sundb
Implemented by https://github.com/redis/redis/pull/14130