Spring Race condition on JMS listener scaling down with ActiveMQ Artemis

After this changes related to this issue, we are experiencing race conditons when jmsListeners are scaling down and some messages are dropped.

I've seen that this idleReceivesPerTaskLimit was changed which previously was always < 0 as default. Now, by default it will always eventually result in the consumers to be stopped resulting for this condition.

Should this condition which now is always false by default and previously was always true, be changed to something like this:

if (messageLimit < 0 && (!surplus || idleTaskExecutionCount< idleLimit))\

cc @jhoeller

Comment From: jhoeller

@fsgonz could you elaborate a bit on your scenario: How are messages getting dropped when scaling down? We got two code paths there, and this change in 6.2 was simply meant to use the idle-receives code path for surplus consumers between core and max by default. The latter code path should not result in a message getting dropped; it'd be great to find out more about your case there.

You could try to explicitly set idleReceivesPerTaskLimit to -1 to restore the previous behavior. We should document that option, actually, since the javadoc does not explicitly mention it.

Comment From: fsgonz

@jhoeller yes, actually we've set idleReceivesPerTaskLimit as a WA. The drop is when one of the consumers already consumed one message and it is stopped just then (because it considers that it was idle). I'm not quite sure the change is responsible for the drop but probably the possibility of the drop was there but as this was by default always negative, it didn't appear till the change was added to effectively do the scale down in those cases. I will try to create a test and create snapshots with a debugger to explicit the race condition.

Comment From: jhoeller

@fsgonz picking this up again: Could you clarify what kind of "dropped" you are actually experiencing? Are those messages already loaded into the given Session but then not consumed from there? Or are those actually lost in the sense of not getting redelivered in another Session either? What's the acknowledge mode you are using, or are you running in transacted mode?

Comment From: fsgonz

Hi! The messages that are lost are the ones already loaded but they are not redelivered for consumed in another session. Sorry I couldn't get to replicate this. I am able to replicate it through when artemis manages the jms consumers but not with activemq so maybe it is a problem with artemis stopping the consumers. I'll come back when trying to replicate in case it is spring boot. For the moment the concern is only for the moment a change in the default behavior but in case it is not a concern for you please feel free to close it and I'll come back after debugging a bit more how the artemis consumers work when been stopped with messages already fetched.

Comment From: jhoeller

@fsgonz thanks for the details, that helps a lot.

Artemis seems to be a bit special there: It uses a default prefetch buffer (defaultConsumerWindowSize) of 1MB in every MessageConsumer which it can preload in the background at any time. Whereas with idleReceivesPerTaskLimit set, Spring assumes that the MessageConsumer and its Session are to be considered empty after a certain number of idle receive attempts, so it simply closes them when it means to reduce the number of active consumers. If I understand this correctly, there is an inherent race condition here when the consumer is being preloaded right inbetween Spring's last empty receive attempt and the actual close call. This may happen on scaling down but also on eventual shutdown of the application.

We have no way to find out whether the buffer is empty on close, so I'm not sure how we could defensively address this on Spring's side. The only way out to reliably avoid this is to set the Artemis defaultConsumerWindowSize to 0. Setting Spring's idleReceivesPerTaskLimit to -1 helps for the scaling down scenario but there is still room for the same race condition on shutdown.

Comment From: jhoeller

@fsgonz it would also be interesting to know whether Artemis performs such potentially lossy background loading in the JMS sessionTransacted=true case as well. I've been diving into the Artemis implementation quite a bit but it still remains unclear to me whether there are intended message-loss semantics with defaultConsumerWindowSize>0 in terms of the MessageConsumer lifecycle - and whether those differ between acknowledge modes.

In any case, there does not seem to be a way to reliably drain a MessageConsumer before closing, making sure that no pre-loaded messages are lost. So I wonder what Artemis would recommend in that case, setting defaultConsumerWindowSize to 0 or rather enforcing sessionTransacted or some other acknowledge mode? That's worth raising with the Artemis team themselves or with some Artemis expert to double-check that intended behavior.

Comment From: spring-projects-issues

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Comment From: spring-projects-issues

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.