Spring AMQP Process not terminated on OOM error

How come this happened - that is - is this expected behavior? My application was not terminated and started working pretty strange - seems like it could not recover from this java.lang.OutOfMemoryError - but still, spring did not terminate the APP but it continue to run.

2020-10-16 16:57:46 (ERROR): ForgivingExceptionHandler Consumer SimpleConsumer [queue=my.partitioned.32.8.user, consumerTag=phoenix.lcoo.clientodds.7b892e20-480a-420a-
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
        at java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:120)
        at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
        at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
        at org.springframework.util.StreamUtils.copy(StreamUtils.java:143)
        at org.springframework.util.FileCopyUtils.copy(FileCopyUtils.java:110)
        at org.springframework.amqp.support.postprocessor.AbstractDecompressingPostProcessor.postProcessMessage(AbstractDecompressingPostProcessor.java:91)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:1423)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:1400)
        at org.springframework.amqp.rabbit.listener.DirectMessageListenerContainer$SimpleConsumer.callExecuteListener(DirectMessageListenerContainer.java:1036)
        at org.springframework.amqp.rabbit.listener.DirectMessageListenerContainer$SimpleConsumer.handleDelivery(DirectMessageListenerContainer.java:996)
        at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
        at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:104)
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2020-10-16 16:57:46 (ERROR): CachingConnectionFactory Channel shutdown: connection error

Comment From: garyrussell

It's not clear what you expect the framework to do in this case.

Comment From: bojanv55

I guess it cannot recover from this exception? I would just expect that my app exits. Is this something that is missing from my (mis)configuration of spring framework or?

Comment From: garyrussell

OOM errors are hard to deal with; generally the JVM needs to be restarted; with the SimpleMessageListenerContainer we generally create and send a listener failed event if some unexpected exception occurs; with the Direct container, this is run on the client thread; I suppose we could add a try/catch and publish the event (then your event listener code could decide what to do). The problem with OOM errors, though, is that it might prevent us from creating the event in the first place.

Comment From: bojanv55

This is from javadoc:

public class Error
extends Throwable
An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch. Most such errors are abnormal conditions.

The problem I had is that I did not know that my application was in some partially working state, since It did not stop, but it did not work either how it should.

Comment From: bojanv55

Thanks. So by default (I do not need to configure anything?) - Direct and Simple listeners will exit the app on OOM?

Comment From: bojanv55

btw. this fixes my problem, so thanks for that, but shouldn't you implement handler for all Error cases? Basically to use java.lang.Error in catching logic. All these seem pretty serious conditions AnnotationFormatError, AssertionError, AWTError, CoderMalfunctionError, FactoryConfigurationError, FactoryConfigurationError, IOError, LinkageError, ServiceConfigurationError, ThreadDeath, TransformerFactoryConfigurationError, VirtualMachineError.

Comment From: garyrussell

So by default (I do not need to configure anything?)

Correct in the upcoming 2.3 release; with the 2.2.12 backport, the default behavior is do nothing (backwards compatibility); you have to set your own OOMHandler.

I suppose we could open it up to all errors, but, TBH, I don't think I have ever seen any of those errors in over 20 years working with Java.

@artembilan WDYT?

Comment From: artembilan

I don't think that we need care about those errors at the moment. Let it be for a while! We always can come back and extend when more and more requests will come.

Comment From: bojanv55

I just pasted those from javadocs, but I guess there are other common errors like - StackOverflowError, and instead then hardcoding all possible children, I guess easiest would be to just match all java.lang.Error errors.

Comment From: garyrussell

Stack overflow is not fatal; you can recover from it (unlike OOM). It would be wrong to kill the JVM.

Comment From: bojanv55

I guess this SO post has a good explanation about that: https://stackoverflow.com/a/52115695/4882336

Comment From: garyrussell

OK; convinced...

Comment From: bojanv55

Thanx. Seems like more robust solution.

Comment From: imperiouslol

@garyrussell @artembilan I apologize for commenting directly in a closed issue, but I experienced an unfortunate situation related to this change and I'm looking for some guidance if possible. Please let me know if you want me to open up another issue and I'd be happy to.

We are currently deploying all of our applications to a traditional application server which is built on top of Java. As a result of this, invoking System.exit ends up stopping the entire server, shutting down all of our applications rather than just the one which encountered the Error.

This occurred when encountering a LinkageError due to jar clash in my listener. I agree that it's unrecoverable, however, the current implementation is currently bringing down our entire application server.

I'm attempting to override the implementation to revert back to noop in my SimpleMessageListenerContainer, but haven't been successful so far:

...
SimpleMessageListenerContainer listenerContainer = containerFactory.createListenerContainer();
listenerContainer.setjavaLangErrorHandler(new NoopJavaLangErrorHandler());
...

class NoopJavaLangErrorHandler implements JavaLangErrorHandler {

    @Override
    public void handle(Error error) {
    }
}

I'm wondering if you have any guidance as to reverting this to noop in my implementation. Thank you.

Comment From: artembilan

@imperiouslol ,

what you are doing is correct. That one is used like:

            catch (Error e) { //NOSONAR
                logger.error("Consumer thread error, thread abort.", e);
                publishConsumerFailedEvent("Consumer threw an Error", true, e);
                getJavaLangErrorHandler().handle(e);
                aborted = true;
            }

Perhaps your containerFactory.createListenerContainer() is out of use in the application, and the listener container is running from somewhere else. Probably like regular @EnableRabbit or Spring Boot.

Please, raise a new Discussion with more info about your application.

Comment From: imperiouslol

@artembilan Thank you for your response.

Yes, this appears to be the case - my @RabbitListener methods are not leveraging this message listener container. I must be mis-configured.

I will continue to troubleshoot and open a new discussion with configuration details if needed. Thank you again.

Comment From: artembilan

See ContainerCustomizer<SimpleMessageListenerContainer> bean for the purpose to configure that build-in container: https://docs.spring.io/spring-amqp/reference/amqp/receiving-messages/async-annotation-driven/enable.html

Comment From: imperiouslol

@artembilan Thank you for your suggestion - using ContainerCustomizer worked and I'm now able to provide a no-op error handler to prevent this.

I do think it's worth re-iterating that the current default implementation has the potential to bring down an entire application server, rather than the single application which encountered the Error - and in this case, a LinkageError is not detrimental to our entire app server (unlike an OOM error).

I know there was a lot of debate above as to whether or not do this for any Error, as opposed to just an OOM error, and this could be one argument against killing a JVM for any Error.

Thank you again!