Context
It seems like there were a lot of changes to the concurrency of singleton creations, here's my understanding of it:
- We have a deadlock issue https://github.com/spring-projects/spring-framework/issues/23501. To fix it we removed the mutex in https://github.com/spring-projects/spring-framework/commit/902e5707a88522a158985138e4ec8b1061184d1c: https://github.com/spring-projects/spring-framework/blob/99a366baf6640b275d08dde60f05da719139bb6a/spring-beans/src/main/java/org/springframework/beans/factory/support/FactoryBeanRegistrySupport.java#L119-L121
- We have no locks at all and that caused https://github.com/spring-projects/spring-framework/issues/33972. So, we added a lock in https://github.com/spring-projects/spring-framework/commit/384dc2a9b812b19f5801e6b10480064efd0be1b5: https://github.com/spring-projects/spring-framework/blob/384dc2a9b812b19f5801e6b10480064efd0be1b5/spring-beans/src/main/java/org/springframework/beans/factory/support/FactoryBeanRegistrySupport.java#L119-L122
- We then add locking depending on threads in: https://github.com/spring-projects/spring-framework/commit/fa168ca78ae134e82db8eacc109bb29266b36fb1 : https://github.com/spring-projects/spring-framework/blob/fa168ca78ae134e82db8eacc109bb29266b36fb1/spring-beans/src/main/java/org/springframework/beans/factory/support/FactoryBeanRegistrySupport.java#L119-L133
Problem
Now we get an issue when multiple threads start calling a @Lazy bean during startup, which then calls its FactoryBean in parallel, and if the factory is not thread safe there are issues.
In the case of #35242 it was actually the DiscoveryClient and InstanceInfoReplicator calling HealthIndicator instances in a background thread, which was in turn calling the feign client.
In our case we had something similar where a bean started a cron job with no delay.
Solution?
In my opinion strict locking should be the enabled by default, and if I understand correctly we would (broadly) arrive at state №1. But for those experiencing deadlocks they can disable strict locking, which was impossible before. From the original issue's description this is quite rare, and I would expect it to definitely be rarer than @Lazy FeignClients and other non thread safe factories:
It's tricky and quite complex. In some situations, listening to the GC notification won't cause any bean creation. It will cause bean creation if you're using Prometheus, have Exemplars enabled, and the lazily created SpanContextSupplier implementation hasn't already been created
What do you think about changing the default to strict? If you want to change it for optimization purposes, maybe we can have a grace period where spring requires thread-safe FactoryBean-s and then use non strict as default?
Here's a "reproduction"
// factory class
public class UnsafeFactoryBean implements FactoryBean<UnsafeFactoryBean.UnsafeBean> {
private final Phaser phaser = new Phaser(2);
@Override
public UnsafeBean getObject() {
System.out.println("Arrived");
// when two threads hit this, we proceed. With strict=true this never happens, so we hang
phaser.arriveAndAwaitAdvance();
return new UnsafeBean();
}
@Override
public Class<?> getObjectType() {
return UnsafeBean.class;
}
public static class UnsafeBean {
public void doSomething() {
System.out.println("doSomething");
}
}
}
// caller class
@Slf4j
public class UnsafeBeanCaller {
private final UnsafeFactoryBean.UnsafeBean unsafeBean;
private final ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();
public UnsafeBeanCaller(UnsafeFactoryBean.UnsafeBean unsafeBean) {
this.unsafeBean = unsafeBean;
check();
}
public void check() {
executorService.scheduleWithFixedDelay(() -> {
log.info("Checking unsafe bean...");
unsafeBean.doSomething();
}, 0, 5000, TimeUnit.MILLISECONDS);
}
}
// config
public class MyConfiguration {
@Bean
public UnsafeFactoryBean unsafeFactoryBean() {
return new UnsafeFactoryBean();
}
@Bean
public UnsafeBeanCaller lazyUnsafeBean(@Lazy UnsafeFactoryBean.UnsafeBean unsafeBean) {
return new UnsafeBeanCaller(unsafeBean);
}
@Bean
public UnsafeBeanCaller unsafeBean(@Lazy UnsafeFactoryBean.UnsafeBean unsafeBean) {
return new UnsafeBeanCaller(unsafeBean);
}
}
Comment From: diukarev
Have the same issue
Comment From: jhoeller
If it is just for a defensive measure against non-thread-safe FactoryBean implementations, we could simply fall back to local synchronization on the given FactoryBean instance when not executing within the general singleton lock. This should cover your scenario where multiple background threads try to obtain the object from the same FactoryBean, even in lenient locking mode.
Thanks for bringing this up again, in particular since we unfortunately missed to reopen #35242. A fallback toward local synchronization will be available in the upcoming 6.2.12 snapshot. Feel free to give it an early try!
Comment From: github-actions[bot]
Fixed via 74dc61b8c4010ad5a96b8a0bf8cbeb7bbc4e39a7
Comment From: dpozinen
Thanks! Will definitely try it once the snapshot is available
Comment From: jhoeller
@dpozinen this should be in the latest 6.2.12 snapshot already (see https://repo.spring.io/snapshot) - let me know whether you can still reproduce the problem there. Assuming that I understood your scenario right, there shouldn't be any concurrent FactoryBean.getObject() invocations between multiple background threads anymore.
Comment From: dpozinen
@jhoeller Well, it seems that thread-1 enters the first if branch, and thread-2 enters the second one, and unfortunately the synchronized block does not save us from this.
Take a look at the reproduction in the issue description, its just a couple of classes
Comment From: jhoeller
@dpozinen good point, I somehow missed the repro completely. There it is actually a race between the main thread and one background thread, not between multiple background threads. And our coordination of specific beans being in creation between the main thread and background threads apparently does not cover the getObject() call. Which means we'd have to extend this defensive synchronization around the FactoryBean instance to the main thread (within the full singleton lock) as well.
Comment From: github-actions[bot]
Fixed via ecd3dd88836f63a437d4065043062a69d0376299
Comment From: jhoeller
@dpozinen a fresh 6.2.12 snapshot with another iteration of this fix is available for testing. This is now literally equivalent to marking the getObject() implementation of a FactoryBean as synchronized. Also, all of this is closely related to our container-internal caching of FactoryBean-produced objects, bypassing further getObject invocations and therefore bypassing further synchronization.
In 7.0 where SmartFactoryBean can produce different types of objects, we are not internally caching getObject-returned instances in the container in case of a SmartFactoryBean anymore. As a consequence, we are not synchronizing around those invocations either; SmartFactoryBean implementations need to be thread-safe in any case.
As a side note for 7.0: Even a simple FactoryBean variant (which just exposes a single type of object) can implement SmartFactoryBean just to signal a thread-safe implementation now. This is straightforward enough since all of the additional methods in SmartFactoryBean come with a default implementation anyway.