Spring Batch 6 introduces support for "resource less" infrastructure. With this mode, a database is not required and this could be helpful for those who do not need it and are currently forced to add an in-memory database for that purpose.
Spring Batch 6 is similar to Spring Session in that regards with a EnableJdbcJobRepository
that can be used by the user to opt-in for database storage.
The biggest problem that I can see now is that Spring Boot extends from the base configuration to customize features and for Spring Session we "just" import the configuration class. With JdbcDefaultBatchConfiguration
extending from DefaultBatchConfiguration
, it makes such extension complicated as we'd have to replicate the overrides in two classes.
cc @fmbenhassine
Comment From: fmbenhassine
Thank you for considering this! I believe Spring Boot should auto-configure a resource less infrastructure by default, and switches to jdbc or mongodb based on classpath scanning (spring-jdbc
/ spring-boot-starter-data-mongodb
) or a property like spring.batch.job.repository=[resourceless,jdbc,mongodb]
. This means starting a new project from start.spring.io
with only batch as dependency would work OOTB, while it currently does not as it requires an embedded database.
The biggest problem that I can see now is that Spring Boot extends from the base configuration to customize features and for Spring Session we "just" import the configuration class. With
JdbcDefaultBatchConfiguration
extending fromDefaultBatchConfiguration
, it makes such extension complicated as we'd have to replicate the overrides in two classes.
What about making SpringBootBatchConfiguration
extend DefaultBatchConfiguration
for the default resource less infrastructure, and adding two new classes SpringBootJdbcBatchConfiguration
and SpringBootMongoBatchConfiguration
that would be conditional?
Comment From: wilkinsona
I believe Spring Boot should auto-configure a resource less infrastructure by default, and switches to jdbc or mongodb based on classpath scanning (spring-jdbc / spring-boot-starter-data-mongodb)
I'm not sure that the classpath will provide a strong enough signal. For example, you may be happy for Batch to use its resource-less infrastructure but you want to use JDBC elsewhere in your app. Similarly, you may be using both JDBC and MongoDB in your app but you want Batch to use one in particular.
or a property like spring.batch.job.repository=[resourceless,jdbc,mongodb]
A property would provide a much stronger signal.
Another option is multiple spring-boot-batch-* modules, but this may be overkill. spring-boot-batch
could be for the resource-less case with new spring-boot-batch-jdbc
and spring-boot-batch-mongodb
handling the two different repositories.
Comment From: fmbenhassine
I agree, classpath scanning won't work. Multiple modules is a clean option, but might be an overkill indeed. So I believe a property is a good middle ground option.
What about using spring.batch.job.repository.type=[resourceless,jdbc,mongodb]
and move jdbc properties and the new mongodb properties (those will be discussed in a separate issue) under spring.batch.job.repository.jdbc.*
and spring.batch.job.repository.mongodb.*
?
Comment From: fmbenhassine
Any chance to get this in 4.0.0-M3?
The biggest problem that I can see now is that Spring Boot extends from the base configuration to customize features and for Spring Session we "just" import the configuration class. With
JdbcDefaultBatchConfiguration
extending fromDefaultBatchConfiguration
, it makes such extension complicated as we'd have to replicate the overrides in two classes.
I was thinking about an improvement that will make this easier (for batch users as well as boot): https://github.com/spring-projects/spring-batch/issues/4962. What do you think?
Another option is multiple spring-boot-batch-* modules, but this may be overkill.
spring-boot-batch
could be for the resource-less case with newspring-boot-batch-jdbc
andspring-boot-batch-mongodb
handling the two different repositories.
In hindsight, I believe this is the best option. A single starter with an enum property means we need to change that enum every time a new repository implementation is added, in addition to having different dependencies in the same starter, etc. Please consider this multiple modules option, which is clean and consistent with other starters.
Comment From: fmbenhassine
I was thinking about an improvement that will make this easier (for batch users as well as boot): spring-projects/spring-batch#4962. What do you think?
Not sure if you had a chance to look at this, but I found some challenges while working on it (see https://github.com/spring-projects/spring-batch/issues/4962#issuecomment-3269477509). So I will hold on for now as we might need that for the moment.
As mentioned in my previous comment, the best option to resolve this ticket in my opinion is by creating different modules, where each module extends one of the specific classes that Batch 6 introduced (JdbcDefaultBatchConfiguration
, MongoDefaultBatchConfiguration
, etc). Please let me know if I can help on this.
Comment From: snicoll
Not sure if you had a chance to look at this
I was on PTO the last two weeks. I don't think I'll have time for this milestone, but I'll try to put it higher in the stack.
Comment From: snicoll
This is blocked on https://github.com/spring-projects/spring-batch/issues/4962
Comment From: snicoll
I am hacking this on https://github.com/snicoll/spring-boot/tree/gh-46307
There is there a spring-boot-batch
and spring-boot-batch-jdbc
modules, with the former only having the auto-configuration for the in-memory store and the latter having the JDBC-specific bits. There is an order so that the JDBC store takes precedence if it is present and if it matches the condition.
I initially made the auto-configurations extend from the base configuration but this broke @ConditionalOnMissingBean(value = DefaultBatchConfiguration.class, annotation = EnableBatchProcessing.class)
. Now that BatchAutoConfiguration
may back off if another store is present, the other beans had to move to a separate auto-config, see BatchJobLauncherAutoConfiguration
.
At this point, the following needs to be clarified:
- Due to https://github.com/spring-projects/spring-batch/issues/4962,
BatchAutoConfiguration
andBatchJdbcAutoConfiguration
extend from different configuration classes and override similar methods. The copy/paste isn't so bad so we might just live with it - Working on the in-memory store, I've noticed that the base configuration exposes
getValidateTransactionState
andgetIsolationLevelForCreate
. While we used to do something for the JDBC store, I am not sure what we could do for the in-memory store, or even why they're available at that level - The tests for the JobLauncher are failing, see this build scan. It looks like the in-memory (so called Resourceless) implementation does not implement all the methods the JDBC variant does, in particular
getLastJobInstance
. We also need to have aPlatformTransactionManager
becausetasklet
forces us to provide one, but it makes no sense in such a scenario AFAIU.
Comment From: fmbenhassine
Great to see the module split, LGTM so far 👍
Due to https://github.com/spring-projects/spring-batch/issues/4962, BatchAutoConfiguration and BatchJdbcAutoConfiguration extend from different configuration classes and override similar methods. The copy/paste isn't so bad so we might just live with it
https://github.com/spring-projects/spring-batch/issues/4962 was an experiment that led me to the conclusion that it complicates things rather than simplify them (the reason is explained here). So we might not need that for now, I had a typo in my previous comment "So I will hold on for now as we might need that for the moment.", sorry about that.
Working on the in-memory store, I've noticed that the base configuration exposes getValidateTransactionState and getIsolationLevelForCreate. While we used to do something for the JDBC store, I am not sure what we could do for the in-memory store, or even why they're available at that level
Those two configuration properties are at that level because they are used in the MongoDefaultBatchConfiguration
as well.
The tests for the JobLauncher are failing, see this build scan. It looks like the in-memory (so called Resourceless) implementation does not implement all the methods the JDBC variant does, in particular getLastJobInstance.
I will fix that.
We also need to have a PlatformTransactionManager because tasklet forces us to provide one, but it makes no sense in such a scenario AFAIU.
This needs to be fixed in Spring Batch as well. The transaction manager should be optional. It should not be required to provide a resourceless transaction manager if there is no need for transactions. I think the same applies to the job repository as well (but I need to validate that).
Comment From: snicoll
Thanks for the feedback @fmbenhassine! I've flagged this as blocked again as we need some changes in Spring Batch to pursue the integration.
Comment From: fmbenhassine
@snicoll I pushed two commits (https://github.com/spring-projects/spring-batch/commit/724874b41560da2a29da87447f7580464d0f561e and https://github.com/spring-projects/spring-batch/commit/76a65501f75fd0115d05d3dd6c90d085a8d8d110) to address the two blocking points, they should be available in the latest snapshots. Please let me know if we need anything else from the Batch side.
Comment From: snicoll
We're still struggling as running a Job doesn't behave the same way between the JDBC store and the in-memory store. Mahmoud and I are chatting to figure out what to do.