Pre-check
- [x] I am sure that all the content I provide is in English.
Search before asking
- [x] I had searched in the issues and found no similar feature requirement.
Apache Dubbo Component
Java SDK (apache/dubbo)
Descriptions
-
When the Registry is wrapped and created by ListenerRegistryWrapper, there is a possibility that the Registry is null (due to connection failure), which may result in a NullPointerException (NPE) in subsequent operations.
Should we consider handling this error during initialization? If the Registry is null, an error can be thrown, allowing the upper-layer caller to handle it accordingly. -
When obtaining the ProviderUrl and the registration mode is ALL (default value), a ServiceDiscoveryRegistry URL will be generated synchronously (org.apache.dubbo.config.utils.ConfigValidationUtils#genCompatibleRegistries).
However, if multiple registries exist and one of them goes down, the ServiceDiscoveryRegistry fails to generate, causing the startup process to fail.
Should we consider:
- 1. Implementing functionality in ServiceDiscoveryRegistry similar to MultipleRegistry, supporting check.
- 2. Lowering the dependency level of ServiceDiscoveryRegistry, making it an attribute of Registry, and allowing other registries to handle its operations.
Related issues
15003
Are you willing to submit a pull request to fix on your own?
- [ ] Yes I am willing to submit a pull request on my own!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Comment From: fantiq
@Stellar1999 can u assign to me ? i will solve it
Comment From: RainYuY
@Stellar1999 can u assign to me ? i will solve it
Thank you for your interest! You can describe how you plan to work on this issue by leaving a comment below, and then submit a PR when you're ready.
Comment From: fantiq
in AbstractRegistryFactory#getRegistry
, if NacosConnectionManager#createNamingService
throw Exception IllegalStateException
and set registry.check = false
,the variable registry
will return null
. maybe it can throw an Exception such as RegistryCreateFailException
instead of return null
.
in ServiceConfig#doExportUrl
we can catch the Exception RegistryCreateFailException
and handler the exception by step:
1. unexport the return exporter
,
2. record the fail registry
in ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add config service.checkThreshold
defaine the failure rate.
in MultipleRegistry
the registry url from qs reference-registry
and service-registry
, the register and subscribe use the different connection. in MultipleServiceDiscovery
the registry url from qs child.*
. register and subscribe to share these connections.
should we define Multiple failure behavior here, such as the failure rate of multiple connections.
Comment From: RainYuY
in
AbstractRegistryFactory#getRegistry
, ifNacosConnectionManager#createNamingService
throw ExceptionIllegalStateException
and setregistry.check = false
,the variableregistry
will returnnull
. maybe it can throw an Exception such asRegistryCreateFailException
instead of returnnull
.in
ServiceConfig#doExportUrl
we can catch the ExceptionRegistryCreateFailException
and handler the exception by step:
- unexport the return
exporter
,- record the fail registry
in
ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add configservice.checkThreshold
defaine the failure rate.in
MultipleRegistry
the registry url from qsreference-registry
andservice-registry
, the register and subscribe use the different connection. inMultipleServiceDiscovery
the registry url from qschild.*
. register and subscribe to share these connections.should we define Multiple failure behavior here, such as the failure rate of multiple connections.
Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure.
In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable.
Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.
Comment From: fantiq
in
AbstractRegistryFactory#getRegistry
, ifNacosConnectionManager#createNamingService
throw ExceptionIllegalStateException
and setregistry.check = false
,the variableregistry
will returnnull
. maybe it can throw an Exception such asRegistryCreateFailException
instead of returnnull
. inServiceConfig#doExportUrl
we can catch the ExceptionRegistryCreateFailException
and handler the exception by step:
- unexport the return
exporter
,- record the fail registry
in
ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add configservice.checkThreshold
defaine the failure rate. inMultipleRegistry
the registry url from qsreference-registry
andservice-registry
, the register and subscribe use the different connection. inMultipleServiceDiscovery
the registry url from qschild.*
. register and subscribe to share these connections. should we define Multiple failure behavior here, such as the failure rate of multiple connections.Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure.
In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable.
Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.
That makes sense, when check = true
, it means the developers don't care about registration status. and the failure rate maybe observed through metrics.
i have reviewed the ZookeeperRegistry, it's connect、register and subscribe is async. Perhaps i just need to modify the default value of
nacos.check
to false
.
but the option combination of registry.check = false
and nacos.check = true
still return null
. should we return a Registry
like DEFAULT_NOP_REGISTRY
or asynchronously retry the connection.
looking forward to your guidance.
Comment From: RainYuY
in
AbstractRegistryFactory#getRegistry
, ifNacosConnectionManager#createNamingService
throw ExceptionIllegalStateException
and setregistry.check = false
,the variableregistry
will returnnull
. maybe it can throw an Exception such asRegistryCreateFailException
instead of returnnull
. inServiceConfig#doExportUrl
we can catch the ExceptionRegistryCreateFailException
and handler the exception by step:
- unexport the return
exporter
,- record the fail registry
in
ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add configservice.checkThreshold
defaine the failure rate. inMultipleRegistry
the registry url from qsreference-registry
andservice-registry
, the register and subscribe use the different connection. inMultipleServiceDiscovery
the registry url from qschild.*
. register and subscribe to share these connections. should we define Multiple failure behavior here, such as the failure rate of multiple connections.Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure. In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable. Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.
That makes sense, when
check = true
, it means the developers don't care about registration status. and the failure rate maybe observed through metrics.i have reviewed the ZookeeperRegistry, it's connect、register and subscribe is async. Perhaps i just need to modify the default value of
nacos.check
tofalse
.but the option combination of
registry.check = false
andnacos.check = true
still returnnull
. should we return aRegistry
likeDEFAULT_NOP_REGISTRY
or asynchronously retry the connection.looking forward to your guidance.
Yes, the core issue is that returning null here prevents the ServiceDiscoveryRegistry from continuing with subsequent logic. While retrying can help with recovery, the priority here should be ensuring the process can proceed.
Comment From: RainYuY
in
AbstractRegistryFactory#getRegistry
, ifNacosConnectionManager#createNamingService
throw ExceptionIllegalStateException
and setregistry.check = false
,the variableregistry
will returnnull
. maybe it can throw an Exception such asRegistryCreateFailException
instead of returnnull
. inServiceConfig#doExportUrl
we can catch the ExceptionRegistryCreateFailException
and handler the exception by step:
- unexport the return
exporter
,- record the fail registry
in
ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add configservice.checkThreshold
defaine the failure rate. inMultipleRegistry
the registry url from qsreference-registry
andservice-registry
, the register and subscribe use the different connection. inMultipleServiceDiscovery
the registry url from qschild.*
. register and subscribe to share these connections. should we define Multiple failure behavior here, such as the failure rate of multiple connections.Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure. In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable. Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.
That makes sense, when
check = true
, it means the developers don't care about registration status. and the failure rate maybe observed through metrics.i have reviewed the ZookeeperRegistry, it's connect、register and subscribe is async. Perhaps i just need to modify the default value of
nacos.check
tofalse
.but the option combination of
registry.check = false
andnacos.check = true
still returnnull
. should we return aRegistry
likeDEFAULT_NOP_REGISTRY
or asynchronously retry the connection.looking forward to your guidance.
How's it going? Are you still making progress?
Comment From: fantiq
in
AbstractRegistryFactory#getRegistry
, ifNacosConnectionManager#createNamingService
throw ExceptionIllegalStateException
and setregistry.check = false
,the variableregistry
will returnnull
. maybe it can throw an Exception such asRegistryCreateFailException
instead of returnnull
. inServiceConfig#doExportUrl
we can catch the ExceptionRegistryCreateFailException
and handler the exception by step:
- unexport the return
exporter
,- record the fail registry
in
ServiceConfig#doExportUrls
calculate the failure rate, if the rate out of bounds the service may export fail. we can add configservice.checkThreshold
defaine the failure rate. inMultipleRegistry
the registry url from qsreference-registry
andservice-registry
, the register and subscribe use the different connection. inMultipleServiceDiscovery
the registry url from qschild.*
. register and subscribe to share these connections. should we define Multiple failure behavior here, such as the failure rate of multiple connections.Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure. In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable. Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.
That makes sense, when
check = true
, it means the developers don't care about registration status. and the failure rate maybe observed through metrics. i have reviewed the ZookeeperRegistry, it's connect、register and subscribe is async. Perhaps i just need to modify the default value ofnacos.check
tofalse
. but the option combination ofregistry.check = false
andnacos.check = true
still returnnull
. should we return aRegistry
likeDEFAULT_NOP_REGISTRY
or asynchronously retry the connection. looking forward to your guidance.How's it going? Are you still making progress?
@RainYuY i will submit the PR next week
Comment From: fantiq
Hi @RainYuY , some suggestion please.
The reason for the issue #15003 are as follows:
when application starting , the registry client will create and it will check the connection status by default.
if the connection status is not available, it will throw an exception java.lang.IllegalStateException
.
Normally the exception will be caught in org.apache.dubbo.registry.support.AbstractRegistryFactory#getRegistry(URL url)
,
but the options registry.check
will effect its behavior, if registry.check=false
it only print the log msg, finally
the return variable registry
is null.
I found the following comment in org.apache.dubbo.registry.RegistryFactory#getRegistry
Params: url – Registry address, is not allowed to be empty
Returns: Registry reference, never return empty value
Perhaps we should consider fixing the issue of method RegistryFactory#getRegistry
returning empty value, whether the registry client needs to check connection status dependency on option registry.check
.
the nacos
client has option nacos.check
, it decided whether to check connection status or not.
the zookeeper
client does not has this option, it always check the connection status.
This is my plan:
if the option registry.check
is false
NacosRegisrty
: in org.apache.dubbo.registry.nacos.util.NacosNamingServiceUtils#createNamingService
if new NacosConnectionManager()
throw an exception, there will create a new NacosConnectionManager
use nacos.check=false
ZookeeperRegistry
: in org.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient#Curator5ZookeeperClient
the logic of checking connection status will only be executed when the option registry.check
is true.
or introduce new option zk.check
and the processing logic is similar as NacosRegistry
?
Comment From: RainYuY
Hi @RainYuY , some suggestion please.
The reason for the issue #15003 are as follows:
when application starting , the registry client will create and it will check the connection status by default. if the connection status is not available, it will throw an exception
java.lang.IllegalStateException
.Normally the exception will be caught in
org.apache.dubbo.registry.support.AbstractRegistryFactory#getRegistry(URL url)
, but the optionsregistry.check
will effect its behavior, ifregistry.check=false
it only print the log msg, finally the return variableregistry
is null.I found the following comment in
org.apache.dubbo.registry.RegistryFactory#getRegistry
Params: url – Registry address, is not allowed to be empty Returns: Registry reference, never return empty value
Perhaps we should consider fixing the issue of method
RegistryFactory#getRegistry
returning empty value, whether the registry client needs to check connection status dependency on optionregistry.check
.the
nacos
client has optionnacos.check
, it decided whether to check connection status or not. thezookeeper
client does not has this option, it always check the connection status.This is my plan:
if the option
registry.check
isfalse
NacosRegisrty
: inorg.apache.dubbo.registry.nacos.util.NacosNamingServiceUtils#createNamingService
ifnew NacosConnectionManager()
throw an exception, there will create a newNacosConnectionManager
usenacos.check=false
ZookeeperRegistry
: inorg.apache.dubbo.remoting.zookeeper.curator5.Curator5ZookeeperClient#Curator5ZookeeperClient
the logic of checking connection status will only be executed when the optionregistry.check
is true. or introduce new optionzk.check
and the processing logic is similar asNacosRegistry
?
You can go ahead — I think it could be a viable solution.