Pre-check

  • [x] I am sure that all the content I provide is in English.

Search before asking

  • [x] I had searched in the issues and found no similar feature requirement.

Apache Dubbo Component

Java SDK (apache/dubbo)

Descriptions

  1. When the Registry is wrapped and created by ListenerRegistryWrapper, there is a possibility that the Registry is null (due to connection failure), which may result in a NullPointerException (NPE) in subsequent operations.
    Should we consider handling this error during initialization? If the Registry is null, an error can be thrown, allowing the upper-layer caller to handle it accordingly.

  2. When obtaining the ProviderUrl and the registration mode is ALL (default value), a ServiceDiscoveryRegistry URL will be generated synchronously (org.apache.dubbo.config.utils.ConfigValidationUtils#genCompatibleRegistries).
    However, if multiple registries exist and one of them goes down, the ServiceDiscoveryRegistry fails to generate, causing the startup process to fail.

Should we consider:
- 1. Implementing functionality in ServiceDiscoveryRegistry similar to MultipleRegistry, supporting check.
- 2. Lowering the dependency level of ServiceDiscoveryRegistry, making it an attribute of Registry, and allowing other registries to handle its operations.

Related issues

15003

Are you willing to submit a pull request to fix on your own?

  • [x] Yes I am willing to submit a pull request on my own!

Code of Conduct

Comment From: fantiq

@Stellar1999 can u assign to me ? i will solve it

Comment From: Stellar1999

@Stellar1999 can u assign to me ? i will solve it

Thank you for your interest! You can describe how you plan to work on this issue by leaving a comment below, and then submit a PR when you're ready.

Comment From: fantiq

in AbstractRegistryFactory#getRegistry, if NacosConnectionManager#createNamingService throw Exception IllegalStateException and set registry.check = false,the variable registry will return null. maybe it can throw an Exception such as RegistryCreateFailException instead of return null.

in ServiceConfig#doExportUrl we can catch the Exception RegistryCreateFailException and handler the exception by step: 1. unexport the return exporter, 2. record the fail registry

in ServiceConfig#doExportUrls calculate the failure rate, if the rate out of bounds the service may export fail. we can add config service.checkThreshold defaine the failure rate.

in MultipleRegistry the registry url from qs reference-registry and service-registry, the register and subscribe use the different connection. in MultipleServiceDiscovery the registry url from qs child.*. register and subscribe to share these connections.

should we define Multiple failure behavior here, such as the failure rate of multiple connections.

Comment From: Stellar1999

in AbstractRegistryFactory#getRegistry, if NacosConnectionManager#createNamingService throw Exception IllegalStateException and set registry.check = false,the variable registry will return null. maybe it can throw an Exception such as RegistryCreateFailException instead of return null.

in ServiceConfig#doExportUrl we can catch the Exception RegistryCreateFailException and handler the exception by step:

  1. unexport the return exporter
  2. record the fail registry

in ServiceConfig#doExportUrls calculate the failure rate, if the rate out of bounds the service may export fail. we can add config service.checkThreshold defaine the failure rate.

in MultipleRegistry the registry url from qs reference-registry and service-registry, the register and subscribe use the different connection. in MultipleServiceDiscovery the registry url from qs child.*. register and subscribe to share these connections.

should we define Multiple failure behavior here, such as the failure rate of multiple connections.

Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure.

In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable.

Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.

Comment From: fantiq

in AbstractRegistryFactory#getRegistry, if NacosConnectionManager#createNamingService throw Exception IllegalStateException and set registry.check = false,the variable registry will return null. maybe it can throw an Exception such as RegistryCreateFailException instead of return null. in ServiceConfig#doExportUrl we can catch the Exception RegistryCreateFailException and handler the exception by step:

  1. unexport the return exporter
  2. record the fail registry

in ServiceConfig#doExportUrls calculate the failure rate, if the rate out of bounds the service may export fail. we can add config service.checkThreshold defaine the failure rate. in MultipleRegistry the registry url from qs reference-registry and service-registry, the register and subscribe use the different connection. in MultipleServiceDiscovery the registry url from qs child.*. register and subscribe to share these connections. should we define Multiple failure behavior here, such as the failure rate of multiple connections.

Good design. However, for this issue, I don’t think we need a counter or similar mechanism to track the failure rate in order to determine success or failure.

In a multi-registry scenario where check=false, the user has already indicated that the application is intended to support multiple registries. The core of this issue is more about ensuring that when multi-registry is enabled, the application can still operate correctly even if one of the registries becomes unavailable.

Therefore, rather than introducing failure rate tracking logic, we should focus on improving the application's resilience to partial registry failures, ensuring that service export or reference can still proceed as long as at least one registry is functional.

That makes sense, when check = true, it means the developers don't care about registration status. and the failure rate maybe observed through metrics.

i have reviewed the ZookeeperRegistry, it's connect、register and subscribe is async. Perhaps i just need to modify the default value of nacos.check to false.

but the option combination of registry.check = false and nacos.check = true still return null. should we return a Registry like DEFAULT_NOP_REGISTRY or asynchronously retry the connection.

looking forward to your guidance.