Pre-check

  • [x] I am sure that all the content I provide is in English.

Search before asking

  • [x] I had searched in the issues and found no similar feature requirement.

Apache Dubbo Component

Java SDK (apache/dubbo)

Descriptions

The issues with the first approach (https://github.com/apache/dubbo/commit/c8a8946f89ef7f1d3bb9725f754465c6bb675834) are as follows:

  1. If a large number of read-only operations flood in, it is not appropriate for the first approach to wrap the read-write combination into an atomic operation. This will cause those read-only operations to be executed serially, which is very time-consuming (imagine 10K threads executing the compute method’s write operation serially due to the lock imposed by synchronized).
  2. I believe that in this load balancing algorithm, although the first approach can guarantee consistency, it cannot fundamentally prevent errors. For example: in the time window between the execution of the compute method and the return of the doSelect function, if another thread modifies the service list and rebuilds the corresponding values on the hash ring, then the result returned by the select method may be a stale node, which will still trigger subsequent fault-tolerance and retry mechanisms. Of course, since the average time complexity of a TreeMap query is O(log n), this time window is short but still exists.
  3. The compute method itself is not truly atomic—it is closer to being "approximately atomic." From the source code, we can see a situation like this: when adding the first element of a hash bucket, if thread 1 is interrupted (in terms of timing) before executing the casTabAt method (which is essentially a CAS, i.e., optimistic lock) by thread 2, the CAS will fail and directly break, returning null. This will result in obtaining a selector that is null, and calling return selector.select(invocation) will throw a NullPointerException.

Related issues

No response

Are you willing to submit a pull request to fix on your own?

  • [x] Yes I am willing to submit a pull request on my own!

Code of Conduct