cilentsCron:

    if (clientsCronHandleTimeout(c,now)) continue;
    if (clientsCronResizeQueryBuffer(c)) continue;
    if (clientsCronFreeArgvIfIdle(c)) continue;
    if (clientsCronResizeOutputBuffer(c,now)) continue;

    if (clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot)) continue;

Currently, Redis handles client-related tasks such as timeout checks, buffer resizing, and memory management within the main thread. While this design ensures efficiency and simplicity, it may lead to the main thread waiting for IO threads to pause, potentially impacting performance in certain scenarios.

I suggest moving these client handling tasks to IO threads. This approach could offer several advantages:

Simplified Main Thread: Reducing the load on the main thread by offloading non-critical, non-real-time tasks to IO threads could streamline its operations.
Enhanced Performance: By distributing tasks across IO threads, we might achieve better utilization of system resources and reduce potential bottlenecks.
Easier Implementation: Handling client-related tasks in IO threads could simplify the implementation by avoiding complex synchronization between the main thread and IO operations.

Comment From: sundb

The main reason is that these methods involve a lot of race data, unless we can completely strip the io thread and the main thread of their own part, but now seems difficult.

Comment From: wclmxxs

The main reason is that these methods involve a lot of race data, unless we can completely strip the io thread and the main thread of their own part, but now seems difficult.

Thank you for your answer。

I noticed that most data races seem to occur in cluster operations and statistical variables. For the cluster-related parts, could we have the IO threads wait for the main thread to handle those operations? For statistical tracking, perhaps we could tolerate some inaccuracies? This way, the main thread wouldn't need to wait for IO threads, potentially improving performance.

this approach might help optimize the current IO thread pausing/resuming mechanism.

Comment From: sundb

I noticed that most data races seem to occur in cluster operations and statistical variables. For the cluster-related parts, could we have the IO threads wait for the main thread to handle those operations? For statistical tracking, perhaps we could tolerate some inaccuracies? This way, the main thread wouldn't need to wait for IO threads, potentially improving performance.

Statistics can be delayed, but inaccuracies are not allowed, otherwise the statistics will be no sense.

this approach might help optimize the current IO thread pausing/resuming mechanism.

This may be the right approach, and we may be able to make some tries, but it needs to be carefully.

Comment From: wclmxxs

I noticed that most data races seem to occur in cluster operations and statistical variables. For the cluster-related parts, could we have the IO threads wait for the main thread to handle those operations? For statistical tracking, perhaps we could tolerate some inaccuracies? This way, the main thread wouldn't need to wait for IO threads, potentially improving performance.

Statistics can be delayed, but inaccuracies are not allowed, otherwise the statistics will be no sense.

this approach might help optimize the current IO thread pausing/resuming mechanism.

This may be the right approach, and we may be able to make some tries, but it needs to be carefully.

Okay, can I participate in this modification, or can I try it first and ask you to review it?

Comment From: sundb

Okay, can I participate in this modification, or can I try it first and ask you to review it?

welcome.

Comment From: wclmxxs

Okay, can I participate in this modification, or can I try it first and ask you to review it?

welcome.

All threads check the last check time of the client. If the last check time reaches N milliseconds, a check is performed. 1. If the value of lastinteraction exceeds maxidletime or client_blocked, the process is processed by the main thread. 2. The current thread checks and processes the changes of the querybuffer, argv, and outputbuffer. If the changes occur, the main thread processes the changes.

After the main thread processes all clients provided by the iothread, the iothread continues to work

Can you help assess the feasibility?

Comment From: sundb

@wclmxxs In this case, most of the work is still done by the main thread, which increases the communication overhead between the io thread and the main thread.

Comment From: wclmxxs

@wclmxxs In this case, most of the work is still done by the main thread, which increases the communication overhead between the io thread and the main thread.

@sundb My goal is to avoid the main thread needing to pause every time clientscron, and I need to analyze whether this part can be moved to iothread completely.

Comment From: ShooterIT

The first thing i think you need to confirm is if the main thread handling clientsCron does bring performance degradation from benchmark, and how much?

Comment From: wclmxxs

@sundb This is my benchmark test. I feel like pauses and resumes are a big part of the time.

i set iothreadnum to 8

src/redis-benchmark -c 1000 -n 1000000 -t set,get -P 10 --threads 4

pausetime: time when pauseIOThreadsRange(start, end) is executed resumetime indicates the time when resumeIOThreadsRange(start, end) is executed.

results: pausetime processtime resumetime average time 1002 4 312 max time 45185 221 34520

others:

====== SET ======
1000000 requests completed in 0.76 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1322751.38 requests per second latency summary (msec): avg min p50 p95 p99 max 7.238 1.352 6.855 12.591 17.935 22.335

====== GET ======
1000000 requests completed in 0.75 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1326259.88 requests per second latency summary (msec): avg min p50 p95 p99 max 5.875 0.400 5.199 12.575 14.615 18.191

Comment From: wclmxxs

@sundb This is my benchmark test. I feel like pauses and resumes are a big part of the time.

i set iothreadnum to 8

src/redis-benchmark -c 1000 -n 1000000 -t set,get -P 10 --threads 4

pausetime: time when pauseIOThreadsRange(start, end) is executed resumetime indicates the time when resumeIOThreadsRange(start, end) is executed.

results: pausetime processtime resumetime average time 1002 4 312 max time 45185 221 34520

others:

====== SET ====== 1000000 requests completed in 0.76 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1322751.38 requests per second latency summary (msec): avg min p50 p95 p99 max 7.238 1.352 6.855 12.591 17.935 22.335

====== GET ====== 1000000 requests completed in 0.75 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1326259.88 requests per second latency summary (msec): avg min p50 p95 p99 max 5.875 0.400 5.199 12.575 14.615 18.191

@ShooterIT

Comment From: sundb

@wclmxxs can you share your test code?

Comment From: wclmxxs

@wclmxxs can you share your test code?

okay

void clientsManager(int iterations, int curr_peak_mem_usage_slot) { mstime_t now = mstime(); mstime_t startt = ustime(); mstime_t pause = 0; mstime_t resume = 0; mstime_t endt = 0;

/* Pause the IO threads that are processing clients, to let us access clients
 * safely. In order to avoid increasing CPU usage by pausing all threads when
 * there are too many io threads, we pause io threads in multiple batches. */
static int start = 1, end = 0;
if (server.io_threads_num >= 1 && listLength(server.clients) > 0) {
    end = start + CLIENTS_CRON_PAUSE_IOTHREAD - 1;
    if (end >= server.io_threads_num) end = server.io_threads_num - 1;
    pauseIOThreadsRange(start, end);
    pause = ustime();
}

while(listLength(server.clients) && iterations--) {
    client *c;
    listNode *head;

    /* Take the current head, process, and then rotate the head to tail.
     * This way we can fairly iterate all clients step by step. */
    head = listFirst(server.clients);
    c = listNodeValue(head);
    listRotateHeadToTail(server.clients);

    if (c->running_tid != IOTHREAD_MAIN_THREAD_ID &&
        !(c->running_tid >= start && c->running_tid <= end))
    {
        /* Skip clients that are being processed by the IO threads that
         * are not paused. */
        continue;
    }

    /* The following functions do different service checks on the client.
     * The protocol is that they return non-zero if the client was
     * terminated. */
    if (clientsCronHandleTimeout(c,now)) continue; // check the interaction(read and write) time of client
    if (clientsCronResizeQueryBuffer(c)) continue; // Resize the query buffer
    if (clientsCronFreeArgvIfIdle(c)) continue; // free max argv
    if (clientsCronResizeOutputBuffer(c,now)) continue; // Resize the output buffer

    if (clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot)) continue;

    /* Iterating all the clients in getMemoryOverheadData() is too slow and
     * in turn would make the INFO command too slow. So we perform this
     * computation incrementally and track the (not instantaneous but updated
     * to the second) total memory used by clients using clientsCron() in
     * a more incremental way (depending on server.hz).
     * If client eviction is enabled, update the bucket as well. */
    if (!updateClientMemUsageAndBucket(c))
        updateClientMemoryUsage(c);

    if (closeClientOnOutputBufferLimitReached(c, 0)) continue;
}

/* Resume the IO threads that were paused */
if (end) {
    resume = ustime();
    resumeIOThreadsRange(start, end);
    start = end + 1;
    if (start >= server.io_threads_num) start = 1;
    end = 0;
}
endt = ustime();
static int max1 = 0;
static int max2 = 0;
static int max3 = 0;
static int total1 = 0;
static int total2 = 0;
static int total3 = 0;
static int cnt = 0;
int firstPhase = pause - startt;
int secondPhase = resume - pause;
int lastPhase = endt - resume;
if (resume != 0) {
    cnt++;
    total1 += firstPhase;
    total2 += secondPhase;
    total3 += lastPhase;
    if (cnt % 100 == 0) {
        printf("average time is %d %d %d\n", (total1 / cnt), (total2 / cnt), (total3 / cnt));
        printf("max time is %d %d %d\n", max1, max2, max3);
    }
    if (firstPhase > max1) {
        max1 = firstPhase;
    }
    if (secondPhase > max2) {
        max2 = secondPhase;
    }
    if (lastPhase > max3) {
        max3 = lastPhase;
    }
}

}

Comment From: wclmxxs

@sundb @ShooterIT I think the real processing does not take much time, and the pause fluctuates greatly. Do you want to consider avoiding the pause of each clientcron?

Comment From: ShooterIT

@wclmxxs thank you could you please try to skip clientsCron(maybe just return at the beginning), and benchmark? so we can check the performance diff

Comment From: wclmxxs

@wclmxxs thank you could you please try to skip clientsCron(maybe just return at the beginning), and benchmark? so we can check the performance diff

of course:

just return at the beginning:

====== SET ======
10000000 requests completed in 6.50 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes

Summary: throughput summary: 1537515.38 requests per second latency summary (msec): avg min p50 p95 p99 max 6.433 1.296 5.247 10.327 11.375 19.279 ====== GET ======
10000000 requests completed in 5.27 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1897173.25 requests per second latency summary (msec): avg min p50 p95 p99 max 4.707 0.584 4.055 7.839 9.423 14.063

no return:

====== SET ======
10000000 requests completed in 8.03 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1245950.62 requests per second latency summary (msec): avg min p50 p95 p99 max 7.827 1.272 6.535 12.983 15.599 23.711 ====== GET ======
10000000 requests completed in 6.02 seconds 1000 parallel clients 3 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 4

Summary: throughput summary: 1660853.75 requests per second latency summary (msec): avg min p50 p95 p99 max 5.684 0.936 4.935 9.607 11.735 17.247

Comment From: sundb

@wclmxxs i saw that the time consume of bechmark in https://github.com/redis/redis/issues/13885#issuecomment-2756951700 is less than 1 secs. can you compare them by using the same env?

Comment From: wclmxxs

@wclmxxs i saw that the time consume of bechmark in #13885 (comment) is less than 1 secs. can you compare them by using the same env?

The number of requests varies. One is 1000000 and the other is 10000000.

Comment From: wclmxxs

@sundb @ShooterIT If a large amount of data is requested at a time, the impact is greater.

The iothread may read large data, causing other small data processing to be blocked.

src/redis-benchmark -c 1000 -n 100000 -t set,get -P 10 --threads 10 -d 51200

just return at the beginning: ====== SET ======
100000 requests completed in 0.78 seconds 1000 parallel clients 51200 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 10

Summary: throughput summary: 128534.70 requests per second latency summary (msec): avg min p50 p95 p99 max 7.164 1.248 6.271 14.111 23.135 30.911 ====== GET ======
100000 requests completed in 1.00 seconds 1000 parallel clients 51200 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 10

Summary: throughput summary: 99800.40 requests per second latency summary (msec): avg min p50 p95 p99 max 57.555 1.640 63.423 87.487 122.303 127.871

**not return ** ====== SET ======
100000 requests completed in 0.78 seconds 1000 parallel clients 51200 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 10

Summary: throughput summary: 127551.02 requests per second latency summary (msec): avg min p50 p95 p99 max 7.864 1.072 6.039 23.327 33.727 39.679 ====== GET ======
100000 requests completed in 1.25 seconds 1000 parallel clients 51200 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 10

Summary: throughput summary: 79872.20 requests per second latency summary (msec): avg min p50 p95 p99 max 76.839 1.616 84.287 123.199 141.823 145.151

Comment From: wclmxxs

@sundb @ShooterIT

Excuse me. What do you think of this phenomenon?

Comment From: sundb

@wclmxxs thanks, dealing with these overheads should be complicated, as you mentioned on the https://github.com/redis/redis/issues/13885#issuecomment-2754011421, I don't know whether it really works, maybe we reduced the overhead here but increased it elsewhere, and maybe you can implement a POC to see if it works.

Comment From: wclmxxs

@wclmxxs thanks, dealing with these overheads should be complicated, as you mentioned on the #13885 (comment), I don't know whether it really works, maybe we reduced the overhead here but increased it elsewhere, and maybe you can implement a POC to see if it works.

Thank you, I'll try to do some experiments later, and if it works well, I'll call you again.

Comment From: wclmxxs

@sundb @ShooterIT I have modified a version and used the previous benchmark. The performance is optimized. Which benchmark test cases do you think I need to use for verification?

Comment From: sundb

@wclmxxs please note that clientsCronTrackExpansiveClients(), updateClientMemoryUsage(), closeClientOnOutputBufferLimitReached() have race data, how do we avoid multiple io threads touching them?

while (listLength(mainThreadProcessingClients[t->id])) {
    .......................

    /* Update the client in the mem usage */
    int ret = updateClientMemUsageAndBucket(c);
    if (c->flags & CLIENT_BUFFER_CHANGE) {
        int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS;
        clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot);
        c->flags &= ~CLIENT_BUFFER_CHANGE;
        if (!ret) {
            updateClientMemoryUsage(c);
        }
        if (closeClientOnOutputBufferLimitReached(c, 0)) {
            continue;
        }
    }

    ...............................

Comment From: wclmxxs

@wclmxxs please note that clientsCronTrackExpansiveClients(), updateClientMemoryUsage(), closeClientOnOutputBufferLimitReached() have race data, how do we avoid multiple io threads touching them?

``` while (listLength(mainThreadProcessingClients[t->id])) { .......................
/* Update the client in the mem usage */
int ret = updateClientMemUsageAndBucket(c);
if (c->flags & CLIENT_BUFFER_CHANGE) {
    int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot);
    c->flags &= ~CLIENT_BUFFER_CHANGE;
    if (!ret) {
        updateClientMemoryUsage(c);
    }
    if (closeClientOnOutputBufferLimitReached(c, 0)) {
        continue;
    }
}

...............................
```

It is only called in processClientsFromIOThread, and I understand that this function is called only by the main thread?

Comment From: sundb

@wclmxxs yeah, you're right, feel like the approach you're doing might work. why delete the comment?

Comment From: wclmxxs

@wclmxxs yeah, you're right, feel like the approach you're doing might work. why delete the comment?

I'll re-send it

It's a new test result

src/redis-benchmark -c 5000 -n 1000000 -t set,get -P 10 --threads 100 -d 512

brefore:

====== SET ====== 1000000 requests completed in 1.57 seconds 5000 parallel clients 512 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 100

Summary: throughput summary: 638977.62 requests per second latency summary (msec): avg min p50 p95 p99 max 50.847 0.408 41.855 83.967 320.767 409.087 ====== GET ====== 1000000 requests completed in 1.02 seconds 5000 parallel clients 512 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 100

Summary: throughput summary: 979431.88 requests per second latency summary (msec): avg min p50 p95 p99 max 41.543 0.408 42.655 67.839 122.175 215.935

after:

====== SET ====== 1000000 requests completed in 1.30 seconds 5000 parallel clients 512 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 100

Summary: throughput summary: 768049.12 requests per second latency summary (msec): avg min p50 p95 p99 max 46.552 0.536 44.383 69.823 238.463 358.655 ====== GET ====== 1000000 requests completed in 0.77 seconds 5000 parallel clients 512 bytes payload keep alive: 1 host configuration "save": 3600 1 300 100 60 10000 host configuration "appendonly": no multi-thread: yes threads: 100

Summary: throughput summary: 1302083.38 requests per second latency summary (msec): avg min p50 p95 p99 max 31.973 0.328 32.431 47.551 71.295 149.759

The general modification is as follows:

add checkClientsFunc

void handleClientsFromMainThread(struct aeEventLoop ae, int fd, void ptr, int mask) { ........ while((ln = listNext(&li))) { client *c = listNodeValue(ln); .................. if (now - c->resize_buffer_time > 300) { c->resize_buffer_time = now; if (clientsResizeBuffer(c, now)) { c->flags &= CLIENT_BUFFER_CHANGE; } } ........... } ........ if (listLength(t->clients) != 0 && t->check_time - now < 100) { t->check_time = now; checkClients(t, now); } }

void checkClients(IOThread t, mstime_t now) { listIter li; listRewind(t->clients, &li); listNode ln = listNext(&li); while(ln != NULL) { client *c = listNodeValue(ln); ln = listNext(&li); if (server.maxidletime && (now / 1000) - c->lastinteraction > server.maxidletime) { if (checkClientsHandleTimeout(c, now)) continue; } else if (now - c->resize_buffer_time > 200) { c->resize_buffer_time = now; if (!clientsResizeBuffer(c, now) && !(c->flags & CLIENT_BUFFER_CHANGE)) continue; c->flags &= CLIENT_BUFFER_CHANGE; enqueuePendingClientsToMainThread(c, 0); } else { break; } } }

** change processclients from iothread **

void processClientsFromIOThread(IOThread t) { listNode node = NULL;

while (listLength(mainThreadProcessingClients[t->id])) { .......................

/* Update the client in the mem usage */
int ret = updateClientMemUsageAndBucket(c);
if (c->flags & CLIENT_BUFFER_CHANGE) {
    int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot);
    c->flags &= ~CLIENT_BUFFER_CHANGE;
    if (!ret) {
        updateClientMemoryUsage(c);
    }
    if (closeClientOnOutputBufferLimitReached(c, 0)) {
        continue;
    }
}

...............................

}

and ** clientscron **

void clientsCron(void) { / Try to process at least numclients/server.hz of clients * per call. Since normally (if there are no big latency events) this * function is called server.hz times per second, in the average case we * process all the clients in 1 second. / if (server.io_threads_num > 1) { return; } time_t now = mstime() / 1000; int numclients = listLength(server.clients); int iterations = numclients/server.hz;

/ Process at least a few clients while we are at it, even if we need * to process less than CLIENTS_CRON_MIN_ITERATIONS to meet our contract * of processing each client once per second. / if (iterations < CLIENTS_CRON_MIN_ITERATIONS) iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ? numclients : CLIENTS_CRON_MIN_ITERATIONS;

int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS; / Always zero the next sample, so that when we switch to that second, we'll * only register samples that are greater in that second without considering * the history of such slot. * * Note: our index may jump to any random position if serverCron() is not * called for some reason with the normal frequency, for instance because * some slow command is called taking multiple seconds to execute. In that * case our array may end containing data which is potentially older * than CLIENTS_PEAK_MEM_USAGE_SLOTS seconds: however this is not a problem * since here we want just to track if "recently" there were very expansive * clients from the POV of memory usage. / int zeroidx = (curr_peak_mem_usage_slot+1) % CLIENTS_PEAK_MEM_USAGE_SLOTS; ClientsPeakMemInput[zeroidx] = 0; ClientsPeakMemOutput[zeroidx] = 0;

while(listLength(server.clients) && iterations--) { client c; listNode head;

/* Take the current head, process, and then rotate the head to tail.
 * This way we can fairly iterate all clients step by step. */
head = listFirst(server.clients);
c = listNodeValue(head);
listRotateHeadToTail(server.clients);

/* The following functions do different service checks on the client.
 * The protocol is that they return non-zero if the client was
 * terminated. */
if (clientsCronHandleTimeout(c,now)) continue;
if (!clientsResizeBuffer(c, now)) continue;

if (clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot)) continue;

/* Iterating all the clients in getMemoryOverheadData() is too slow and
 * in turn would make the INFO command too slow. So we perform this
 * computation incrementally and track the (not instantaneous but updated
 * to the second) total memory used by clients using clientsCron() in
 * a more incremental way (depending on server.hz).
 * If client eviction is enabled, update the bucket as well. */
if (!updateClientMemUsageAndBucket(c))
    updateClientMemoryUsage(c);

if (closeClientOnOutputBufferLimitReached(c, 0)) continue;

}

Comment From: sundb

@sundb @ShooterIT I have modified a version and used the previous benchmark. The performance is optimized. Which benchmark test cases do you think I need to use for verification?

sorry for late reply, I usually test with the following:

start server

./src/redis-server --save "" --io-threads 10

benchmark

memtier_benchmark --data-size 521 --ratio 1:1 --key-pattern R:R --key-minimum=1 --key-maximum 3000000 --test-time 60 -c 50 -t 13 --hide-histogram -x 3

and use SANITIZER to check the race condition

make SANITIZER=thread 
./runtest --config io-threads 4 --config io-threads-do-reads yes --accurate --verbose --tags network --dump-logs

Comment From: wclmxxs

@sundb @ShooterIT I have modified a version and used the previous benchmark. The performance is optimized. Which benchmark test cases do you think I need to use for verification?

sorry for late reply, I usually test with the following:

start server

./src/redis-server --save "" --io-threads 10

benchmark

memtier_benchmark --data-size 521 --ratio 1:1 --key-pattern R:R --key-minimum=1 --key-maximum 3000000 --test-time 60 -c 50 -t 13 --hide-histogram -x 3

and use SANITIZER to check the race condition

make SANITIZER=thread ./runtest --config io-threads 4 --config io-threads-do-reads yes --accurate --verbose --tags network --dump-logs

Sorry, it took me some time to compile memtier_benchmark, this is the result

before:

Writing results to stdout [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 60 secs] 0 threads: 44245301 ops, 664999 (avg: 737329) ops/sec, 194.38MB/sec (avg: 215.77MB/sec), 0.98 (avg: 0.88) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 60 secs] 0 threads: 40952421 ops, 405461 (avg: 682453) ops/sec, 118.73MB/sec (avg: 199.96MB/sec), 1.60 (avg: 0.95) msec latency

[RUN #3] Preparing benchmark client... [RUN #3] Launching threads now... [RUN #3 100%, 60 secs] 0 threads: 42043932 ops, 681683 (avg: 700682) ops/sec, 199.41MB/sec (avg: 205.30MB/sec), 0.95 (avg: 0.93) msec latency

13 Threads 50 Connections per thread 60 Seconds

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 368658.70 --- --- 0.88094 0.83100 1.80700 4.89500 204717.34 Gets 368653.32 3819.58 364833.74 0.88061 0.83100 1.80700 4.86300 16221.76 Waits 0.00 --- --- --- --- --- --- --- Totals 737312.02 3819.58 364833.74 0.88077 0.83100 1.80700 4.86300 220939.10

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 341237.72 --- --- 0.95195 0.83900 3.90300 8.25500 189490.89 Gets 341232.20 4034.68 337197.52 0.95096 0.83900 3.90300 8.25500 15270.29 Waits 0.00 --- --- --- --- --- --- --- Totals 682469.92 4034.68 337197.52 0.95145 0.83900 3.90300 8.25500 204761.18

AGGREGATED AVERAGE RESULTS (3 runs)

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 353413.73 --- --- 0.91922 0.83100 3.10300 7.77500 196252.01 Gets 353408.39 4000.35 349408.04 0.91826 0.83100 3.10300 7.74300 15724.11 Waits 0.00 --- --- --- --- --- --- --- Totals 706822.12 4000.35 349408.04 0.91874 0.83100 3.10300 7.74300 211976.12

after:

Writing results to stdout [RUN #1] Preparing benchmark client... [RUN #1] Launching threads now... [RUN #1 100%, 60 secs] 0 threads: 48278828 ops, 854994 (avg: 804585) ops/sec, 250.70MB/sec (avg: 234.71MB/sec), 0.76 (avg: 0.81) msec latency

[RUN #2] Preparing benchmark client... [RUN #2] Launching threads now... [RUN #2 100%, 60 secs] 0 threads: 47416286 ops, 774536 (avg: 790184) ops/sec, 227.60MB/sec (avg: 231.66MB/sec), 0.84 (avg: 0.82) msec latency

[RUN #3] Preparing benchmark client... [RUN #3] Launching threads now... [RUN #3 100%, 60 secs] 0 threads: 46825194 ops, 791277 (avg: 780329) ops/sec, 231.82MB/sec (avg: 228.77MB/sec), 0.82 (avg: 0.83) msec latency

13 Threads 50 Connections per thread 60 Seconds

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 402297.11 --- --- 0.80747 0.75900 1.66300 3.87100 223397.11 Gets 402291.64 2693.70 399597.94 0.80691 0.75100 1.66300 3.90300 16946.96 Waits 0.00 --- --- --- --- --- --- --- Totals 804588.75 2693.70 399597.94 0.80719 0.75100 1.66300 3.88700 240344.07

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 390181.56 --- --- 0.83242 0.78300 1.68700 3.53500 216669.34 Gets 390176.53 4881.80 385294.73 0.83221 0.78300 1.68700 3.53500 17598.00 Waits 0.00 --- --- --- --- --- --- --- Totals 780358.10 4881.80 385294.73 0.83232 0.78300 1.68700 3.53500 234267.34

AGGREGATED AVERAGE RESULTS (3 runs)

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 395861.38 --- --- 0.82071 0.77500 1.67100 3.67900 219823.36 Gets 395856.09 4173.32 391682.77 0.81999 0.77500 1.67100 3.69500 17455.14 Waits 0.00 --- --- --- --- --- --- --- Totals 791717.47 4173.32 391682.77 0.82035 0.77500 1.67100 3.69500 237278.50

Comment From: wclmxxs

@sundb and SANITIZER result:

               The End

Execution time of different units: 0 seconds - unit/bitops 0 seconds - unit/bitfield 0 seconds - unit/auth 0 seconds - unit/acl-v2 0 seconds - unit/aofrw 0 seconds - unit/acl 0 seconds - unit/dump 0 seconds - unit/client-eviction 0 seconds - unit/expire 0 seconds - unit/memefficiency 0 seconds - unit/info-command 0 seconds - unit/geo 0 seconds - unit/hyperloglog 0 seconds - unit/functions 0 seconds - unit/introspection-2 0 seconds - unit/info-keysizes 0 seconds - unit/info 0 seconds - unit/keyspace 0 seconds - unit/latency-monitor 0 seconds - unit/introspection 0 seconds - unit/lazyfree 0 seconds - unit/maxmemory 0 seconds - unit/multi 0 seconds - unit/networking 0 seconds - unit/printver 0 seconds - unit/other 0 seconds - unit/obuf-limits 0 seconds - unit/pubsubshard 0 seconds - unit/oom-score-adj 0 seconds - unit/replybufsize 0 seconds - unit/quit 0 seconds - unit/querybuf 0 seconds - unit/tls 0 seconds - unit/slowlog 0 seconds - unit/shutdown 0 seconds - unit/sort 0 seconds - unit/scripting 0 seconds - unit/violations 0 seconds - unit/type/hash 0 seconds - unit/type/incr 0 seconds - unit/type/hash-field-expire 0 seconds - unit/type/list-3 0 seconds - unit/type/list-2 0 seconds - unit/cluster/cluster-response-tls 0 seconds - unit/type/set 0 seconds - unit/type/list 0 seconds - unit/type/stream-cgroups 0 seconds - unit/type/stream 0 seconds - unit/type/string 0 seconds - unit/cluster/announced-endpoints 0 seconds - unit/type/zset 0 seconds - unit/cluster/cli 0 seconds - unit/cluster/failure-marking 0 seconds - unit/cluster/human-announced-nodename 0 seconds - unit/cluster/hostnames 0 seconds - unit/cluster/internal-secret 0 seconds - unit/cluster/links 0 seconds - unit/cluster/misc 0 seconds - unit/cluster/multi-slot-operations 0 seconds - unit/cluster/scripting 0 seconds - unit/cluster/sharded-pubsub 0 seconds - unit/cluster/slot-ownership 0 seconds - integration/aof-race 0 seconds - integration/aof 0 seconds - integration/aof-multi-part 0 seconds - integration/block-repl 0 seconds - integration/convert-ziplist-zset-on-load 0 seconds - integration/convert-ziplist-hash-on-load 0 seconds - integration/convert-zipmap-hash-on-load 0 seconds - integration/dismiss-mem 0 seconds - integration/corrupt-dump 0 seconds - integration/corrupt-dump-fuzzer 0 seconds - integration/logging 0 seconds - integration/failover 0 seconds - integration/psync2-master-restart 0 seconds - integration/psync2-reg 0 seconds - integration/psync2-pingoff 0 seconds - integration/rdb 0 seconds - integration/psync2 0 seconds - integration/replication-3 0 seconds - integration/replication-2 0 seconds - integration/redis-cli 0 seconds - integration/replication-psync 0 seconds - integration/replication-buffer 0 seconds - integration/replication-rdbchannel 0 seconds - integration/shutdown 0 seconds - unit/protocol 1 seconds - integration/redis-benchmark 1 seconds - unit/limits 1 seconds - unit/pause 1 seconds - unit/pubsub 2 seconds - unit/tracking 6 seconds - integration/replication-4 7 seconds - unit/scan 11 seconds - integration/replication 31 seconds - unit/wait 0 seconds - bitops-large-memory 0 seconds - defrag 0 seconds - violations 0 seconds - set-large-memory 0 seconds - list-large-memory

\o/ All tests passed without errors!

Cleanup: may take some time... OK

Comment From: wclmxxs

@sundb Hello, Please help me check the test results, thank you. (My English is not good. Please understand if there is any impoliteness.)

Comment From: sundb

@wclmxxs there are some false positive in the SANITIZER thread report, you'd better run make distclean before make SANITIXZER=thread

Comment From: wclmxxs

@wclmxxs there are some false positive in the SANITIZER thread report, you'd better run make distclean before make SANITIXZER=thread

I will run some tests. If everything looks good, can I submit a PR?

Comment From: sundb

@wclmxxs yes, feel free to create a POC PR.

Comment From: wclmxxs

yes, feel free to create a POC PR.

Alright, thanks!

Comment From: wclmxxs

@wclmxxs yes, feel free to create a POC PR.

@sundb How can I confirm that SANITIZER ran successfully?

The End

Execution time of different units: 0 seconds - unit/auth 0 seconds - unit/aofrw 0 seconds - unit/client-eviction 0 seconds - unit/bitops 0 seconds - unit/bitfield 0 seconds - unit/dump 0 seconds - unit/info-command 0 seconds - unit/hyperloglog 0 seconds - unit/geo 0 seconds - unit/expire 0 seconds - unit/introspection-2 0 seconds - unit/info 0 seconds - unit/lazyfree 0 seconds - unit/latency-monitor 0 seconds - unit/keyspace 0 seconds - unit/info-keysizes 0 seconds - unit/functions 0 seconds - unit/memefficiency 0 seconds - unit/obuf-limits 0 seconds - unit/printver 0 seconds - unit/multi 0 seconds - unit/networking 0 seconds - unit/introspection 0 seconds - unit/pubsubshard 0 seconds - unit/other 0 seconds - unit/querybuf 0 seconds - unit/replybufsize 0 seconds - unit/quit 0 seconds - unit/maxmemory 0 seconds - unit/slowlog 0 seconds - unit/tls 0 seconds - unit/shutdown 0 seconds - unit/sort 0 seconds - unit/violations 0 seconds - unit/type/hash 0 seconds - unit/type/hash-field-expire 0 seconds - unit/type/incr 0 seconds - unit/type/list-2 0 seconds - unit/oom-score-adj 0 seconds - unit/type/list-3 0 seconds - unit/acl-v2 0 seconds - unit/type/set 0 seconds - unit/type/string 0 seconds - unit/type/stream-cgroups 0 seconds - unit/cluster/announced-endpoints 0 seconds - unit/type/list 0 seconds - unit/cluster/failure-marking 0 seconds - unit/cluster/hostnames 0 seconds - unit/cluster/cli 0 seconds - unit/type/zset 0 seconds - unit/cluster/human-announced-nodename 0 seconds - unit/scripting 0 seconds - unit/cluster/misc 0 seconds - unit/cluster/internal-secret 0 seconds - unit/cluster/cluster-response-tls 0 seconds - unit/cluster/links 0 seconds - unit/cluster/scripting 0 seconds - unit/cluster/multi-slot-operations 0 seconds - unit/cluster/slot-ownership 0 seconds - unit/cluster/sharded-pubsub 0 seconds - unit/type/stream 0 seconds - integration/convert-ziplist-hash-on-load 0 seconds - integration/aof-race 0 seconds - integration/block-repl 0 seconds - integration/convert-ziplist-zset-on-load 0 seconds - integration/convert-zipmap-hash-on-load 0 seconds - integration/dismiss-mem 0 seconds - integration/aof 0 seconds - integration/logging 0 seconds - integration/corrupt-dump 0 seconds - integration/aof-multi-part 0 seconds - integration/corrupt-dump-fuzzer 0 seconds - integration/failover 0 seconds - integration/psync2-master-restart 0 seconds - integration/psync2 0 seconds - integration/redis-cli 0 seconds - integration/rdb 0 seconds - integration/psync2-reg 0 seconds - integration/replication-2 0 seconds - integration/replication-3 0 seconds - integration/psync2-pingoff 0 seconds - integration/replication-psync 0 seconds - integration/replication-buffer 0 seconds - integration/replication-rdbchannel 0 seconds - integration/shutdown 0 seconds - unit/acl 4 seconds - unit/protocol 4 seconds - unit/limits 4 seconds - unit/pubsub 7 seconds - unit/tracking 10 seconds - integration/redis-benchmark 10 seconds - unit/scan 11 seconds - unit/pause 14 seconds - integration/replication-4 19 seconds - integration/replication 64 seconds - unit/wait 0 seconds - bitops-large-memory 0 seconds - defrag 0 seconds - violations 0 seconds - set-large-memory 0 seconds - list-large-memory

\o/ All tests passed without errors!

Cleanup: may take some time... OK

Comment From: wclmxxs

@sundb @ShooterIT

Could you please review my modifications first? I will supplement the comments and consider whether this approach has any issues. https://github.com/redis/redis/pull/13900

Comment From: wclmxxs

@sundb @ShooterIT I have locally passed all test cases and added code comments. Would you please help review the changes? Thanks in advance. https://github.com/redis/redis/pull/13900

Comment From: wclmxxs

@sundb Hello, Please help me check the test results, thank you.

@sundb @ShooterIT I have locally passed all test cases and added code comments. Would you please help review the changes? Thanks in advance. #13900

@sundb @ShooterIT

Comment From: sundb

@wclmxxs it's in my list, thanks.

Comment From: egarevoc999

@wclmxxs it's in my list, thanks.

Okay, thanks.

Comment From: ShooterIT

I did a test with 7 io threads (since my machine cpu core is limited) benchmark command is as bellow

memtier_benchmark --data-size 521 --ratio 1:1 --key-pattern R:R --key-minimum=1 --key-maximum 3000000 --test-time 60 -c 50 -t 8 --hide-histogram -x 3

unstable

AGGREGATED AVERAGE RESULTS (3 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       470569.60          ---          ---         0.42555         0.44700         0.64700         1.43100    261309.67
Gets       470566.10      9295.05    461271.05         0.42480         0.44700         0.64700         1.43100     22966.87
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     941135.70      9295.05    461271.05         0.42517         0.44700         0.64700         1.43100    284276.54

skip clientsCron in serverCron

AGGREGATED AVERAGE RESULTS (3 runs)
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets       473220.11          ---          ---         0.42315         0.43900         0.59900         1.12700    262781.45
Gets       473216.73      9279.47    463937.26         0.42244         0.43900         0.59900         1.11900     23061.47
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     946436.84      9279.47    463937.26         0.42280         0.43900         0.59900         1.11900    285842.92

I did not find there is a big difference, and from flame graph, i also didn't find clientsCron costs much CPU.

and in https://github.com/redis/redis/pull/13665, i worried pause/resume io thread may be a bottleneck, i did some tests

Testing has shown that pauseIOThread is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, pauseAllIOThreads with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization.

i don't know why you test has such a difference

Comment From: egarevoc999

I did a test with 7 io threads (since my machine cpu core is limited) benchmark command is as bellow

memtier_benchmark --data-size 521 --ratio 1:1 --key-pattern R:R --key-minimum=1 --key-maximum 3000000 --test-time 60 -c 50 -t 8 --hide-histogram -x 3

unstable

``` AGGREGATED AVERAGE RESULTS (3 runs) ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 470569.60 --- --- 0.42555 0.44700 0.64700 1.43100 261309.67 Gets 470566.10 9295.05 461271.05 0.42480 0.44700 0.64700 1.43100 22966.87 Waits 0.00 --- --- --- --- --- --- --- Totals 941135.70 9295.05 461271.05 0.42517 0.44700 0.64700 1.43100 284276.54 ```

skip clientsCron in serverCron

``` AGGREGATED AVERAGE RESULTS (3 runs) ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

Sets 473220.11 --- --- 0.42315 0.43900 0.59900 1.12700 262781.45 Gets 473216.73 9279.47 463937.26 0.42244 0.43900 0.59900 1.11900 23061.47 Waits 0.00 --- --- --- --- --- --- --- Totals 946436.84 9279.47 463937.26 0.42280 0.43900 0.59900 1.11900 285842.92 ```

I did not find there is a big difference, and from flame graph, i also didn't find clientsCron costs much CPU.

and in #13665, i worried pause/resume io thread may be a bottleneck, i did some tests

Testing has shown that pauseIOThread is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, pauseAllIOThreads with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization.

i don't know why you test has such a difference

@ShooterIT Is it my environment? I don't seem to be much different now. But you can try changing client from 50 to 500: memtier_benchmark --data-size 521 --ratio 1:1 --key-pattern R:R --key-minimum=1 --key-maximum 3000000 --test-time 60 -c 500 -t 13 --hide-histogram -x 3 --port 6378

or make config_hz to 100 and data-size to 52100

Comment From: wclmxxs

@sundb @ShooterIT I simplified the modification by simply moving the processing from clientsCron to processClientsFromIOThread and increasing check_time. I'd like to ask you to see if this is appropriate. Thanks.

The performance result is the same as that in the previous test.

bool cronHandleClients(client *c, int curr_peak_mem_usage_slot) {
    mstime_t now = mstime();
    if (clientsCronHandleTimeout(c,now)) return false;
    if (clientsCronResizeQueryBuffer(c)) return true;
    if (clientsCronFreeArgvIfIdle(c)) return true;
    if (clientsCronResizeOutputBuffer(c,now)) return true;

    if (clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot)) return true;

    if (!updateClientMemUsageAndBucket(c))
        updateClientMemoryUsage(c);

    if (closeClientOnOutputBufferLimitReached(c, 0)) return false;
    return true;
}

void clientsCron(void) {
    int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    int zeroidx = (curr_peak_mem_usage_slot+1) % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    ClientsPeakMemInput[zeroidx] = 0;
    ClientsPeakMemOutput[zeroidx] = 0;
    /* Only handle scenarios without iothread.  */
    if (server.io_threads_num > 1) {
        return;
    }
    int numclients = listLength(server.clients);
    int iterations = numclients/server.hz;

    if (iterations < CLIENTS_CRON_MIN_ITERATIONS)
        iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ?
                     numclients : CLIENTS_CRON_MIN_ITERATIONS;

    while(listLength(server.clients) && iterations--) {
        client *c;
        listNode *head;
        head = listFirst(server.clients);
        c = listNodeValue(head);
        listRotateHeadToTail(server.clients);

        cronHandleClients(c, curr_peak_mem_usage_slot);
    }
}

void processClientsFromIOThread(IOThread *t) {
    listNode *node = NULL;

    mstime_t now = mstime();
    int curr_peak_mem_usage_slot = (now / 1000) % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    while (listLength(mainThreadProcessingClients[t->id])) {
....................
        if (c->check_time + 200 < now) {
            c->check_time = now;
            if (!cronHandleClients(c, curr_peak_mem_usage_slot)) {
                continue;
            }
        }
...............
    }
.......
}

void handleClientsFromMainThread(struct aeEventLoop *ae, int fd, void *ptr, int mask) {
    UNUSED(ae);
    UNUSED(mask);
    .........................

    if (listLength(t->clients) != 0) 
    {
        checkClients(t);
    }
}

/* Check whether threads in the iothread need to be checked by the main thread. */
void checkClients(IOThread *t) {
    mstime_t now = mstime();
    listIter li;
    listRewind(t->clients, &li);
    listNode *ln = listNext(&li);
    while(ln != NULL) {
        client *c = listNodeValue(ln);
        ln = listNext(&li);
        /* Check for idle timeout first */
        if (c->check_time + 200 < now) {
            enqueuePendingClientsToMainThread(c, 0);
        } else {
            /* Optimization: early exit */
            break;
        }
    }
}

Comment From: wclmxxs

@sundb @ShooterIT I simplified the modification by simply moving the processing from clientsCron to processClientsFromIOThread and increasing check_time. I'd like to ask you to see if this is appropriate. Thanks.

The performance result is the same as that in the previous test.

``` bool cronHandleClients(client *c, int curr_peak_mem_usage_slot) { mstime_t now = mstime(); if (clientsCronHandleTimeout(c,now)) return false; if (clientsCronResizeQueryBuffer(c)) return true; if (clientsCronFreeArgvIfIdle(c)) return true; if (clientsCronResizeOutputBuffer(c,now)) return true;
if (clientsCronTrackExpansiveClients(c, curr_peak_mem_usage_slot)) return true;

if (!updateClientMemUsageAndBucket(c))
    updateClientMemoryUsage(c);

if (closeClientOnOutputBufferLimitReached(c, 0)) return false;
return true;
} ```

``` void clientsCron(void) { int curr_peak_mem_usage_slot = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS; int zeroidx = (curr_peak_mem_usage_slot+1) % CLIENTS_PEAK_MEM_USAGE_SLOTS; ClientsPeakMemInput[zeroidx] = 0; ClientsPeakMemOutput[zeroidx] = 0; / Only handle scenarios without iothread. / if (server.io_threads_num > 1) { return; } int numclients = listLength(server.clients); int iterations = numclients/server.hz;
if (iterations < CLIENTS_CRON_MIN_ITERATIONS)
    iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ?
                 numclients : CLIENTS_CRON_MIN_ITERATIONS;

while(listLength(server.clients) && iterations--) {
    client *c;
    listNode *head;
    head = listFirst(server.clients);
    c = listNodeValue(head);
    listRotateHeadToTail(server.clients);

    cronHandleClients(c, curr_peak_mem_usage_slot);
}
} ```

``` void processClientsFromIOThread(IOThread t) { listNode node = NULL;
mstime_t now = mstime();
int curr_peak_mem_usage_slot = (now / 1000) % CLIENTS_PEAK_MEM_USAGE_SLOTS;
while (listLength(mainThreadProcessingClients[t->id])) {
.................... if (c->check_time + 200 < now) { c->check_time = now; if (!cronHandleClients(c, curr_peak_mem_usage_slot)) { continue; } } ............... } ....... } ```

``` void handleClientsFromMainThread(struct aeEventLoop ae, int fd, void ptr, int mask) { UNUSED(ae); UNUSED(mask); .........................
if (listLength(t->clients) != 0) 
{
    checkClients(t);
}
} ```

/* Check whether threads in the iothread need to be checked by the main thread. */ void checkClients(IOThread *t) { mstime_t now = mstime(); listIter li; listRewind(t->clients, &li); listNode *ln = listNext(&li); while(ln != NULL) { client *c = listNodeValue(ln); ln = listNext(&li); /* Check for idle timeout first */ if (c->check_time + 200 < now) { enqueuePendingClientsToMainThread(c, 0); } else { /* Optimization: early exit */ break; } } }

@sundb Sorry to interrupt, can you take a look? Thank you very much.

Redis Why not Consider Moving Client Handling Tasks to IO Threads for Simplified Implementation and Improved Performance

before:

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

AGGREGATED AVERAGE RESULTS (3 runs)

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

after:

BEST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

WORST RUN RESULTS

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec

AGGREGATED AVERAGE RESULTS (3 runs)

Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec