Hey!
I'm making one issue due to fact, that I was using exactly the same test scenario, but with different configuration. I believe, that this will make it easier to understand what is going on.
I've already spoken about this with @marcingrzejszczak as a part of his free consultations, thank you once again @marcingrzejszczak for this opportunity!
Bug in visual form
Configuration
- http.client.requests:
- web app
- server with virtual threads
- RestClient with ReactorClientHttpRequestFactory
- http.server.requests:
- WebFlux app
- Undertow server
- WebClient with ReactorClientHttpConnector
Version:
- Spring Boot 3.5.4
How to reproduce
Everything is here, please check readme file https://github.com/mateusz-nalepa/spring-http-client-server-seconds-bug-MCVE
Potential places how to fix it
- http.client.requests
DefaultWebClient.DefaultRequestBodyUriSpec#exchange
- do not start observation for
http.client
here, instead starthttp.client.pending
- make sure
http.client.pending
metric/observation starts on server thread
- do not start observation for
ExchangeFunctions.DefaultExchangeFunction#exchange
- end observation for
http.client.pending
- start observation for
http.client
- make sure, both of this happens on client thread
- end observation for
- http.server.requests
TomcatVirtualThreadsWebServerFactoryCustomizer
- instead of
new VirtualThreadExecutor("tomcat-handler-")
- do something like
new MonitoredExecutor(
new VirtualThreadExecutor("tomcat-handler-")
)
class MonitoredExecutor internal constructor(private val delegate: Executor) : Executor {
override fun execute(runnable: Runnable) {
delegate.execute(MonitoredRunnable(runnable))
}
}
class MonitoredRunnable internal constructor(private val delegate: Runnable) : Runnable {
private val startTime = System.nanoTime()
override fun run() {
val duration = Duration.ofNanos(System.nanoTime() - startTime)
VirtualLogger.log(this, "Http server pending request took: $duration")
delegate.run()
}
}
Very important note
I've provided only two configurations of server + client. But probably the same applies for other servers/clients.
Why is this important?
In classical old approach test scenario takes 19s as expected. In webflux/virtual in takes 27s and we are blind.