Expected Behavior Is it currently possible to tweak the underlying server? Or in other words I need to ensure that I can support 100 concurrent users of the MCP server. Will this work out of box?
Current Behavior
It seems that the server starts at /mcp. But it seems that besides configuration of that base endpoint it’s not possible to configure anything else?
For example a base /healthcheck endpoint at the root and tweaking some of the potentially default configurations to improve base scalability with some predictable latency.
Context This is stemming from me finding performance issues when using Python so I’m now exploring the latest advancements of Spring AI and I’m wondering about some of these details that weren’t immediately obvious to me when reading the docs.
As always absolutely mega work and this is great
Comment From: parsa735
hi,i'm not sure but you can think about using loadbalancers or proxy servers for scaling mcp servers.some like HAproxy or ngnix.
Comment From: bruno-oliveira
Hey @parsa735 thanks for the reply, my question was more truly at the code level itself, i.e. can we leverage the configuration of the server from WebMVC to scale "single node" configuration? I mean, in Python the FastMCP library used uvicorn but I want to use something im more familar with so Spring is awesome. So I assume that this will take effect?
# Using spring-ai-starter-mcp-server-webmvc
server:
port: 8000
# Tomcat configuration optimized for Java 21 virtual threads
# Virtual threads auto-scale, no need for explicit thread pool configuration
tomcat:
max-connections: 100 # Max simultaneous connections
accept-count: 100 # Queue length when all connections busy
connection-timeout: 60s # Connection timeout for MCP protocol
keep-alive-timeout: 60s
max-keep-alive-requests: 100
With the above config, if I make simultaneous calls to the MCP server (via different clients at the same time or a load test for eg) then this will take effect and the service will respond accordingly right?
CC @tzolov :)