Spring AI Integration Issue: Spring AI Embedding with VLLM OpenAI API Compatibility

Bug description Unable to use embedding functionality when integrating Spring AI with VLLM using OpenAI API format. The issue appears to be related to HTTP protocol version compatibility.

Environment Spring AI: 1.0.0-M4 JDK: 22 VLLM: 0.6.6.post1

Steps to reproduce 1. Set up VLLM server

vllm serve Alibaba-NLP/gte-Qwen2-1.5B-instruct --task embed --tokenizer Alibaba-NLP/gte-Qwen2-1.5B-instruct

Configure Spring AI to use VLLM endpoint
Attempt to use embedding functionality

@RestController
@RequestMapping("/ai")
public class EmbeddingController {
    private final EmbeddingModel embeddingModel;

    @Autowired
    public EmbeddingController(EmbeddingModel embeddingModel) {
        this.embeddingModel = embeddingModel;
    }

    @GetMapping("/embedding")
    public Map embed(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        EmbeddingResponse embeddingResponse = this.embeddingModel.embedForResponse(List.of(message));
        return Map.of("embedding", embeddingResponse);
    }

}

Expected behavior Return embedding results.

Current Behavior When sending requests to VLLM using Spring AI's OpenAI API implementation: 1. VLLM logs:

WARNING: Unsupported upgrade request.
INFO: - "POST /v1/embeddings HTTP/1.1" 400 Bad Request

Spring AI throws:

org.springframework.ai.retry.NonTransientAiException: 400 - {
    "object": "error",
    "message": "[{'type': 'missing', 'loc': ('body',), 'msg': 'Field required', 'input': None}]",
    "type": "BadRequestError",
    "param": null,
    "code": 400
}

The request being sent contains:

{
    "method": "POST",
    "headers": {
        "Connection": "Upgrade, HTTP2-Settings",
        "Host": "127.0.0.1:8080",
        "Http2-Settings": "AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA",
        "Transfer-Encoding": "chunked",
        "Upgrade": "h2c",
        "User-Agent": "Java-http-client/22.0.1",
        "Authorization": "Bearer test",
        "Content-Type": "application/json"
    }
}

Current Workaround I've implemented a temporary solution by forcing HTTP/1.1:

@Component
public class RestClientCustomizer {
    @Bean
    @Primary
    public RestClient.Builder customRestClientBuilder() {
        HttpClient httpClient = HttpClient.newBuilder()
                .version(HttpClient.Version.HTTP_1_1)
                .build();
        return RestClient.builder()
                .requestFactory(new JdkClientHttpRequestFactory(httpClient));
    }
}

🤔Perhaps a better solution would be for VLLM to provide support.

Comment From: dev-jonghoonpark

@icyclv Which Vector Databases are you experiencing this issue in? In my case, I am experiencing the above issue in chroma.

Comment From: icyclv

@dev-jonghoonpark I am using Elasticsearch as the vector database, but the issue I'm encountering isn't occurring during the storage-to-database phase, but rather during the stage of embedding text into vectors.

Comment From: dev-jonghoonpark

ChromaVectorStoreAutoConfiguration.java

When a ChromaApi instance is created, RestClient.builder() is used as follows:

var chromaApi = new ChromaApi(chromaUrl, restClientBuilderProvider.getIfAvailable(RestClient::builder), objectMapper);

The RestClient.builder() returns DefaultRestClientBuilder:

static Builder builder() {
    return new DefaultRestClientBuilder();
}

DefaultRestClientBuilder.java

when build() method of DefaultRestClientBuilder is invoked, it call initRequestFactory() method

private ClientHttpRequestFactory initRequestFactory() {
    if (this.requestFactory != null) {
        return this.requestFactory;
    } else if (httpComponentsClientPresent) {
        return new HttpComponentsClientHttpRequestFactory();
    } else if (jettyClientPresent) {
        return new JettyClientHttpRequestFactory();
    } else if (reactorNettyClientPresent) {
        return new ReactorClientHttpRequestFactory();
    } else {
        return (ClientHttpRequestFactory)(jdkClientPresent ? new JdkClientHttpRequestFactory() : new SimpleClientHttpRequestFactory());
    }
}

finally, a JdkClientHttpRequestFactory instance selected.

JdkClientHttpRequestFactory.java

The JdkClientHttpRequestFactory uses HttpClient, witch is created by calling HttpClient.newHttpClient(). By default, it uses HTTP/2.

Refer to HttpClient.html#newHttpClient() for more details:

The default settings include: the "GET" request method, a preference of HTTP/2, a redirection policy of NEVER, the default proxy selector, and the default SSL context.

Comment From: reneleonhardt

@icyclv vllm decided to use uvicorn as ASGI server (HTTP 1 only, the roadmap lists HTTP 2 and 3). If they would switch to a more modern ASGI server like hypercorn or granian, they could serve HTTP 1 and 2 (hypercorn even supports HTTP 3): https://fastapi.tiangolo.com/deployment/manually/#asgi-servers

I can't find an issue or pull request regarding uvicorn or ASGI or HTTP/2 (they will never tackle their backlog 🙈😅). @dev-jonghoonpark maybe you want to open a feature request to migrate to granian? hypercorn seems too unstable when I look at their issue backlog.

Comment From: zachary-zhaoqi

有一个临时方法，可以通过nginx做代理，将 HTTP/2 请求转换为 HTTP/1.1 转发给 Uvicorn：

server {
    listen 443 ssl;
    http2 on;

    location / {
        proxy_pass http://localhost:8000;  # 转发到 Uvicorn
        proxy_http_version 1.1;           # 强制使用 HTTP/1.1
    }
}

vllm部署qwen2.5-32B 使用spring ai M6版本连接成功，call stream 都可以

Comment From: kangnn

有一个临时方法，可以通过nginx做代理，将 HTTP/2 请求转换为 HTTP/1.1 转发给 Uvicorn：

server { listen 443 ssl; http2 on;
location / {
    proxy_pass http://localhost:8000;  # 转发到 Uvicorn
    proxy_http_version 1.1;           # 强制使用 HTTP/1.1
}
} vllm部署qwen2.5-32B 使用spring ai M6版本连接成功，call stream 都可以

请问是只修改了这个nginx代理，java中直接用openai那个接口吗

@Bean
    public ChatClient chatClient(OpenAiChatModel model) {
        return ChatClient
                .builder(model)
                .build();
    }

调用：

    @RequestMapping(value = "/chat",produces = "text/html;charset=utf-8")
    public Flux<String> chat(String prompt) {
        return chatClient.prompt()
                .user(prompt)
                .stream()
                .content();
    }

Comment From: Dudu0831

我找到了一个新的解决方案，不需要去麻烦的使用nginx转换，只需要配置如下

然后在使用时注入新的客户端

Comment From: xyombo

我的解决方案：直接自定义RestClient 和 WebClient ，这样是比较保险的配置方式了，因为我还没搞明白什么情况下会用RestClient 或WebClient，但好歹是run起来了 :)

@Configuration
@Slf4j
public class HttpClientConfiguration {


    @Bean
    public RestClient.Builder customRestClientBuilder() {
        HttpClient httpClient = HttpClient.newBuilder().version(HttpClient.Version.HTTP_1_1).build();
        log.debug("Using custom RestClient.Builder with HTTP/1.1");
        return RestClient.builder().requestFactory(new JdkClientHttpRequestFactory(httpClient));
    }

    @Bean
    public WebClient.Builder customWebClientBuilder() {
        HttpClient httpClient = HttpClient.newBuilder().version(HttpClient.Version.HTTP_1_1).build();
        log.debug("Using custom WebClient.Builder with HTTP/1.1");
        return WebClient.builder().clientConnector(new JdkClientHttpConnector(httpClient));
    }
}

Comment From: reneleonhardt

Interesting enough that AI projects like VLLM want users to see them as "state-of-the-art" but refuse to provide HTTP/2 (2015) or HTTP/3 (2022) support... 🤷