OpenAI supports returning embedding responses as base64 encoded which can save a lot of resources see this excellent blog post Speed Up OpenAI Embedding By 4x With This Simple Trick!

Currently Spring AI assumes that the response from OpenAI is coming back as an array of floats see https://github.com/spring-projects/spring-ai/blob/6a1f982203a1de3f1021a7b2279d366cf45afa93/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiEmbeddingModel.java#L181

The embedding class constructor assumes a float[] https://github.com/spring-projects/spring-ai/blob/6a1f982203a1de3f1021a7b2279d366cf45afa93/spring-ai-model/src/main/java/org/springframework/ai/embedding/Embedding.java#L40

You can find info about how the offical python SDK decodes the response at https://github.com/openai/openai-python/blob/db5c35049accb05f5fb03791ef9c12547fd309a7/src/openai/resources/embeddings.py#L204

The offical python SDK uses base64 encoding as of version 1.62 and other sdks are turning on base64 as the default.

Please add support for base64 embeddings to spring AI.

Comment From: sunyuhan1998

Hi @ilayaperumalg , Could you assign this issue to me? Considering that we now support the encodingFormat parameter, I'd like to implement what's described in this issue. We don't need many changes, and I want to submit a PR for it.

The main idea is to add a custom JsonDeserializer for the embedding field in org.springframework.ai.openai.api.OpenAiApi.Embedding. If the returned embedding is in base64 format, we can convert it to a float[] within the JsonDeserializer.

I believe this approach can fulfill the current requirement with minimal impact on the existing code.

Comment From: ilayaperumalg

@sunyuhan1998 Thank you for your interest in contributing. Looking forward to your PR!