Expected Behavior

When creating a UserMessage with an image URL using Media, I should be able to specify the image detail option ("low", "high", "auto"), which is supported by the OpenAI API for vision models like GPT-4o.

For example, I expect something like:

UserMessage.builder()
  .text("What do you see?")
  .media(List.of(Media.builder()
      .mimeType(MimeTypeUtils.IMAGE_PNG)
      .data(URI.create("https://example.com/image.png"))
      .detail("low")  // <== This field doesn't currently exist
      .build()))
  .build();

The resulting request payload should include:

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.png",
    "detail": "low"
  }
}

Current Behavior

Currently, the Media abstraction does not support setting a detail value. Even though the internal MediaContent.ImageUrl class accepts a detail parameter, the mapToMediaContent(...) function in OpenAiChatModel uses a constructor that sets it to null.

As a result, it is not possible to control image quality when using image URLs. This is a problem when optimizing for latency or when handling large/multiple images.

Context

I'am building a system using GPT-4o's multimodal capabilities and leveraging Spring AI for easier integration. When sending multiple or large images via URL, being able to reduce the image detail to "low" would provide performance improvements.

However, without this feature:

  • The full-size image is always sent
  • We experience longer response times from the LLM
  • We have no control over performance trade-offs

I am considering customizing OpenAiChatModel and overriding mapToMediaContent to manually inject the detail value, but this workaround adds unnecessary complexity.

If this feature could be added — either by extending the Media class or by offering a more flexible mapping hook — I'd be very happy to contribute.

Thanks again for your great work!

Comment From: sunyuhan1998

Indeed, it appears that this issue also exists in other models (e.g., Mistral, MiniMax, ZhiPu).