Expected Behavior

When creating a UserMessage with an image URL using Media, I should be able to specify the image detail option ("low", "high", "auto"), which is supported by the OpenAI API for vision models like GPT-4o.

For example, I expect something like:

UserMessage.builder()
  .text("What do you see?")
  .media(List.of(Media.builder()
      .mimeType(MimeTypeUtils.IMAGE_PNG)
      .data(URI.create("https://example.com/image.png"))
      .detail("low")  // <== This field doesn't currently exist
      .build()))
  .build();

The resulting request payload should include:

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/image.png",
    "detail": "low"
  }
}

Current Behavior

Currently, the Media abstraction does not support setting a detail value. Even though the internal MediaContent.ImageUrl class accepts a detail parameter, the mapToMediaContent(...) function in OpenAiChatModel uses a constructor that sets it to null.

As a result, it is not possible to control image quality when using image URLs. This is a problem when optimizing for latency or when handling large/multiple images.

Context

I'am building a system using GPT-4o's multimodal capabilities and leveraging Spring AI for easier integration. When sending multiple or large images via URL, being able to reduce the image detail to "low" would provide performance improvements.

However, without this feature:

  • The full-size image is always sent
  • We experience longer response times from the LLM
  • We have no control over performance trade-offs

I am considering customizing OpenAiChatModel and overriding mapToMediaContent to manually inject the detail value, but this workaround adds unnecessary complexity.

If this feature could be added — either by extending the Media class or by offering a more flexible mapping hook — I'd be very happy to contribute.

Thanks again for your great work!

Comment From: sunyuhan1998

Indeed, it appears that this issue also exists in other models (e.g., Mistral, MiniMax, ZhiPu).

Comment From: dev-jonghoonpark

How about this solution?

I resolved the issue by creating an ImageWithDetail class that extends Media, allowing us to add detail data without significantly changing the existing structure.

If the maintainers find this approach acceptable, I will submit a PR implemented in this direction.


test code:

@Test
void imageWithDetail() throws IOException {

    var userMessage = UserMessage.builder()
        .text("Explain what do you see on this picture?")
        .media(List.of(ImageWithDetail.low(Media.builder()
            .mimeType(MimeTypeUtils.IMAGE_PNG)
            .data(URI.create("https://docs.spring.io/spring-ai/reference/_images/multimodal.test.png"))
            .build())))
        .build();

    ChatResponse response = this.chatModel
        .call(new Prompt(List.of(userMessage), OpenAiChatOptions.builder().model("gpt-4o").build()));

    logger.info(response.getResult().getOutput().getText());
    assertThat(response.getResult().getOutput().getText()).containsAnyOf("bananas", "apple", "bowl", "basket",
            "fruit stand");
}

The test results confirm that the intended detail value is included in the request.

{"type":"image_url","image_url":{"url":"https://docs.spring.io/spring-ai/reference/_images/multimodal.test.png","detail":"low"}

ImageWithDetail.java:

public class ImageWithDetail extends Media {

    private final String detail;

    private ImageWithDetail(Media media, String detail) {
        super(media.getMimeType(), media.getData(), media.getId(), media.getName());
        this.detail = detail;
    }

    public static Media low(Media media) {
        return new ImageWithDetail(media, "low");
    }

    public static Media high(Media media) {
        return new ImageWithDetail(media, "high");
    }

    public static Media auto(Media media) {
        return new ImageWithDetail(media, "auto");
    }

    public String getDetail() {
        return detail;
    }

}

OpenAiChatModel.java:

private MediaContent mapToMediaContent(Media media) {
    var mimeType = media.getMimeType();
    if (MimeTypeUtils.parseMimeType("audio/mp3").equals(mimeType)) {
        return new MediaContent(
                new MediaContent.InputAudio(fromAudioData(media.getData()), MediaContent.InputAudio.Format.MP3));
    }
    if (MimeTypeUtils.parseMimeType("audio/wav").equals(mimeType)) {
        return new MediaContent(
                new MediaContent.InputAudio(fromAudioData(media.getData()), MediaContent.InputAudio.Format.WAV));
    }
    if (MimeTypeUtils.parseMimeType("application/pdf").equals(mimeType)) {
        return new MediaContent(new MediaContent.InputFile(media.getName(),
                this.fromMediaData(media.getMimeType(), media.getData())));
    }
    else if (media instanceof ImageWithDetail imageWithDetail) {
      return new MediaContent(new MediaContent.ImageUrl(this.fromMediaData(media.getMimeType(), media.getData()),
                imageWithDetail.getDetail()));
    }
    else {
        return new MediaContent(
                new MediaContent.ImageUrl(this.fromMediaData(media.getMimeType(), media.getData())));
    }
}

Comment From: sunyuhan1998

How about this solution?

I resolved the issue by creating an ImageWithDetail class that extends Media, allowing us to add detail data without significantly changing the existing structure.

I think it looks good. From the perspective of the Media class's original intent, directly modifying Media is not appropriate. Implementing a subclass seems to be a better approach. I really like your solution.

Comment From: weonest

How about this solution?

I resolved the issue by creating an ImageWithDetail class that extends Media, allowing us to add detail data without significantly changing the existing structure.

I think this is a great approach. Modifying the Media class directly wouldn't be appropriate given its original intent and design, so extending it via a subclass makes much more sense. Thanks so much for taking care of this. Really appreciate your help! 🙌