Please do a quick search on GitHub issues first, there might be already a duplicate issue for the one you are about to create. If the bug is trivial, just go ahead and create the issue. Otherwise, please take a few moments and fill in the following sections:

Bug description QdrantVectorStore throws a NullPointerException when adding a Document that contains media only (no text).

Environment spring-ai-bom:1.0.0 spring-ai-starter-vector-store-qdrant Java : 24 springboot version : 3.5.0 spring dependency management : version 1.1.7

Steps to reproduce 1. Run Qdrant locally (docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant). 2. Create a Spring Boot app with spring-ai-starter-vector-store-qdrant. 3. Inside a test or service method:

   Media media = new Media(
           MimeType.valueOf("image/png"),
           new byte[] { 0x00 });                  // 1×1 transparent pixel

   Document imgDoc = Document.builder()
           .media(media)                          // ⚠️ no text
           .metadata(Map.of("fileName", "pixel.png"))
           .build();

   vectorStore.add(List.of(imgDoc));             // ← NPE here
 ```

<img width="500" alt="Image" src="https://github.com/user-attachments/assets/f367e2e6-a943-49bf-9691-081adbcf52eb" />

 ```
Observe the stack trace:
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
    at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.toPayload(QdrantVectorStore.java:304)
    at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.lambda$doAdd$0(QdrantVectorStore.java:186)
 ```

**Expected behavior**
QdrantVectorStore should accept media-only documents and store their vector plus metadata without requiring doc_content to be present.

**Minimal Complete Reproducible example**
When a Document instance contains only media (no text) and is passed to QdrantVectorStore.add(...), the call fails with a NullPointerException.
The failure originates in QdrantVectorStore.toPayload(Document):


```java
var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME,
            io.qdrant.client.ValueFactory.value(document.getText())); // ← getText() is null

Because Document was refactored to be either text or media (never both), getText() legitimately returns null for image-only documents. The method still unconditionally adds the "doc_content" field, so a media document triggers the NPE during ValueFactory.value(null).

Typical flow that exposes the bug:

My custom EmbeddingModel converts each incoming image to a float-array vector via a FastAPI CLIP endpoint.

I build a list of media-only Document objects, one per uploaded file.

I inject VectorStore (backed by QdrantVectorStore) and call vectorStore.add(documents).

doAdd(...) calls toPayload(document) for every item; the very first image document crashes with the NPE, aborting the entire batch.

The problem is independent of the embedding model: it happens before the vectors are sent to Qdrant, purely because the payload builder assumes text content is always present.

Comment From: dev-jonghoonpark

The Media class has two public constructors:

  • public Media(MimeType mimeType, URI uri)
  • public Media(MimeType mimeType, Resource resource)

How were you able to use a byte[] as the second argument in your example?

Comment From: dev-jonghoonpark

I think your use case falls outside the expected scope of usage currently supported by Spring AI. If you use a custom embedding model to convert images to vectors, wouldn't it be better to use the QdrantClient directly to add and the vector data? What do you think?

Comment From: devMtn30

The Media class has two public constructors:

  • public Media(MimeType mimeType, URI uri)
  • public Media(MimeType mimeType, Resource resource)

How were you able to use a byte[] as the second argument in your example?

@dev-jonghoonpark

Below is a snippet taken directly from my production codebase for context:

@Service
@RequiredArgsConstructor
public class ImageService {

    private final VectorStore vectorStore;

    public void store(List<MultipartFile> files) {
        try {
            List<Document> documents = toDocumentList(files);
            vectorStore.add(documents);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public List<Document> search(String query, int k) {
        return vectorStore.similaritySearch(SearchRequest.builder()
                .query(query)
                .topK(k)
                .build());
    }

    private List<Document> toDocumentList(List<MultipartFile> files) throws IOException {
        return files.stream().map(
                file -> Document.builder()
                        .media(new Media(MediaType.IMAGE_PNG, file.getResource()))
                        .metadata(Map.of("filename", Objects.requireNonNull(file.getOriginalFilename())))
                        //.text(file.getOriginalFilename())
                        //.idGenerator(new RandomIdGenerator())
                        .build()
        ).collect(Collectors.toList());
    }
}

Comment From: devMtn30

Hi Spring AI team,

While integrating QdrantVectorStore I noticed that toPayload(Document document) always inserts document.getText() into the payload—even when the incoming Document is a Media object whose text is null:

private Map<String, Value> toPayload(Document document) {
    try {
        var payload = QdrantValueFactory.toValueMap(document.getMetadata());
        payload.put(CONTENT_FIELD_NAME,
                io.qdrant.client.ValueFactory.value(document.getText()));
        return payload;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

This can result in a null value (or even an NPE, depending on the Qdrant client) under CONTENT_FIELD_NAME, which breaks downstream similarity-search logic.

Would it make sense to guard this with something like

if (document.isText()) {
    payload.put(CONTENT_FIELD_NAME,
            io.qdrant.client.ValueFactory.value(document.getText()));
}

—or populate it with a sensible default—so non-text Documents are handled safely?

Happy to submit a PR if that approach sounds reasonable. Let me know what you think!

Thanks!

Comment From: devMtn30

I think your use case falls outside the expected scope of usage currently supported by Spring AI. If you use a custom embedding model to convert images to vectors, wouldn't it be better to use the QdrantClient directly to add and the vector data? What do you think?

@dev-jonghoonpark Even with the official spring-ai-starter-model-ollama starter, a media-only Document still triggers a NullPointerException before the data ever reaches Qdrant. The problem lives in DefaultContentFormatter.format(…), which calls document.getText() unconditionally.

java.lang.NullPointerException: Cannot invoke "java.lang.CharSequence.toString()" because "replacement" is null
    at java.base/java.lang.String.replace(String.java:3164)
    at org.springframework.ai.document.DefaultContentFormatter.format(DefaultContentFormatter.java:116)
    at org.springframework.ai.document.Document.getFormattedContent(Document.java:229)
    at org.springframework.ai.embedding.TokenCountBatchingStrategy.batch(TokenCountBatchingStrategy.java:148)
    at org.springframework.ai.embedding.EmbeddingModel.embed(EmbeddingModel.java:87)
    at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.doAdd(QdrantVectorStore.java:179)
// DefaultContentFormatter – line 111+
return this.textTemplate
        .replace(TEMPLATE_METADATA_STRING_PLACEHOLDER, metadataText)
        .replace(TEMPLATE_CONTENT_PLACEHOLDER, document.getText());  // ← null → NPE

Comment From: dev-jonghoonpark

What I meant to say is that the scenario you’re considering doesn’t seem to fit well with the current Spring AI feature. That’s why I suggested it might be better to just use QdrantClient directly. If I’ve misunderstood something, please let me know.


Assuming we modify it in the way you want, how would you like to perform searches after storing the data in the vector database? In the current code, the similaritySearch method in the vector database module allows you to query using a text-based input.

Comment From: devMtn30

@dev-jonghoonpark Hi 👋,

I may be misunderstanding, so I’d like to double-check.

Scenario I’m testing

  1. Pass an image into the official embedding starter
    (org.springframework.ai:spring-ai-starter-model-ollama – no custom model).
  2. The model returns a vector.
  3. Store that vector in a VectorStore (Qdrant).

In other words, the only difference from the “text” path is that the Document contains media instead of text.

What actually happens

Creating the Document itself works:

```java Document.builder() .media(new Media(mime, resource)) .metadata(Map.of("filename", filename)) .build(); ````

But as soon as I call vectorStore.add(List.of(imgDoc)) (or even embeddingModel.embed(List.of(imgDoc))), DefaultContentFormatter.format() calls document.getText() unconditionally and a NullPointerException is thrown.


My questions

  1. Is the simple “media → vector → store” workflow considered out of scope for Spring AI 1.0?
  2. If so, is the current NPE the intended behavior, or should we:

  3. a) guard against null and throw a clear IllegalArgumentException("media-only documents not supported"), or

  4. b) update the docs/Javadoc to state that media-only Documents are not yet end-to-end supported?

The Javadoc for Document explicitly shows a media constructor and an example, so it feels like this should either work or fail with a clear message, rather than a low-level NPE.

Reference doc I followed: https://docs.spring.io/spring-ai/reference/api/vectordbs/qdrant.html

Thanks!

Comment From: dev-jonghoonpark

1.

as far as i know, yes. It's probably impossible not just with Qdrant but with any vector store. I'm not sure about the plans going forward, as I'm just one of the contributors.

If you’re interested in implementing this feature, it seems that both storing media-only document to the vector store and retrieving it would require new implementations.

2.

guard against null and throw a clear IllegalArgumentException("media-only documents not supported")

That sounds like a good approach. It might be worth adding this to something like AbstractObservationVectorStore.

If you're interested in contributing, feel free to give it a try!

Comment From: ilayaperumalg

@devMtn30 Thanks for the detailed writeup on the issue. Please submit a PR with your suggestion to fix this issue. @dev-jonghoonpark Thanks for the review/comments.

Comment From: ilayaperumalg

Adding some additional information:

@devMtn30 Please feel free to submit a PR to fix the NPE issue by throwing an exception when embedding non-text media documents.

Meanwhile, we are discussing to have a better plan to support non text type embeddings. Will keep posted with the progress. Thanks everyone!