Spring AI QdrantVectorStore throws NPE when Document contains only media

Please do a quick search on GitHub issues first, there might be already a duplicate issue for the one you are about to create. If the bug is trivial, just go ahead and create the issue. Otherwise, please take a few moments and fill in the following sections:

Bug description QdrantVectorStore throws a NullPointerException when adding a Document that contains media only (no text).

Environment spring-ai-bom:1.0.0 spring-ai-starter-vector-store-qdrant Java : 24 springboot version : 3.5.0 spring dependency management : version 1.1.7

Steps to reproduce 1. Run Qdrant locally (docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant). 2. Create a Spring Boot app with spring-ai-starter-vector-store-qdrant. 3. Inside a test or service method:

   Media media = new Media(
           MimeType.valueOf("image/png"),
           new byte[] { 0x00 });                  // 1×1 transparent pixel

   Document imgDoc = Document.builder()
           .media(media)                          // ⚠️ no text
           .metadata(Map.of("fileName", "pixel.png"))
           .build();

   vectorStore.add(List.of(imgDoc));             // ← NPE here
 ```

<img width="500" alt="Image" src="https://github.com/user-attachments/assets/f367e2e6-a943-49bf-9691-081adbcf52eb" />

 ```
Observe the stack trace:
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
    at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.toPayload(QdrantVectorStore.java:304)
    at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.lambda$doAdd$0(QdrantVectorStore.java:186)
 ```

**Expected behavior**
QdrantVectorStore should accept media-only documents and store their vector plus metadata without requiring doc_content to be present.

**Minimal Complete Reproducible example**
When a Document instance contains only media (no text) and is passed to QdrantVectorStore.add(...), the call fails with a NullPointerException.
The failure originates in QdrantVectorStore.toPayload(Document):


```java
var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME,
            io.qdrant.client.ValueFactory.value(document.getText())); // ← getText() is null

Because Document was refactored to be either text or media (never both), getText() legitimately returns null for image-only documents. The method still unconditionally adds the "doc_content" field, so a media document triggers the NPE during ValueFactory.value(null).

Typical flow that exposes the bug:

My custom EmbeddingModel converts each incoming image to a float-array vector via a FastAPI CLIP endpoint.

I build a list of media-only Document objects, one per uploaded file.

I inject VectorStore (backed by QdrantVectorStore) and call vectorStore.add(documents).

doAdd(...) calls toPayload(document) for every item; the very first image document crashes with the NPE, aborting the entire batch.

The problem is independent of the embedding model: it happens before the vectors are sent to Qdrant, purely because the payload builder assumes text content is always present.

Comment From: dev-jonghoonpark

The Media class has two public constructors:

public Media(MimeType mimeType, URI uri)
public Media(MimeType mimeType, Resource resource)

How were you able to use a byte[] as the second argument in your example?

Comment From: dev-jonghoonpark

I think your use case falls outside the expected scope of usage currently supported by Spring AI. If you use a custom embedding model to convert images to vectors, wouldn't it be better to use the QdrantClient directly to add and the vector data? What do you think?

Comment From: devMtn30

The Media class has two public constructors:

public Media(MimeType mimeType, URI uri)

public Media(MimeType mimeType, Resource resource)

How were you able to use a byte[] as the second argument in your example?

@dev-jonghoonpark

Below is a snippet taken directly from my production codebase for context:

@Service
@RequiredArgsConstructor
public class ImageService {

    private final VectorStore vectorStore;

    public void store(List<MultipartFile> files) {
        try {
            List<Document> documents = toDocumentList(files);
            vectorStore.add(documents);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public List<Document> search(String query, int k) {
        return vectorStore.similaritySearch(SearchRequest.builder()
                .query(query)
                .topK(k)
                .build());
    }

    private List<Document> toDocumentList(List<MultipartFile> files) throws IOException {
        return files.stream().map(
                file -> Document.builder()
                        .media(new Media(MediaType.IMAGE_PNG, file.getResource()))
                        .metadata(Map.of("filename", Objects.requireNonNull(file.getOriginalFilename())))
                        //.text(file.getOriginalFilename())
                        //.idGenerator(new RandomIdGenerator())
                        .build()
        ).collect(Collectors.toList());
    }
}

Comment From: devMtn30

Hi Spring AI team,

While integrating QdrantVectorStore I noticed that toPayload(Document document) always inserts document.getText() into the payload—even when the incoming Document is a Media object whose text is null:

private Map<String, Value> toPayload(Document document) {
    try {
        var payload = QdrantValueFactory.toValueMap(document.getMetadata());
        payload.put(CONTENT_FIELD_NAME,
                io.qdrant.client.ValueFactory.value(document.getText()));
        return payload;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

This can result in a null value (or even an NPE, depending on the Qdrant client) under CONTENT_FIELD_NAME, which breaks downstream similarity-search logic.

Would it make sense to guard this with something like

if (document.isText()) {
    payload.put(CONTENT_FIELD_NAME,
            io.qdrant.client.ValueFactory.value(document.getText()));
}

—or populate it with a sensible default—so non-text Documents are handled safely?

Happy to submit a PR if that approach sounds reasonable. Let me know what you think!

Thanks!