Please do a quick search on GitHub issues first, there might be already a duplicate issue for the one you are about to create. If the bug is trivial, just go ahead and create the issue. Otherwise, please take a few moments and fill in the following sections:
Bug description
QdrantVectorStore
throws a NullPointerException when adding a Document
that contains media only (no text).
Environment spring-ai-bom:1.0.0 spring-ai-starter-vector-store-qdrant Java : 24 springboot version : 3.5.0 spring dependency management : version 1.1.7
Steps to reproduce
1. Run Qdrant locally (docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
).
2. Create a Spring Boot app with spring-ai-starter-vector-store-qdrant
.
3. Inside a test or service method:
Media media = new Media(
MimeType.valueOf("image/png"),
new byte[] { 0x00 }); // 1×1 transparent pixel
Document imgDoc = Document.builder()
.media(media) // ⚠️ no text
.metadata(Map.of("fileName", "pixel.png"))
.build();
vectorStore.add(List.of(imgDoc)); // ← NPE here
```
<img width="500" alt="Image" src="https://github.com/user-attachments/assets/f367e2e6-a943-49bf-9691-081adbcf52eb" />
```
Observe the stack trace:
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.toPayload(QdrantVectorStore.java:304)
at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.lambda$doAdd$0(QdrantVectorStore.java:186)
```
**Expected behavior**
QdrantVectorStore should accept media-only documents and store their vector plus metadata without requiring doc_content to be present.
**Minimal Complete Reproducible example**
When a Document instance contains only media (no text) and is passed to QdrantVectorStore.add(...), the call fails with a NullPointerException.
The failure originates in QdrantVectorStore.toPayload(Document):
```java
var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME,
io.qdrant.client.ValueFactory.value(document.getText())); // ← getText() is null
Because Document was refactored to be either text or media (never both), getText() legitimately returns null for image-only documents. The method still unconditionally adds the "doc_content" field, so a media document triggers the NPE during ValueFactory.value(null).
Typical flow that exposes the bug:
My custom EmbeddingModel converts each incoming image to a float-array vector via a FastAPI CLIP endpoint.
I build a list of media-only Document objects, one per uploaded file.
I inject VectorStore (backed by QdrantVectorStore) and call vectorStore.add(documents).
doAdd(...) calls toPayload(document) for every item; the very first image document crashes with the NPE, aborting the entire batch.
The problem is independent of the embedding model: it happens before the vectors are sent to Qdrant, purely because the payload builder assumes text content is always present.
Comment From: dev-jonghoonpark
The Media class has two public constructors:
public Media(MimeType mimeType, URI uri)
public Media(MimeType mimeType, Resource resource)
How were you able to use a byte[]
as the second argument in your example?
Comment From: dev-jonghoonpark
I think your use case falls outside the expected scope of usage currently supported by Spring AI.
If you use a custom embedding model to convert images to vectors, wouldn't it be better to use the QdrantClient
directly to add and the vector data?
What do you think?
Comment From: devMtn30
The Media class has two public constructors:
public Media(MimeType mimeType, URI uri)
public Media(MimeType mimeType, Resource resource)
How were you able to use a
byte[]
as the second argument in your example?
@dev-jonghoonpark
Below is a snippet taken directly from my production codebase for context:
@Service
@RequiredArgsConstructor
public class ImageService {
private final VectorStore vectorStore;
public void store(List<MultipartFile> files) {
try {
List<Document> documents = toDocumentList(files);
vectorStore.add(documents);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public List<Document> search(String query, int k) {
return vectorStore.similaritySearch(SearchRequest.builder()
.query(query)
.topK(k)
.build());
}
private List<Document> toDocumentList(List<MultipartFile> files) throws IOException {
return files.stream().map(
file -> Document.builder()
.media(new Media(MediaType.IMAGE_PNG, file.getResource()))
.metadata(Map.of("filename", Objects.requireNonNull(file.getOriginalFilename())))
//.text(file.getOriginalFilename())
//.idGenerator(new RandomIdGenerator())
.build()
).collect(Collectors.toList());
}
}
Comment From: devMtn30
Hi Spring AI team,
While integrating QdrantVectorStore
I noticed that toPayload(Document document)
always inserts document.getText()
into the payload—even when the incoming Document
is a Media
object whose text is null
:
private Map<String, Value> toPayload(Document document) {
try {
var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME,
io.qdrant.client.ValueFactory.value(document.getText()));
return payload;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
This can result in a null
value (or even an NPE, depending on the Qdrant client) under CONTENT_FIELD_NAME
, which breaks downstream similarity-search logic.
Would it make sense to guard this with something like
if (document.isText()) {
payload.put(CONTENT_FIELD_NAME,
io.qdrant.client.ValueFactory.value(document.getText()));
}
—or populate it with a sensible default—so non-text Document
s are handled safely?
Happy to submit a PR if that approach sounds reasonable. Let me know what you think!
Thanks!
Comment From: devMtn30
I think your use case falls outside the expected scope of usage currently supported by Spring AI. If you use a custom embedding model to convert images to vectors, wouldn't it be better to use the
QdrantClient
directly to add and the vector data? What do you think?
@dev-jonghoonpark Even with the official spring-ai-starter-model-ollama starter, a media-only Document still triggers a NullPointerException before the data ever reaches Qdrant. The problem lives in DefaultContentFormatter.format(…), which calls document.getText() unconditionally.
java.lang.NullPointerException: Cannot invoke "java.lang.CharSequence.toString()" because "replacement" is null
at java.base/java.lang.String.replace(String.java:3164)
at org.springframework.ai.document.DefaultContentFormatter.format(DefaultContentFormatter.java:116)
at org.springframework.ai.document.Document.getFormattedContent(Document.java:229)
at org.springframework.ai.embedding.TokenCountBatchingStrategy.batch(TokenCountBatchingStrategy.java:148)
at org.springframework.ai.embedding.EmbeddingModel.embed(EmbeddingModel.java:87)
at org.springframework.ai.vectorstore.qdrant.QdrantVectorStore.doAdd(QdrantVectorStore.java:179)
// DefaultContentFormatter – line 111+
return this.textTemplate
.replace(TEMPLATE_METADATA_STRING_PLACEHOLDER, metadataText)
.replace(TEMPLATE_CONTENT_PLACEHOLDER, document.getText()); // ← null → NPE
Comment From: dev-jonghoonpark
What I meant to say is that the scenario you’re considering doesn’t seem to fit well with the current Spring AI feature.
That’s why I suggested it might be better to just use QdrantClient
directly.
If I’ve misunderstood something, please let me know.
Assuming we modify it in the way you want, how would you like to perform searches after storing the data in the vector database?
In the current code, the similaritySearch
method in the vector database module allows you to query using a text-based input.
Comment From: devMtn30
@dev-jonghoonpark Hi 👋,
I may be misunderstanding, so I’d like to double-check.
Scenario I’m testing
- Pass an image into the official embedding starter
(org.springframework.ai:spring-ai-starter-model-ollama
– no custom model). - The model returns a vector.
- Store that vector in a VectorStore (Qdrant).
In other words, the only difference from the “text” path is that the
Document
contains media
instead of text
.
What actually happens
Creating the Document
itself works:
```java Document.builder() .media(new Media(mime, resource)) .metadata(Map.of("filename", filename)) .build(); ````
But as soon as I call vectorStore.add(List.of(imgDoc))
(or even embeddingModel.embed(List.of(imgDoc))
),
DefaultContentFormatter.format()
calls document.getText()
unconditionally
and a NullPointerException
is thrown.
My questions
- Is the simple “media → vector → store” workflow considered out of scope for Spring AI 1.0?
-
If so, is the current NPE the intended behavior, or should we:
-
a) guard against
null
and throw a clearIllegalArgumentException("media-only documents not supported")
, or - b) update the docs/Javadoc to state that media-only
Document
s are not yet end-to-end supported?
The Javadoc for Document
explicitly shows a media constructor and an
example, so it feels like this should either work or fail with a clear
message, rather than a low-level NPE.
Reference doc I followed: https://docs.spring.io/spring-ai/reference/api/vectordbs/qdrant.html
Thanks!
Comment From: dev-jonghoonpark
1.
as far as i know, yes. It's probably impossible not just with Qdrant
but with any vector store.
I'm not sure about the plans going forward, as I'm just one of the contributors.
If you’re interested in implementing this feature, it seems that both storing media-only document to the vector store and retrieving it would require new implementations.
2.
guard against null and throw a clear
IllegalArgumentException("media-only documents not supported")
That sounds like a good approach.
It might be worth adding this to something like AbstractObservationVectorStore
.
If you're interested in contributing, feel free to give it a try!
Comment From: ilayaperumalg
@devMtn30 Thanks for the detailed writeup on the issue. Please submit a PR with your suggestion to fix this issue. @dev-jonghoonpark Thanks for the review/comments.
Comment From: ilayaperumalg
Adding some additional information:
@devMtn30 Please feel free to submit a PR to fix the NPE issue by throwing an exception when embedding non-text media documents.
Meanwhile, we are discussing to have a better plan to support non text type embeddings. Will keep posted with the progress. Thanks everyone!