🔍 Motivation Current VectorStore implementations (e.g., ChromaVectorStore, PgVectorStore) automatically compute embeddings from Document.content via the configured EmbeddingModel. This rigid behavior is limiting in real-world applications where:

  1. Embeddings are precomputed externally using fine-tuned or specialized models (offline pipelines).
  2. Embeddings may represent a prompt, summary, or condensed form, not the entire content.
  3. Structured data (e.g., JSON) may be stored as content, but embedding the full structure reduces semantic quality.

✅ What This Proposal Adds This feature introduces support for user-provided embeddings at ingestion time, improving flexibility and performance. Highlights include:

  • Overloaded add(List, List) method in the VectorStore interface.
  • AbstractObservationVectorStore refactored to call a centralized doAdd with validation.
  • Embedding generation logic removed from VectorStore doAdd() implementations — instead, embeddings must be passed explicitly.
  • No need to modify the Document model.
  • No extra user config required for backward-compatible usage (existing add(List) continues to auto-embed).

⚙️ Implementation Benefits - Clean separation of embedding generation from storage logic. - Maintains full backward compatibility. - Enables efficient batch ingestion using external embedding workflows.

📎 Related Work #1600 – Discusses the need for prompt-based or user-controlled embedding logic. #1239 – Adds prompt-based embedding, but doesn't support full injection of embeddings per document.

✅ Acceptance Criteria - Overloaded add(documents, embeddings) method available in all VectorStore implementations. - Embedding validation (dimension, NaN/Inf check) is done before ingestion. - If add(documents) is called, embeddings are generated as before. - Supports batching where applicable (no batching enforced by user; store decides). - Works out-of-the-box for existing stores (e.g., Pinecone, PGVector, Milvus).

Comment From: dev-jonghoonpark

Instead of modifying the vector store, it seems more appropriate to implement a custom class that extends the AbstractEmbeddingModel.

What do you think?

Comment From: aniketg-21

I think above approach works well with no additional setup and supports both user-provided embeddings and auto generated. The main issue with existing is that doAdd method generates embeddings based on document content so lets say if user has a JSON/XML document with there summary it would more make sense to create embedding from summary rather than on the document content itself. So as you said we can implement a custom class that extends the AbstractEmbeddingModel even after this the embeddings are still generated based on document content and not its summary.

Comment From: aniketg-21

Currently embedding model class the 2 methods either i can embed a String or Document. Now let say i created a Document object to be inserted in store in its content i have structured data stored for further use after retrieval. After this I call add method it then generates the embeddings from document content as the doAdd method only takes Document objects. Now if i need to add summary as embeddings with Document stored as is how can modifying embedding model works?