🔍 Motivation Current VectorStore implementations (e.g., ChromaVectorStore, PgVectorStore) automatically compute embeddings from Document.content via the configured EmbeddingModel. This rigid behavior is limiting in real-world applications where:

  1. Embeddings are precomputed externally using fine-tuned or specialized models (offline pipelines).
  2. Embeddings may represent a prompt, summary, or condensed form, not the entire content.
  3. Structured data (e.g., JSON) may be stored as content, but embedding the full structure reduces semantic quality.

✅ What This Proposal Adds This feature introduces support for user-provided embeddings at ingestion time, improving flexibility and performance. Highlights include:

  • Overloaded add(List, List) method in the VectorStore interface.
  • AbstractObservationVectorStore refactored to call a centralized doAdd with validation.
  • Embedding generation logic removed from VectorStore doAdd() implementations — instead, embeddings must be passed explicitly.
  • No need to modify the Document model.
  • No extra user config required for backward-compatible usage (existing add(List) continues to auto-embed).

⚙️ Implementation Benefits - Clean separation of embedding generation from storage logic. - Maintains full backward compatibility. - Enables efficient batch ingestion using external embedding workflows.

📎 Related Work #1600 – Discusses the need for prompt-based or user-controlled embedding logic. #1239 – Adds prompt-based embedding, but doesn't support full injection of embeddings per document.

✅ Acceptance Criteria - Overloaded add(documents, embeddings) method available in all VectorStore implementations. - Embedding validation (dimension, NaN/Inf check) is done before ingestion. - If add(documents) is called, embeddings are generated as before. - Supports batching where applicable (no batching enforced by user; store decides). - Works out-of-the-box for existing stores (e.g., Pinecone, PGVector, Milvus).