Spring AI TextSplitter loses Document score and parent tracking information during splitting

Expected Behavior

When splitting documents using TextSplitter, all original document properties should be preserved in the resulting chunks, and users should be able to track the relationship between chunks and their parent documents. This is essential for RAG (Retrieval-Augmented Generation) systems that need to:

Maintain document relevance scores across chunks
Reconstruct original documents from chunks
Group search results by source document
Provide proper attribution and traceability

Users should be able to:

//Split a document with score and metadata
Document originalDoc = Document.builder()
    .text("Long document content...")
    .score(0.95)
    .metadata(Map.of("source", "report.pdf", "author": "John Doe"))
    .build();

List<Document> chunks = textSplitter.split(originalDoc);

//Access preserved score
chunks.get(0).getScore(); // Should return 0.95

//Track parent document
String parentId = (String) chunks.get(0).getMetadata().get("parent_document_id");
int chunkIndex = (Integer) chunks.get(0).getMetadata().get("chunk_index");
int totalChunks = (Integer) chunks.get(0).getMetadata().get("total_chunks");

//Reconstruct document order
chunks.stream()
    .filter(chunk -> parentId.equals(chunk.getMetadata().get("parent_document_id")))
    .sorted((a, b) -> Integer.compare(
        (Integer) a.getMetadata().get("chunk_index"),
        (Integer) b.getMetadata().get("chunk_index")
    ));

Current Behavior

The current TextSplitter implementation has significant limitations that impact RAG system functionality:

Property Loss: Document score values are completely lost during splitting, making it impossible to maintain relevance rankings
Missing Traceability: There's no way to determine which original document a chunk came from, breaking document attribution
No Chunk Context: Users cannot determine chunk position or total count, making document reconstruction impossible
Incomplete Implementation: The TODO comment "copy over other properties" indicates known missing functionality

Current behavior results in:

Document originalDoc = Document.builder()
    .text("Content...")
    .score(0.95)  // This score is lost
    .build();

List<Document> chunks = textSplitter.split(originalDoc);
chunks.get(0).getScore(); // Returns null instead of 0.95
chunks.get(0).getMetadata(); // Missing parent tracking information

This forces developers to implement workarounds like: - Manually tracking document relationships in external data structures - Re-implementing scoring logic after splitting - Using complex metadata schemes to maintain document context - Building custom chunk management systems

Impact

This limitation significantly reduces the effectiveness of RAG systems because: - Search Quality: Lost relevance scores mean chunks from high-quality documents aren't prioritized - User Experience: Cannot provide proper source attribution or document context - System Complexity: Forces developers to build custom tracking mechanisms - Data Integrity: Risk of losing important document relationships and metadata

The current implementation essentially treats each chunk as an isolated document, breaking the semantic and contextual relationships that are crucial for effective information retrieval and generation.