Summary

Spring Web's UriComponents and UriComponentsBuilder have fundamental design issues that make them non-compliant with RFC 3986 and cause information loss. I'm willing to work on a PR to address these issues, but would like maintainer feedback on: 1. Whether you agree these are issues worth fixing 2. Whether breaking changes are acceptable 3. The preferred approach for fixes

Issue 1: Inappropriate Opaque vs Hierarchical URI Distinction

The Problem

Spring separates URIs internally into OpaqueUriComponents and HierarchicalUriComponents, claiming to follow RFC 3986 (see javadoc references at OpaqueUriComponents.java:38 and HierarchicalUriComponents.java:52). However, this distinction does not exist in RFC 3986.

What the Specifications Say

RFC 2396 (Obsolete - August 1998): - Explicitly defined "opaque" and "hierarchical" as mutually exclusive URI types - Hierarchical URIs: use "/" to separate components - Opaque URIs: don't use "/" for hierarchical separation

RFC 3986 (Current - January 2005): - Does not define opaque and hierarchical as distinct URI types (This is also true in WHATWG URL) - Defines a single generic syntax: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] - Only mentions in passing: "for some URI schemes, the visible hierarchy is limited to the scheme itself: everything after the scheme component delimiter (":") is considered opaque to URI processing" (RFC 3986 Section 1.2.3) - This is descriptive commentary, not a prescription for two URI classes

Current Implementation Issues

The RfcUriParser at line 394 implements:

private static final Set<String> hierarchicalSchemes = Set.of("ftp", "file", "http", "https", "ws", "wss");

// Later:
this.isOpaque = (this.uri.charAt(this.index) != '/' && !hierarchicalSchemes.contains(this.scheme));

This conflates concepts from different specifications: 1. RFC 3986 is scheme-agnostic and doesn't define opaque/hierarchical distinction 2. The "special schemes" concept (which enables some host normalization) exists in WHATWG URL, not RFC 3986

Consequences

  1. Conceptual confusion: The distinction doesn't match any single specification
  2. Functionality loss: OpaqueUriComponents returns null for query, path, host, etc. (see lines 60-92). These previously "opaque URI"s do have path and can have query in RFC 3986 and get completely consistent treatment

Issue 2: Query String Information Loss

The Problem

UriComponents discards the original query string and only stores parsed query parameters (HierarchicalUriComponents.java:110):

private final MultiValueMap<String, String> queryParams;

The getQuery() method reconstructs the query string from parameters (lines 209-236), which causes information loss.

What RFC 3986 Says

Query component syntax (RFC 3986 Section 3.4):

query = *( pchar / "/" / "?" )

The specification states:

"The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource."

Critically: RFC 3986 does NOT specify x-www-form-urlencoded format or key-value pairs. The query component is intentionally generic. Key-value pairs are just the most common convention, not part of the generic URI syntax.

Consequences

  1. Information loss: Cannot roundtrip URIs with non-standard query formats
  2. Order loss: Currently loses entry order (no LinkedListHashMap)
  3. Not generic: Assumes x-www-form-urlencoded format, violating RFC 3986's generic nature

Better Approach

Similar to JavaScript's URL object: - Store both the query string AND parsed parameters - Keep them synchronized on updates - Query string is the source of truth - Parameters provide convenient access API - Modification via parameters API normalizes the query string automatically

Proposed Solutions

Option 1: Breaking Change

  1. Merge OpaqueUriComponents into HierarchicalUriComponents: Use a single UriComponents implementation that follows RFC 3986 generic syntax (these classes are not part of public API, yet some external behaviors should break)
  2. Write conversion to java.net.URI: In toURI(), convert RFC 3986 structure to java.net.URI structure. This seems straightforward
  3. Store both query string and parameters: Add dual storage with synchronization. I almost finished implementing this. Not straightforward but manageable.
  4. Use LinkedListHashMap for query parameter order preservation

Option 2: Non-Breaking

Add separate well-designed implementation? Or keep the existing behavior and document how it deviates from the spec?

Questions for Maintainers

  1. Do you agree these are issues worth addressing?
  2. Which approach would you prefer? Is a breaking change acceptable?

References