Summary
Spring Web's UriComponents
and UriComponentsBuilder
have fundamental design issues that make them non-compliant with RFC 3986 and cause information loss. I'm willing to work on a PR to address these issues, but would like maintainer feedback on:
1. Whether you agree these are issues worth fixing
2. Whether breaking changes are acceptable
3. The preferred approach for fixes
Issue 1: Inappropriate Opaque vs Hierarchical URI Distinction
The Problem
Spring separates URIs internally into OpaqueUriComponents
and HierarchicalUriComponents
, claiming to follow RFC 3986 (see javadoc references at OpaqueUriComponents.java:38 and HierarchicalUriComponents.java:52). However, this distinction does not exist in RFC 3986.
What the Specifications Say
RFC 2396 (Obsolete - August 1998): - Explicitly defined "opaque" and "hierarchical" as mutually exclusive URI types - Hierarchical URIs: use "/" to separate components - Opaque URIs: don't use "/" for hierarchical separation
RFC 3986 (Current - January 2005):
- Does not define opaque and hierarchical as distinct URI types (This is also true in WHATWG URL)
- Defines a single generic syntax: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
- Only mentions in passing: "for some URI schemes, the visible hierarchy is limited to the scheme itself: everything after the scheme component delimiter (":") is considered opaque to URI processing" (RFC 3986 Section 1.2.3)
- This is descriptive commentary, not a prescription for two URI classes
Current Implementation Issues
The RfcUriParser
at line 394 implements:
private static final Set<String> hierarchicalSchemes = Set.of("ftp", "file", "http", "https", "ws", "wss");
// Later:
this.isOpaque = (this.uri.charAt(this.index) != '/' && !hierarchicalSchemes.contains(this.scheme));
This conflates concepts from different specifications: 1. RFC 3986 is scheme-agnostic and doesn't define opaque/hierarchical distinction 2. The "special schemes" concept (which enables some host normalization) exists in WHATWG URL, not RFC 3986
Consequences
- Conceptual confusion: The distinction doesn't match any single specification
- Functionality loss:
OpaqueUriComponents
returnsnull
for query, path, host, etc. (see lines 60-92). These previously "opaque URI"s do have path and can have query in RFC 3986 and get completely consistent treatment
Issue 2: Query String Information Loss
The Problem
UriComponents
discards the original query string and only stores parsed query parameters (HierarchicalUriComponents.java:110):
private final MultiValueMap<String, String> queryParams;
The getQuery()
method reconstructs the query string from parameters (lines 209-236), which causes information loss.
What RFC 3986 Says
Query component syntax (RFC 3986 Section 3.4):
query = *( pchar / "/" / "?" )
The specification states:
"The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource."
Critically: RFC 3986 does NOT specify x-www-form-urlencoded format or key-value pairs. The query component is intentionally generic. Key-value pairs are just the most common convention, not part of the generic URI syntax.
Consequences
- Information loss: Cannot roundtrip URIs with non-standard query formats
- Order loss: Currently loses entry order (no LinkedListHashMap)
- Not generic: Assumes x-www-form-urlencoded format, violating RFC 3986's generic nature
Better Approach
Similar to JavaScript's URL object: - Store both the query string AND parsed parameters - Keep them synchronized on updates - Query string is the source of truth - Parameters provide convenient access API - Modification via parameters API normalizes the query string automatically
Proposed Solutions
Option 1: Breaking Change
- Merge OpaqueUriComponents into HierarchicalUriComponents: Use a single
UriComponents
implementation that follows RFC 3986 generic syntax (these classes are not part of public API, yet some external behaviors should break) - Write conversion to java.net.URI: In toURI(), convert RFC 3986 structure to java.net.URI structure. This seems straightforward
- Store both query string and parameters: Add dual storage with synchronization. I almost finished implementing this. Not straightforward but manageable.
- Use LinkedListHashMap for query parameter order preservation
Option 2: Non-Breaking
Add separate well-designed implementation? Or keep the existing behavior and document how it deviates from the spec?
Questions for Maintainers
- Do you agree these are issues worth addressing?
- Which approach would you prefer? Is a breaking change acceptable?