Summary
Spring Web's UriComponents and UriComponentsBuilder have fundamental design issues that make them non-compliant with RFC 3986 and cause information loss. I'm willing to work on a PR to address these issues, but would like maintainer feedback on:
1. Whether you agree these are issues worth fixing
2. Whether breaking changes are acceptable
3. The preferred approach for fixes
Issue 1: Inappropriate Opaque vs Hierarchical URI Distinction
The Problem
Spring separates URIs internally into OpaqueUriComponents and HierarchicalUriComponents, claiming to follow RFC 3986 (see javadoc references at OpaqueUriComponents.java:38 and HierarchicalUriComponents.java:52). However, this distinction does not exist in RFC 3986.
What the Specifications Say
RFC 2396 (Obsolete - August 1998): - Explicitly defined "opaque" and "hierarchical" as mutually exclusive URI types - Hierarchical URIs: use "/" to separate components - Opaque URIs: don't use "/" for hierarchical separation
RFC 3986 (Current - January 2005):
- Does not define opaque and hierarchical as distinct URI types (This is also true in WHATWG URL)
- Defines a single generic syntax: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
- Only mentions in passing: "for some URI schemes, the visible hierarchy is limited to the scheme itself: everything after the scheme component delimiter (":") is considered opaque to URI processing" (RFC 3986 Section 1.2.3)
- This is descriptive commentary, not a prescription for two URI classes
Current Implementation Issues
The RfcUriParser at line 394 implements:
private static final Set<String> hierarchicalSchemes = Set.of("ftp", "file", "http", "https", "ws", "wss");
// Later:
this.isOpaque = (this.uri.charAt(this.index) != '/' && !hierarchicalSchemes.contains(this.scheme));
This conflates concepts from different specifications: 1. RFC 3986 is scheme-agnostic and doesn't define opaque/hierarchical distinction 2. The "special schemes" concept (which enables some host normalization) exists in WHATWG URL, not RFC 3986
Consequences
- Conceptual confusion: The distinction doesn't match any single specification
- Functionality loss:
OpaqueUriComponentsreturnsnullfor query, path, host, etc. (see lines 60-92). These previously "opaque URI"s do have path and can have query in RFC 3986 and get completely consistent treatment
Issue 2: Query String Information Loss
The Problem
UriComponents discards the original query string and only stores parsed query parameters (HierarchicalUriComponents.java:110):
private final MultiValueMap<String, String> queryParams;
The getQuery() method reconstructs the query string from parameters (lines 209-236), which causes information loss.
What RFC 3986 Says
Query component syntax (RFC 3986 Section 3.4):
query = *( pchar / "/" / "?" )
The specification states:
"The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource."
Critically: RFC 3986 does NOT specify x-www-form-urlencoded format or key-value pairs. The query component is intentionally generic. Key-value pairs are just the most common convention, not part of the generic URI syntax.
Consequences
- Information loss: Cannot roundtrip URIs with non-standard query formats
- Order loss: Currently loses entry order (no LinkedListHashMap)
- Not generic: Assumes x-www-form-urlencoded format, violating RFC 3986's generic nature
Better Approach
Similar to JavaScript's URL object: - Store both the query string AND parsed parameters - Keep them synchronized on updates - Query string is the source of truth - Parameters provide convenient access API - Modification via parameters API normalizes the query string automatically
Proposed Solutions
Option 1: Breaking Change
- Merge OpaqueUriComponents into HierarchicalUriComponents: Use a single
UriComponentsimplementation that follows RFC 3986 generic syntax (these classes are not part of public API, yet some external behaviors should break) - Write conversion to java.net.URI: In toURI(), convert RFC 3986 structure to java.net.URI structure. This seems straightforward
- Store both query string and parameters: Add dual storage with synchronization. I almost finished implementing this. Not straightforward but manageable.
- Use LinkedListHashMap for query parameter order preservation
Option 2: Non-Breaking
Add separate well-designed implementation? Or keep the existing behavior and document how it deviates from the spec?
Questions for Maintainers
- Do you agree these are issues worth addressing?
- Which approach would you prefer? Is a breaking change acceptable?
References
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
- RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax (Obsolete)
- WHATWG URL Living Standard
- RFC 3986 Section 3.4 - Query Component
- RFC 3986 Section 1.2.3 - Hierarchical Identifiers
Comment From: rstoyanchev
You're bringing up multiple issues, and it will be challenging to discuss all in one place. We prefer more focused issues with more examples of concrete problems you face.
For some background, UriComponentsBuilder was originally aligned with java.net.URI which treats opaque URI's similarly. From its Javadoc:
An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing.
java.net.URI was indeed built for RFC 2396, and that means OpaqueUriComponents is arguably also more aligned with RFC 2396. That said the distinction is less important than the actual details.
First, it is inaccurate to claim functionality loss. The reason OpaqueUriComponents returns null for individual components is because it doesn't parse anything after the scheme. All that is instead exposed as schemeSpecificPart so there is no loss. This is similar to what java.net.URI does.
You're right that RFC 3986 has changed the rules for opaque_part and it is now:
opaque_part: `path-rootless [ "?" query ]
From the explanations in D.2 I gather this was done to simplify the rules for parsing, but I'm genuinely not sure (i.e. don't know) if it changes anything about the nature of opaque URI's which still don't have any structure, and 1.2.3 says "the visible hierarchy is limited to the scheme itself everything after the scheme component delimiter (":") is considered opaque to URI processing".
In other words taking an opaque URI like mailto:java-net@www.example.com, we could expose java-net@www.example.com as a path component with a single path segment rather than as schemeSpecificPart, and doing so would match the ABNF rules, but it would still be helpful to understand what specific benefit does bring? Or reversely, what concrete problems do you face? Or maybe you have better example(s) we should focus on.
I would also argue the distinction between opaque and hierarchical as separate classes is helpful, and I would guess that it can be helpful for processing decisions to be able to differentiate opaque vs hierarchical.
I am not going to address the second issue about the Query string loss in detail as it's too much to mix this into the conversation. I think that should be treated as a separate topic that merits its own discussion and just the same requires concrete examples. I will point out the existence of related issue #34788 that covers part of the concerns.
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: finalchild
The main issue is that UriComponents fails to parse valid components from URIs with an opaque path." For example, a URI like mailto:user@example.com?subject=Hi has a clear path and query according to modern standards (RFC 3986), but UriComponents provides no way to access them, which is a loss of functionality.
I also find it inconsistent that getSchemeSpecificPart() is only available for these "opaque" URIs, unlike in java.net.URI where it's always accessible.
Hardcoding a list of "hierarchical schemes" (http, ftp, etc.) is a custom behavior that doesn't align with any standard or implementation. RFC 3986 is scheme-agnostic. While the WHATWG URL spec has "special schemes," the behavior is different. This custom implementation is a divergence that creates an unpredictable API.
This matters because the URL landscape is already a minefield. Developers constantly struggle with the differences between RFC 3986 and the WHATWG URL standard. In a security-critical microservices architecture, ensuring that every library parses URLs consistently is crucial for preventing vulnerabilities like SSRF. It's a huge burden.
When a core framework like Spring introduces its own parser that is non-compliant with any modern standard and lacks clear documentation on its deviations, it only makes this difficult situation worse. Predictability and adherence to a known spec are essential.
(Note that Gemini rewrote my English but all opinions are my own)