RFC 1808 defines how to resolve relative URIs, given a relative URI and a base URI. RFC 2396 updates this, and RFC 2396 is currently being rewritten, which will update it further when it goes final.
Section 12.4.1 of the HTML specification defines how to find the base URI of an HTML document.
I feel oddly compelled to explain this to you. It is insanely complicated.
HEAD element of the HTML document contains a BASE element, the base URI is given in the href attribute, which must be an absolute URI.Section 14.14 of RFC 2616 defines the Content-Location: HTTP header. If an HTML document is served without a BASE element but with a Content-Location: HTTP header, then that is the base URI (test page). Just to make this more interesting, Content-Location: may itself be a relative URI, in which case it is resolved according to RFC 2396, with the URI of the HTML document as its base URI. The resolved URI then serves as the base URI for other relative URIs within the HTML document.
Neither IE 6 SP1 nor Mozilla 1.6 Beta support the Content-Location: header, mainly because Microsoft web servers are so buggy that respecting the Content-Location: header would cause about 10% of IIS-powered sites to break horribly.
Finally, Mozilla does support the Content-Base: header, which existed in HTTP 1.0 but was dropped from HTTP 1.1 due to the lack of interoperable implementations. The IETF requires at least two interoperable implementations before a draft can become a standard. Interoperating only with yourself is just a standards-compliant form of masturbation.
The following HTML attributes may be relative URIs:
<a href="..."><applet codebase="..."><area href="..."><blockquote cite="..."><body background="..."><del cite="..."><form action="..."><frame longdesc="..."><frame src="..."><head profile="..."><iframe longdesc="..."><iframe src="..."><img longdesc="..."><img src="..."><img usemap="..."><input src="..."><input usemap="..."><ins cite="..."><link href="..."><object classid="..."><object codebase="..."><object data="..."><object usemap="..."><q cite="..."><script src="...">Section 12.4 of the HTML specification states that When present, the BASE element must appear in the HEAD section of an HTML document, before any element that refers to an external source
. What if you have, say, a LINK element with a relative URI before the BASE element? In this situation, Mozilla resolves the URI relative to the document URI (there was no Content-Location: HTTP header), but once it sees the BASE element, it resolves all further URIs relative to the URI given in the href of the BASE element. I am not entirely convinced that this behavior is correct, but it seems reasonable, and I have codified this interpretation in my autodiscovery tests.
While recently discussing the XHTML Friends Network with Tantek, I learned that the profile attribute of the HEAD element may actually contain multiple URIs, separated by spaces. Section 7.4.1 of the HTML specification confirms this. Presumably all of the profile URIs should be considered potentially relative, and resolved according to the Content-Location: HTTP header, or failing that, the document URI. They can’t be resolved relative to the href attribute of the BASE element, since by definition, the profile attribute of the HEAD element always precedes the BASE element within the HEAD element. Since I know of no software that does anything at all with the profile attribute, I can’t test how real-world implementations actually deal with this.
Stuff like this drives me nuts. People ask me why my markup category is named those that tremble as if they were mad
. This is why.
Given enough good code, I should always be able to Do The Right Thing with your markup.
§
Well, I don’t want to be pedantic (he lied) but I think your rules #1 and #4 are in conflict with each other. I don’t see why the base element shouldn’t work just because you read the HTML from a file. For example, look at http://www.tbray.org/testing/t.html – download it to the disk and see what your browser does with the second link.
Secondly, no piece on this subject is complete without some sort of nod to RFC2396 or the (much, much better) redraft now in progress at http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
— Tim Bray ![]()
You’re right. HTML in an email has no defined base URI, but HTML in a file does ( file://local/path/to/document.html ).
— Mark ![]()
Relative URIs can be resolved in an HTML document sent in an email. See RFC 2100 and RFC 2557 for more information.
Thanks Tim and Gary, I’ve updated the article with those references.
— Mark ![]()
Hey, I just realized you’re the beagle guy. Small world.
http://diveintomark.org/archives/2002/04/08/where_the_beagles_are
— Mark ![]()
> Section 12.4 of the HTML specification states that “When present, the BASE element must appear in the HEAD section of an HTML document, before any element that refers to an external source”. What if you have, say, a LINK element with a relative URI before the BASE element?
Well, since the HTML specification clearly states a requirement, I’d say that documents that don’t fulfil that requirement are invalid. Since HTML doesn’t define error handling, you can’t expect any particular behaviour from user-agents.
For extra points, compare the definition of what to do with a relative link starting with a question mark (href=”?whatever”) in RFC 1808 and RFC 2396.
“Since HTML doesn’t define error handling, you can’t expect any particular behaviour from user-agents.”
This is not necessarily true. If an unspecified feature is used widely on the www, then page authors and browser developers will tend to converge on the same expected behavior.
There’s a cycle. Many page authors code to the behavior of the dominate browser, not the specification. Browser developers tend to match the behavior expected by page authors in the interest of providing the best user experience.
Gary,
Okay, bad choice of words :). You may /expect/ user-agents to handle certain errors in certain ways, but it’s not coded into the specification, and so it isn’t a good idea to rely on it.
For specific examples of relying on certain error handling and the consequences, take a look at http://www.htmlhelp.com/tools/validator/reasons.html
Yep, it was insanely complicated, a good read, but I am still lost.
— Mike ![]()
My 0.02 to add to the mess: don’t forget to mention XML Base ( http://www.w3.org/TR/xmlbase/ ) and the way it interacts with that ever-recurring problem of whether XHTML gets parsed as XML or as (SGML-based) HTML. Oh yeah, and the pleasures of XLink, too.
With all its hacks, incompatibilities, and broken links, it is still a beautiful web. Be accessible. Strive to be conformant. [Apologies to Max Ehrmann...]
— David ![]()
David, XML Base is not specified to interact with XHTML 1.0 explicitly anywhere as far as I can see. The linking mechanism of XHTML is not based upon XLink and the XHTML 1.0 specification doesn’t mention any use of xml:base for determining relative URI’s.
From the xml:base specification we can read “in applications based on specifications that do not have direct or indirect normative reference to XML Base is undefined”. IMO application processing XHTML (at least version 1.0) shouldn’t make use of xml:base as it has no defined behaviour, in fact I don’t even believe it can be included in a valid XHTML document (the xml:base attribute would be identified as invalid). There of course may be differences with XHTML 2.0 (or even 1.1) but 2.0 is of course still a WIP.
Excuse me, but only lusted above tags can be URIs relative? How about ?
Oops… =( I was asking about tag . But it didn’t show up
Damn! I mean’t tag “Style”
(may be this way it will be shown)
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim