Smart people have said nice things about Speed Up Your Site. I have not read the book (although the author did offer me a free review copy, thanks, wish I had the time), but the markup on the book’s companion site is impressively optimized. The home page all but validates (looks like a typo), uses better-than-average semantic markup, and looks good in Netscape 4. That’s more that I can say for my own sites.
Any sort of optimization is a dangerous numbers game; there’s always the danger of over-optimizing for one thing (like download speed) and accidentally degrading the experience for some number of your users. The markup on the Web Site Optimization site seems to strike a balance; it even uses a DOCTYPE (on the home page, although inner pages do not), and alt attributes for its images.
In the spirit of continuing the discussion, here are a few further suggestions for optimizing the Web Site Optimization home page. These are in no way a reflection on the book itself, which as I mentioned, I have not read. Some of the more advanced ones require server-level access, or at least .htaccess access. Some of these I could apply to my own sites, but then, I’m not the one hawking a book.
Use > instead of >. Only < needs to be escaped.
Use <p id="foot"> instead of <div id="foot"><p></div>
You’re using HTML 4, so you don’t need to close your <p> or <li> tags.
On links to external sites, eliminate the www. from the URL if the remote server supports it (most do).
Get rid of your meta name="keywords". Search engines don’t care.
Get rid of your meta http-equiv="Content-Type" tag. Put this line in your Apache .htaccess file to send the Content-type in the HTTP headers instead:
AddType text/html;charset=iso-8859-1 .html
And finally, the tip that trumps all other tips: use mod_gzip (Apache 1.3.x) or mod_deflate (Apache 2.x). Most modern browsers support gzip compression, and most pages compress 60% or more. The 7K home page would compress to 3K. There are tutorials on webcompression.org, and a free analyzer on leknor.com.
§
Could I get a reason why the ‘www.’ should be removed fro the URL? Is there a reason beyond saving four keystrokes?
— RobbyB ![]()
I mean remove it from the HREF of the A tag, to save 4 bytes.
— Mark ![]()
> On links to external sites, eliminate the www. from the URL if the remote server supports it (most do).
The problem is not the server support, but whether there’s a DNS entry for the domain name without the “www.” that resolves to an A (or soon AAAA ;-) record
You can’t just remove www. from URLs. You have the use the URLs the way they are advertised. F.e. if you go to w3.org, the server will redirect you to http://www.w3.org. So they want you to use that URL. Why? Because a lot of web tools rely on consistent URLs.
Good optimisation suggestions, especially the trump card gzip. If IIS 5.0 users want to get in on the action they can enable it on the “Services” tab of the “WWW Service Master Properties” dialog.
— BenM ![]()
A really-well-configured site accepts [nothing], w., ww., http://www., and wwww. as hostnames for the homepage.
Mine are partway there.
Does anyone know if the AddType command mentioned in Mark’s article works on the Zeus web server? I’d love to use it to cut out all those meta elements.
3. You already mentioned that it works well with Netscape 4, but as I recall, there are a few issues with omitting closing p and li tags with that browser, however legal it may be.
6. I thought that CERT put out a warning advising people to explicitly specify the charset inside the document? What happens when somebody saves the document to disk and does not have the ability to read http headers?
7. The use of mod_gzip is not universally good. Apart from the cpu time involved, you need to buffer dynamic pages completely, and there are issues with older browsers and external objects like css and javascript files. It also reduces cache effectiveness, as separate copies have to be cached for each accept-encoding header (could vary from nothing, to gzip alone, to gzip and compress, deflate, and so on…)
>> Does anyone know if the AddType command mentioned in Mark’s article works on the Zeus web server?
Apparently, it doesn’t, darn it. Thanks to Richard Page-Wood at http://echoweb.org for testing it.
Re: Netscape 4 and end tags. This is what scientists call a verifiable hypothesis. I saved a copy of his home page locally, stripped all ending P and LI tags, did not see any difference in any browser, including Netscape 4.
Re: Content-type. Yes, there’s a small possibility of a user who saves the page locally and whose browser is not set to auto-detect character encoding, but so what? As far as I can tell (having read the site and the sample chapter the Zeldman linked to), the entire site/book is about optimizing for interactive use, and the only metric is download time. That’s certainly the only metric his interactive site analyzer measures.
http://www.websiteoptimization.com/services/analyze/
Re: mod_gzip. Again, so what? This site puts its CSS inline (BTW, an interesting technique that trades perceived initial speed for overall speed, since in the long run the user will end up downloading and re-downloading the same CSS rules for every page). This home page is static HTML, and it’s 7K, and it could be 3K, but it’s not, because the server doesn’t use mod_gzip. That’s one hell of an optimization, wouldn’t you say?
(Also note that the analyzer tool doesn’t take gzip compression into account when calculating download time. I would classify this as a bug.)
— Mark ![]()
you can’t ignore the www in the URL’s since
is not the same as
*sigh* As I said, you should strip the “www.” prefix if and only if the remote server supports it. Obviously if http://www.foo.com is the only address that works, you’re stuck with it, but if foo.com and http://www.foo.com end up at the same place, you’re wasting 4 bytes by linking to http://www.foo.com. I know that sounds trivial, but that’s the point of the book (as far as I can tell — the sample chapter on Javascript advocated using shorter variable names).
As for servers that visibly redirect to their www. cousin, I say still point to the shorter version. You save the bytes, and users will blame the remote server for the speed difference (as they should, frankly, because visibly redirecting from foo.com to http://www.foo.com is braindead).
— Mark ![]()
As far as dropping the leading ‘www.’ goes, what about things like pingback? You link to the article (which, if things are set up right, will redirect to the ‘www.’ version), send a ping, the pingback server looks at your document for the permalink (to avoid spamming), and discards the pingback because it isn’t found.
Why discard meta name=”keywords” tags? Do no search engines use that tag when indexing a site?
My suggestions, based on a quick look at the front page:
1. There are plenty of linebreaks and other whitespace that can be eliminated. No functional difference, yet will save more than the four bytes saved by dropping the ‘www.’s. Preprocessing can do this.
2. The front page looks fairly static, but no last-modified or etag headers are sent, and no expires headers are sent either. Efficient caching would do a hell of a lot more than saving a few bytes here and there.
3. No content-length header. This removes any ability to use http 1.1 pipelining.
4. I would say that, for a single user, yes, the percieved load time would be reduced by embedded css. However, no websites have a single user. If the styling were separated out into an external stylesheet, and efficient caching applied to it, the overall load on the server would reduce, giving better response times for everyone (and faster subsequent page accesses for everybody). Even the initial page load might be faster – a copy of a stylesheet cached nearby may very well be faster than an embedded stylesheet that has to come all the way from the origin server.
Re: #15. Agreed on all counts. As I touched on (tangentially), optimization is a dangerous numbers game. Optimize a single page and you miss the site-wide optimizations. Optimize for a single client and you miss the audience-wide optimizations. Over-optimize your markup and you miss out on important accessibility features. (Adrian’s GetContentSize tool brought up some of these issues a while ago.)
— Mark ![]()
Just to nitpick, comment 15.3 isn’t true. The server uses chunking if you try connection: keep-alive
I.e. it sends a count, then a chunk of that length, then another count and chunk until the file is done. Then it sends a chunk count of 0. You can still pipeline the next response after that.
Optimizing is cool. I don’t mind it in the least. But doesn’t some of this sometimes get to the point of being a tad anal?
How much of a speed difference is 4k?
— Marty ![]()
Mark,
Thanks for the critical review of my new site. It is great to see other speed buffs scrutinize my site devoted to speed. Many of the suggestions you offer are on the TTD list, I figured at 7K or so with 3 graphics it loads pretty fast as it is, so I’ve been devoting my time to creating free tools, etc.
I went ahead and did all but suggestions #5 and #7 and saved a few bytes off the home page and interior pages. #5 – I feel that meta keywords and description tags are still important, although less so with the likes of Google.
#6 – Yes, I noticed this with my new host, am planning on adding this server-based char encoding header in the future. After this tweak the pages should validate with few mods.
#7 – Mod_gzip content encoding. Well, I’d have to disagree with you on this one. I think at around 10K for HTML you should consider mod_gzip (we devote an entire chapter in the book to compression and mod_gzip, mod_hs, ISAPI filters etc., Chapter 18). At 7K I think it is a wash for sending it as straight text as opposed to sending it compressed and having to decompress it. Mod_gzip and their ilk are wonderful for larger pages, but at some point I think you need to set a minimum page size (which you can do with mod_gzip).
However, since I tout mod_gzip and other modules so much in the book I think I should have it installed, and put a floor on page size to compress.
Mod_gzip is a great, highly configurable open source tool for compressing your text content automatically. However, it 30% slower than some other compression modules that are available, like mod_deflate-ru and mod_hs. I definitely recommend these tools for saving bandwidth and speeding up web sites.
Netscape 4: I sniff for NS4 and conditionally load NS4-friendly CSS or CSS for everthing else. The site is designed to look largely the same for NS4 and modern v5+ browsers that support float. Getting the float to work well with the right padding and margins was a challenge :)
Line breaks: Even though I advocate eliminating most line breaks in the book, I found that to get the site to work reliably with NS4.X I had to use a markup style with line breaks. Otherwise, my CSS went wacko on me.
Embedded CSS: Yes, for maximum speed I like to embed my CSS at least on the front page to help eliminate HTTP requests (easy in HTML, harder in XHTML). I plan on conditionally loading an external style sheet for the interior pages, which would be cached and save some bandwidth.
As you’ll discover I discuss these techniques (no www, mod_gzip, etc.) and a number of others in the book. There are some other optimizations I have planned for the front page, and the caching aspect I plan to address with my new host. Feel free to let ‘er rip if you find any other possible optimizations.
- Andy
From comment 21: “for maximum speed I like to embed my CSS at least on the front page to help eliminate HTTP requests” — Considering HTTP/1.1 supports the Connection Keep Alive, where’s the “bloat” in an external css file. You are gaining on cacheability of the stylesheet throughout the site.
Is this more of a “one-pager” rather than a web-site optimisation?
— Isofarro ![]()
“visibly redirecting from foo.com to http://www.foo.com is braindead”
If http://www.foo.com is the correct URL for the site then redirecting’s the right thing to do – if you don’t do that then you’re effectively making multiple copies of the site available, with multiple versions of each page’s URL.
I’m sure most search engines are smart enough to cope, but it’s still potentially confusing for both robots and humans if each page doesn’t have a unique URL.
correction: “..doesn’t have a single URL”
I am a freshman student in web developement, I was under the impression closing your tags early on could ease the transtition into xml. How much optimisation am I actually doing by following #3?
Andy, thanks for your comments and clarifications. Sounds like a great book. It would still be great if you could find the time to add gzip support to your web site analyzer tool and include it in the calculation of download time. Without it, the tool is somewhat misleading, especially since, as you say, you talk so much about the benefits of compression in your book.
— Mark ![]()
Mark and Andy:
I use Content Compression for all pages I serve, regardless of size, because I have found that the benefits of fewer packets on the wire outweighs the cost in CPU time — and most Web servers should NOT be doing a great deal of intense processing anyway.
I am also in the position where my sites (http://www.webcompression.org, etc.) are served from a very low speed connection, so I take the performance improvements anywhere I can.
I also would like to point out to the folks who mention pipelining in HTTP connections that the only two browsers that implement pipeling are Opera (on all the time) and Mozilla (an option you have to set). MSIE does not use persistent connections, apparently due to limitations in the networking libraries they use.
smp
Mark, thanks. Yes, I’ve received a lot of feature requests (including compression support) and bug reports for the new page analyzer tool I released on Monday. I’m making a list and plan on rolling many of these into the next update. Can’t guarantee they’ll all make it into this one however.
I went ahead and did the CSS optimization I planned and saved 383 bytes (which includes some judicious trimming) off the home page. Thanks for the inspiration. Let me know if you think of anything else.
The CSS caching external versus internal debate (page optimization versus site optimization). I hear you all, and have planned on making the CSS files conditionally load externally, which will save some bandwidth and speed things up a bit. This will be pretty easy with the XSSI scheme I use. For the front page I will consider it, but I like to minimize HTTP requests on the front page if at all possible.
Speaking of conditional XSSI I’ll need to take that into account when I look at compression, statically compressing my files won’t work in my case.
- Andy
RE: #23
> … confusing for both robots and humans if each page doesn’t have a singel URL.
So, in fact, you’re saying a resource available through http://x.com/hi shouldn’t be available through http://x.com/hi?dummy ?
RE: #25
If you are planning a transition to XML, or if you’re generating your HTML from XML with a transformation, I’d say you’d better use XHTML.
The proposed optimization only applies to HTML 4 (or any other not-XML application), because it indeed breaks XML well-formedness.
If you’re already using XHTML, stick with it. If you’re using HTML 4 and not planning to migrate your system to an XML application, the suggested optimization makes sence.
— Martijn ![]()
Re: server-side includes. I use them myself, with mod_gzip, and I don’t have a problem. The compression happens later, after the page has been pieced together by the server but before it gets returned to the client. So they play well together, no setup required.
Also on the topic of includes, someone mentioned that your pages don’t include Last-Modified or ETag headers. You should look into “XBitHack full” which allows the server to send a Last-Modified date on XSSI-parsed pages (you control it by setting an otherwise unused permissions bit on the parsed files).
http://httpd.apache.org/docs/mod/mod_include.html#xbithack
The result is that your server will send Last-Modified dates and respond properly to conditional GETs, which speeds up repeat access to pages that haven’t changed (the browser doesn’t have to redownload the page).
Andy, I know you know all this, I’m just explaining it for those who don’t. Also, I’m putting together a collection of Apache tips/tricks based on my experiences on this site, and this is the one I wanted to write today. Guess I have to go write it now…
— Mark ![]()
>The use of mod_gzip is not universally good. Apart from the cpu time involved, you need to buffer dynamic pages completely
Only the very first mod_gzip. Now they can stream the compressed content.
> and there are issues with older browsers and external objects like css and javascript files.
In a very few versions of Netscape. You can turn compression Off for these files or fine-tune the module you use.
>It also reduces cache effectiveness, as separate copies have to be cached for each accept-encoding header (could vary from nothing, to gzip alone, to gzip and compress, deflate, and so on…)
So what? 2 compressed copis in a cache are still smaller than one uncompressed. It saves room on a hard drive and provides very fast response – faster than for uncompressed pages. By the way, modern compression products like Pipeboost are caching rather than compressing solutions. Means, the trick is not how they compress, but how they cache compressed pages.
About the smallest files to be compressed. Right, it doesn’t make sense to compress HTML files that fit into one TCP pocket. Do you have a lot of them? :)
Mark/all,
#3 – omitting closing tags in HTML. Yes, I talk about this extensively in the book. However, I plan on transitioning to XHTML, and when I switch to strict including these closing tags makes the browser work a little bit less. You can remove closing tags for HTML pages, but make sure you test everything you do in your target browsers (NS4 in particular can be quirky). Perhaps I’ll go to HTML on the front page, and XHTML on the interior pages, still deciding.
XBitHack full: Yes, thanks I talk about this on p. 377 of the book, along with mod_expires. The caching aspect is something I need to address.
XSSI and compression: Yes, mod_gzip can dynamically compress files. I was referring to statically gzipping things and delivering them, which I can’t do (this is faster than dynamic compression). So I would need a module that dynamically compresses files. There are a few that do this for Apache, I’ll let you know which one I choose.
- Andy
All,
Konstantin is the author of Chapter 18: “Compressing the Web” in my book, “Speed Up Your Site.” Great to hear from you Konstantin.
- Andy
“So, in fact, you’re saying a resource available through http://x.com/hi shouldn’t be available through http://x.com/hi?dummy ?”
The site itself shouldn’t unnecessarily produce inconsistent internal links (e.g. a lot of sites daftly add filenames to links back to the home page rather than using the default document), but few users are likely to add stuff on the end of the URL, whereas many will omit www and it’s easy to remedy that to force consistency.
All:
Also, start looking for good things out of mod_deflate for Apache 2.0.x — I have been discussing optimization issues with the authors of this module and they will be adding a directive that will allow you to control the compression level of the output. Currently, the module is set for the equivalent of “gzip -1″; I have hacked mod_deflate.c to allow for this to be either “gzip -X”, where X == [1-9].
And, despite what is said in the book — ;-) — the mod_deflate filters have come a long way and do allow a great deal of control over what is and is not compressed. Still a ways to go before it has the granular control that mod_gzip has, but it’s improving.
smp
Regarding compressing small files. The cut off should be around 10K which downloads pretty fast on a 28K modem. Anything smaller than that especially if they are generated dynamically will slow the server down unnecessarily.
Caching compressed content is still a crap shoot. Pipeboost has managed it, but I haven’t seen a public domain caching servers that know how to support it. It has to happen but the issues are many; i.e. if modified since and then uncompressing the data in response to an older browser.
There’s an opportunity for someone – hack mod_gzip to incorporate it into Squid – allowing for dynamic compression and caching of compressed data.
Peter
Using <p id=”foot”> instead of <div id=”foot”><p></div> seems like a good idea anyway: <div> adds no semantic or structural value in this type of case, it’s just cruft.
Re: mod_gzip, you could always have static compressed .gz files on the server and offer them using content negotiation. This was actually suggested as the “right way” to do it when there was a great debate around including mod_gzip in the Apache core. AFAIK content negotiation with Apache is quite fast.
— Matt ![]()
#37 – This has been changed to … on the front page.
#38 – Yes, but I dynamically include (using XSSI) different CSS based on what browser requests the page, so I think the best solution would be to use a module like mod_gzip or a reverse proxy like from VIGOS to dynamically compress my dynamic content.
They look static, but the pages are customized on the fly based on the browser. This was the only way I could get NS4 to behave and not resort to multiple link/@import stylesheets (the conventional way).
- Andy
In #37 I was talking about *all* these cases. Another example from the current source code would be <div id=”l”><table class=”d” …> … </table></div>. Why not <table class=”d” id=”l” …> … </table>?
Then there is <div id=”n”> … </div>. Is there no other HTML element that would give a more meaningful description?
(Btw, there’s a stray </dd> in the source.)
#36 “…10K which downloads pretty fast on a 28K modem.”
Lets do the math.
8 bits = 1 Byte
28.8 kbps / 8 = 3.6 KBps
10KB / 3.6KBps = 2.8 seconds
with 60% compression
4KB / 3.6KBps = 1.2 seconds
A difference of 1.6 seconds. It adds up over the course of a day. Compound that by the fact that 80% of users are still connecting with dial-up.
Compression may not get you a parade, but you might earn the respect of your peers.
All,
What a wonderful discussion we have going here :)
#40 – Stray dd eliminated, tx Arien.
div and table vs. table? I suppose I could get rid of this “l” div, this is an artifact of the old design, the table allowed me to align things reliably. I’ll put this on my TTD list for next rev.
Minimum file size to compress? – I did some research on this for you all, in light of all the interest in this issue. It appears that there is some disagreement on what is the minimum file size to compress, and my initial floor may have been too high.
Kevin Kiley, the author of mod_gzip, says it defaults to 300 bytes for the minimum size to compress. mod_gzip also has some check logic that prevents sending compressed files larger than the original, mod_deflate for Apache 2.0 does not.
Konstantin Balashov says use a floor of 1 TCP/IP packet (adjusts with connection speed, 576 bytes for slower connections, 1500 MTU for faster, which drops down to 1160 for first packet in some cases w/ overhead) for minimum file size to compress. He also says that a 5-7 KB floor is reasonable.
Igor Sysoev, author of mod_deflate ru, says that the minimum file size should be around 4K.
Peter Cranstone says 10K is a reasonable floor.
How low you go also depends on how heavily loaded your server is. Is the compression software or hardware based? On the same server? Total number of small files? (these are common on the Net), etc. You can compress tiny files, but the additional load, especially for dynamic files, may slow server response time.
So it appears that I may have to rethink my statement in #21-7 re not compressing my 7K home page. For slower modems (not 80% but 67.5% of home users connect at 56Kbps or less), especially cell (9.6Kbps), 14.4Kbps, and 28.8Kbps download time still dominates the delay these users experience. So as long as it doesn’t unduly slow down the server, and speeds up the user’s experience, compression makes sense.
Installing mod_gzip etc. will also save me some bandwidth site-wide, which is a good thing.
- Andy
Does the optimizer take into account background images? On a site I’m developing, it claims the number of images is zero, which is true of the HTML, but not of the CSS.
— Richard ![]()
Does the analyzer take into account background images? On a site I’m developing, it claims the number of images is zero, which is true of the HTML, but not of the CSS.
— Richard ![]()
#43 – Background images? No, not in CSS or in table cell backgrounds, yet. We’ve received this and many other requests and bugs, and plan to roll as many of these into the next version that we can.
Yeesh! Peter Cranstone commenting on mod_gzip! Everyone has ended up here!
Ok, I compress EVERYTHING using mod_gzip: PHP, HTML, plaintext, Word Docs, anything that can be compressed properly.
My caveat is that I serve my pages across a very low speed connection. However, for those of you who have visited my sites over the last few days, my download speed is not greatly affected. I have also focused on text-heavy, graphics light content, all of which is highly compressible.
I do not have a lower-bound on mod_gzip, and I agree that compression on all files is not a bad thing.
Fewer Packets are ALWAYS a good thing.
smp
Gziping an entire page is nice, but when everything on your site is dynamic, it means gziping practically every request. Is there some way to gzip just sections of a page, like the toolbar, blogrolls, metatags, and inline css? I can see gzipping these as external files that are included via javascript or css include, but what if you want to keep the page as one file?
Dylan:
At this point, and somebody correct me if I am wrong, it is an all or nothing proposition for compressed content. The alternative would be to send a transfer-encoding header, which specifies which order the encoding were applied — especially good for multi-part documents.
However, Transfer-Encoding causes most browsers to explode on contact. So, we are stuck…for right now.
smp
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–present Mark Pilgrim