I don’t think the designers of the HTML5 Application Cache (appcache) actually build or deploy web sites. The standard is fundamentally broken - actually a step backwards for deployment reliability - if used in the "official" way. Luckily, we can still use the appcache in a small, limited way to do something very useful.

AppCache Background

If you search what calls itself the specification for the terms "deploy", "consistent", "release", "versioned", "synchronized", you find no practical examples. Instead, you see their toy app example - a clock - just shoves unversioned html, css, and js into a manifest and calls it a day. What could go wrong?

Today we have "eventually consistent static file hosting", "waves" of server deployments, and frequent deployments. That means things are always changing on the server side, and explicitly versioning everything is a must - otherwise clients will get inconsistent results, as they are also constantly, concurrently, requesting state on the server as it’s changing.

But the designers of appcache seem to think that we live in some bizarre 1980s world where an HTTP client can download a resource, look at its content, and for "safety" download it a second time and come to any meaningful decision about the validity of other resources it downloaded a few seconds ago
[Appcache spec section 7.7.4, step 25, "… or if second manifest and manifest are not byte-for-byte identical, then schedule a rerun of the entire algorithm with the same parameters after a short delay, and run the cache failure steps." And yes, that is literally the only place in the document they talk about this "second manifest" algorithm.]
. I guess they’ve never heard of livelocking, either.

It’s too bad that is completely racy. The server has to update the "master document html" and the "manifest file" in some order - either one goes first or the other, and you have no guarantee on how long the delay is. Either way, it causes races for clients, and they could get a html file that uses the wrong version of the manifest. Nothing links the html version to the manifest version. I want to update multiple files on a static web site and have it just work reliably.

UPDATE: I have a script that proves it. I ran it against chrome and this happened. See the JS version doesn’t match what the HTML tries to include?

If you’re familiar with GIT, you know you start with a base commit hash and you can build a fully versioned tree starting with that. It’s always consistent. Why can’t we deploy to browsers the same way? Have the appcache designers never heard of GIT?

More pragmatically, is there some reason we can’t instead just have a dead simple JS API (no manifest file involved) of: set_persistent_resource(path, blob), remove_persistent_resource(path), list_persistent_resources() and (optionally) try building upon that with libraries???

The AppCache is the new Bootloader

Let’s talk about what we can do with appcache. We can use it to persistently store one html file, with embedded JavaScript, and even upgrade it later safely. That sounds a bit like a bootloader.

Pre-web world:

  1. Bootloader (BIOS/Disk)

  2. OS Kernel (from disk or network via PXE)

  3. Filesystem (from disk or network via NFS, etc.)

Web world with appcache:

  1. Bootloader (Browser AppCache)

  2. Web App Code (localStorage or AJAX)

  3. Web App Large Blobs (IndexedDB or AJAX)

As you can see, HTML5 lets us write totally offline apps, or a hybrid where we can support transient internet availability and choose what to do.

Tips on Using AppCache

  • Only put one file in the appcache manifest - the index.html file (with embedded <script>). Appcache is totally broken and can’t deliver consistent updates involving more than one file.

    I’m going to make that change in the next version of my digital signage client - put the bootloader.js code right inside the index.html. There’s no reason not to. That may also work around the safari 7.0.3 bug (UPDATE: Yes, it did!).

  • If you have a small app, or it isn’t updated frequently, you might be able to put the app code totally inside the bootloader - essentially the app is the bootloader. However, be advised that upgrading the app may require two loads of the page; you are forced to live with appcache semantics.

  • All other files are downloaded by AJAX using the NETWORK: * appcache directive, and you may or may not hit the regular browser cache by default (at least from my early testing on different browsers - I haven’t bothered to dive too deeply into the "spec" here). I recommend explicitly cachebusting resources you don’t want cached, and using localStorage or IndexedDB to explicitly cache what you do. Not convenient, but it should at least work - just remember to delete files you don’t need. I worry that browsers are going to be a huge garbage dump of useless, unneeded files left by tired developers.

  • Chrome apparently has just committed Blob support for IndexedDB. That will be nice, and eliminate the need for annoying base64 encoding workarounds.

Update: And Yet Another Gotcha I Forgot to Mention

There’s also a second gotcha with appcache and it involves eventually consistent file hosting, like S3. Even with the "one file only" bootloader, it’s possible for clients to get stuck on an old version (they got new manifest but old html). Your options are:

  • Wait a long time, like an hour, between updating the .html and the manifest, and cross your fingers it makes the ordering "strongly consistent".

  • Set a cron job to touch the manifest every hour/day or so. (Which has a big drawback of wasting clients' bandwidth.)

So as you can see, you really don’t want to have to update the bootloader.