ETag Everything, Everything ETagg’ed

    The beneficial effects of web caching are most often underestimated but adding even just basic support to an existing application is a guaranteed win for everybody involved.

    Embrace the web!

    Web caching is both simple and complex. As we don’t want to put the cart before the horse, we start simple first. This article assumes that you have a basic understanding of HTTP and applies several simplifications. A full list is presented at the end to give additional pointers for further reading.

    Validation is one aspect of web caching. With it comes the aspect of conditional requests. A typical request/response flow involving web caching and entity tags might look as simple as in the following chart.

    Entity tags are like fingerprints of the underlying resource of an URL and change when its content changes. They must only be unique in the scope of that URL. The tag itself can practically be any kind of string - most often this is a hash over some part of the resource.

    Now it is important which parts of the resource we choose to generate the hash over. As a last resort this could be as close to the response as computing the hash over the actual body to be sent. Or - and this is much better as we can move the decision of not returning the full response up - generate the hash over the data that is used in rendering the body. Sometimes this work has already been done for you: reuse existing checksums, unique IDs or other modification signals.

    In the following three sections we’ll visit three different parts of a li3 application and apply basic web caching for: serving files, serving dynamic content, retrieving arbitrary data from a web service.

    One of the benefits when using MongoDB is that you also get a nice place to store your files. A good example on how and why to do this is Nate’s photobolog tutorial. This code presented here actually builds off the ideas presented in that tutorial.

    While there are many good things that come with storing you files this way, one downside is that in order to respond to file requests now involves PHP and the database in addition to the web server. To compensate this overhead we’ll modify the route handler used for serving the files to use web caching.

    Now this is where we actually serve the file. We’ll be reusing the field as the entity tag. MongoDB automatically adds that field as the checksum over the contents of the file. The entity tag is wrapped in quotes so be sure to strip them off before comparing. Also we must include the entity tag on both 200 and 304 responses.

    As you will see the pattern of generating, adding and comparing the entity tag will not change much throughout the next example below and basically stays the same.

    This example will enable caching for any possibly dynamically generated content by adding a filter to . While this is a pretty bulletproof it is also not the most performant solution as we are generating the entity tag over the full body which in turn needs to be always rendered.

    A better solution would be to generate the tag over the data used in the body (see above) and intercept the cycle right before the template is fully rendered. Feel free to post your solution in the comments.

    In this case we are retrieving the latest commits from the li3 GitHub repository. As GitHub has a pretty tight rate limit in place this seems to be a good idea.