Control Caching Of Dynamic Web Pages

Updated about 1 yr, 4 mths ago (July 27, 2023). Know a better answer? Let me know!

Control caching of dynamic web pages

See the function ned_conditional_get in the Drop In PHP Caching Function page for a more up-to-date example of how to do the below.

Most browsers and many intermediate servers cache web content to speed up its delivery and display. Having your content cached will usually make your site appear faster and more responsive, as well as lower your server’s bandwidth requirements. Based on this, you would think that everyone would have his or her site set up to allow caching – but there’s a downside. Dynamic content is not usually suited to caching – consider a login system, you probably don’t want a browser to cache the site in its logged-in state, as you would like the page to be refreshed every time someone attempts to access it, so you can ensure they’re still logged in. For this reason, PHP will, by default, disable caching – especially if you use sessions. Unfortunately, this means that even if your site is suitable for caching, if you’ve used PHP, the chances are it won’t be – every time a user goes to a page, that page will be reloaded, with all the extra time and overhead that requires, making your site seem slower than it need be. Fortunately, it’s possible to use PHP to enable caching of your pages in an intelligent manner.

Caching is controlled by the HTTP headers sent with every HTTP request. The basic logic is quite simple – if permitted to cache, a cache will store the page for a specified time. This is controlled by the “Cache-Control” and “Expires” headers.

In addition to this, when checking for new versions of the page, most browsers will send an “If-Modified-Since” and/or “If-None-Match” header, if a “Modified-Since” and/or “ETag” header were present in the HTTP headers received from the web server. If any of these headers indicate that the page hasn’t been modified, then the server will return a “304 Not Modified” response, and the browser will continue to use and display the current page, thus saving the server bandwidth and making the site appearing more responsive.

The first step to getting your pages cached is to send the appropriate headers:

ETag

An “ETag” header contains a “strong” identifier – that is, an identifier that is unique not only for a particular page or resource, but for the current state of that particular page or resource. In other words, if the identifier has changed, then the associated page or resource has also changed in some way. I do that by taking an MD5 hash of the filename and its last modified date. This way, if either the filename or last modified date is different, the ETag will also be different.

// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);

// send a unique 'strong' identifier. This is always the same for this
// particular file while the file itself remains the same.
header('ETag: "'.md5($mtime.$file).'"');

Last-Modified

The “Last-Modified” header simply contains the time and date the resource in question was last modified.

// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);

// Create a HTTP conformant date, example 'Mon, 22 Dec 2003 14:16:16 GMT'
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';

// output last modified header using the last modified date of the file.
header('Last-Modified: '.$gmt_mtime);

Cache-Control

The “Cache-Control” header instructs modern caches on how they should behave, although it is worth noting that older caches may not obey this field. “Cache-Control” can take a variety of values, such as “private” and “no-cache” – but the one we are interested in is “public”. A “public” field in a Cache-Control header indicates that the resource may be cached by any cache, which is what we want to do. “Private” indicates that the response should only be cached by non-shared caches (such as your local browser), and “no-cache”, rather obviously, indicates that the page or resource being returned must not be cached anywhere.

// tell all caches that this resource is publically cacheable.
header('Cache-Control: public');

Expires

The Expires header gives the date and time after which a response is considered stale, that is, after which a cached copy of a page should no longer be considered valid. In other words, the Expires header indicates how long caches should store a cached copy of a page. Here we indicate that pages can be cached for one month from the current date, by specifying their expiry as a date one month in the future.

// this resource expires one month from now.
header('Expires: '.gmdate('D, d M Y H:i:s', strtotime('+1 month')).' GMT');

Not Modified

The next step is to check if the page has been modified when a request is made to the server, and if not, return a “304 Not Modified” status and stop any further processing. This is simply done with two PHP if statements:

// check if the last modified date sent by the client is the the same as
// the last modified date of the requested file. If so, return 304 header
// and exit.
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
	if($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
	{
		header('HTTP/1.1 304 Not Modified');
		exit();
	}
}
// check if the Etag sent by the client is the same as the Etag of the
// requested file. If so, return 304 header and exit.
if(isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
	if(str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])) == md5($mtime.$file))
	{
		header("HTTP/1.1 304 Not Modified");

		// abort processing and exit
		exit();
	}
}

There’s one further caveat – headers must be sent before any other output. This generally means that headers must be sent before anything else in your PHP code, in other words, that this code must go at the very top of your PHP code before anything else. One way around this is to buffer your page on the server before outputting it, which is done by using ob_start(), which I use to provide gzip compression – further increasing the responsiveness of plain text files by transmitting them compressed, and decreasing the server bandwidth used even more.

All Together Now

Putting all this together gives us:

// $file contains the file name of the page being displayed (the actual
// content, not any templates you may be using). We take the last modified
// date of this file.
$mtime = filemtime($file);

// Create a HTTP conformant date, example 'Mon, 22 Dec 2003 14:16:16 GMT'
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';

// send a unique 'strong' identifier. This is always the same for this
// particular file while the file itself remains the same.
header('ETag: "'.md5($mtime.$file).'"');

// check if the last modified date sent by the client is the the same as
// the last modified date of the requested file. If so, return 304 header
// and exit.
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
	if($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
	{
		header('HTTP/1.1 304 Not Modified');
		exit();
	}
}
// check if the Etag sent by the client is the same as the Etag of the
// requested file. If so, return 304 header and exit.
if(isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
	if(str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH'])) == md5($mtime.$file))
	{
		header("HTTP/1.1 304 Not Modified");

		// abort processing and exit
		exit();
	}
}

// output last modified header using the last modified date of the file.
header('Last-Modified: '.$gmt_mtime);
// tell all caches that this resource is publically cacheable.
header('Cache-Control: public');
// this resource expires one month from now.
header('Expires: '.gmdate('D, d M Y H:i:s', strtotime('+1 month')).' GMT');
// set the content-type
header('Content-Type: text/html; charset=utf-8');

// start output.
// Note that no output can precede the headers unless you call ob_start().
// You don't have to use gzip, but it greatly saves on bandwidth (for text)
// at the cost of a little more processing.
ob_start("ob_gzhandler");

More Information

For more information on:

 

Updated about 1 yr, 4 mths ago (July 27, 2023). Know a better answer? Let me know!

Related categories [coloured].

User submitted comments:

Göran, about 8 yrs, 11 mths ago
Thursday January 14, 2016 1:24 AM

Thanks for your article for implementing caching in PHP.
I noticed one typo that you might want to correct. In the beginning, when you say:
"if a “Modified-Since” and/or"
I suspect you really mean:
"if a “Last-Modified” and/or"
because Last-Modified is the name of the header that goes in the direction from server to client.

In the code comments of code example "All together now" ... you briefly mention that the date and time stamp for a file represents the date and time of the actual *content* -- not the file generating the content (e.g. a PHP file). I find it very good that you address this fact, but I think it also raises a question: How to compute the date and time in case the content is generated by more than one PHP-file -- or by one single PHP-file that creates dynamic content (e.g. writing a date and/or time, writing data that come from a database etc)? Perhaps the simple answer is to not use the Last-Modified header at all for content generated by PHP-file(s), but to use only the ETag header? What do you think?

Cheers!

Comment on this article (no HTML, max 1200 characters):