Tag Archive for 'caching'

Zenphoto plugin: Static Cache Control

Stealing a page from WP Super Cache’s book, I’ve written a plugin for Zenphoto that caches (and optionally gzips) Zenphoto pages on the file system to minimize PHP execution, making your dynamic image gallery run almost as fast and light on resources as a static website. Read more about Zenphoto staticCacheControl here.

Nagging question:
Using PHP, is it impossible to redirect a browser to a different URL without changing the browser’s URL visibly? In other words, can transparent URL rewriting Apache-style be accomplished with PHP?

If you liked this post, please subscribe to my feed. Thanks for visiting!

Zenphoto plugin: Static Cache Control

staticCacheControl is a plugin that caches (and optionally gzips) Zenphoto pages on the file system to minimize PHP execution, making your dynamic image gallery run as fast and light on resources as a static website.

This is a beta release. staticCacheControl has been tested on my private Zenphoto installation but not on a “live” site. Please send me any feedback, suggestions, improvements, criticisms.

No matter how well you optimize PHP processing through code optimizations, opcode and database query caching, a dynamic PHP application still can’t beat the performance (speed/time, CPU/memory usage) of a website with the same content displayed using static HTML files. However, good luck updating thousands of static HTML pages by hand. :P With staticCacheControl, you can have the best of both worlds.

staticCacheControl automates the process of generating static HTML copies of your Zenphoto pages, redirecting visitors to these cached pages when possible, and also the validating/refreshing of the static cache when your Zenphoto pages change. Thus, you can enjoy the performance of static pages without the hassle of updating them manually.

This plugin is designed for use in the album.php, albumarchive.php, image.php, index.php of your theme. It can be used in conjunction with (after) httpCacheControl for optimal results.

Example usage:
Place static_cache_control.php in your theme directory, like “/themes/default“. Insert one of the following chunks of code at the top of your theme’s index.php, like “/themes/default/index.php” (but after httpCacheControl() if present)

include_once('static_cache_control.php');
staticCacheControl(__FILE__);

Or, enable gzip compression

staticCacheControl(__FILE__, 0, true);

If you got $mtime from elsewhere, like

$mtime = httpCacheControl(__FILE__);

then pass $mtime

staticCacheControl(__FILE__, $mtime, true);

The first required parameter is the path to the file that should be used to calculate Last Modified date, usually “__FILE__”.

The second optional parameter is the Last Modified date of this page. If another function, like httpCacheControl(), already calculated the Last Modified date, be sure to pass the value using this second parameter to avoid redundant calculations. If omitted or set to 0, staticCacheControl will attempt to calculate the Last Modified date, so httpCacheControl is not required.

The third optional parameter specifies whether staticCacheControl should gzip compress cached pages. If omitted, gzip files will not be created.

Read on for .htaccess code to minimize PHP execution further.

How it works:
staticCacheControl generates static HTML caches of Zenphoto pages, optionally applies gzip compression, and stores them on the file system for future retrieval with minimal PHP processing. Static cached pages are stored in the “/ZENPHOTO/cache/static” directory where “ZENPHOTO” is your installation directory. The directory structure in the static cache mirrors the URLs used to access Zenphoto pages. For example, the page

/ZENPHOTO/album/img.jpg/suffix

is cached to

/ZENPHOTO/cache/static/album/img.jpg/suffix/index.html(.gz)

When a Zenphoto page is requested, staticCacheControl automatically redirects the client to the static cache if the cache is fresh; the gzipped cache file is used if gzip is enabled and the client accepts gzip. If the cache is does not exist or is stale, staticCacheControl creates/updates the static cache file. This plugin uses the same logic as httpCacheControl to determine if a cache is stale. Please see this page for details.

PHP processing can be further minimized by using .htaccess directives such that the web server handles the redirection automatically and transparently. Insert the following code after the “RewriteBase” line and before other “RewriteCond” and “RewriteRule” lines in Zenphoto’s .htaccess.

# BEGIN staticCacheControl redirection
  # Redirect if current second is greater than 10
  # Visits during seconds 1-10 (i.e. 1 in 6 visits on average) will execute PHP to validate cache
  # 0-58: the larger the number, the more likely PHP will be executed to validate cache
  # Comment out if you have some other method to periodically validate/freshen your static cache
  RewriteCond %{TIME_SEC} >10
  # Redirect if user isn't an admin or hasn't left a comment, signified by lack of a zenphoto cookie
  RewriteCond %{HTTP_COOKIE} !^.*zenphoto.*$
  RewriteCond %{HTTP:Accept-Encoding} gzip
  # Check if the cache file exists
  # !! Replace 4 instances of ZENPHOTO below with your installation directory !!
  RewriteCond %{DOCUMENT_ROOT}/ZENPHOTO/cache/static/$1/index.html.gz -f
  # Redirect any URL starting with "RewriteBase" and ending with a "/" or not
  # Don't save the trailing slash
  RewriteRule ^(.*)/?$ /ZENPHOTO/cache/static/$1/index.html.gz [L]
 
  RewriteCond %{TIME_SEC} >10
  RewriteCond %{HTTP_COOKIE} !^.*zenphoto.*$
  RewriteCond %{DOCUMENT_ROOT}/ZENPHOTO/cache/static/$1/index.html -f
  RewriteRule ^(.*)/?$ /ZENPHOTO/cache/static/$1/index.html [L]
# END staticCacheControl redirection

The problem with bypassing PHP execution is that you still must somehow validate or refresh your cache periodically. Thus, the above directives do not bypass PHP execution all the time. The code

RewriteCond %{TIME_SEC} >10

forces ~1 in 6 visitors, on average, to execute the PHP code to validate the cache. (Visitors from the 0th to 10th second of each minute execute PHP code and trigger cache validation; visitors from the 11th to 59th second of each minute are transparently redirected to the static cache.) Admins and visitors who leave comments also execute PHP to validate the cache. If you have a more elegant method of validation, please share!

Ideas:

  • Use a cron job to periodically delete the entire “/ZENPHOTO/cache/static” directory and redirect ALL visitors to static cache 100% of the time?
  • Invalidate more than just the currently viewed page (ex. the entire album) if a cached page is stale?
  • Execute PHP to validate cache for a longer chunk of time during off-peak hours (ex. 1AM-6AM) instead of doing that for a few seconds every minute?

Performance:
From my rudimentary profiling, execution of staticCacheControl takes from less than 10 milliseconds to a few dozen milliseconds on my shared hosting service. Thus, in the case where PHP is executed and the static cache is fresh, this plugin cuts your page’s execution time down from hundreds of milliseconds to ~10-50ms. In the case where the web server transparently redirects you to the static cache, 0ms is spent on PHP execution. In the case where the static cache is stale or doesn’t exist, your page takes an extra ~10-50ms to load. Also remember that time is not the only resource saved.

Note that benchmarks were performed in a PHP5 environment. PHP4-compatible functions are included but not tested (yet).

Download:

zenphoto_static_cache_control.zip

FAQ:
Q: Why aren’t the .htaccess directives working? I am visibly redirected to the cache file URL every time.
A: Are you an admin or have left a comment on the site? If so, you have a *zenphoto* cookie, and you will execute PHP code instead of getting redirected by the web server. Try clearing your cookies, using a different browser or computer.

Known issues:

  • Redirection by PHP (rather than web server URL rewriting) is visible to the user as the URL of the page changes to the URL of the static cached file. Page navigation is unaffected; it’s just inelegant. I don’t believe it is possible to redirect in PHP transparently, is it?
  • Static cache validation/refreshing is inelegant. PHP execution is required to validate a cache file; the more often we bypass PHP execution, the higher the likelihood of stale cache files not getting invalidated. To-Do: actions performed in Admin pages should trigger cache validations, but this would require modifying zp-core files.
  • /etc/mime.types on some servers, like Red Hat and CentOS, have (erroneously?) an entry for gzip, causing *.html.gz files to be sent with a “Content-Type: application/x-gzip” header. Browsers will barf unless gzip compressed files are sent with a “Content-Type: text/html” header. Work around this by creating a .htaccess file in “/ZENPHOTO/cache” with the following directives:
    AddEncoding x-gzip .gz
    AddType text/html .gz

Changelog:
December 02, 2007

  • added support for comments; last modified time calculations include the date of the most recent comment
  • comments can also be submitted from a static cache page

November 30, 2007

  • first public release

Credits:
Inspired by Donncha O Caoimh’s WP Super Cache. Workaround for gzip Content-Type by Dennis.

Zenphoto plugin: HTTP Cache Control

I’ve released a plugin for Zenphoto, httpCacheControl, which makes your Zenphoto pages cacheable in order to increase site performance and minimize resource usage. It uses the doConditionalGet() function I released yesterday.

Again, feedback, suggestions, improvements, criticisms very much welcome!

Zenphoto plugin: HTTP Cache Control

httpCacheControl is a plugin that makes your Zenphoto pages cacheable in order to increase site performance and minimize resource usage.

This is a beta release. httpCacheControl has been tested on my private Zenphoto installation but not on a “live” site. Please send me any feedback, suggestions, improvements, criticisms.

Making pages cacheable by browsers, proxies, HTTP accelerators can significantly increase your site’s performance and eliminate wastage of resources (CPU cycles, memory, bandwidth, time). While caching might be a complicated matter for sites with time-critical or user-sensitive information, it is a great performance-booster with hardly any side-effects for relatively static sites. Zenphoto is just one such application as it primarily serves static images.

Even if you cannot allow the possibility of visitors ever seeing stale content, this plugin can improve your site’s performance. httpCacheControl can make your pages cacheable but require that visitors revalidate their cache on every single visit. This plugin can determine if the requested page changed since the last visit using a fraction of the resources it would take to process, output, and transmit the entire page. Thus, within a few milliseconds, httpCacheControl can tell the visitor, “this page hasn’t changed since your last visit; use your cached copy,” or “this page has changed; I will now send you a fresh copy.”

httpCacheControl is designed to be used in the album.php, albumarchive.php, image.php, index.php of your theme; in other words, the ZP_ALBUM, ZP_IMAGE, ZP_INDEX contexts. The function doConditionalGet() is required and included in http_cache_control.php. (You can read about doConditionalGet() here and download it separately.)

Example Usage:
Insert the following at the top of index.php.

< ?php
include_once('http_cache_control.php');
// get mtime of this file, cacheable by all, fresh for 1 day
httpCacheControl(__FILE__, 'public,max-age=86400');
?>

This will make your index.php cacheable by all and considered fresh for 1 day. After 1 day, the cache will revalidate with Zenphoto and get a fresh copy if needed.

The first required parameter is the path to the file that should be used to calculate Last Modified date, usually “__FILE__”.

The second optional parameter is the string of valid cache directives for the “Cache-Control” header; the value of max-age, if present, will automatically be used to set the “Expires:” header. The parameter defaults to “public,must-revalidate”, which makes the page cacheable by anyone w/ no possibility of stale content. See RFC2616#14.9 for a list of possible cache directives.

How it works:
This plugin is based on the idea that there are a finite number easily accessible objects that can cause Zenphoto pages to change. Changes in the following objects can cause Zenphoto pages to change. If none of these objects changed, chances are that the output of the current PHP page did not change either.

  • [1] this file, http_cache_control.php (changes in cache control behavior)
  • [2] the file indicated in the first parameter of the function call httpCacheControl($file), generally the theme file that called this function (theme changes)
  • [3] the “options” table in the database (configuration changes)
  • [4] the Gallery directory, Album directory, or Parent directory of the current image depending on the current context (new image or album uploaded)
  • [5] the comments of the current page

Note that checking if these objects have changed is much faster than executing the entire PHP page and checking if the output of the page has changed. Thus, even if you instructed caches to revalidate on every request (Cache-Control: no-cache), this plugin will improve your site’s performance because execution of the entire script is avoided if the given page did not change.

This plugin generates Last Modified dates and ETags of pages using the factors listed above, and compares with the If-Modified-Since and If-None-Match request headers sent by the client to determine if Zenphoto should return a “304 Not Modified” header or serve up a full page.

The Last Modified date of a Zenphoto page is determined by taking the most recent of the Last Modified dates of [1], [2], [3], [4], [5].

The ETag of a Zenphoto page is set to the MD5 hash of the string concatenation of

  • a) the URL used to request this page ($_SERVER[’REQUEST_URI’])
  • b) the version number of this Zenphoto installation
  • c) the Last Modified date of [1]
  • d) the Last Modified date of [2]
  • e) the Last Modified date of [3]
  • f) the Last Modified date of [4]
  • g) the Last Modified date of [5]

Performance:
According to my rudimentary profiling, execution of httpCacheControl generally takes less than 10 milliseconds on my shared hosting service. Thus, in the case where your client has a good cached copy, this plugin cuts your page’s execution time down from hundreds of milliseconds to a <10ms. In the case where your client has a stale cache and needs a fresh copy, your page takes an extra 10ms to load. Also remember that time is not the only resource saved.

It should be most effective if you deploy a reverse proxy or HTTP accelerator like Squid or Varnish in front of your web server.

Download:

zenphoto_http_cache_control.zip

Known issues:

  • Although the plugin can determine the creation date of the most recent comment of a page, it cannot tell if the comment is edited subsequently because the edit date doesn’t appear to be stored in the database. Thus, a page might be considered fresh even if one of the comments has been updated since the last visit.

Changelog:
December 01, 2007

  • added support for comments; last modified time calculations include the date of the most recent comment

November 30, 2007

  • return mtime in UNIX timestamp instead of RFC1123 format for more flexibility

November 28, 2007

  • first public release

My first code release: Conditional GET PHP function doConditionalGet()

I’ve started a section on this blog called “My Code” that features code I’ve written. My first release is doConditionalGet(), a generic PHP function that implements HTTP’s Conditional GET mechanism. This function facilitates making dynamic PHP pages cacheable by HTTP accelerators, browsers, or other caches. Read more about it here.

Suggestions, comments, criticisms are very much welcome!

Conditional GET PHP function

doConditionalGet() is a generic PHP function that implements HTTP’s Conditional GET mechanism.

Sending the full contents of a webpage over the Internet whenever a client requests that page is a waste of resources (CPU, memory, bandwidth, time) if the client has retrieved this page before and the content has not changed since. The inventors of HTTP came up with an idea to prevent such wastages. In a single query, the client (browser, proxy, HTTP accelerator, etc.) can say, “I have a copy of this page from a previous visit. If the page has changed since my last visit, give me a fresh copy. Otherwise, tell me that the page is unchanged and give me nothing.”

The HTTP Conditional GET mechanism also allows web servers to instruct clients to ask if the content of the requested page has changed on every visit (Cache-Control: must-revalidate), or to not ask and simply assume that the content is unchanged for a certain amount of time (Cache-Control: max-age=X).

Web servers like Apache automatically handle the Conditional GET mechanism for static objects like HTML, CSS, JavaScript, image files. This is not the case with dynamically generated content like PHP pages. Sometimes, this is a good thing because we do not want browsers or proxies to cache dynamic content that changes on every visit, or content that is sensitive or personal and should not be shared by more than one client. However, PHP-generated content is often no different from normal HTML pages that seldom change, or can be allowed to grow “stale” for a certain amount of time. Furthermore, if revalidation of a PHP page can be performed much more quickly than executing the entire page, even revalidation on every visit might make sense, improving site performance while ensuring that visitors never see stale content.

Usage
doConditionalGet() sends the “304 Not Modified” header and aborts script execution if the client has a good cached copy of the given document, otherwise it returns control to your main script to output content like normal. You must provide the function with a Last Modified date, ETag, and optionally a freshness duration for the given document. doConditionalGet() must be called before any content has been outputted. You can either call this function at the top of your script or use the ob_start() function to buffer and delay the output of your script till the end.

Usage Example

// get mtime of some file
$mtime = filemtime($some_file);
// set ETag to MD5 hash of filename+mtime
$etag = '"'.md5($some_file.$mtime).'"';
// make cacheable by anyone, fresh for 1 minute
$cache_control = 'public,max-age=60';
doConditionalGet($mtime, $etag, $cache_control);

Please see the script file for more in-depth documentation. Also see the Zenphoto plugin httpCacheControl for a “real-life” example.

Download

do_conditional_get.zip

Changelog
November 30, 2007

  • return mtime in UNIX timestamp instead of RFC1123 format for more flexibility

November 28, 2007

  • rearranged logic of function to improve performance
  • can now set any Cache-Control header with the third parameter
  • automatically set Expires header with max-age value if available

November 27, 2007

  • use GMT instead of local time
  • removed ETag generation to make function more generic
  • comply with the updated RFC2616 #13.3.4 when both IMS and INM headers exist (see discussion)
  • code cleanup and documentation

Credit
Based on the work of

References