Tag Archive for 'performance' Page 2 of 3



Zenphoto plugin: HTTP Cache Control

I’ve released a plugin for Zenphoto, httpCacheControl, which makes your Zenphoto pages cacheable in order to increase site performance and minimize resource usage. It uses the doConditionalGet() function I released yesterday.

Again, feedback, suggestions, improvements, criticisms very much welcome!

If you liked this post, please subscribe to my feed. Thanks for visiting!

Zenphoto plugin: HTTP Cache Control

httpCacheControl is a plugin that makes your Zenphoto pages cacheable in order to increase site performance and minimize resource usage.

This is a beta release. httpCacheControl has been tested on my private Zenphoto installation but not on a “live” site. Please send me any feedback, suggestions, improvements, criticisms.

Making pages cacheable by browsers, proxies, HTTP accelerators can significantly increase your site’s performance and eliminate wastage of resources (CPU cycles, memory, bandwidth, time). While caching might be a complicated matter for sites with time-critical or user-sensitive information, it is a great performance-booster with hardly any side-effects for relatively static sites. Zenphoto is just one such application as it primarily serves static images.

Even if you cannot allow the possibility of visitors ever seeing stale content, this plugin can improve your site’s performance. httpCacheControl can make your pages cacheable but require that visitors revalidate their cache on every single visit. This plugin can determine if the requested page changed since the last visit using a fraction of the resources it would take to process, output, and transmit the entire page. Thus, within a few milliseconds, httpCacheControl can tell the visitor, “this page hasn’t changed since your last visit; use your cached copy,” or “this page has changed; I will now send you a fresh copy.”

httpCacheControl is designed to be used in the album.php, albumarchive.php, image.php, index.php of your theme; in other words, the ZP_ALBUM, ZP_IMAGE, ZP_INDEX contexts. The function doConditionalGet() is required and included in http_cache_control.php. (You can read about doConditionalGet() here and download it separately.)

Example Usage:
Insert the following at the top of index.php.

< ?php
include_once('http_cache_control.php');
// get mtime of this file, cacheable by all, fresh for 1 day
httpCacheControl(__FILE__, 'public,max-age=86400');
?>

This will make your index.php cacheable by all and considered fresh for 1 day. After 1 day, the cache will revalidate with Zenphoto and get a fresh copy if needed.

The first required parameter is the path to the file that should be used to calculate Last Modified date, usually “__FILE__”.

The second optional parameter is the string of valid cache directives for the “Cache-Control” header; the value of max-age, if present, will automatically be used to set the “Expires:” header. The parameter defaults to “public,must-revalidate”, which makes the page cacheable by anyone w/ no possibility of stale content. See RFC2616#14.9 for a list of possible cache directives.

How it works:
This plugin is based on the idea that there are a finite number easily accessible objects that can cause Zenphoto pages to change. Changes in the following objects can cause Zenphoto pages to change. If none of these objects changed, chances are that the output of the current PHP page did not change either.

  • [1] this file, http_cache_control.php (changes in cache control behavior)
  • [2] the file indicated in the first parameter of the function call httpCacheControl($file), generally the theme file that called this function (theme changes)
  • [3] the “options” table in the database (configuration changes)
  • [4] the Gallery directory, Album directory, or Parent directory of the current image depending on the current context (new image or album uploaded)
  • [5] the comments of the current page

Note that checking if these objects have changed is much faster than executing the entire PHP page and checking if the output of the page has changed. Thus, even if you instructed caches to revalidate on every request (Cache-Control: no-cache), this plugin will improve your site’s performance because execution of the entire script is avoided if the given page did not change.

This plugin generates Last Modified dates and ETags of pages using the factors listed above, and compares with the If-Modified-Since and If-None-Match request headers sent by the client to determine if Zenphoto should return a “304 Not Modified” header or serve up a full page.

The Last Modified date of a Zenphoto page is determined by taking the most recent of the Last Modified dates of [1], [2], [3], [4], [5].

The ETag of a Zenphoto page is set to the MD5 hash of the string concatenation of

  • a) the URL used to request this page ($_SERVER[’REQUEST_URI’])
  • b) the version number of this Zenphoto installation
  • c) the Last Modified date of [1]
  • d) the Last Modified date of [2]
  • e) the Last Modified date of [3]
  • f) the Last Modified date of [4]
  • g) the Last Modified date of [5]

Performance:
According to my rudimentary profiling, execution of httpCacheControl generally takes less than 10 milliseconds on my shared hosting service. Thus, in the case where your client has a good cached copy, this plugin cuts your page’s execution time down from hundreds of milliseconds to a <10ms. In the case where your client has a stale cache and needs a fresh copy, your page takes an extra 10ms to load. Also remember that time is not the only resource saved.

It should be most effective if you deploy a reverse proxy or HTTP accelerator like Squid or Varnish in front of your web server.

Download:

zenphoto_http_cache_control.zip

Known issues:

  • Although the plugin can determine the creation date of the most recent comment of a page, it cannot tell if the comment is edited subsequently because the edit date doesn’t appear to be stored in the database. Thus, a page might be considered fresh even if one of the comments has been updated since the last visit.

Changelog:
December 01, 2007

  • added support for comments; last modified time calculations include the date of the most recent comment

November 30, 2007

  • return mtime in UNIX timestamp instead of RFC1123 format for more flexibility

November 28, 2007

  • first public release

PHP: speed of string comparison functions

Here is a recent benchmark of various PHP string comparison functions. Not surprisingly, the fastest is strpos. But did you know that preg_match is faster than strstr and not much slower than strpos? This contradicts the following note in the official PHP manual:

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

ereg is by far the slowest.

So, the best function to use to do a simple check to see if a string contains a particular substring is strpos. In all other cases preg_match might be superior to strstr. YMMV.

Conditional GET PHP function

doConditionalGet() is a generic PHP function that implements HTTP’s Conditional GET mechanism.

Sending the full contents of a webpage over the Internet whenever a client requests that page is a waste of resources (CPU, memory, bandwidth, time) if the client has retrieved this page before and the content has not changed since. The inventors of HTTP came up with an idea to prevent such wastages. In a single query, the client (browser, proxy, HTTP accelerator, etc.) can say, “I have a copy of this page from a previous visit. If the page has changed since my last visit, give me a fresh copy. Otherwise, tell me that the page is unchanged and give me nothing.”

The HTTP Conditional GET mechanism also allows web servers to instruct clients to ask if the content of the requested page has changed on every visit (Cache-Control: must-revalidate), or to not ask and simply assume that the content is unchanged for a certain amount of time (Cache-Control: max-age=X).

Web servers like Apache automatically handle the Conditional GET mechanism for static objects like HTML, CSS, JavaScript, image files. This is not the case with dynamically generated content like PHP pages. Sometimes, this is a good thing because we do not want browsers or proxies to cache dynamic content that changes on every visit, or content that is sensitive or personal and should not be shared by more than one client. However, PHP-generated content is often no different from normal HTML pages that seldom change, or can be allowed to grow “stale” for a certain amount of time. Furthermore, if revalidation of a PHP page can be performed much more quickly than executing the entire page, even revalidation on every visit might make sense, improving site performance while ensuring that visitors never see stale content.

Usage
doConditionalGet() sends the “304 Not Modified” header and aborts script execution if the client has a good cached copy of the given document, otherwise it returns control to your main script to output content like normal. You must provide the function with a Last Modified date, ETag, and optionally a freshness duration for the given document. doConditionalGet() must be called before any content has been outputted. You can either call this function at the top of your script or use the ob_start() function to buffer and delay the output of your script till the end.

Usage Example

// get mtime of some file
$mtime = filemtime($some_file);
// set ETag to MD5 hash of filename+mtime
$etag = '"'.md5($some_file.$mtime).'"';
// make cacheable by anyone, fresh for 1 minute
$cache_control = 'public,max-age=60';
doConditionalGet($mtime, $etag, $cache_control);

Please see the script file for more in-depth documentation. Also see the Zenphoto plugin httpCacheControl for a “real-life” example.

Download

do_conditional_get.zip

Changelog
November 30, 2007

  • return mtime in UNIX timestamp instead of RFC1123 format for more flexibility

November 28, 2007

  • rearranged logic of function to improve performance
  • can now set any Cache-Control header with the third parameter
  • automatically set Expires header with max-age value if available

November 27, 2007

  • use GMT instead of local time
  • removed ETag generation to make function more generic
  • comply with the updated RFC2616 #13.3.4 when both IMS and INM headers exist (see discussion)
  • code cleanup and documentation

Credit
Based on the work of

References

Tutorials: Make your PHP pages cacheable

The following tutorials taught me how to make PHP pages cacheable. If you do not manually set your PHP page’s HTTP headers, that page is generally uncacheable by browsers, reverse proxies, HTTP accelerators, etc. That might make sense for highly dynamic pages or personal/sensitive information, but there are plenty of situations where PHP generated content can and should be cached.

There is a typo in Ned Martin’s guide: “Last-Modified” is hyphenated! (Took me forever to figure out why I couldn’t set that header.)

Use the Cacheability Engine and the Live HTTP Headers extension for Firefox to debug as you go. Remember that shift-clicking reload/refresh forces the browser to fetch a fresh copy of the page.

Apologies for the short and sporadic posting. I’m busy hacking Zenphoto for another project. :)

P.S. Another plug for my host NearlyFreeSpeech.NET; Squid HTTP acceleration on shared hosting for mere pennies rocks! The availability of Squid made me look into HTTP caching for PHP pages.

WordPress internal object cache killed this site’s performance

As you may have noticed, this blog became very sluggish recently. Fortunately, I’ve narrowed down the cause of the lag to the WordPress internal object cache, which is supposed to increase site performance, but in my case it slowed down page load times by more than 10x, sometimes much more.

My blog’s symptoms:

  • Disabling all plugins and reverting to the default theme and displaying only 1 blog post on the homepage did not really affect my page loadtime.
  • The load delay was not an issue with PHP/MySQL at all. My page was completely executed and outputted by PHP in about 0.5-1 second (according to various plugins that showed the script execution time), but it took 10x that to cache the entire page to disk or load 100% of the page in a browser.
  • The page load lag has nothing to do with the browser on the client side. I tried Firefox, IE, Safari, and even external tools like Pingdom Full Page Tester.
  • The browser loaded 95% of the page and for some reason stalled for a long time on the footer section; making the footer empty didn’t help, and profiling the footer itself showed that PHP executed it in 0.06 seconds.

On a whim, I removed the following line that I inserted in wp-config.php to get back to a vanilla WP setup.

define(ENABLE_CACHE, true);

This line enables WP’s built-in object cache and is supposed to improve performance slightly. Problem solved!

This object cache supposedly caches database queries to disk to reduce CPU and database overhead. This should increase performance in most cases unless your disk I/O performance is very poor, like Dreamhost’s NFS storage system. This feature seems to be undocumented and had security issues in the past, but is commonly recommended as a safe way of increasing performance. I have heard that the object cache hardly improves performance, but I’d never imagined that it would slow my site down by 10x or more.

Disabling the object cache brought my page load times back to normal, with all my plugins enabled. I suppose the feature is disabled by default for a reason. If your blog is suffering from performance issues and your plugins or themes are not the cause, try disabling the object cache (if you enabled it). Comment out

define(ENABLE_CACHE, true);

in wp-config.php if needed.

However, some of the performance improvements in the upcoming WordPress 2.4 apparently have to do with the object cache. /me confused. :( I will try to find out more about what exactly the object cache does and what it did to kill my site’s performance. Thanks again, Jeff from NFSN, for helping me troubleshoot my site’s performance issues.

Redirect visitors from high-traffic sites like Digg to CoralCDN mirror using .htaccess

Regardless if you have the most sophisticated caching mechanisms in place (like WP Super Cache), the simplest and most effective way of surviving flash traffic from sites like Digg is to redirect the traffic to a mirror, like the free CoralCDN.

I personally think that the Coral Content Distribution Network is one of the most amazing yet underrated web invention in recent years. It’s so effective yet simple to use it’s like magic to end-users. By loading a URL like “http://www.tummblr.com.nyud.net/archivepage/“, that page is fetched and cached on the CoralCDN, which is a peer-to-peer network comprising of servers all over the globe. So Digg can send thousands of users to my CoralCDN mirror and use no more of my host’s resources and bandwidth than if a lone visitor hit my site. Read more about Coral on their homepage.

Without further ado, here is the .htaccess code that will redirect visitors from high traffic sites like Digg to your CoralCDN mirror.

# Heavy Site Redirect to Coral Cache
# Links incoming from heavy sites are redirected to the Coral Cache
# Exception: Coral Cache Proxy Servers
# Exception: Googlebot crawler
#
# CONFIG: Replace “yourdomain.com” with your target domain name.
# CONFIG: Follow the HTTP_REFERER RewriteCond examples to add or remove
# domains to the list of redirected sites.
#
<ifmodule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} !^Googlebot
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?digg\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?slashdot\.org [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?slashdot\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?fark\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?somethingawful\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?kuro5hin\.org [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?engadget\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?boingboing\.net [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?del\.icio\.us
RewriteRule ^(.*)?$ http://www.yourdomain.com.nyud.net/$1 [R,L]
</ifmodule>

Credit: Benjamin Yu
Note: Removed “:8080” from “http://yourdomain.com.nyud.net:8080/” since CoralCDN now runs on port 80, the standard HTTP port. Yay! :)