Tag Archive for 'cache' Page 2 of 3



My first code release: Conditional GET PHP function doConditionalGet()

I’ve started a section on this blog called “My Code” that features code I’ve written. My first release is doConditionalGet(), a generic PHP function that implements HTTP’s Conditional GET mechanism. This function facilitates making dynamic PHP pages cacheable by HTTP accelerators, browsers, or other caches. Read more about it here.

Suggestions, comments, criticisms are very much welcome!

If you liked this post, please subscribe to my feed. Thanks for visiting!

Conditional GET PHP function

doConditionalGet() is a generic PHP function that implements HTTP’s Conditional GET mechanism.

Sending the full contents of a webpage over the Internet whenever a client requests that page is a waste of resources (CPU, memory, bandwidth, time) if the client has retrieved this page before and the content has not changed since. The inventors of HTTP came up with an idea to prevent such wastages. In a single query, the client (browser, proxy, HTTP accelerator, etc.) can say, “I have a copy of this page from a previous visit. If the page has changed since my last visit, give me a fresh copy. Otherwise, tell me that the page is unchanged and give me nothing.”

The HTTP Conditional GET mechanism also allows web servers to instruct clients to ask if the content of the requested page has changed on every visit (Cache-Control: must-revalidate), or to not ask and simply assume that the content is unchanged for a certain amount of time (Cache-Control: max-age=X).

Web servers like Apache automatically handle the Conditional GET mechanism for static objects like HTML, CSS, JavaScript, image files. This is not the case with dynamically generated content like PHP pages. Sometimes, this is a good thing because we do not want browsers or proxies to cache dynamic content that changes on every visit, or content that is sensitive or personal and should not be shared by more than one client. However, PHP-generated content is often no different from normal HTML pages that seldom change, or can be allowed to grow “stale” for a certain amount of time. Furthermore, if revalidation of a PHP page can be performed much more quickly than executing the entire page, even revalidation on every visit might make sense, improving site performance while ensuring that visitors never see stale content.

Usage
doConditionalGet() sends the “304 Not Modified” header and aborts script execution if the client has a good cached copy of the given document, otherwise it returns control to your main script to output content like normal. You must provide the function with a Last Modified date, ETag, and optionally a freshness duration for the given document. doConditionalGet() must be called before any content has been outputted. You can either call this function at the top of your script or use the ob_start() function to buffer and delay the output of your script till the end.

Usage Example

// get mtime of some file
$mtime = filemtime($some_file);
// set ETag to MD5 hash of filename+mtime
$etag = '"'.md5($some_file.$mtime).'"';
// make cacheable by anyone, fresh for 1 minute
$cache_control = 'public,max-age=60';
doConditionalGet($mtime, $etag, $cache_control);

Please see the script file for more in-depth documentation. Also see the Zenphoto plugin httpCacheControl for a “real-life” example.

Download

do_conditional_get.zip

Changelog
November 30, 2007

  • return mtime in UNIX timestamp instead of RFC1123 format for more flexibility

November 28, 2007

  • rearranged logic of function to improve performance
  • can now set any Cache-Control header with the third parameter
  • automatically set Expires header with max-age value if available

November 27, 2007

  • use GMT instead of local time
  • removed ETag generation to make function more generic
  • comply with the updated RFC2616 #13.3.4 when both IMS and INM headers exist (see discussion)
  • code cleanup and documentation

Credit
Based on the work of

References

Tutorials: Make your PHP pages cacheable

The following tutorials taught me how to make PHP pages cacheable. If you do not manually set your PHP page’s HTTP headers, that page is generally uncacheable by browsers, reverse proxies, HTTP accelerators, etc. That might make sense for highly dynamic pages or personal/sensitive information, but there are plenty of situations where PHP generated content can and should be cached.

There is a typo in Ned Martin’s guide: “Last-Modified” is hyphenated! (Took me forever to figure out why I couldn’t set that header.)

Use the Cacheability Engine and the Live HTTP Headers extension for Firefox to debug as you go. Remember that shift-clicking reload/refresh forces the browser to fetch a fresh copy of the page.

Apologies for the short and sporadic posting. I’m busy hacking Zenphoto for another project. :)

P.S. Another plug for my host NearlyFreeSpeech.NET; Squid HTTP acceleration on shared hosting for mere pennies rocks! The availability of Squid made me look into HTTP caching for PHP pages.

WordPress internal object cache killed this site’s performance

As you may have noticed, this blog became very sluggish recently. Fortunately, I’ve narrowed down the cause of the lag to the WordPress internal object cache, which is supposed to increase site performance, but in my case it slowed down page load times by more than 10x, sometimes much more.

My blog’s symptoms:

  • Disabling all plugins and reverting to the default theme and displaying only 1 blog post on the homepage did not really affect my page loadtime.
  • The load delay was not an issue with PHP/MySQL at all. My page was completely executed and outputted by PHP in about 0.5-1 second (according to various plugins that showed the script execution time), but it took 10x that to cache the entire page to disk or load 100% of the page in a browser.
  • The page load lag has nothing to do with the browser on the client side. I tried Firefox, IE, Safari, and even external tools like Pingdom Full Page Tester.
  • The browser loaded 95% of the page and for some reason stalled for a long time on the footer section; making the footer empty didn’t help, and profiling the footer itself showed that PHP executed it in 0.06 seconds.

On a whim, I removed the following line that I inserted in wp-config.php to get back to a vanilla WP setup.

define(ENABLE_CACHE, true);

This line enables WP’s built-in object cache and is supposed to improve performance slightly. Problem solved!

This object cache supposedly caches database queries to disk to reduce CPU and database overhead. This should increase performance in most cases unless your disk I/O performance is very poor, like Dreamhost’s NFS storage system. This feature seems to be undocumented and had security issues in the past, but is commonly recommended as a safe way of increasing performance. I have heard that the object cache hardly improves performance, but I’d never imagined that it would slow my site down by 10x or more.

Disabling the object cache brought my page load times back to normal, with all my plugins enabled. I suppose the feature is disabled by default for a reason. If your blog is suffering from performance issues and your plugins or themes are not the cause, try disabling the object cache (if you enabled it). Comment out

define(ENABLE_CACHE, true);

in wp-config.php if needed.

However, some of the performance improvements in the upcoming WordPress 2.4 apparently have to do with the object cache. /me confused. :( I will try to find out more about what exactly the object cache does and what it did to kill my site’s performance. Thanks again, Jeff from NFSN, for helping me troubleshoot my site’s performance issues.

Bad Behavior and Squid’s default caching behavior don’t play nice

I wrote earlier about Bad Behavior sporadically blocking me from my own site. It turns out the problem was Squid’s default behavior of caching error messages for 5 minutes (my host deploys Squid in front of its server clusters for load balancing and other purposes). Thus, if a spambot or other undesirable gets blocked by Bad Behavior, and I or anyone happens to visit the site within 5 minutes, Squid will serve up the 403 access forbidden message.

Bad Behavior’s devevloper Michael Hampton and Jeff from my host NearlyFreeSpeech were both extremely patient and helpful in helping me solve this problem. They both independently provided me with this very simple solution: add

header("Vary: *");

after line25 in banned.inc.php in the Bad Behavior plugin.

The “Vary: *” header tells the cache (like Squid) that the content of this particular page changes based on unknown factors. Since the criteria for whether the cache should serve the same version of this page to future requests is unknown, the cache shouldn’t cache the page. Contrast the “Vary: *” header with the “Vary: accept-encoding” header where the cache will serve up the same version of the page to requesters with the same “accept-encoding” value, and get a fresh copy if the value is different.

I am told that, in the ideal world, Bad Behavior should not have to send such a header with its error messages because the official HTTP standards (RFC 2616) state that content should not be cached unless the Cache-Control headers explicitly allow it. However, Squid is “non-compliant” in this particular case as it caches error messages for 5 minutes unless the default negative_ttl setting is changed. Luckily, it’s not a painful choice to choose between being compliant with HTTP standards and being complaint with default Squid settings (which I assume are widely used). Sending an extra “Vary: *” header doesn’t seem to have any downsides.

Thanks again, Michael and Jeff, for helping me debug this problem and humoring my newbish questions.

Redirect visitors from high-traffic sites like Digg to CoralCDN mirror using .htaccess

Regardless if you have the most sophisticated caching mechanisms in place (like WP Super Cache), the simplest and most effective way of surviving flash traffic from sites like Digg is to redirect the traffic to a mirror, like the free CoralCDN.

I personally think that the Coral Content Distribution Network is one of the most amazing yet underrated web invention in recent years. It’s so effective yet simple to use it’s like magic to end-users. By loading a URL like “http://www.tummblr.com.nyud.net/archivepage/“, that page is fetched and cached on the CoralCDN, which is a peer-to-peer network comprising of servers all over the globe. So Digg can send thousands of users to my CoralCDN mirror and use no more of my host’s resources and bandwidth than if a lone visitor hit my site. Read more about Coral on their homepage.

Without further ado, here is the .htaccess code that will redirect visitors from high traffic sites like Digg to your CoralCDN mirror.

# Heavy Site Redirect to Coral Cache
# Links incoming from heavy sites are redirected to the Coral Cache
# Exception: Coral Cache Proxy Servers
# Exception: Googlebot crawler
#
# CONFIG: Replace “yourdomain.com” with your target domain name.
# CONFIG: Follow the HTTP_REFERER RewriteCond examples to add or remove
# domains to the list of redirected sites.
#
<ifmodule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} !^Googlebot
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?digg\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?slashdot\.org [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?slashdot\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?fark\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?somethingawful\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?kuro5hin\.org [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?engadget\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?boingboing\.net [OR]
RewriteCond %{HTTP_REFERER} ^http://([^/]+\.)?del\.icio\.us
RewriteRule ^(.*)?$ http://www.yourdomain.com.nyud.net/$1 [R,L]
</ifmodule>

Credit: Benjamin Yu
Note: Removed “:8080” from “http://yourdomain.com.nyud.net:8080/” since CoralCDN now runs on port 80, the standard HTTP port. Yay! :)

CSS and JavaScript consolidation: fetching one big file is faster than ten small files

So you’ve enabled caching, gzipped your files, but your site still loads slowly? The culprit might be the large number of separate CSS and JavaScript files that the browser must load when first visiting your site.
Continue reading ‘CSS and JavaScript consolidation: fetching one big file is faster than ten small files’