nginx: FastCGI Caching Basics

I’m back!

Today, I am going to share some things regarding how to do caching in nginx, with a bit of a write up and history first.

The LAMP Stack

An older acronym these days, LAMP stands for:

  • (L)inux,
  • (A)pache,
  • (M)ySQL,
    and:
  • (P)HP.

I really am not too sure when the term was coined, but it definitely was a long time, probably about 15 years ago. The age definitely shows: there are definitely plenty of alternatives to this stack now, giving way to several other acronyms which I am not going to attempt to catalog. The only technology that has remained constant in this evolution has been the operating system: Linux.

MySQL (and Postgres, for that matter) have seen less use in favour of alternatives after people found out that not everything is best suited to go into a relational database. PHP has plenty of other alternatives, be it Node, Ruby, Python, or others, all of which have their own middleware to facilitate working with the web.

Apache can serve the aforementioned purpose, but really is not necessarily well-fated for the task. That’s not to say it can’t be. Apache is extremely well featured, a product of it being one of the oldest actively developed HTTP servers currently available, and can definitely act as a gateway for several of the software systems mentioned above. It is still probably the most popular web server on the internet, serving a little over 50% of the web’s content.

Apache’s Dated Performance

However, as far as performance goes, Apache has not been a contender for a while now. More minimal alternatives, such as the subject of this article, nginx, offer fewer features, but much better performance. Some numbers put nginx at around twice the speed – or faster – of some Apache MPMs, even on current versions. Out of the box, I recently clocked the memory footprint of a nginx and PHP-FPM stack at roughly half of the memory footprint of an Apache and mod_php5 server, a configuration that is still in popular use, mainly due to the issues the PHP project has historically had with threading.

Gateway vs. Middleware

PHP running as a CGI has always had some advantages: from a hosting background, it allows hosters to ensure that scripts and software get executed with a segregated set of privileges, usually the owner of the site. The big benefit to this was that any security problems with that in particular site didn’t leak over to other sites.

Due to the lack of threading, this is where PHP has gotten most of the love. Aside from FastCGI, there are a couple of other popular, high-performance options to use for running PHP middleware:

  • PHP-FPM, which is mainline in PHP
  • HipHopVM, Facebook’s next generation PHP JIT VM, that supports both PHP and Facebook’s own Hack derivative.

These of course, connect to a webserver, and when all the webserver is doing now is serving static content and directing connections, the best course of action is to pick a lightweight server, such as nginx.

Dynamic Language for Static Content?

Regardless of the option chosen, one annoying fact may always remain – the fact that there is a very good chance that the content being served by PHP is ultimately static during a very large majority of its lifetime. A great example of this is a CMS system, such as WordPress, running a site that may see little to no regular updates. In this event, the constant re-generation of content will place unnecessary load on a system, wasting system resources and increasing page load times.

The CMS in use may have caching options, which may be useful in varying capacities. Depending on how they run their cache, however, this could still mean unnecessary CPU used to run PHP logic, or database traffic if the cache is stored in the database.

Enter nginx’s Caching

nginx has some very powerful options for serving as a proxy server, and is perfectly capable of running as a layer 7 load balancer, complete with caching. The latter is what I am covering in this article.

nginx has 2 specific caching modules: the cache options stored in ngx_http_proxy_module and ngx_http_fastcgi_module. These control their respective areas: proxy_cache_* options are used in conjunction with standard requests and proxy options, and fastcgi_cache_* options are used with the FastCGI options (locations generally used with fastcgi_pass proxied namespace).

Setting up the Middleware

I am not covering setting up the middleware in this article, but it is very easy to get started with PHP-FPM. Usually, installing it is as easy as installing it through the respective distro (ie: apt-get install php5-fpm in modern versions of Debian or Ubuntu).

Ubuntu 14.04 sets up PHP-FPM to listen on /var/run/php5-fpm.sock, but it can, of course, be configured to listen on TCP as well.

Setting up nginx for FastCGI

Before jumping into the config below, keep in mind that the FastCGI cache needs to be defined in the core nginx http config, like so:

http {
  # Several omitted options here...
  fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=fcgizone:10m max_size=200m;
}

This option dictates several things:

  • The path of the cache, in this case /var/cache/nginx
  • The structure of the path hierarchy. 1:2 constructs a directory structure that takes the last three characters of the MD5 hash computed file name and creates a structure like /var/cache/nginx/c/ab/ if your last 3 characters were abc.
  • keys_zone is the name and the size of the key cache. This is not the actual cache size, but memory used for cache key entries and metadata. One megabyte can hold approximately 8000 keys or cache entries.
  • max_size is what defines the size of the cache.

This can also be included and dropped into /etc/nginx/conf.d/ in most default setups.

The following details a basic server section in nginx. This will lie in somewhere like /etc/nginx/conf.d/vancluevertech.com.conf, or /etc/nginx/sites-available/vancluevertech.com if Debian or Ubuntu convention is being followed.

server {
  listen 80;

  root /var/www/vancluevertech.com;
  index index.php index.html;

  server_name _;

  # Logs
  access_log /var/log/nginx/vancluevertech.com.log;
  error_log /var/log/nginx/vancluevertech.com.err;

  location / {
          try_files $uri $uri/ /index.php$is_args$args;
  }

  # FastCGI stuff
  location ~ \.php$ {
          include fastcgi_params;

          fastcgi_pass unix:/var/run/php5-fpm.sock;
          fastcgi_index index.php;
          fastcgi_param  SCRIPT_FILENAME $document_root$fastcgi_script_name;

          fastcgi_cache fcgizone;
          fastcgi_cache_valid 200 1m;
          fastcgi_cache_key $http_host$fastcgi_script_name$request_uri;
          fastcgi_cache_lock on;
          fastcgi_cache_use_stale error timeout invalid_header updating http_500 http_503;
  }
  # deny access to .htaccess files, if Apache's document root
  # concurs with nginx's one
  #
  location ~ /\.ht {
          deny all;
  }
}

A note before I move on to the FastCGI stuff: the location block is set up to try a few options before 404ing: the direct URI, the URI as a sub directory (ie: to see if there is a default file here), and then, as a fallback, to request /index.php itself. This is mainly designed for sites that have a CMS system that uses permalinks (again, like WordPress).

Now, on to the FastCGI bits in the location block (which, if it wasn’t evident, passes all PHP content):

  • First off, /etc/nginx/fastcgi_params is included (shorthanded to a relative path). This file sets a number of environment variables that are essential to a functional FastCGI environment, and is included with most bundled versions of nginx.
  • fastcgi_pass is the option that passes the request to – in this case – PHP-FPM.
  • fastcgi_param as it’s shown here overrides an option that was set in fastcgi_params, and serves as an example of how to set environment. Basically I am building the SCRIPT_FILENAME environment variable for FastCGI by combining the document root and the path to the running script (evaluating to something like /var/www/vancluevertech.com/index.php). This is needed by some CMS systems.

Now, on to the main attraction:

  • fastcgi_cache references the cache zone that was defined earlier. This effectively turns on the cache, and is in theory the only option needed.
  • fastcgi_cache_valid sets cache entries for 200 (OK) code responses for 1 minute, in this instance.
  • fastcgi_cache_key is building a cache key. The object is to get a unique enough key generated here so that there are no cache conflicts that could lead to broken content. The one listed here gives one such as vancluevertech.com/index.php/stub, which should be plenty. Of course, cache entries can be built off a number of things, including the many list of variables that nginx has.
  • fastcgi_cache_lock ensures that only one request for a new cache entry is sent at the same time. This has a default timeout of 5 seconds by default (controlled with fastcgi_cache_lock_timeout), after which the request will be passed through to avoid errors. However:
  • fastcgi_cache_use_stale, the last option, has a option named updating that allows a stale entry to be passed to any other requests in the case that there is currently a lock. This enables a simple yet effective throttling mechanism to back end resources. In this configuration, approximately one request per URL would come through every minute. There are also other flags here that allow stale entries to be used in the event of several kinds of errors. Depending on how the application is set up, your mileage may vary.

Lastly, not a caching option, but I block .htaccess files just in case there are any left over in the content since moving from Apache, if that change was made.

This above is really the tip of the iceberg when it comes to nginx caching. There are several other cache manipulation options, allowing for finer grained cache control, such as fastcgi_cache_bypass to bypass the cache (ie: honouring Cache-Control headers inbound or manually exempting admin areas), or even more sophisticated scenarios such as setting up the cache to be purged or revalidated via special requests. Definitely take a look at the documentation mentioned above if you are interested. Keep in mind that some require later versions of nginx (the one bundled in Ubuntu 14.04 for example is only 1.4.6), and cache purging actually requires nginx+, nginx’s premium server.

One last thing about the cache that should be noted: Cache-Control response headers from FastCGI are honoured. This means that if the application in question has an admin area that passes these headers, it is not necessary to set up any exceptions using fastcgi_cache_bypass.

Advertisements

One thought on “nginx: FastCGI Caching Basics

Comments are closed.