nginx: More FastCGI Stuff

nginx’s a pretty powerful server, no doubt. But depending on your situation, a simple caching solution as discussed in my FastCGI Caching Basics article might not be enough, or may outright break your setup. You may also want nginx to co-exist with any existing application cache as well.

In this article, I’ll explain a few techniques that can be used to help alter nginx’s caching and proxy behaviour to get the configuration you need. 

Advanced use of fastcgi_param

fastcgi_param is an important part of setting up the environment for proper operation. It can also be used to manipulate the HTTP request as it is being passed to the backend, performing an ad-hoc mangling of the request.

Example: clearing the Accept-Encoding request header

fastcgi_param HTTP_ACCEPT_ENCODING "";

This is useful when an application is configured to send a response deflated, ie: with gzip compression. This is something that should be handled in nginx, not the backend, and caching a gzipped response with a key that may cause it to be sent to a client that does not support compression will obviously cause problems. Of course, this should probably be turned off in the application, but there may be situations where that is not possible. This effectively clears whatever was set here in the first place, possibly by something earlier in the config. 

Ignoring and Excluding Headers

Via fastcgi_ignore_headers and fastcgi_hide_header, one can manipulate the headers passed to the client and even control nginx’s behaviour itself. 

Example: Ignoring Cache-Control

Imagine a scenario where the application you are using has a caching feature that can be used to compliment nginx’s own cache by speeding up updates, further reducing load. However, this feature sends Cache-Control and Expires headers. nginx honours these headers, which may manipulate nginx’s cache in a way you do not want. 

This can be corrected via:

fastcgi_ignore_headers Cache-Control Expires;
fastcgi_hide_header Cache-Control;
fastcgi_hide_header Expires;

This does the following:

  • The fastcgi_ignore_headers line will ensure that nginx will not honor any data in the Cache-Control and Expires header. Hence, if caching is enabled, nginx will cache the page even if there is a Cache-Control: no-cache header. Additionally, Expires from the response is ignored as well.
  • However, with this on, the response will still be passed to the client with the bad headers. That is probably not what is desired, so fastcgi_hide_header is used to ensure that those headers are not passed to the client as well.

Gotcha #1: Ensuring an Admin Area is not Cached

If the above techniques are employed to further optimize the cache, then it is probably a good idea to ensure that the admin area is not cached, explicitly, if Cache-Control was relied on to do that job before.

Below is an example on how to structure that in a config file, employing all of the other items that I have discussed already. Note that static entry of the SCRIPT_FILENAME and SCRIPT_NAME is probably necessary for the admin area, as the dynamic location block for general PHP files will have the caching credentials in it.

# fastcgi for admin area (no cache)
location ~ ^/admin.* {
  include fastcgi_params;

  # Clear the Accept-Encoding header to ensure response is not zipped
  fastcgi_param HTTP_ACCEPT_ENCODING "";

  fastcgi_pass unix:/var/run/php5-fpm.sock;
  fastcgi_index index.php;
  fastcgi_param SCRIPT_FILENAME $document_root/index.php;
  fastcgi_param SCRIPT_NAME /index.php;
}

location / {
  try_files $uri $uri/ /index.php$is_args$args;
}

# Standard fastcgi (caching)
location ~ \.php$ {
  include fastcgi_params;

  # Clear the Accept-Encoding header to ensure response is not zipped
  fastcgi_param HTTP_ACCEPT_ENCODING "";

  fastcgi_ignore_headers Cache-Control Expires;
  fastcgi_hide_header Cache-Control;
  fastcgi_hide_header Expires;

  fastcgi_pass unix:/var/run/php5-fpm.sock;
  fastcgi_index index.php;
  fastcgi_param  SCRIPT_FILENAME $document_root$fastcgi_script_name;

  fastcgi_cache fcgi_one;
  fastcgi_cache_valid 200 1m;
  fastcgi_cache_key $request_method-$http_host$fastcgi_script_name$request_uri;
  fastcgi_cache_lock on;
  fastcgi_cache_use_stale error timeout invalid_header updating http_500 http_503;
}

This setup will route all requests for /admin to FastCGI with a statically defined page and no cache options. This will ensure that no caching and header manipulation happens, while regular FastCGI requests will be sent to the optimized block with the caching and header bypass features.

Note the fastcgi_cache_key. This is the last thing I want to talk about:

Gotcha #2: Caching and HEAD Requests

nginx’s fastcgi_cache_key option does not have a default value, or at least the docs do not give one. The example may also be insufficient for everyday needs, or at least I thought it was. However, the proxy_cache_key has the following default value:

scheme$proxy_host$uri$is_args$args

This was pretty close to what I needed initially, and I just altered it to $http_host$fastcgi_script_name$request_uri. However, when I put this config into production, we started seeing the occasional blank page, and our monitors were reporting timeouts. It took quite a bit of digging to find out that some of our cache entires were blank, and a little googling later ended up revealing the obvious, which was confirmed in logs: HEAD requests were coming thru when there was no cache data, and nginx was caching this to a key that could not discern between a HEAD and a GET request.

There are a couple of options here: use fastcgi_cache_methods to restrict caching to only GET requests, or add the request method into the cache key. I chose the latter, as HEAD requests in my instance caused just as much load as GET requests, and I would rather not run into a situation where we encounter load issues because the servers are hammered with HEAD requests.

The new cache key is below:

fastcgi_cache_key $request_method-$http_host$fastcgi_script_name$request_uri;

nginx: FastCGI Caching Basics

I’m back!

Today, I am going to share some things regarding how to do caching in nginx, with a bit of a write up and history first.

The LAMP Stack

An older acronym these days, LAMP stands for:

  • (L)inux,
  • (A)pache,
  • (M)ySQL,
    and:
  • (P)HP.

I really am not too sure when the term was coined, but it definitely was a long time, probably about 15 years ago. The age definitely shows: there are definitely plenty of alternatives to this stack now, giving way to several other acronyms which I am not going to attempt to catalog. The only technology that has remained constant in this evolution has been the operating system: Linux.

MySQL (and Postgres, for that matter) have seen less use in favour of alternatives after people found out that not everything is best suited to go into a relational database. PHP has plenty of other alternatives, be it Node, Ruby, Python, or others, all of which have their own middleware to facilitate working with the web.

Apache can serve the aforementioned purpose, but really is not necessarily well-fated for the task. That’s not to say it can’t be. Apache is extremely well featured, a product of it being one of the oldest actively developed HTTP servers currently available, and can definitely act as a gateway for several of the software systems mentioned above. It is still probably the most popular web server on the internet, serving a little over 50% of the web’s content.

Apache’s Dated Performance

However, as far as performance goes, Apache has not been a contender for a while now. More minimal alternatives, such as the subject of this article, nginx, offer fewer features, but much better performance. Some numbers put nginx at around twice the speed – or faster – of some Apache MPMs, even on current versions. Out of the box, I recently clocked the memory footprint of a nginx and PHP-FPM stack at roughly half of the memory footprint of an Apache and mod_php5 server, a configuration that is still in popular use, mainly due to the issues the PHP project has historically had with threading.

Gateway vs. Middleware

PHP running as a CGI has always had some advantages: from a hosting background, it allows hosters to ensure that scripts and software get executed with a segregated set of privileges, usually the owner of the site. The big benefit to this was that any security problems with that in particular site didn’t leak over to other sites.

Due to the lack of threading, this is where PHP has gotten most of the love. Aside from FastCGI, there are a couple of other popular, high-performance options to use for running PHP middleware:

  • PHP-FPM, which is mainline in PHP
  • HipHopVM, Facebook’s next generation PHP JIT VM, that supports both PHP and Facebook’s own Hack derivative.

These of course, connect to a webserver, and when all the webserver is doing now is serving static content and directing connections, the best course of action is to pick a lightweight server, such as nginx.

Dynamic Language for Static Content?

Regardless of the option chosen, one annoying fact may always remain – the fact that there is a very good chance that the content being served by PHP is ultimately static during a very large majority of its lifetime. A great example of this is a CMS system, such as WordPress, running a site that may see little to no regular updates. In this event, the constant re-generation of content will place unnecessary load on a system, wasting system resources and increasing page load times.

The CMS in use may have caching options, which may be useful in varying capacities. Depending on how they run their cache, however, this could still mean unnecessary CPU used to run PHP logic, or database traffic if the cache is stored in the database.

Enter nginx’s Caching

nginx has some very powerful options for serving as a proxy server, and is perfectly capable of running as a layer 7 load balancer, complete with caching. The latter is what I am covering in this article.

nginx has 2 specific caching modules: the cache options stored in ngx_http_proxy_module and ngx_http_fastcgi_module. These control their respective areas: proxy_cache_* options are used in conjunction with standard requests and proxy options, and fastcgi_cache_* options are used with the FastCGI options (locations generally used with fastcgi_pass proxied namespace).

Setting up the Middleware

I am not covering setting up the middleware in this article, but it is very easy to get started with PHP-FPM. Usually, installing it is as easy as installing it through the respective distro (ie: apt-get install php5-fpm in modern versions of Debian or Ubuntu).

Ubuntu 14.04 sets up PHP-FPM to listen on /var/run/php5-fpm.sock, but it can, of course, be configured to listen on TCP as well.

Setting up nginx for FastCGI

Before jumping into the config below, keep in mind that the FastCGI cache needs to be defined in the core nginx http config, like so:

http {
  # Several omitted options here...
  fastcgi_cache_path /var/cache/nginx levels=1:2 keys_zone=fcgizone:10m max_size=200m;
}

This option dictates several things:

  • The path of the cache, in this case /var/cache/nginx
  • The structure of the path hierarchy. 1:2 constructs a directory structure that takes the last three characters of the MD5 hash computed file name and creates a structure like /var/cache/nginx/c/ab/ if your last 3 characters were abc.
  • keys_zone is the name and the size of the key cache. This is not the actual cache size, but memory used for cache key entries and metadata. One megabyte can hold approximately 8000 keys or cache entries.
  • max_size is what defines the size of the cache.

This can also be included and dropped into /etc/nginx/conf.d/ in most default setups.

The following details a basic server section in nginx. This will lie in somewhere like /etc/nginx/conf.d/vancluevertech.com.conf, or /etc/nginx/sites-available/vancluevertech.com if Debian or Ubuntu convention is being followed.

server {
  listen 80;

  root /var/www/vancluevertech.com;
  index index.php index.html;

  server_name _;

  # Logs
  access_log /var/log/nginx/vancluevertech.com.log;
  error_log /var/log/nginx/vancluevertech.com.err;

  location / {
          try_files $uri $uri/ /index.php$is_args$args;
  }

  # FastCGI stuff
  location ~ \.php$ {
          include fastcgi_params;

          fastcgi_pass unix:/var/run/php5-fpm.sock;
          fastcgi_index index.php;
          fastcgi_param  SCRIPT_FILENAME $document_root$fastcgi_script_name;

          fastcgi_cache fcgizone;
          fastcgi_cache_valid 200 1m;
          fastcgi_cache_key $http_host$fastcgi_script_name$request_uri;
          fastcgi_cache_lock on;
          fastcgi_cache_use_stale error timeout invalid_header updating http_500 http_503;
  }
  # deny access to .htaccess files, if Apache's document root
  # concurs with nginx's one
  #
  location ~ /\.ht {
          deny all;
  }
}

A note before I move on to the FastCGI stuff: the location block is set up to try a few options before 404ing: the direct URI, the URI as a sub directory (ie: to see if there is a default file here), and then, as a fallback, to request /index.php itself. This is mainly designed for sites that have a CMS system that uses permalinks (again, like WordPress).

Now, on to the FastCGI bits in the location block (which, if it wasn’t evident, passes all PHP content):

  • First off, /etc/nginx/fastcgi_params is included (shorthanded to a relative path). This file sets a number of environment variables that are essential to a functional FastCGI environment, and is included with most bundled versions of nginx.
  • fastcgi_pass is the option that passes the request to – in this case – PHP-FPM.
  • fastcgi_param as it’s shown here overrides an option that was set in fastcgi_params, and serves as an example of how to set environment. Basically I am building the SCRIPT_FILENAME environment variable for FastCGI by combining the document root and the path to the running script (evaluating to something like /var/www/vancluevertech.com/index.php). This is needed by some CMS systems.

Now, on to the main attraction:

  • fastcgi_cache references the cache zone that was defined earlier. This effectively turns on the cache, and is in theory the only option needed.
  • fastcgi_cache_valid sets cache entries for 200 (OK) code responses for 1 minute, in this instance.
  • fastcgi_cache_key is building a cache key. The object is to get a unique enough key generated here so that there are no cache conflicts that could lead to broken content. The one listed here gives one such as vancluevertech.com/index.php/stub, which should be plenty. Of course, cache entries can be built off a number of things, including the many list of variables that nginx has.
  • fastcgi_cache_lock ensures that only one request for a new cache entry is sent at the same time. This has a default timeout of 5 seconds by default (controlled with fastcgi_cache_lock_timeout), after which the request will be passed through to avoid errors. However:
  • fastcgi_cache_use_stale, the last option, has a option named updating that allows a stale entry to be passed to any other requests in the case that there is currently a lock. This enables a simple yet effective throttling mechanism to back end resources. In this configuration, approximately one request per URL would come through every minute. There are also other flags here that allow stale entries to be used in the event of several kinds of errors. Depending on how the application is set up, your mileage may vary.

Lastly, not a caching option, but I block .htaccess files just in case there are any left over in the content since moving from Apache, if that change was made.

This above is really the tip of the iceberg when it comes to nginx caching. There are several other cache manipulation options, allowing for finer grained cache control, such as fastcgi_cache_bypass to bypass the cache (ie: honouring Cache-Control headers inbound or manually exempting admin areas), or even more sophisticated scenarios such as setting up the cache to be purged or revalidated via special requests. Definitely take a look at the documentation mentioned above if you are interested. Keep in mind that some require later versions of nginx (the one bundled in Ubuntu 14.04 for example is only 1.4.6), and cache purging actually requires nginx+, nginx’s premium server.

One last thing about the cache that should be noted: Cache-Control response headers from FastCGI are honoured. This means that if the application in question has an admin area that passes these headers, it is not necessary to set up any exceptions using fastcgi_cache_bypass.