Pages

Saturday, 1 December 2018

NGINX as a SPDY load balancer for Node.js

Background

Recently we wanted to integrate SPDY into our stack at SocialRadar to make requests to our API a bit more speedy (hurr hurr). Particularly for multiple subsequent requests in rapid succession, avoiding that TCP handshake on every request would be quite nice.
Android has supported SPDY in its networking library for a little while and iOS added SPDY support in iOS 8 so we could get some nice performance boosts on our two most used platforms.
Previously, we had clients connecting via normal HTTPS on port 443 to an Elastic Load Balancer which would handle the SSL negotiation and proxy requests into our backend running Node.js over standard HTTP. This was working nicely for us and we didn’t have to handle any SSL certs in our Node.js codebase which was beneficial both for cleanliness and for performance.
However, when we wanted to enable SPDY, we discovered that
  1. AWS Elastic Load Balancers don’t support SPDY
  2. In order for SPDY to work optimally, it would need an end-to-end channel[1]
So, I set out to find alternatives.
Ultimately I settled on using NGINX as it had SPDY proxy support, it’s fast, and it’s relatively easy to configure. My configuration details follow, hopefully it’ll be helpful to others endeavoring to do the same.
We will assume for the purposes of this that you have a server, we’ll refer to as the Load Balancer in this article since that will be its primary purpose.

 Step 1: Configuring NGINX

So our first step is to configure NGINX. We want it to act like a load balancer, proxying requests to the Node.js instances in our cluster.
In order to do this, we’ll need a few core things, explained in greater detail below:
  • A recent version of NGINX which supports NPN (next protocol negotiation) which SPDY needs
  • The SSL cert on our NGINX servers so they can handle the incoming data
  • Configuration for acting as a proxy to our Node.js servers

 Grab the latest version of NGINX

First, ensure you have a recent version of NGINX installed on your Load Balancer server. The default version installed from apt-get on the latest version of Ubuntu (Trusty Tahr) is sadly too out of date to support NPN, the special sauce needed for NGINX to support SPDY. (I believe it is 1.4.x)
We use Chef to manage our deployments, so we added the following to our chef recipe to ensure we grabbed and installed the latest:
apt_repository "nginx" do
  uri "http://nginx.org/packages/ubuntu/"
  distribution node['lsb']['codename']
  components ['nginx']
  deb_src false
  key 'http://nginx.org/keys/nginx_signing.key'
  keyserver nil
  action 'add'
end

package "nginx"
As of this writing (October 2014), this installed version 1.6.2 of NGINX
If you’re not using Chef or Ubuntu, I suggest looking on the NGINX websitefor information on installing the latest stable version.

 Get your SSL certs on your Load Balancer server

SSL certs tend to be rather confusing, and it would be too much for us to cover in-depth for the purposes of this article, so we will assume you have gotten your SSL certs somehow. In our case, we have a wildcard cert for our domain which we used for this load balancer.
What worked for us was getting 2 files, the ssl certificate (file ending in .crt) and the ssl certificate key (file ending in .pem). We had some other files (such as one ending in .key) that we didn’t need.[2]
So anyway, get those 2 files (.crt and .pem) onto your server.

 Build your nginx.conf file

So now you’ve got NGINX installed on Load Balancer, we need to configure it.
We used the following configuration for ours (filled in with some dummy data):
error_log /srv/logs/loadbalancer/error.log;

events {
    worker_connections 1024;
    multi_accept on;
}

http {
    upstream nodejsservers {
        ip_hash;
        server server1.yourdomain.com:9081;
        server server2yourdomain.com:9081;
        server server3.yourdomain.com:9081;
        server server4.yourdomain.com:9081;
    }
    server {
        access_log      /srv/logs/loadbalancer/access.log;

        # Only requests to our Host are allowed
        if ($host !~ ^(api.yourdomain.com)$ ) {
           return 444;
        }

        listen 443 ssl spdy;

        ssl on;
        ssl_certificate     ssl/yourdomain.com.crt;
        ssl_certificate_key ssl/yourdomain.com.private.pem;

        location / {
            # Let the client know about spdy
            add_header        Alternate-Protocol  443:npn-spdy/2;

            proxy_pass https://nodejsservers;

            # Set headers
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}
I have a breakdown of some chunks below to help explain what different sections are doing:
upstream nodejsservers {
    ip_hash;
    server server1.yourdomain.com:9081;
    server server2yourdomain.com:9081;
    server server3.yourdomain.com:9081;
    server server4.yourdomain.com:9081;
}
Above we are specifying the list of servers for which our load balancer will route requests. In our case we have 4 different servers. In this example, it’s hard-coded, but in our real-world setup, we fill this list with Chef so we can add a server in one place and it’s updated here.
These are our Node.js servers. In our case they are each serving a SPDY server on port 9081. This was a rather arbitrary port selection, high enough in the range so we don’t need to run Node.js as root, but otherwise mostly historic internally.
We specify ip_hash because we want to use an ip hash for affinity so a given client will reliably connect to the same terminal server, thereby maximizing the benefit of using SPDY.[3]
# Only requests to our Host are allowed
if ($host !~ ^(api.yourdomain.com)$ ) {
    return 444;
}
This adds a tiny bit of added security, ensuring only requests to api.yourdomain.com are actually routed through this load balancer, returning the HTTP response code 444 otherwise.
listen 443 ssl spdy;

ssl on;
ssl_certificate     ssl/yourdomain.com.crt;
ssl_certificate_key ssl/yourdomain.com.private.pem;
We want our Load Balancer to listen on port 443 for SSL and SPDY traffic which of course it will forward along to our servers. It needs to hold our certs in order to pass on that traffic.
# Let the client know about spdy
add_header        Alternate-Protocol  443:npn-spdy/2;
As the comment suggests, we want to announce that we accept SPDY.
proxy_pass https://nodejsservers;
A key line, basically instructing NGINX that we want to proxy requests to our cluster above. Node we called it nodejsservers in our upstreamdefinition above. This is what allows us to refer to it here and have NGINX understand it.
# Set headers
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
We want to set some headers before forwarding it on so our terminal applications have the real ip of the requests. Without it, all requests will look as though they are coming from our load balancer.

 Step 2: Configuring our Node.js servers

So now we have NGINX operating as a proxy to spread load among our Node.js servers, we now need to update them to be able to handle SPDY requests.

 Get your SSL certs on your Node.js servers

Same as above, we want the .crt and .pem files. In our case we used the exact same 2 files on the load balancer as on our Node servers.

 Update Node.js code to accept SPDY

Now that we have the certs, let’s use ‘em!
Below is a simplified version of an app running Express. But the same technique would work for Koa, or anything else which builds upon Node’s baked in HTTP request as Express and Koa both do.
var express = require("express");
var spdy = require("spdy");

var app = module.exports = express();

// routes, middleware, etc. omitted

spdy.createServer({
    key: yourdomain.com.private.pem,
    cert: yourdomain.com.crt
}, app).listen(9081);
We used port 9081 as specified in our NGINX config above. Again, any port works, as long as NGINX is routing the SPDY traffic to a port open on your Node.js servers.

 Epilogue: Load balancing the load balancer

Doing things exactly as described above is awesome for allowing multiple Node.js servers to handle SPDY requests, however it introduces a single point of failure, namely the load balancer.
Of course this is not ideal! The AWS Elastic Load Balancer comes with some fault tolerance baked in. How can we get around this?
Well, in our case, we deployed multiple load balancers like this in multiple availability zones.
Using Route53, we can set up DNS with routing policies such as Latency, Geolocation, Failover, and more. Depending on your application, use the one fit for the job.
For example, if you just have multiple load balancers all in us-east-1, you may want to use simple failover. But if you have multiple data centers, say one in us-east-1 and one in eu-west-1, you can have requests route to the correct load balancer closest to the appropriate data center with Latency or Geolocation routing policies. Not only will this speed up performance (so a request from say, Russia doesn’t have to travel around the entire world to Virginia to then be load balanced back to the server in Europe) but also provide failover, so if one of the load balancers goes down, the other can continue servicing requests until a new one is brought online.
You should follow me on twitter here.
[1]: Why not terminate SPDY at the load balancer and just communicate via HTTP to the Node.js servers? Because then we are not eliminating the TCP handshake, just pushing it further into our system. We could take this approach, but then our load balancer will be doing the TCP handshake with our Node.js servers so we’re just pushing the problem down the line. Only ensuring the SPDY pipe is ending with a server will truly eliminate that.
[2]: YMMV, honestly I don’t feel comfortable enough with SSL certs to explain the difference, I did some trial and error to get it working. Searching the web for “which freaking SSL files do I need for this” or “I have a .crt, .pem and .key, which file is which and which goes where” didn’t yield much in the way of useful results.
[3]: I have read that ip_hash can be is suboptimal in many circumstances because it doesn’t evenly spread load, but we need some kind of affinity so a single client connecting to our API keeps connecting to the same server or a lot of the benefit of SPDY (avoiding the TCP handshake and the pipelining) goes out the window. Most of the other selections for affinity (such as sticky) use some sort of cookie based mechanism for ensuring a client hits the same server on a subsequent request, but since basically all of our requests are API calls made from mobile devices to our backend, cookie based affinity didn’t make much sense. For browser based traffic, this makes more sense, but our clients don’t use cookies for anything.
from http://blog.victorquinn.com/nginx-as-a-spdy-reverse-proxy-passthrough

No comments:

Post a Comment