Total Pageviews

Wednesday 16 April 2014

一个基于nodejs的reverse HTTP proxy

 from http://www.steve.org.uk/Software/node-reverse-proxy/
(https://github.com/nodejitsu/node-http-proxy)

The use of node.js allows for hundreds of active connections at any time, without suffering from high system overhead or load.
The key feature list for node-reverse-proxy:
  • Simple to deploy, configure & understand.
  • Capable of proxying for an arbitrary number of virtual hosts.
  • Capable of performing simple URL rewrites prior to proxying, via regular-expressions.
  • Capable of performing arbitrarily complex rewrites via Javascript functions.
  • Able to execute hooks for each recieved request, and prior to returning the remote HTTP-server's response to the originating client.

Downloading & Getting Started

Once you have node.js installed you may get the node-reverse-proxy via one of the following links, depending upon how dangerously you wish to live:
Once downloaded you'll find a directory tree like this:
|-- node-reverse-proxy.js
|-- README
|-- examples
    |-- cgi-vhost.js
    |-- filters.js
    |-- one-host-rewrites.js
    |-- one-vhost.js
    |-- two-vhosts.js
    `-- wildcard-hosts.js
Within the download the script node-reverse-proxy.js is the program itself, and it may be launched like so:
$ node ./node-reverse-proxy.js --config ./path/to/config.file.js
For further options please consult the output of "node ./node-reverse-proxy.js --help"

The Configuration File

The configuration file is the core of the reverse-proxy, and it will generally contain three things:
  • The global options, such as the address and port to bind upon.
  • The list of virtual hosts for which requests will be proxied.
    • Requests for virtual hosts which aren't recognised will be ignored.
    • Although you can change this via a "wildcard virtual host" - as demonstrated in the filter example.
  • The destination host & port to forward each request to.
Optionally you may also define & use:
  • A list of rewrite rules to apply to incoming requests - based upon regular expression matches against incoming URLs.
  • A list of functions to call for requests matching a particular regular expression.
  • Pre/Post-request filters.
The simplest possible configuration file would define a single virtual host, and give the destination of a "back-end" proxy to route the incoming requests to. That configuration file would look something like this:
Note that the hosts which are listed in the examples are actually regular expressions, so you can match multiple hosts with ease. This is best demonstrated via:
The next most complex file might have a number of virtual hosts, including some simple rewrites of incoming requests. The simplest possible rewrite would be to redirect each incoming request against a particular host to another server. For example redirecting all visitors of "http://www.example.com/" to the preferred domain "http://example.com/ :

Rewrite Rules

Rewrite rules are simple in nature, and are applied to incoming requests prior to any proxy action. There are two outcomes:
  • The incoming request is transformed and passed to the back-end server.
  • The incoming request is transformed and a HTTP-redirect is returned to the caller.
The definition of rules is specified inside the virtual-host section, via this in the configuration file:
    'example.com':
    {
        'port': '1020',
        'host': 'localhost',
        rules: {
            '/random': '/cgi-bin/random.cgi',
            '/people/([^/]+)/*$': '/people/$1.html',
        }
    },
The left hand side, or hash-key, is a regular expression to match against the incoming request. The right hand side, or hash value, is the rewrite to apply.
If the rewrite begins with "http" then a redirect is returned, otherwise the (updated) request is handed to the proxy as it would have been if no match were made at all.
  rules: {
   '/gone':  'http://example.com/moved/',  # this is redirect
   '/ok':    '/good/'                      # this is not - no leading HTTP prefix.
  }

More flexible updates

In addition to the regular-expression based rewrites we've demonstrated above it is also possible to invoke your own javascript routine to handle any incoming request - based upon the URL-path requested.
Consider a hostname "static.example.com", which had some files located beneath /private (which would correspond to the public URL http://static.example.com/private/) we can cause a routine to be executed for all accesses to /private by writing our configuration file like this:
    'static.example.com':
    {
        host: 'localhost',
        port: '1008',
        'functions': {

            '/private': (function(orig_host, vhost,req,res) {
  //
  // Your code here.
                //
     } )
        }
     },
The routine has access to several arguments:
VariableMeaning
orig_hostThe original Host: header the client submitted.
vhostThe Host: header, as it appears in the configuration file as a regular expression.
reqThe node.js request object.
resThe node.js response object.
Using this knowledge we can write a simple IP-based ACL for the /private path like this:
    'static.example.com':
    {
        host: 'localhost',
        port: '1008',
        'functions': {
            '/private': (function(orig_host, vhost,req,res) {
                var remote = req.connection.remoteAddress;;

                if ( ( remote != "1.2.3.4" ) &&
                     ( remote != "127.0.0.1" ) )
                {
                    res.writeHead(403);
                    res.write( "Denied access to " + req.url  + " from " + remote );
                    res.end();
      return true;
                }

  return false;
           }),
        }
    },

This implements simple IP-based restrictions to the given path, and allows for pretty much arbitrary flexibility. This flexibility is demonstrated in my own personal configuration file, which is used for my mercurial repositories - which uses wildcard DNS:
You'll notice that this function returns either "true" or "false". A true result will mean the proxy server will not continue processing the connection - which means it will not proxy the (possibly updated) request. A false return value ensures that the request will continue onwards to be proxied.
In short conside the return value "This request is over". So if "true" then no further action is taken, and if "false" it continues to execute and be proxied.

Pre & Post-execution Hooks

As of version 0.4 of the node-reverse-proxy it is possible to invoke hooks before a request is processed, and before the response from the back-end proxy is returned to the client.
These "pre" and "post" functions are global to the server, rather than per-vhost, for added flexiability. You can see a simple demonstration in the included example:

Configuration In Brief

The main part of the configuration file is a hash. The keys are regular expressions matched against incoming "Host:" headers.
Each hash value is a further hash containing the following known keys:
host & portDestination for where to proxy this hosts' requests to.
rulesRegular expression rules & rewrites applied to incoming URL requests.
functionsJavascript functions applied to particular paths, again based upon a regular expression.
pre/postfiltersJavascript functions executed before each incoming request is processed, and after the remote HTTP proxy server returns a result.