Total Pageviews

Wednesday 31 May 2017

基于nodejs的网页代理程序:node-unblocker

Web proxy for evading internet censorship, and general-purpose Node.js library for proxying and rewriting remote webpages
 node-Unblocker was originally a web proxy for evading internet censorship, similar to CGIproxy / PHProxy / Glype but written in node.js. It's since morphed into a general-purpose library for proxying and rewriting remote webpages.
All data is processed and relayed to the client on the fly without unnecessary buffering, making unblocker one of the fastest web proxies available.
Build Status Dependency Status npm-version

The magic part

The script uses "pretty" urls which, besides looking pretty, allow links with relative paths to just work without modification. (E.g. <a href="path/to/file2.html"></a>)
In addition to this, links that are relative to the root (E.g. <a href="/path/to/file2.html"></a>) can be handled without modification by checking the referrer and 307 redirecting them to the proper location in the referring site. (Although the proxy does attempt to rewrite these links to avoid the redirect.)
Cookies are proxied by adjusting their path to include the proxy's URL, and a bit of extra work is done to ensure they remain intact when switching protocols or subdomains.

Limitations

Although the proxy works well for standard login forms and even most AJAX content, OAuth login forms and anything that uses postMessage (Google, Facebook, etc.) are not likely to work out of the box. This is not an insurmountable issue, but it's not one that I expect to have fixed in the near term. Patches are welcome, including both a general-purpose fix to go into the main library, and site-specific fixes to go in the examples folder.

Running the website on your computer

Requires node.js >=4.3 Then download node-unblocker, cd into the examples/nodeunblocker.com/ directory, and run npm install to set things up. Then run npm start to start the server. It should spawn a new instance for each CPU core you have.
(Note: running node app.js will not work. The server code is in the Gatling package, which the npm start command calls automatically.)

Running the website on heroku/bluemix/modulous/etc

This project should be runnable on a free Heroku instance without modification - just copy the examples/nodeunblocker.com/ folder to a new git repo and push it.

Using unblocker as a library in your software

npm install --save unblocker
Unblocker exports an express-compatible API, so using in an express application is trivial:
var express = require('express')
var Unblocker = require('unblocker');
var app = express();

// this must be one of the first app.use() calls and must not be on a subdirectory to work properly
app.use(new Unblocker({prefix: '/proxy/'}));

app.get('/', function(req, res) {
    //...
});
Usage without express is similarly easy, see examples/simple/server.js for an example.

Configuration

Unblocker supports the following configuration options, defaults are shown:
{
    prefix: '/proxy/',  // Path that the proxied URLs begin with. '/' is not recommended due to a few edge cases.
    host: null, // Host used in redirects (e.g `example.com` or `localhost:8080`). Default behavior is to determine this from the request headers.
    requestMiddleware: [], // Array of functions that perform extra processing on client requests before they are sent to the remote server. API is detailed below.
    responseMiddleware: [], // Array of functions that perform extra processing on remote responses before they are sent back to the client. API is detailed below.
    standardMiddleware: true, // Allows you to disable all built-in middleware if you need to perform advanced customization of requests or responses.
    processContentTypes: [ // All  built-in middleware that modifies the content of responses limits itself to these content-types.
        'text/html',
        'application/xml+xhtml',
        'application/xhtml+xml',
        'text/css'
    ],
    httpAgent: null, //override agent used to request http response from server. see https://nodejs.org/api/http.html#http_class_http_agent
    httpsAgent: null //override agent used to request https response from server. see https://nodejs.org/api/https.html#https_class_https_agent
}

Custom Middleware

Unblocker "middleware" are small functions that allow you to inspect and modify requests and responses. The majority of Unblocker's internal logic is implimented as middleware, and it's possible to write custom middleware to augment or replace the built-in middleware.
Custom middleware should be a function that accepts a single data argument and runs synchronously.
To process request and response data, create a Transform Stream to perform the processing in chunks and pipe through this stream. (Example below.)
To respond directly to a request, add a function to config.requestMiddleware that handles the clientResponse (a standard http.ServerResponse when used directly, or a Express Response when used with Express. Once a response is sent, no further middleware will be executed for that request. (Example below.)
requestMiddleware
Data example:
{
    url: 'http://example.com/',
    clientRequest: {request},
    clientResponse: {response},
    headers: {
        //...
    },
    stream: {ReadableStream of data for PUT/POST requests, empty stream for other types}
}
requestMiddleware may inspect the headers, url, etc. It can modify headers, pipe PUT/POST data through a transform stream, or respond to the request directly. If you're using express, the request and response objects will have all of the usual express goodies. For example:
function validateRequest(data) {
    if (!data.url.match(/^https?:\/\/en.wikipedia.org\//) {
        data.clientResponse.status(403).send('Wikipedia only.');
    }
}
var config = {
    requestMiddleware: [
        validateRequest
    ]
}
If any piece of middleware sends a response, no further middleware is run.
After all requestMiddleware has run, the request is forwarded to the remote server with the (potentially modified) url/headers/stream/etc.
responseMiddleware
responseMiddleware receives the same data object as the requestMiddleware, but the headers and stream fields are replaced with those of the remote server's response, and several new fields are added for the remote request and response:
Data example:
{
    url: 'http://example.com/',
    clientRequest: {request},
    clientResponse: {response},
    remoteRequest {request},
    remoteResponse: {response},
    contentType: 'text/html',
    headers: {
        //...
    },
    stream: {ReadableStream of response data}
}
For modifying content, create a new stream and then pipe data.stream to it and replace data.stream with it:
var Transform = require('stream').Transform;

function injectScript(data) {
    if (data.contentType == 'text/html') {

        // https://nodejs.org/api/stream.html#stream_transform
        var myStream = new Transform({
            decodeStrings: false,
            function(chunk, encoding, next) {
                chunk = chunk.toString.replace('</body>', '<script src="/my/script.js"></script></body>');
                this.push(chunk);
                next();
        });

        data.stream = data.stream.pipe(myStream);
    }
}

var config = {
    responseMiddleware: [
        injectScript
    ]
}
See examples/nodeunblocker.com/app.js for another example of adding a bit of middleware. Also, see any of the built-in middleware in the lib/ folder.
Built-in Middleware
Most of the internal functionality of the proxy is also implemented as middleware:
  • host: Corrects the host header in outgoing responses
  • referer: Corrects the referer header in outgoing requests
  • cookies: Fixes the Path attribute of set-cookie headers to limit cookies to their "path" on the proxy (e.g. Path=/proxy/http://example.com/). Also injects redirects to copy cookies from between protocols and subdomains on a given domain.
  • hsts: Removes Strict-Transport-Security headers because they can leak to other sites and can break the proxy.
  • hpkp: Removes Public-Key-Pinning headers because they can leak to other sites and can break the proxy.
  • csp: Removes Content-Security-Policy headers because they can leak to other sites and can break the proxy.
  • redirects: Rewrites urls in 3xx redirects to ensure they go through the proxy
  • decompress: Decompresses Content-Encoding: gzip responses and also tweaks request headers to ask for either gzip-only or no compression at all. (It will attempt to decompress deflate content, but there are some issues, so it does not advertise support for deflate.)
  • charsets: Converts the charset of responses to UTF-8 for safe string processing in node.js. Determines charset from headers or meta tags and rewrites all headers and meta tags in outgoing response.
  • urlPrefixer: Rewrites URLS of links/images/css/etc. to ensure they go through the proxy
  • metaRobots: Injects a ROBOTS: NOINDEX, NOFOLLOW meta tag to prevent search engines from crawling the entire web through the proxy.
  • contentLength: Deletes the content-length header on responses if the body was modified.
Setting the standardMiddleware configuration option to false disables all built-in middleware, allowing you to selectively enable, configure, and re-order the built-in middleware.
This configuration would mimic the defaults:
var Unblocker = require('unblocker');

var config = {
    prefix: '/proxy/',
    host: null,
    requestMiddleware: [],
    responseMiddleware: [],
    standardMiddleware: false,  // disables all built-in middleware
    processContentTypes: [
        'text/html',
        'application/xml+xhtml',
        'application/xhtml+xml'
    ]
}

var host = Unblocker.host(config);
var referer = Unblocker.referer(config);
var cookies = Unblocker.cookies(config);
var hsts = Unblocker.hsts(config);
var hpkp = Unblocker.hpkp(config);
var csp = Unblocker.csp(config);
var redirects = Unblocker.redirects(config);
var decompress = Unblocker.decompress(config);
var charsets = Unblocker.charsets(config);
var urlPrefixer = Unblocker.urlPrefixer(config);
var metaRobots = Unblocker.metaRobots(config);
var contentLength = Unblocker.contentLength(config);

config.requestMiddleware = [
    host,
    referer,
    decompress.handleRequest,
    cookies.handleRequest
    // custom requestMiddleware here
];

config.responseMiddleware = [
    hsts,
    hpkp,
    csp,
    redirects,
    decompress.handleResponse,
    charsets,
    urlPrefixer,
    cookies.handleResponse,
    metaRobots,
    // custom responseMiddleware here
    contentLength
];

app.use(new Unblocker(config));

Debugging

Unblocker is fully instrumented with debug. Enable debugging via environment variables:
DEBUG=unblocker:* node mycoolapp.js
There is also a middleware debugger that adds extra debugging middleware before and after each existing middleware function to report on changes. It's included with the default DEBUG activation and may also be selectively enabled:
DEBUG=unblocker:middleware node mycoolapp.js
... or disabled:
DEBUG=*,-unblocker:middleware node mycoolapp.js

Troubleshooting

If you're using Nginx as a reverse proxy, you probably need to disable merge_slashes to avoid endless redirects and/or other issues:
merge_slashes off; 
 
from https://github.com/nfriedly/node-unblocker 
------

一键在Heroku.com上,部署node-unblocker Web代理,简单易用且免费 

基于node unblocker的通用web代理

本工程是 node-unblocker 的一个副本,只是修改了工程结构,以便于直接在Heroku上部署; 原作者的版本里, 所有的代理请求被到 http, 在这种场景下, URL中含有敏感域名会导致连接被GFW重置。因此,在该副本里,修改了代码让所有代理请求全使用https。
node-unblocker相对于PHP的web代理性能要好,且易于在Heroku上部署。 Heroku提供免费账号,且不需要绑定信用卡,部署简单。若您担心使用别人提供的服务器可能会泄漏个人隐私,可以通过一键部署搭建自己的Web代理。部署自己的代理,去中心化,小范围使用更不容易被GFW封锁。
如果您觉得该项目有用,那就请把它推荐给自己的亲友,让更多的人受益吧。
一键部署,请点击下方的"Deploy to Heroku"按钮, 具体步骤请参考 WIKI
Deploy
注:此代理适用于大多数网站,但无法播放YouTube视频。观看YouTube视频,请使用 you2php-heroku
from https://github.com/gfw-breaker/heroku-node-proxy
------------

使用Heroku空间搭建免费的Web代理


Heroku 是最早出现的PaaS提供商之一,从2007年就开始运营,平台的灵活性极高,支持多种编程语言, 而且提供免费账号!!!
只需简单几步,就可以免费搭建自已的Web代理服务,适用于代理绝大部分网站,已适配桌面端和移动端
在线体验:https://web-proxy-6.herokuapp.com
访问https://web-proxy-6.herokuapp.com/proxy/https://briteming.blogspot.com ,即可访问到我的博客。
https://web-proxy-6.herokuapp.com/proxy/https://nogfw.the-youtube.win/watch.php?v=nU28bvoX0BQ ,把“v=”后面的值替换为其他youtube视频网址末尾的v=”后面的值,比如https://www.youtube.com/watch?v=WOsX-C1OkKs,就替换为WOsX-C1OkKs ,即可观看该视频。
搭建教程:
1. 注册账号:进入注册页面 https://signup.heroku.com/login(需要翻墙), 使用安全邮箱 (如Gmail) 注册账号
2. 进入我的GitHub项目 https://github.com/gfw-breaker/heroku-node-proxy,点击“Deploy  to Heroku“按钮, 进入部署向导页面
3. 在”Create New App”向导中填写应用名称及Region后,点击”Deploy app”按钮。备注:目前Europe region速度较快
4. 部署成功后,会有”Your app was successfully deployed”的提示信息,点击”View”即可进入Web代理主页.
若代理主页显示”no such app”, 请删除该Dyno,按上述步骤重建,重建时请指定新的应用名称
关于Heroku免费套餐
1. 500兆内存
2. 每月可使用550小时(折合22天)
3. 应用程序在连续30分钟没有访问时进入休眠
4. 当有请求时应用程序会自动唤醒, 第一次访问时会比较慢
from https://gfw-breaker.win/heroku%E4%B8%8A%E6%90%AD%E5%BB%BAweb%E4%BB%A3%E7%90%86/
--------------------------------------

https://github.com/gfw-breaker/open-proxy

No comments:

Post a Comment