Total Pageviews

Thursday 2 November 2017

Nginx Module for Google

Nginx Module for Google Mirror

Build Status Gitter

Description

ngx_http_google_filter_module is a filter module which makes google mirror much easier to deploy.
Regular expressions, uri locations and other complex configurations have been built-in already.
The native nginx module ensure the efficiency of handling cookies, gstatic scoures and redirections.
Let's see how easy it is to setup a google mirror.
location / {
  google on;
}
What? Are you kidding me?
Yes, it's just that simple!

Demo site https://g2.wen.lu

Demo Site

Dependency

  1. pcre regular expression support
  2. ngx_http_proxy_module backend proxy support
  3. ngx_http_substitutions_filter_module mutiple substitutions support

Installation

Download sources first
#
# download the newest source
# @see http://nginx.org/en/download.html
#
wget http://nginx.org/download/nginx-1.7.8.tar.gz

#
# clone ngx_http_google_filter_module
# @see https://github.com/cuber/ngx_http_google_filter_module
#
git clone https://github.com/cuber/ngx_http_google_filter_module

#
# clone ngx_http_substitutions_filter_module
# @see https://github.com/yaoweibin/ngx_http_substitutions_filter_module
#
git clone https://github.com/yaoweibin/ngx_http_substitutions_filter_module
Brand new installation
#
# configure nginx customly
# replace </path/to/> with your real path
#
./configure \
  <your configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module
Migrate from existed distribution
#
# get the configuration of existed nginx
# replace </path/to/> with your real path
#
</path/to/>nginx -V
> nginx version: nginx/ <version>
> built by gcc 4.x.x
> configure arguments: <configuration>

#
# download the same version of nginx source
# @see http://nginx.org/en/download.html
# replace <version> with your nginx version
#
wget http://nginx.org/download/nginx-<version>.tar.gz
  
#
# configure nginx
# replace <configuration> with your nginx configuration
# replace </path/to/> with your real path
#
./configure \
  <configuration> \
  --add-module=</path/to/>ngx_http_google_filter_module \
  --add-module=</path/to/>ngx_http_substitutions_filter_module
#
# if some libraries were missing, you should install them with the package manager
#   eg. apt-get, pacman, yum ...
#

Usage

Basic Configuration
resolver is needed to resolve domains.
server {
  # ... part of server configuration
  resolver 8.8.8.8;
  location / {
    google on;
  }
  # ...
}
Google Scholar
google_scholar depends on google, so google_scholar cannot be used independently.
Nowadays google scholar has migrate from http to https, and ncr is supported, so the tld of google scholar is no more needed.
location / {
  google on;
  google_scholar on;
}
Google Language
The default language can be set through google_language, if it is not setup, zh-CN will be the default language.
location / {
  google on;
  google_scholar on;
  # set language to German
  google_language de; 
}
Supported languages are listed below.
ar    -> Arabic
bg    -> Bulgarian
ca    -> Catalan
zh-CN -> Chinese (Simplified)
zh-TW -> Chinese (Traditional)
hr    -> Croatian
cs    -> Czech
da    -> Danish
nl    -> Dutch
en    -> English
tl    -> Filipino
fi    -> Finnish
fr    -> French
de    -> German
el    -> Greek
iw    -> Hebrew
hi    -> Hindi
hu    -> Hungarian
id    -> Indonesian
it    -> Italian
ja    -> Japanese
ko    -> Korean
lv    -> Latvian
lt    -> Lithuanian
no    -> Norwegian
fa    -> Persian
pl    -> Polish
pt-BR -> Portuguese (Brazil)
pt-PT -> Portuguese (Portugal)
ro    -> Romanian
ru    -> Russian
sr    -> Serbian
sk    -> Slovak
sl    -> Slovenian
es    -> Spanish
sv    -> Swedish
th    -> Thai
tr    -> Turkish
uk    -> Ukrainian
vi    -> Vietnamese
Spider Exclusion
The spiders of any search engines are not allowed to crawl google mirror.
Default robots.txt listed below was build-in aleady.
User-agent: *
Disallow: /
If google_robots_allow set to on, the robots.txt will be replaced with the version of google itself.
  #...
  location / {
    google on;
    google_robots_allow on;
  }
  #...
Upstreaming
upstream can help you to avoid name resolving cost, decrease the possibility of google robot detection and proxy through some specific servers.
upstream www.google.com {
  server 173.194.38.1:443;
  server 173.194.38.2:443;
  server 173.194.38.3:443;
  server 173.194.38.4:443;
}
Proxy Protocol
By default, the proxy will use https to communicate with backend servers.
You can use google_ssl_off to force some domains to fall back to http protocol.
It is useful, if you want to proxy some domains through another gateway without ssl certificate.
#
# eg. 
# i want to proxy the domain 'www.google.com' like this
# vps(hk) -> vps(us) -> google
#

#
# configuration of vps(hk)
#
server {
  # ...
  location / {
    google on;
    google_ssl_off "www.google.com";
  }
  # ...
}

upstream www.google.com {
  server < ip of vps(us) >:80;
}

#
# configuration of vps(us)
#
server {
  listen 80;
  server_name www.google.com;
  # ...
  location / {
    proxy_pass https://www.google.com;
  }
  # ...
}
from https://github.com/cuber/ngx_http_google_filter_module
-----------
安装nginx的模块ngx_http_google_filter_module,以搭建google.com的镜像网站

前言

我们都知道,因为某些原因,无法正常在国内浏览谷歌。虽然在这个时候,很多朋友会去自己搭建或者是自己购买特殊服务(SS、V.P!N),但是对于日常浏览谷歌来说,没有必要这么麻烦。本次介绍个更为直接的方法,方便大家去谷歌学术和谷歌寻找资料~

必备材料

  • Nginx
  • 一台境外VPS
  • 一颗敢于挑战的心

教程

首先登入SSH,安装Nginx,这个方法很多,无论你是军哥LNMP安装包还是Oneinstack,都是可以胜任的(强烈推荐AMH)。安装完成后,运行
nginx -V
效果如下:
请复制图中 configure arguments: 后方的内容。从 –prefix= 一直到../ngx cache purge-2.3。
下一步请进入Nginx的源码包的上级目录,例如: 你源码包在 /root/oneinstack/src/nginx-1.10.2
那么源码包上级就在 /root/oneinstack/src
请cd至此
下面要下载,安装如下几个Nginx模块:
  • pcre 正则
  • ngx_http_proxy_module 反向代理
  • ngx_http_substitutions_filter_module 多重替换
直接复制去运行吧:

git clone https://github.com/cuber/ngx_http_google_filter_module
git clone https://github.com/yaoweibin/ngx_http_substitutions_filter_module
wget http://mirrors.linuxeye.com/oneinstack/src/pcre-8.38.tar.gz
tar xzf pcre-8.38.tar.gz
下面准备重新编译Nginx,请先进入Nginx目录,比如我的就是要cd nginx-1.10.2 :
./configure  +  刚刚你复制的那一堆configure arguments  +  --add-module=../ngx_http_google_filter_module \
--add-module=../ngx_http_substitutions_filter_module
如果出错,请仔细阅读以上教程。 如果完成了,继续make和install
make
make install
至此,Nginx相关模块已经配置完成。
然后就是虚拟主机配置了,你需要准备:
  • 一个可信的SSL证书
具体配置写在下面,oneinstack用户直接在/usr/local/nginx/conf/vhost/目录下新建一个conf文件:
server {
listen 443 ssl http2;
server_name 你的域名;
ssl_certificate SSL证书CRT完整绝对路径;
ssl_certificate_key SSL证书私钥KEY绝对路径;
ssl_session_timeout 10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_ciphers CHACHA20:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-RC4-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:RC4-SHA:!aNULL:!eNULL:!EXPORT:!DES:!3DES:!MD5:!DSS:!PKS;
ssl_session_cache builtin:1000 shared:SSL:10m;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
location / {
google on;
google_scholar on;
google_language zh-CN;
}
}
server {
listen 80;
server_name 你的域名;
rewrite ^(.*)$ https://$host$1 permanent; #访问http跳转至https
}
最后 service nginx reload 并且把域名解析到 你服务器就好啦~
-----------------------
查找方法发现一个更好方法,直接安装Nginx的模块即可。

模块介绍

ngx_http_google_filter_module是一个过滤器模块,能够让谷歌镜像更便捷的部署。内建了正则表达式、URI locations和其他复杂的配置。原生nginx模块确保了更加高效地处理cookies, gstatic scoures和重定向。

No comments:

Post a Comment