Total Pageviews

Tuesday, 19 January 2016

国外信息安全专家体验防火长城

Marc Bevand 的 Blog:http://www.zorinaq.com/

信息安全专家体验防火长城


一位信息安全专业人员( Marc Bevand )首次访问了中国,获得了防火长城的第一手经验,发现防火长城能使用机器学习识别流量模式。他在博客上描述了他的经历: 他预加载了Google地图,结果发现由于中国的GPS飘移问题地图根本就没有用;为了绕过防火长城,他在宾馆首先设立了SSH隧道,在笔记本上连接美国 数据中心的一台服务器,结果前几分钟还行,后几分钟就出现了严重的丢包问题;他在服务器上运行一个Web代理,利用SSH端口重定向访问代理,结果和 SSH隧道相同,几分钟后又出现丢包;他改用TLS连接直接访问代理,这一次可以成功访问很长时间,但当他通过代理访问HTTPS网站后防火长城探测到了 代理,丢包问题再次出现;他进行了一番实验,确认防火长城能利用TLS的旁路泄露观察数据包特征识别代理服务器,于是利用随机填充修改请求和握手中间发送 的数据包大小,结果显示可以正常浏览被审查的HTTP和HTTPS网站,关闭随机填充丢包问题则会立即再现。作者称,防火长城现在利用了机器学习算法自动 的学习、发现和屏蔽VPN和代理。他在手机上使用了一个中国可以正常访问的VPN服务ExpressVPN,发现它的Root CA证书使用的是1024位RSA算法加密,而如此长度的RSA密钥中国政府有能力破解,他怀疑中国政府已经这么做了,并利用破解的密钥监视部分用户。他 疑惑为什么中国没有屏蔽ExpressVPN.
---------
英文原文:

My Experience With the Great Firewall of China


When I recently visited China for the first time, as an InfoSec professional I was very curious to finally be able to poke at the Great Firewall of China with my own hands to see how it works and how easy it is evade. In short I was surprised by:
  • Its high level of sophistication such as its ability to exploit side-channel leaks in TLS (I have evidence it can detect the "TLS within TLS" characteristic of secure web proxies)
  • How poorly simple Unix computer security tools fared to evade it
  • 1 of the top 3 commercial VPN providers uses RSA keys so short (1024 bits!) that the Chinese government could factor them

Why evade the GFW?

Most westerners who visit China have a perfectly legitimate reason for evading the GFW: it blocks all Google services. That means no Gmail to access your airline e-ticket, no Hangouts to stay in touch with your family, no Maps to find your hotel, no Drive to access your itinerary document. This was my primary need for evading it.
Before visiting China I prepared myself a bit. On my phone I pinned documents in Drive to access them offline. In Maps I preloaded the locations I was going to visit by zooming in on them to load all the streets and points of interest nearby—the new offline Google Maps feature did not exist at the time. But Maps turned out to be almost unusable anyway: my GPS position was always offset by hundreds of meters from its true location due to the China GPS shift problem. (Google could fix it by using WGS-84 coordinates for their Chinese maps; why have they not done it already?)

Idea 1

So I arrived at my hotel in Beijing, tried to load google.com, and it errored out due to TCP RSTs sent by the GFW to block the connection. My first idea was to set up an SSH SOCKS tunnel (ssh -D) from my laptop to a server colocated in a datacenter in the USA, and I configured Chrome to use it:
$ chrome --proxy-server=socks://127.0.0.1:1080
$ ssh -D 1080 my-server
This worked fine for a few minutes. Then severe packet loss, around 70-80%, started occuring. Restarting the tunnel fixed it for a few minutes. But the packet loss eventually returned, affecting all traffic to my server no matter what type: SSH connections, or simple pings. It is not clear why the GFW drops packets. Some say it is to intentionally disrupt VPNs without outright blocking them. Or perhaps the GFW selectively redirects some suspicious packets to a subsystem for deeper inspection and this subsystem is overloaded and unable to cope with all the traffic.
Whatever the reason is, this packet loss made the SOCKS tunnel too slow and unreliable to be usable.

Idea 2

I tried a slightly different approach: running a web proxy (polipo) on my server listening on 127.0.0.1:$port and using SSH port redirection (ssh -L) to access it:
$ google-chrome --proxy-server=127.0.0.1:1234
$ ssh -L 1234:127.0.0.1:$port my-server
Again, this worked fine for a few minutes, but the packet loss returned. The GFW is clearly able to detect and interfere with SSH carrying bulk traffic.

Idea 3

Instead of SSH, why not access the proxy over a TLS connection? This should make it harder for the GFW to detect it since the traffic patterns of a user accessing a proxy over TLS are close to the traffic patterns of a user accessing an HTTPS site.
Making a web proxy available over TLS is what we call a secure web proxy, which is not common to the point that most browsers do not support it. So I used stunnel to wrap the proxy connection in TLS and to expose an unencrypted proxy endpoint to my laptop.
Of course I had to protect the setup with authentication. But I could not use standard proxy authentication because if the GFW actively connects to it, the "407 Proxy Authentication Required" error would expose it. And I did not want to use TLS client authentication because this might raise a small red flag that this might some sort of TLS-based VPN. Again I needed to make my secure web proxy endpoint look like and act like a regular HTTPS endpoint as much as possible.
So I wrote a small relay script in Python which listens on $port_a and forwards all connections to another endpoint $host_b:$port_b. The relay can run in 2 modes. In "client mode" (on my laptop) it inserts a 128-bit secret key as the first 16 bytes sent through the connection. In "server mode" (on my server) it verifies this key, and only forwards the connection if the key is valid, or else the data is discarded and dropped which makes it look like a non-responsive web server.
The setup looked like this on my laptop:
  • Browser configured to use proxy on 127.0.0.1:5000
  • Relay listens on 127.0.0.1:5000, inserts the key, and forwards to 127.0.0.1:5001
  • stunnel client listens on 127.0.0.1:5001, wraps the connection in TLS, and forwards to my-server:5002
And on the server:
  • stunnel server listens on my-server:5002, unwraps the connection, and forwards to 127.0.0.1:5003
  • Relay listens on 127.0.0.1:5003, verifies the key (removes it), and forwards to 127.0.0.1:5004
  • Web proxy listens on 127.0.0.1:5004
Result? This worked well! No packet loss, no problems whatsoever.
What does the GFW see on the wire when browsing an HTTP site through the proxy? A packet capture of "curl --head http://www.google.com" shows this on my system (size of TLS records shown in parentheses):
  1. C: TCP SYN to proxy
  2. S: TCP SYN+ACK reply from proxy
  3. C: TCP ACK
  4. C: ClientHello (86 bytes)
  5. S: ServerHello, Certificate, ServerHelloDone (67+858+9 bytes)
  6. C: ClientKeyExchange, ChangeCipherSpec, encrypted Finished (267+6+53 bytes)
  7. S: NewSessionTicket, ChangeCipherSpec, encrypted Finished (207+6+53 bytes)
  8. C: encrypted ApplicationData #1 (37+197 bytes)
  9. S: encrypted ApplicationData #2 (37+693 bytes)
(Side note: ApplicationData records are split in 2 records, the first one of 37 bytes, because of the 1/n-1 record splitting workaround for BEAST.)
There is a TCP handshake, a TLS handshake, an encrypted ApplicationData record sent by the client of about 200 bytes (the HTTP request), and an encrypted ApplicationData record sent by the server of about 700 bytes (the HTTP response). In fact this TLS exchange and traffic pattern is similar to a non-proxied HTTPS connection, which is why the GFW fails to detect it as an evasion technique.
Unfortunately, as soon as I started browsing HTTPS sites through my proxy, the GFW detected it and impacted it with a high packet loss... How can it be?

Idea 4

When browsing an HTTPS site through a secure proxy there are 2 layers of TLS: the outer TLS connection to the proxy and the inner TLS connection to the site. I theorized that the GFW is able to guess that the encrypted ApplicationData records hide a proxy CONNECT request and another TLS handshake. Here is what a packet capture looks like for "curl --head https://www.google.com" through the proxy:
  1. C: TCP SYN to proxy
  2. S: TCP SYN+ACK reply from proxy
  3. C: TCP ACK
  4. C: ClientHello (86 bytes)
  5. S: ServerHello, Certificate, ServerHelloDone (67+858+9 bytes)
  6. C: ClientKeyExchange, ChangeCipherSpec, encrypted Finished (267+6+53 bytes)
  7. S: NewSessionTicket, ChangeCipherSpec, encrypted Finished (207+6+53 bytes)
  8. C: encrypted ApplicationData #1 (37+197 bytes)
  9. S: encrypted ApplicationData #2 (37+69 bytes)
  10. C: encrypted ApplicationData #3 (37+325 bytes)
  11. S: encrypted ApplicationData #4 (37+3557 bytes)
  12. C: encrypted ApplicationData #5 (37+165 bytes)
  13. S: encrypted ApplicationData #6 (37+85 bytes)
  14. C: encrypted ApplicationData #7 (37+149 bytes)
  15. S: encrypted ApplicationData #8 (37+853 bytes)
To the GFW, these 8 ApplicationData records could look like 4 pairs of HTTP requests and responses in a keep-alive connection. However as research has shown [5] [6], side-channel leaks in TLS can be exploited, for example by looking at packet sizes. Doing so, we can see that they indeed match the expected sizes of the messages exchanged during a CONNECT request and a TLS handshake:
  1. C: encrypted ApplicationData #1 (37+197 bytes):
    "CONNECT www.google.com:443 HTTP/1.1\r\nHost:... \r\nUser-Agent:... \r\n\r\n" which is typically 200-300 bytes
  2. S: encrypted ApplicationData #2 (37+69 bytes):
    35-byte "HTTP/1.1 200 Tunnel established\r\n\r\n" proxy response. But with 1/n-1 record splitting, a 20-byte SHA-1 MAC per record (my stunnel was using the AES128-SHA cipher suite), padding to align with a 16-byte AES block, and 5 bytes of TLS record header, this translates exactly to a 37-byte and 69-byte record
  3. C: encrypted ApplicationData #3 (37+325 bytes):
    ClientHello which is typically 200-300 bytes if it advertises dozens of cipher suites (you may notice the ClientHello in the outer TLS connection is only 86 bytes but that is because my stunnel instances were configured to only allow 1 cipher suite)
  4. S: encrypted ApplicationData #4 (37+3557 bytes):
    ServerHello, Certificate, optional ServerKeyExchange, ServerHelloDone, which are typically 1000-4000 bytes combined (space mostly used by the certificate and optional certificate chains)
  5. C: encrypted ApplicationData #5 (37+165 bytes):
    ClientKeyExchange, ChangeCipherSpec, encrypted Finished, which are typically 200-300 bytes combined
  6. S: encrypted ApplicationData #6 (37+85 bytes):
    optional NewSessionTicket, ChangeCipherSpec, encrypted Finished, which are typically 100-300 bytes combined
  7. C: encrypted ApplicationData #7 (37+149 bytes):
    HTTP request
  8. S: encrypted ApplicationData #8 (37+853 bytes):
    HTTP response
Specifically, if ApplicationData #2 is very short (it is extremely rare to see an HTTP reply shorter than "HTTP/1.1 200 Tunnel established"), and if ApplicationData #4 is around 1-4kB (certificates + certificate chain), and if ApplicationData #6 is less than 300 bytes (HTTP responses this small are less rare but still uncommon), then the probability of that exchange hiding a CONNECT request and TLS handshake is high.
To verify my theory that the GFW exploits these side-channel leaks, I modified the relay script to pad each relayed data block smaller than 1500 bytes to a random length between 1000 and 1500 bytes:
if len_pkt < 1000:
  len_pad = randint(1000 - len_pkt, 1500 - len_pkt)
else:
  len_pad = randint(0, 1500 - len_pkt)
Result? This worked very well! With random padding I was able to browse normally censored HTTP and HTTPS sites for multiple hours without slowdown, without packet loss caused by the GFW.
It was pretty fascinating to test how reliable enabling/disabling the random padding was. I would disable it and the packet loss would return in minutes. I would re-enable it and I could browse for hours. I would disable it again, and the loss would reappear instantly.
I learned through this experience that the GFW is unmistakably able to exploit side-channel leaks in TLS, such as packet sizes in order to detect the "TLS within TLS" characteristic of secure web proxies. This really surprised me. I had no idea the GFW had reached this level of sophistication.
The next day, the packet loss returned. But if I simply used a different port number for the proxy, everything would continue to work fine for another day or so. I think this time the GFW was not blocking me based on side-channel leaks, but based on network metrics. 100% of the network traffic to/from my server crossing the Chinese border was to my public IP in China, so the GFW probably learned my TCP endpoint was likely used as a private VPN, as opposed to being a public HTTPS site accessed by many client IPs.

GFW uses machine learning

None of the information above is new to those familiar with the GFW. It is only after I reached this point in my tests that I did some deeper reading and learned that the GFW uses machine learning algorithms to learn, discover, and block VPNs and proxies.
It all makes sense now: the GFW engineers do not even have to define explicit rules like I described above (if ApplicationData #2 is short, if ApplicationData #4 is around 1-4kB, etc). They train their models using various VPN and proxy setups, and the algorithms learns the characteristics of those connections to identify them automatically.

ExpressVPN

My proxy setup and custom relay script injecting random padding were running on my laptop which I could use at the hotel, and it worked very well. But I also needed a solution for my phone when out on the streets.
I used the commercial service ExpressVPN which seems to be 1 of the top 3 VPN service used to evade the GFW. It is simple and easy to configure: I installed their Android app and I was up and running in no time. ExpressVPN built their service on OpenVPN and have dozens of VPN servers located in many countries.
However I was not pleased when I saw that their OpenVPN root CA certificate RSA key size is only 1024 bits! Why, why, why? The Chinese government is one of the archetype "state-level adversaries" that crypto is supposed to protect us from. This ExpressVPN weakness has been reported and noted multiple times [1] [2].
It is believed that $10 million of specialized hardware can factor 1024-bit RSA keys [3] [4]. There is a high computing cost per key, but if I were China and could factor at least a few RSA keys, surely the root CA key of 1 of the top 3 VPN providers in the country would be one of my targets. Doing so would give them the ability to actively man-in-the-middle ExpressVPN connections and decrypt the traffic. It is possible that China is already doing so and spying on some (all?) ExpressVPN users.
Below is the current ExpressVPN root CA certificate with a 1024-bit RSA key, extracted from the OpenVPN configuration files they distribute to users. Its serial number is 14845239355711109861 (0xce04e28a62cf3ae5) and it is valid from Jul 19 09:36:31 2009 GMT to Jul 17 09:36:31 2019 GMT:
-----BEGIN CERTIFICATE-----
MIIDeDCCAuGgAwIBAgIJAM4E4opizzrlMA0GCSqGSIb3DQEBBQUAMIGFMQswCQYD
VQQGEwJVUzELMAkGA1UECBMCQ0ExFTATBgNVBAcTDFNhbkZyYW5jaXNjbzEVMBMG
A1UEChMMRm9ydC1GdW5zdG9uMRgwFgYDVQQDEw9Gb3J0LUZ1bnN0b24gQ0ExITAf
BgkqhkiG9w0BCQEWEm1lQG15aG9zdC5teWRvbWFpbjAeFw0wOTA3MTkwOTM2MzFa
Fw0xOTA3MTcwOTM2MzFaMIGFMQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFTAT
BgNVBAcTDFNhbkZyYW5jaXNjbzEVMBMGA1UEChMMRm9ydC1GdW5zdG9uMRgwFgYD
VQQDEw9Gb3J0LUZ1bnN0b24gQ0ExITAfBgkqhkiG9w0BCQEWEm1lQG15aG9zdC5t
eWRvbWFpbjCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAyN2QZ9DRRyGsM2/4
lrf/2/6MQ7RQkD34HeNm73/PiyCg8KM5pmZONfZvlKYPjn5GQVb7AdkgxGCkTtRa
KGflBwWlPVS716jD+G92McGXjrjVCNdqOADMZdGG69nryX15IAqOqsfeR4vouEra
UoW9zTibd0rKO6cGbKcfkjoICzkCAwEAAaOB7TCB6jAdBgNVHQ4EFgQU0I63Uy/Y
ejRdgNARuAef2r07VDEwgboGA1UdIwSBsjCBr4AU0I63Uy/YejRdgNARuAef2r07
VDGhgYukgYgwgYUxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEVMBMGA1UEBxMM
U2FuRnJhbmNpc2NvMRUwEwYDVQQKEwxGb3J0LUZ1bnN0b24xGDAWBgNVBAMTD0Zv
cnQtRnVuc3RvbiBDQTEhMB8GCSqGSIb3DQEJARYSbWVAbXlob3N0Lm15ZG9tYWlu
ggkAzgTiimLPOuUwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQUFAAOBgQBTRzCa
WuEimYpjcTSCp8NawUGWetPCeibdOfDinpcIGrmjorxC5RETSAVhQD0i4CaHP7Fu
vQmBYAIqgSByLAIz+oSj0Vw820pNwA3EGQB8aT/L6QCSuA5NqG6NZS0No8HlICzZ
KGa+SZvptdmGjhnD1czi+21knEg17ZtktvcQ0w==
-----END CERTIFICATE-----
Also, I am confused by the fact the Chinese government allows this well-known VPN provider (and others) to operate freely in the country. They could very easily deploy low-tech ways to block access to the ExpressVPN service, for example by filtering or redirecting the DNS records of their VPN hosts, which is something they do to block certain website hosts. But they do not do it to block ExpressVPN, why? One possible explanation could be that the Chinese government did factor the ExpressVPN root CA key and does spy on the network traffic of their users, but they prefer to not interfere with ExpressVPN in order to give their users a false sense of privacy. If China blocked the service, users would migrate to other more secure VPN services, and China would lose a SIGINT ability.
Many countries other than China have internet censorship capabilities that rival or surpass the capabilities of the GFW. I would be curious to poke at them too.
[Edit: I am well aware of some open source VPN tools that work quite well in China: ShadowVPN / ShadowSocks (whose developer was recently pressured by Chinese authorities to empty the GitHub repository), Obfsproxy (wiki), Softether, etc. My goal was to find out by trial and error the minimum amount of tricks needed to evade the GFW. And I found that a secure web proxy with packet size randomizaton (idea 4) worked perfectly well to evade it.]

from http://blog.zorinaq.com/?e=81 
------------------
中文翻译:   

最近我第一次去了中国。作为一个信息安全专业人士,终于能够亲手摸一下中国的 GFW, 看看它是如何工作的以及翻墙的难度如何,对此我充满了好奇。简短地说,我惊讶于如下方面:
  • 它的成熟度很高。比如能够利用 TLS 的侧信道泄漏 (我有关于它能够检测到 安全 web 代理中的 ”TLS 中的 TLS“ 特性的证据)
  • 用一些简单的 Unix 安全工具就能翻墙。
  • 在 中国排名最前的 3 个商业 VPN 供应商里,有 2 个 使用了太短的 RSA 密钥 (1024 位)。中国政府能够把这么短的密钥分解因数出来。(2016-02-15 更新:在接到我的报告之后,这两个供应商停止了短密钥的使用,现在它们用 2048 或者 4096 位的了)

为什么要翻墙?

很多到中国的西方人有非常合理的理由需要翻墙。GFW 阻止了所有的 Google 服务。这意味着没有 Gmail 来访问你的电子机票,没有 Hangouts 来跟家庭联络,没有 Maps 来寻找旅馆,没有 Drive 来存取旅行计划文档。这就是我翻墙的主要需求。
在去中国前我做了一些准备。在手机上我把 Drive 应用里的文档存下来以便离线访问。在 Maps 应用里我把我要去的地方的地图预先加载好,通过放大这些地点我把附近所有的街道和景点的地图都加载到手机上了 -那时候 Google 地图还没有新的离线功能。不过最终 Maps 还是几乎没什么用:我的 GPS 位置总是与真正的地点偏离几百米远。这是因为中国的 GPS 移位问题。(Google 完全可以在它的中国地图上使用 WGS-84 坐标来解决这个问题,可它为什么至今还没做呢?)

想法 1

就这样我到了北京的旅馆里,试图访问 google.com。访问出错了,GFW 发送了 TCP RST 包阻止了连接。我第一个想法是建立一个 SSH Socks tunnel (ssh -D),从我的笔记本连到美国一个数据中心的服务器上。我用了如下命令来配置 Chrome:
$ google-chrome --proxy-server=socks://127.0.0.1:1080
$
 ssh -D 1080 my-server

这种方法工作了几分钟。然后就开始出现严重的丢包,丢包率大概有 60% 到 70%。重启这个  tunnel 又能工作几分钟,然后丢包又出现了。这种丢包会影响到达服务器的所有流量,跟数据种类无关,无论是 SSH 连接还是简单的 ping 都一样。还不清楚 GFW 为什么丢弃数据包。有人说那是在不彻底阻断 VPN 的情况下,对 VPN 进行的干扰。也可能使 GFW 选择了一些可疑的数据包,将它们导向一个子系统来进行深度检测,而这个子系统过载了,没法处理全部流量。
不管原因是什么,这样的掉包使得 SOCKS tunnel 变得又慢又不稳定,没法用。

想法 2 

我尝试了一个稍有点不同的方法:在服务器上运行一个 web 代理 (polipo),对 127.0.0.1:$port 进行侦听,然后用 SSH 端口重定向 (ssh -L) 去访问。
$ google-chrome --proxy-server=127.0.0.1:1234
$ ssh -L 1234:127.0.0.1:$port my-server

一样地,这种方法正常工作了几分钟,然后又出现了丢包。GFW 能够清楚地检测和干扰哪些携带了大量数据的 SSH 通信。

想法 3 

可以用一个在 TLS 连接之上的代理来代替 SSH。这样 GFW 应该难以检测了,因为通过 TLS 连接来访问代理的流量模式跟访问一个 https 的站点是很接近的。
我们把 TLS 连接之上的 web 代理称为安全 web 代理。大多数浏览器还不支持,所以它并不普遍。我用了 stunnel 将一个代理连接包装到 TLS 里,并为我的笔记本打开了一个未加密的代理端口。
当然,我需要用身份认证来保护这个配置。不过我不能用标准的代理认证方法,因为如果 GFW 连到它上面,一个 “407 需要代理认证” 就会将它暴露。我也不想用 TLS 客户端认证,因为那样很可能会表示出这可能是某种基于 TLS的 VPN。我仍然尽可能地让我的安全 web 代理看上去像,活动起来也像一个常规的 HTTPS 端点。
我用 Python 写 了个短小的中继脚本,它在 $port_a 上侦听并将所有的连接转发到另外一个端点 $host_b:$port_b. 这个中继可以运行在两种模式下。在 “客户端模式” 下(在我的笔记本上)它会将一个 128 位的密钥作为发送到连接的头 16 个字节。在 “服务器模式“ 下(运行在我的服务器上)它会对这个密钥进行校验,只有校验成功才会转发连接,否则就丢弃数据,像一个失去响应的 web 服务器一样。
在我的笔记本上的配置如下:
  • 配置浏览器使用 127.0.0.1:5000 上的代理
  • 中继脚本在 127.0.0.1:5000 侦听, 插入密钥,并转发到 127.0.0.1:5001
  • stunnel 客户端在 127.0.0.1:5001 上侦听, 将连接包装进 TLS,并转发到 my-server:5002
服务器端:
  • stunnel 服务端 my-server:5002 上侦听, 解出实际连接,并转发到127.0.0.1:5003
  • 中继脚本在 127.0.0.1:5003 上侦听, 校验密钥(并移除之),然后转发到 127.0.0.1:5004
  • Web 代理在 127.0.0.1:5004 上侦听
结果如何?工作得很好!没有包丢失,没有任何问题。
在通过这个代理浏览 HTTP 站点时 GFW 会在线路上看到什么呢?在我的系统上对 "curl --head http://www.google.com" 进行抓包结果显示如下 (TLS 记录的大小显示在括号里):
  1. C: TCP SYN to proxy
  2. S: TCP SYN+ACK reply from proxy
  3. C: TCP ACK
  4. C: ClientHello (86 bytes)
  5. S: ServerHello, Certificate, ServerHelloDone (67+858+9 bytes)
  6. C: ClientKeyExchange, ChangeCipherSpec, encrypted Finished (267+6+53 bytes)
  7. S: NewSessionTicket, ChangeCipherSpec, encrypted Finished (207+6+53 bytes)
  8. C: encrypted ApplicationData #1 (37+197 bytes)
  9. S: encrypted ApplicationData #2 (37+693 bytes)
(旁注:ApplicationData 记录分成了两部分,第一部分 37 字节,这是为应对 BEAST 而做的 1/n - 1 记录分割)
这里有一个 TCP 握手,一个 TLS 握手,一个从客户端发送的,大约 200 字节的加密 ApplicationData 记录(HTTP 请求),以及一个从服务端发送的,大约 700 字节的加密 ApplicationData 记录 (HTTP 回应)。这个 TLS 交换和流量模式类似于一个未经代理的 HTTPS 连接,因此 GFW 无法检测到它是一种翻墙技术。
不幸的是,一旦我开始通过我的代理浏览 HTTPS 站点,GFW 就能检测到了并应之以大量的丢包... 它是怎么检测到的呢?
想法 4
通过一个安全代理浏览 HTTPS 站点会有两层 TLS:外层 TLS 连接到代理,内层连接到站点。我推断 GFW 能够猜测出加密的 ApplicationData 里隐藏了一个代理 CONNECT 请求和另一个 TLS 握手。如下为通过代理运行 "curl --head https://www.google.com" 是的抓包结果:
  1. C: TCP SYN to proxy
  2. S: TCP SYN+ACK reply from proxy
  3. C: TCP ACK
  4. C: ClientHello (86 bytes)
  5. S: ServerHello, Certificate, ServerHelloDone (67+858+9 bytes)
  6. C: ClientKeyExchange, ChangeCipherSpec, encrypted Finished (267+6+53 bytes)
  7. S: NewSessionTicket, ChangeCipherSpec, encrypted Finished (207+6+53 bytes)
  8. C: encrypted ApplicationData #1 (37+197 bytes)
  9. S: encrypted ApplicationData #2 (37+69 bytes)
  10. C: encrypted ApplicationData #3 (37+325 bytes)
  11. S: encrypted ApplicationData #4 (37+3557 bytes)
  12. C: encrypted ApplicationData #5 (37+165 bytes)
  13. S: encrypted ApplicationData #6 (37+85 bytes)
  14. C: encrypted ApplicationData #7 (37+149 bytes)
  15. S: encrypted ApplicationData #8 (37+853 bytes)
对 GFW 来说,这 8 个 ApplicationData 记录看上去像 keep-alive 连接上的 4 对 HTTP 请求和回应。不过,根据 [5][6] 的研究所显示的,TLS 的侧信道泄漏是可以被利用的,例如用来观察包大小。试着做一下,我们能看到它们的大小符合一个 CONNECT 请求和一个 TLS 握手交换的信息大小:
  1. C: encrypted ApplicationData #1 (37+197 bytes):
    "CONNECT www.google.com:443 HTTP/1.1\r\nHost:... \r\nUser-Agent:... \r\n\r\n" 通常是 200-300 bytes
  2. S: encrypted ApplicationData #2 (37+69 bytes):
    35-byte "HTTP/1.1 200 Tunnel established\r\n\r\n" 代理回应。在 1/n-1 记录分割下, 每记录 20-byte SHA-1 MAC (我的 stunnel 使用 AES128-SHA 密码组), 填充到 16-byte AES 块边界, 以及 5 bytes 的 TLS 记录头, 这正好转换为一个 37-byte 加 69-byte 的记录
  3. C: encrypted ApplicationData #3 (37+325 bytes):
    ClientHello 如果宣告一堆密码组的话一般是 200-300 bytes (可能你注意到了外层的 ClientHello 只有 86 bytes 但这事因为我的 stunnel 实例被配置成了仅允许一种密码组)
  4. S: encrypted ApplicationData #4 (37+3557 bytes):
    ServerHello, Certificate, 可选的 ServerKeyExchange, ServerHelloDone, 一般加起来 1000-4000 bytes (大部分内容是证书和可选的证书链)
  5. C: encrypted ApplicationData #5 (37+165 bytes):
    ClientKeyExchange, ChangeCipherSpec, encrypted Finished, 一般加起来 200-300 bytes
  6. S: encrypted ApplicationData #6 (37+85 bytes):
    optional NewSessionTicket, ChangeCipherSpec, encrypted Finished, 一般加起来 100-300 bytes
  7. C: encrypted ApplicationData #7 (37+149 bytes):
    HTTP request
  8. S: encrypted ApplicationData #8 (37+853 bytes):
    HTTP response
特别地,如果 ApplicationData #2 很短 (极少能见到比 "HTTP/1.1 200 Tunnel established" 更短的 HTTP 回应), 并且 ApplicationData #4 大概 1-4kB 左右(证书+证书链), 而 ApplicationData #6 小于 300 字节的话(这么小的 HTTP 回应稍微多一些但依然是不常见的),那么这个交换过程隐藏了一个 CONNECT 请求和 TLS 握手的可能性相当高。
为了验证我关于 GFW 利用了 TLS 侧信道泄漏的理论,我修改了我的脚本,把小于 1500 字节的数据包填充成一个 1000 到 1500 字节之间的随机长度。
if len_pkt < 1000:
  len_pad = randint(1000 - len_pkt, 1500 - len_pkt)
else:
  len_pad = randint(0, 1500 - len_pkt)
结果呢? 工作得很好!加上随机填充之后我可以数小时地正常浏览被屏蔽的 HTTP 和 HTTPS 站点而不变慢,也没有 GFW 引起的丢包。
对于测试一下允许/禁止随机填充有多可靠,我有极大的兴趣。禁止了之后几分钟内,丢包现象就回来了。允许之后能几个小时正常浏览。再次禁止填充,丢包马上出现。
从这里我知道了,GFW 能够正确地利用 TLS 侧信道泄漏的信息,例如包大小,来侦测 安全 web 代理的 “TLS 里的 TLS” 特点。这真的令我很惊讶。我没想到 GFW 已经发展得如此完善了。
第二天,丢包又回来了。但是如果我把代理换一个端口,就能正常工作一天左右。我认为这次 GFW 不是通过侧信道泄漏来阻止我,而是基于网络测量了。进出我的服务器的网络流量 100% 穿过中国的边界指向我在中国的公开 IP 地址,所以 GFW 可能认为我的 TCP 端点像是在当作私有的  VPN 使用,而不是一个被许多客户端 IP 地址访问的公开 HTTPS 站点。
GFW 使用了机器学习
上面的信息对于熟悉 GFW 的人来说没什么新的。这仅仅是当我测试并作了一些深入的观察之后发现的。我认为 GFW 使用了机器学习算法来学习,发现和阻隔 VPN 和代理。
这很有意义:GFW 工程师们甚至都不用像我在上面描述的那样显式定义规则(如果 ApplicationData #2 短,如果 ApplicationData #4 大概 1-4kB,等等)。它们用各种 VPN 和代理配置来训练他们的模型,算法则根据这些连接的特点来进行自动识别。
ExpressVPN
我用笔记本在旅馆里时,我运行我的代理配置和定制的插入随机填充的中继脚本,它们工作得很好。但当我出门的时候,我还需要在我的手机上有一个翻墙方案。
我用了 ExpressVPN 的商业服务。它是排名前三的翻墙 VPN 服务之一。它用起来简单,容易配置。我安装了他们的 Android 应用,立刻我就能运行了。ExpressVPN 将它们的服务建立在 OpenVPN 之上,在许多国家有大量的 VPN 服务器。
但是当我看到它们的 OpenVPN 根 CA 证书的 RSA 密钥只有 1024 位的时候,我高兴不起来了。为什么,为什么,为什么?中国政府是典型的 “国家级别的敌人” 之一,而加密据说能够保护我们。这个 ExpressVPN 的弱点已经被报告并提到很多次了 [1][2].
据信一千万美元的特殊硬件可以分解 1024 位的 RSA 密钥 [3][4]. 对单个密钥来说这是个很高的计算代价,但是如果我是中国并且能够分解一些 RSA 密钥,国内前 3 VPN 供应商中任何一个的 root CA 密钥必然是我的目标之一。这么做让他们有能力对 ExpressVPN 实施中间人攻击并解密通信内容。很可能中国已经这么干了并正在监视着一些(全部?)ExpressVPN 用户。 
下面是当前的 ExpressVPN root CA 证书,包含一个 1024-bit RSA key, 从他们分发给用户的 OpenVPN 配置中取得. 该证书的序列号是 14845239355711109861 (0xce04e28a62cf3ae5) , 有效期从 Jul 19 09:36:31 2009 GMT 到 Jul 17 09:36:31 2019 GMT:
-----BEGIN CERTIFICATE-----MIIDeDCCAuGgAwIBAgIJAM4E4opizzrlMA0GCSqGSIb3DQEBBQUAMIGFMQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFTATBgNVBAcTDFNhbkZyYW5jaXNjbzEVMBMGA1UEChMMRm9ydC1GdW5zdG9uMRgwFgYDVQQDEw9Gb3J0LUZ1bnN0b24gQ0ExITAfBgkqhkiG9w0BCQEWEm1lQG15aG9zdC5teWRvbWFpbjAeFw0wOTA3MTkwOTM2MzFaFw0xOTA3MTcwOTM2MzFaMIGFMQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExFTATBgNVBAcTDFNhbkZyYW5jaXNjbzEVMBMGA1UEChMMRm9ydC1GdW5zdG9uMRgwFgYDVQQDEw9Gb3J0LUZ1bnN0b24gQ0ExITAfBgkqhkiG9w0BCQEWEm1lQG15aG9zdC5teWRvbWFpbjCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAyN2QZ9DRRyGsM2/4lrf/2/6MQ7RQkD34HeNm73/PiyCg8KM5pmZONfZvlKYPjn5GQVb7AdkgxGCkTtRaKGflBwWlPVS716jD+G92McGXjrjVCNdqOADMZdGG69nryX15IAqOqsfeR4vouEraUoW9zTibd0rKO6cGbKcfkjoICzkCAwEAAaOB7TCB6jAdBgNVHQ4EFgQU0I63Uy/YejRdgNARuAef2r07VDEwgboGA1UdIwSBsjCBr4AU0I63Uy/YejRdgNARuAef2r07VDGhgYukgYgwgYUxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJDQTEVMBMGA1UEBxMMU2FuRnJhbmNpc2NvMRUwEwYDVQQKEwxGb3J0LUZ1bnN0b24xGDAWBgNVBAMTD0ZvcnQtRnVuc3RvbiBDQTEhMB8GCSqGSIb3DQEJARYSbWVAbXlob3N0Lm15ZG9tYWluggkAzgTiimLPOuUwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQUFAAOBgQBTRzCaWuEimYpjcTSCp8NawUGWetPCeibdOfDinpcIGrmjorxC5RETSAVhQD0i4CaHP7FuvQmBYAIqgSByLAIz+oSj0Vw820pNwA3EGQB8aT/L6QCSuA5NqG6NZS0No8HlICzZKGa+SZvptdmGjhnD1czi+21knEg17ZtktvcQ0w==-----END CERTIFICATE-----
另外,我也很疑惑中国政府居然允许这个众所周知的 VPN 供应商(以及另外一些)能随便在这个国家里营业。他们可以很简单地采用一些低技术的方式阻止对 ExpressVPN 服务的访问,比如过滤或者重定向他们的 VPN 主机的 DNS 记录,就像用来阻止一些 web 站点一样。但是他们没去阻止 ExpressVPN 啊,为什么呢?一个可能的解释是中国政府已经分解出了 ExpressVPN 根 CA 证书密钥,正在监视他们用户的网络流量,而不是干扰 ExpressVPN,从而让用户获得具有隐私的假象。如果中国阻断了这个服务,用户可能会转向其他更加安全的 VPN 服务,中国就失去了 SIGINT 的能力。
除了中国以外,许多国家有类似或者胜过 GFW 的互联网屏蔽能力。我也有很大的好奇心去碰一碰。
[更新: 我 知道有许多开源的 VPN 工具能在中国很好地工作: ShadowVPN / ShadowSocks (其开发者最近迫于中国政府的压力清空了她的 Github 库), Obfsproxy (wiki), Softether, 等等. 我的目标是通过试错来最小化翻墙所需的技巧. 我发现了一个安全 web 代理加上随机化包大小 (想法 4) 能很好地进行翻墙.]
[更新 2016-01-22: 关于 ExpressVPN 的弱 RSA 密钥,我联系了 ExpressVPN。他们回复道: "我们同意你说的问题很重要,你说得很对,这个问题在我们的列表中已经有段时间了。现在我们决定提高其优先级到下个月“. 另外我也听说另外一个在中国常见的 VPN 供应商 Astrill 也在用弱密钥.]
[更新 2016-01-23: Astrill 用的也是 OpenVPN. They 定义了 2 个根 CA (CN=ASCA, and CN=ASCA2). 第二个是 2048-bit 的, 但第一个只有 1024-bit. 这意味着一个活跃的中间人攻击通过恶意伪装成用 CN=ASCA 证书认证的 OpenVPN 服务器,可以截取和解密所有的 Astrill VPN 流量。 该证书的序列号是 10853689667623641679(0x96a00d3f5508e24f)  有效期从 Oct 6 16:58:51 2010 GMT 到 Oct 316:58:51 2020 GMT:
-----BEGIN CERTIFICATE-----MIIDDTCCAnagAwIBAgIJAJagDT9VCOJPMA0GCSqGSIb3DQEBBQUAMGMxCzAJBgNVBAYTAi4uMQswCQYDVQQIEwIuLjELMAkGA1UEBxMCLi4xCzAJBgNVBAoTAi4uMQswCQYDVQQLEwIuLjENMAsGA1UEAxMEQVNDQTERMA8GCSqGSIb3DQEJARYCLi4wHhcNMTAxMDA2MTY1ODUxWhcNMjAxMDAzMTY1ODUxWjBjMQswCQYDVQQGEwIuLjELMAkGA1UECBMCLi4xCzAJBgNVBAcTAi4uMQswCQYDVQQKEwIuLjELMAkGA1UECxMCLi4xDTALBgNVBAMTBEFTQ0ExETAPBgkqhkiG9w0BCQEWAi4uMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDH+Q9xZyUp0eI8dFilbISDQtACxkoxtFk8xS8dmYafI8kjvdcn6ow7Joey8n2G87dVgTOKhCGfVE8UNnJLze7TxifWk0ycEztzBjy0T7MsO8DuSz8NscQXIrSlXRNfCnhWECqFK0/ZhwJ1tZdDPedEXbqokbKnHCZVZa7lk0orbwIDAQABo4HIMIHFMB0GA1UdDgQWBBSCRD2bPGLS7EAqz+xZLyndXMa1nDCBlQYDVR0jBIGNMIGKgBSCRD2bPGLS7EAqz+xZLyndXMa1nKFnpGUwYzELMAkGA1UEBhMCLi4xCzAJBgNVBAgTAi4uMQswCQYDVQQHEwIuLjELMAkGA1UEChMCLi4xCzAJBgNVBAsTAi4uMQ0wCwYDVQQDEwRBU0NBMREwDwYJKoZIhvcNAQkBFgIuLoIJAJagDT9VCOJPMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADgYEAhPkzNaNtiIK9EFfkfohiRiF82MoXChzj5E0XRV6j+CJoFN36zRVTZuvWwphMo0C+Dnq4G01IJ8fdX71UlbhCTmZQy3snIV4WbA82DueluQ0QQwFJ251tU/dXQaQm7ZDd3waBI8ot1eyKePiAye8E8H72FQE3diFQWYPHrBq7unM=-----END CERTIFICATE-----
我联系了 Astrill 的技术支持, 等着看他们怎么说.]
[更新 2016-01-25: Astrill 的首席安全官以个人名义给我发了 Email, 对我的报告表示感谢, 并说 "今天 1024bit 证书 (ASCA) 已经从 PKI 中移除,所有客户端都需要用 2048bit 证书". 喔嗬!]
[更新 2016-01-26: ExpressVPN 和 Astrill 贴出了官方宣告.]
[更新 2016-02-15: ExpressVPN 告诉我他们已经完成了生机。CA 密钥从 1024 位升到了 4096 位. 耶!]