Total Pageviews

Sunday, 17 February 2013

柏克莱大学研究:关于GFW审查负载量的评估报告

http://ignum.dl.sourceforge.net/project/gfwpaper/pearce_kantola_widmer_censorship_volume_cs294_79.pdf

http://sourceforge.net/projects/gfwpaper/files/pearce_kantola_widmer_censorship_volume_cs294_79.pdf/download?use_mirror=ignum&r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fgfwpaper%2Ffiles%2F%3Fsource%3Dnavbar&use_mirror=ignum

http://sourceforge.net/projects/gfwpaper/files/?source=navbar
--------------------------------------
采用数据泄露方法推断GFW审查负载量(部分翻译)
作者:Paul Pearce, David Kantola and Krsna Widmer

摘要:
中 国的长城防火墙(GFW)已被确认为在世界上是最先进的国家级互联网审查制度[1]。在识别机制、触发器、并测量其有效性方面已经有很多研究工作 [1-5],但还没有试图估计GFW的审查量或验证其负载分配方面的研究。本文中,我们通过分析GFW审查的成百上千的IP连接成千上万的审查数据的泄漏 来执行此评估。分析表明:每小时有超过4900万的伪造SYN-ACK报文通过国际路由进入中国。

1、引言:
中国防火墙(GFW)是一项由中国政府资助的复杂审查机制,目的在于防止中国公民访问的特定内容[1,4,6,7]。这些机制部署在全国各地在不同地点[6],并使用了多种技术审查互联网内进出中国的网络流量[6,7,1,4]。

GFW 网络“军火库”所使用的工具有:IP黑名单,DNS劫持,深度包检测(DPI),TCP复位(RST)注入,通过伪造的SYN + ACK包的TCP会话劫持[4,7,8] 等。在本文中,我们专注于基于TCP数据包注入审查。有趣的是,GFW进行审查活动的同时也泄露了其内部一定量的运作状态。具体而言,注入的RST + ACK和SYN + ACK数据包所包含可观察的字段结构超出字段本身的意义。

本文工作的目的有两个,首先,我们希望列举出这些注入数据包泄露的信息。其次,借助这些被泄露信息种类,我们开始估计受GFW审查连接的容量。

为 了实现这些目标,我们从中国大陆以内和以外的IP地址使用软件触发审查系统来进行测试研究。作为本测量研究的一部分,我们从多个IP地址收集一天的痕迹数 据,以及从508不同的相邻IP地址收集1小时的痕迹数据。然后,我们详细的分析在网络上收集到的数据,确定什么样的信息已被泄露。

从GFW泄露的信息,我们给出昼夜模式的审查量,并提供初步估计GFW的实际容量。我们还了解到假定的所谓GFW内部负载均衡结构的一些性质。初步结果表明,被审查的连接流经一个国际网关,数量可能已高达49,000,000连接/小时。

.....(中间略)

7、结论:
中 国防火墙(GFW)泄露的信息作为GFW审查机制的一部分,我们能够利用这些信息学到很多GFW自身的行为和了解其运行机制。从这个泄露的信息,我们能够 观察GFW日负载模式和嵌入式计数器。我们使用基本的计数结合多个有利位置、线性拟合和熵近似,已经展开了计算GFW审查的连接数量所需要的工作。这项工 作标志着我们在做出更大的努力估算国家层面的审查系统方面迈出了第一步。
------------------------------------------

Abstract
The Great Firewall of China (GFW) has been recognized as one of the most sophisticated state-level Internet censorship system in the world [1]. There has been much work on identifying its mechanisms,triggers, and measuring its effectiveness [1, 2, 3, 4, 5], but no work attempting to estimate the volume of censorship performed by the GFW or trying to identify how that load is distributed. In this work we begin to formulate an estimation of the volume of connections censored by the GFW.We perform this estimation through the analysis of information leaked by the GFW from thousands of censored connections across hundreds of IP addresses. Our analysis shows that over 49,000,000 forged SYN-ACK packets may be generated per hour across an international route into China.

1 Introduction
The Great Firewall of China (GFW) is the name given to a collection of sophisticated state sponsored censorship mechanisms deployed throughout China for the purpose of preventing Chinese citizens from accessing specific content [1, 4, 6, 7]. These mechanisms are deployed at a various locations throughout the country [6], and use a variety of techniques to censor Internet traffic flowing within and through China [6, 7, 1, 4].

Among the arsenal of tools used by the GFW are IP blacklisting, DNS hi-jacking, deep packet inspection (DPI), TCP Reset (RST) injection, and TCP session hi-jacking through forged SYN+ACKs [4, 7, 8]. In this work we focus on the TCP packet injection based censorship. Interestingly, the GFW leaks some amount of internal state as part of it’s censorship activities. Specifically, the injected RST+ACK and SYN+ACK packets contain fields with observable structure beyond the meaning of the fields themselves.

The goal of this work is two-fold. First, we wish to to enumerate the information leaked by these injected packets. Second, leveraging the kind of information leaked we want to begin to estimate the volume of connections censored by the GFW.

To achieve these goals we perform a measurement study using software designed to trigger censorship from IP addresses based both within and outside of China. As part of this measurement study we collect day-long traces from several IP addresses, as well as hour-long traces from 508 different neighboring IP addresses. We then perform detailed analysis on the network traces collected to quantify what information has been leaked.

Using the information leaked from the GFW we show diurnal patterns in the volume of censorship, as well as provide preliminary estimates to the actual volume itself. We also learn some properties of what is presumed to be the internal load-balancing structure of the GFW. Our preliminary results show that the number of censored connections flowing through one international gateway may have been as high as 49,000,000 connections an hour.

....

7 Conclusion
Utilizing information leaked by the Great Firewall of China (GFW) as part of it’s censorship mechanisms we are able to learn a great deal about the behavior of the GFW itself. From this leaked information we are able to observe diurnal load patterns and embedded counters. Using basic counting combined with multiple vantage points, linear fitting, and entropy approximations we’ve begun the work necessary to calculate the number of connections censored by the GFW. This work marks the first steps in a larger effort to estimate censorship at a country-wide scale.

References
[1] John-Paul Verkamp and Minaxi Gupta. Inferring Mechanics of Web Censorship Around the World.
In Free and Open Communications on the Internet, Bellevue, WA, USA, 2012. USENIX Association.
[2] J. R. Crandall, D. Zinn, M. Byrd, E. Barr, and R. East. ConceptDoppler: a weather tracker for
Internet censorship. In Proceedings of the 14th ACM Conference on Computer and Communications
Security (CCS), pages 352–365, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-703-2. doi:
10.1145/1315245.1315290. URL http://www.csd.uoc.gr/~hy558/papers/conceptdoppler.pdf.
[3] A. Sfakianakis, E. Athanasopoulos, and S. Ioannidis. CensMon: A web censorship monitor. 2011.
URL http://www.usenix.org/event/foci11/tech/final_files/Sfakianakis.pdf.
[4] J. C. Park and J. R. Crandall. Empirical study of a national-scale distributed intrusion detection
system: Backbone-level filtering of HTML responses in China. In Proceedings of the 30th IEEE Inter-
national Conference on Distributed Computing Systems (ICDCS), pages 315–326. IEEE, 2010. URL
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.191.206&rep=rep1&type=pdf.
[5] Antonio M. Espinoza and Jedidiah R. Crandall. Automated Named Entity Extraction for Tracking
Censorship of Current Events. In USENIX Workshop on Free and Open Communications on the
Internet, San Francisco, CA, USA, 2011. USENIX Association.
[6] X. Xu, Z. Mao, and J. Halderman. Internet censorship in China: Where does the filtering occur? In
Neil Spring and George Riley, editors, Passive and Active Measurement, volume 6579 of Lecture Notes
in Computer Science, pages 133–142. Springer Berlin / Heidelberg, 2011. ISBN 978-3-642-19259-3.
URL http://www.cs.umich.edu/~zmao/Papers/china-censorship-pam11.pdf.
[7] Introduction to project west-chamber. URL https://scholarzhang.googlecode.com/svn/trunk/
west-chamber/README.html.
[8] Graham Lowe, Patrick Winters, and Michael L. Marcus. The Great DNS Wall of China. Technical
report, New York University, 2007.
[9] Intrusion defense system of evaluation and problem. URL http://www.chinagfw.org/2009/09/gfw_
21.html.
[10] VI Levenshtein. On bounds for packings in n-dimensional euclidean space. In Soviet Math. Dokl,
volume 20, pages 417–421, 1979.
[11] Anne Chao and Tsung-Jen Shen. Nonparametric estimation of shannons index of diversity when
there are unseen species in sample. Environmental and Ecological Statistics, 10:429–443, 2003. ISSN
1352-8505. doi: 10.1023/A:1026096204727. URL http://dx.doi.org/10.1023/A%3A1026096204727.
[12] Jean Hausser and Korbinian Strimmer. Entropy inference and the james-stein estimator, with appli-
cation to nonlinear gene association networks. J. Mach. Learn. Res., 10:1469–1484, December 2009.
ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1577069.1755833.
[13] Ken Bogart and Cliff Stein. Discrete Math in Computer Science, pages 246–247.

原文
 GFW Paper, http://sourceforge.net/projects/gfwpaper/files/?source=navbar 
http://ftp2.jp.debian.org/pub/sourceforge/g/gf/gfwpaper/pearce_kantola_widmer_censorship_volume_cs294_79.pdf 

另补几篇:

1 Alberto Dainotti, et al, Analysis of Country-wide Internet Outages Caused by Censorship, IMC’11, November 2–4, 2011, Berlin, Germany. http://conferences.sigcomm.org/imc/2011/docs/p1.pdf

2 Richard Clayton, Steven J. Murdoch, and Robert N. M. Watson, Ignoring the Great Firewall of China, http://www.cl.cam.ac.uk/~rnc1/ignoring.pdf

3 Xueyang Xu, Z. Morley Mao, and J. Alex Halderman, Internet Censorship in China: Where Does the Filtering Occur? Proceeding PAM'11 Proceedings of the 12th international conference on Passive and active measurement, Pages 133-142, Springer-Verlag Berlin, Heidelberg ?2011 ,
http://web.eecs.umich.edu/~zmao/Papers/china-censorship-pam11.pdf