[ Overview
- Methodology - Analysis & Summary
Statistics - Conclusions - Technical
Appendix ]
[ Specific Blocked Sites - Highlights - Analysis by Google Keyword (details | chart) ]
[ Specific Blocked Sites - Highlights - Analysis by Google Keyword (details | chart) ]
Abstract
The authors are collecting
data on the methods, scope, and depth of selective barriers to Internet
access through Chinese networks. Tests from May 2002 through November
2002 indicate at least four distinct and independently operable methods
of Internet filtering, with a documentable leap in filtering sophistication
beginning in September 2002. The authors document thousands of sites rendered
inaccessible using the most common and longstanding filtering practice.
These sites were found through connections to the Internet by telephone
dial-up link and through proxy servers in China. Once so connected, the
authors attempted to access approximately two hundred thousand web sites.
The authors tracked 19,032 web sites that were inaccessible from China
on multiple occasions while remaining accessible from the United States.
Such sites contained information about news, politics, health, commerce,
and entertainment. See highlights of blocked
pages. The authors conclude (1) that the Chinese government maintains
an active interest in preventing users from viewing certain web content,
both sexually explicit and non-sexually explicit; (2) that it has managed
to configure overlapping nationwide systems to effectively -- if
at times irregularly -- block such content from users who do not
regularly seek to circumvent such blocking; and (3) that such blocking
systems are becoming more refined even as they are likely more labor-
and technology-intensive to maintain than cruder predecessors. |
Related Projects
|
As with most filtering regimes, whether implemented at the client, ISP, or government level, no list is made available of the sites blocked or of the methodologies used to block them. Further, while the government-connected Internet Society of China (not a chapter of the international Internet Society) has asked Internet service providers and content creators to sign a pledge including self-filtering, few official statements document the existence of government-maintained web filtering, much less the criteria employed and thresholds necessary to elicit a block. We therefore sought to investigate the growing set of methods by which Internet filtering is accomplished, and to collect and distribute a list of blocked sites and pages -- a list that is large in absolute terms even if small relative to the size of the Internet and to the total amount of blocked content, and a list that is diverse even if not perfectly representative of all blocked content. Such a list allows us and others to begin to assess the nature and scope of filtering within China, with particular attention to non-sexually explicit web sites rendered inaccessible there.
Having requested some 204,012 distinct web sites, we found more than 50,000 to be inaccessible from at least one point in China on at least one occasion. Adopting a more conservative standard for determining which inaccessible sites were intentionally blocked and which were unreachable solely due to temporary glitches, we find that 18,931 sites were inaccessible from at least two distinct proxy servers within China on at least two distinct days. We conclude that China does indeed block a range of web content beyond that which is sexually explicit. For example, we found blocking of thousands of sites offering information about news, health, education, and entertainment, as well as some 3,284 sites from Taiwan. A look at the list beyond sexually explicit content yields insight into the particular areas the Chinese government appears to find most sensitive.
This report is intended as a milepost, part of an ongoing empirical investigation documenting filtering levels and methods over time. As we continue to collect data on the evolving accessibility of a diversified "basket" of web sites, we will seek to say more about overall trends in Chinese web filtering, and further see if such trends are credibly linked to government statements of Internet policy and, for particular categories of sensitive sites, whether shifts in the Chinese government's substantive policy (for example, a noted change in tension levels with Taiwan) are reflected in levels of web filtering. This, in turn, can shed light on how important a priority web filtering is to the government.
In other work, the authors will expand analysis to Internet filtering systems in other countries and will generate additional URLs to test based on queries invoked in the local language. Sign up to receive updates. The authors are also developing a distributed application for use by Internet users worldwide in testing, analyzing, and documenting respective Internet filtering regimes. Get more information and sign up to get involved. The authors previously provided access to a web-based system to test web filtering in China which remains available. Finally, the authors prepared screenshots documenting the September 2002 redirection of requests for google.com to other search engines.
Testing Methodology & Technical Notes on Chinese Filtering Systems
Our testing relied on two separate methods of data collection. From March 20 to May 6, 2002, we connected by modem with an international telephone call to dialup accounts with several Chinese ISPs. After May 6 our modems were unable to negotiate a "handshake" with modems answering at any Chinese ISPs, a failure consistent across multiple phone lines and locations, and multiple ISPs and points of presence in China. From August 14 to November 12, 2002, we connected to open proxy servers in China. We selected open proxies with assistance from Ronald F. Guilmette, and we determined their respective listed locations for tabulation purposes using IP-WHOIS.
We conducted testing of only one URL per web host based on our background knowledge, confirmed in subsequent testing, that when the default page of a site was filtered, the entirety of that site was typically filtered. Our appendix contains more about this hypothesis, its support, and our level of confidence in it. As a result, when we report a site as inaccessible, it is typically the case that the entirety of that site was inaccessible -- not just the site's default page or "front page."
On the basis of our testing, both automated and manual, we have reached an increased understanding of the design of filtering systems used to restrict Internet access in China. Our appendix discusses the details of this filtering, including the details we have inferred as to the implementation of filtering systems and the prospects for circumventing them, as well as possible regional variations in filtering and their impact on concluding that a given site is "blocked in China."
Specific Sites Found to be Blocked
During testing, we requested 204,012 distinct sites drawn from various web indices (such as sites listed within Yahoo! Taiwan's directory categories) and search results (such as Google's top 100 results for a search on "China freedom"). Most sites were accessible from China just as from our standard Internet connection in the United States, but we found that certain URLs were consistently unavailable. By attempting to retrieve these sites repeatedly over time, from multiple locations within China, we drew inferences on which specific sites among them were intentionally blocked by Chinese network staff. Our subsequent analysis considers a site to be blocked if it was found to be inaccessible by our testing system on at least two distinct occasions from at least two distinct testing locations in China, and if at those times it was simultaneously reachable from our main testing location in the United States.
Filtering of Sexually Explicit Content
A preliminary round of testing examined 795 distinct URLs containing sexually explicit images. These URLs had been used as the basis for a portion of one author's expert testimony in Multnomah County Public Library, et al. v. United States, 201 F.Supp.2d 401 (E.D.Pa., 2002). He generated this list by collecting all 797 results from Google in response to an October 2001 web search using the search criteria "free adult sex," less two pages removed because they were found not to include sexually explicit images. Of the 752 of these pages still providing content at the time of this testing, 101 (13.4%) were blocked in China. In contrast, the authors previously found 695 (86.2%) of these same sites to be blocked in Saudi Arabia, and one author previously found that leading commercial filtering applications blocked 70% to 90% of these sites.
Filtering of Other Content
Our main testing examined 203,217 web sites drawn from categories other than sexually explicit content. We seeded this list of sites from multiple sources. For example, we extracted from Yahoo all web sites in certain categories (including those specifically about education, entertainment, news, major world governments, and politics) as well as all sites in the non-English regional versions of Yahoo that specifically concern China and Taiwan. We conducted searches on terms likely to yield sensitive results and thus candidates for blocking, both in English and in Chinese, using the Google search engine, and placed the top results into our list of URLs to test. We tracked approximately 5,000 additional sites submitted to our Real-Time Testing System through September 2002, and we received email suggestions of further sites to test. The result of these data sources was a list of 203,217 distinct host names.
Using the definition of "blocked" specified above, we found that a total of 18,931 of these sites (9.3%) were blocked in China. Given the large number of sites blocked, we have organized our listing of specific blocked pages into highlights -- some blocked pages that are well known or otherwise of possible interest -- followed by the full list. Where available, each page's listing includes its HTML title, its META keywords and description, and its Yahoo Directory and Google Directory classifications. These details are as retrieved in November 2002.
Specific web sites blocked in China
Highlights of blocked sites - sites that are well known or otherwise of particular interest
Complete list of 18,931 blocked sites, sorted alphabetically by URL
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z numbers
Content Not Filtered
Within the context of the large number of sites found to be restricted in China, many other sites are not blocked in China, whether because they have yet to be passed upon by the authorities that determine blocks or because they have been affirmatively found to be non-sensitive. Sites not blocked may assist in drawing inferences about what content among the blocked sites is responsible for the differential treatment. For example, filtering of the United States Federal Courts (uscourts.gov and all subdomains) might indicate a desire to prevent access to information about the American judicial system, its processes, and its rulings -- but Findlaw, LexisNexis, and Westlaw all remain accessible. Similarly, blocking of well-known sexually explicit sites such as Playboy and Penthouse suggests a purposeful decision to restrict sexually-explicit material -- yet the well-known sites of Hustler Magazine and whitehouse.com were consistently accessible in the authors' testing.
Additional hosts tested but not found to be inaccessible (.ZIP file)
Analysis & Summary Statistics
Among the specific blocked pages are the following categories of content:
- Dissident/democracy sites. Blocked sites includes sites about democracy and human rights generally and sites specific to China. Of the top 100 sites returned by Google in response to a search for "democracy china," 40 were found to be blocked, while 37 "dissident china" sites were blocked, 32 were blocked for "freedom china," and 30 for "justice china." Specific blocked sites included Amnesty International, Human Rights Watch, the Hong Kong Voice of Democracy, the Direct Democracy Center, and dozens of Falun Gong and Falun Dafa sites.
- Health. Blocked sites included sites about health generally and about health in China specifically. Of the top 100 Google results for "hunger china," 24 were blocked; for "famine china" 23; for "AIDS china" 21; for "sex china" 19; for "disease china" 14. Specific blocked sites included the AIDS Healthcare Foundation, the Internet Mental Health reference, and the Health in China research project. We found blocking of a total of 139 sites listed in Yahoo's Health directory categories and subcategories.
- Education. Blocked sites included a number of well-known institutions of higher education, including the primary web servers operated by Caltech, Columbia, MIT, and the University of Virginia. Blocked non-university sites included the Learning Channel, the Islamic Virtual School, the Music Academy of Zheng, and the web sites of dozens of public and private primary and secondary schools. We further found evidence of blocking of 696 sites listed in Yahoo's Education directory categories and subcategories.
- News. The BBC News was consistently unreachable, while CNN, Time Magazine, PBS, the Miami Herald, and the Philadelphia Inquirer were also often unavailable. Of Google's top 100 results for news, 42 were blocked. We further found evidence of blocking of 923 sites listed in Yahoo's News and Media directory categories and subcategories. Nonetheless, some news sites that were previously blocked became accessible during the course of our testing; for example, Reuters was blocked through April 29, but was subsequently accessible, while the Washington Post was blocked through May 6 and was subsequently accessible. This reduction in blocking of entire news sites may reflect that certain new filtering technologies (discussed in greater detail in the appendix) allow blocking only of the particular sections and articles that are particularly controversial in China. As a result, our results should not be taken to suggest that every Washington Post article is now accessible in China.
- Government sites. Blocked sites included a variety of sites operated by governments in Asia and beyond. As discussed below, government sites of Taiwan and Tibet were targeted specifically. Also blocked was the entirety of uscourts.gov, including the many federal district and appellate courts in the United States, as well as the United Kingdom's Court Service and Israel's Judicial Authority. The communication sites of various governments were blocked, including the United States' Voice of America, as well as travel sites from Australia, Israel, Korea, Switzerland, and Wales. Government military department sites were also blocked, including the US Department of Defense, though others remained reachable (the CIA). A variety of additional government sites were blocked, without manifest pattern, both in the United States and beyond; examples include the site of Seattle's King County, the main Australian Federal Government index site, the Philippines Bureau of Customs, the British Insolvency Service, the Office of the Governor of Makkah in Saudi Arabia, and the Legislative Assembly of British Columbia. Blocked sites included 516 sites in Yahoo's categories and subcategories pertaining to governments.
- Taiwanese and Tibetan sites generally. Blocked sites included business sites (like the A&D Company of Taiwan), non-commercial sites (the Taiwan Health Clinic and a total of 709 .edu.tw sites, as well as the Voice of Tibet), and government sites (the Office of the President of Taiwan and the Taiwanese Parliamentary Library among 936 other Taiwanese government sites, and the Official Website of the Tibetan Government in Exile). More than 60% of Google's top 100 "Tibet" sites were found to be blocked, and more than 47% of the top "Taiwan" sites were blocked. Taiwanese content was also blocked disproportionately, relative to its representation in our testing sample; fully 3,284 .TW sites (13.4% of .TW sites tested) were blocked, while our overall block rate was approximately 9.3%. (Of course, comparisons of block rates must be performed with care given the subjective formation of the list of sites tested. For lack of a domain name specifically associated with Tibetan sites, it is more difficult to perform such a comparison on the block rate of Tibetan content.)
- Entertainment. Blocked sites included the movie Deep Impact, the Canadian Music Centre, the Taiwanese site of MTV (mtv.com.tw) and multiple sites providing off-color jokes. We also found blocking of a total of 451 sites in Yahoo's categories and subcategories pertaining to Entertainment.
- Religion. Blocked sites included the Asian American Baptist Church, the Atheist Network, the Catholic Civil Rights League, Feng Shui at Geomancy.net, the Canberra Islamic Centre, the Jewish Federation of Winnipeg, and the Denver Zen Center. We found blocking of a total of 1,763 sites in Yahoo's categories and subcategories pertaining to religion.
Blocking of search results by Google Keyword
Blocking of search results by Google Keyword - with blocked site details
Blocking of search results by Google Keyword - chart
Blocking was found to vary across locations in China. However, the authors lack sufficient data to draw conclusions about systematic variations in blocking across geographic locations; current data is consistent both with intentional variations in blocking and with delays in updating block lists in certain regions.
The authors previously made available to the public a real-time testing site whereby interested Internet users could submit URLs for immediate testing through Chinese filtering systems. Between August 28 and November 21, 2002, this system received a total of 100,563 requests to test 13,569 distinct URLs on 12,335 distinct host names. More than 5,000 of these hosts had not previously been selected by the authors for testing.
Having previously examined Internet filtering in Saudi Arabia, the authors tested through Chinese filtering systems all sites previously tested in Saudi Arabia. The authors had previously tested 49,586 distinct hosts through filtering systems in Saudi Arabia and had found that country to restrict access to at least certain content on a total of 582 of these hosts (1.2% of sampled hosts). China also filtered 101 of these hosts (0.2%), while China filtered 5,903 additional hosts (11.9% of the sample) not filtered in Saudi Arabia. The chart below depicts the extent of overlap between filtering in China and in Saudi Arabia. (Note that the representation of hosts not blocked is not to scale, relative to the rest of the chart.)
Conclusions
From our data, it appears that the set of sites blocked in China is by no means static: whoever maintains the lists is actively updating them, and certain general-interest high-profile sites whose content changes frequently appear to be blocked and unblocked as those changes are evaluated. (This is particularly noticeable with news sites such as CNN and Slashdot.) Some new sites with sensitive content do not appear to take long to be blocked. However, even some longstanding sites of apparent sensitivity remain unblocked. This is most easily noticed in our data with respect to sexually-explicit sites -- we found blocking of only 13.4% of our sample of well-known sexually-explicit sites -- but is also anecdotally apparent from our data, as one notes blocking of some US intelligence sites but not others, etc. Further data collection will be geared at determining the extent to which the basket of sites blocked reflects shifting substantive government policies -- whether, for example, a sea change in relations with Taiwan, whether positive or negative, is reflected in blocking, and if so, how quickly.
China's Internet filtering efforts remain opaque, and in the absence of government cooperation or admission of filtering methods, data probing of the sort used in our study remains a useful tool in determining the scope of filtering. The authors have previously studied filtering in Saudi Arabia and in American public libraries; in these locations, blockage of a web page leads to an error message clearly explaining that the requested page is unavailable due to intentional blockage. In contrast, China's systems make it difficult for a user to distinguish between an intentional block and a temporary network or server glitch. This may be intentional or may reflect technical happenstance -- that this implementation was easier or cheaper, given the size and design of China's network infrastructure. But some newer forms of Chinese filtering -- namely, redirection of a request for a sensitive web site to another web site -- can be either more or less obvious to the user than an apparent network glitch, depending on whether the substitution is noticed.
The primary and most longstanding means of blocking is at the router level, and on the basis of IP address -- the crudity of which means that those implementing filtering must choose between blocking an entire site on the basis of a small portion of its content, or tolerating such content. This would explain why, for example, the www.mit.edu server is sometimes wholly inaccessible even though Chinese officials likely have no objection to most content on that server. To the extent that the entirety of that server is nonetheless inaccessible, China's filtering system is properly considered to be overblocking, and we believe our data indicates extensive overblocking of this form. This may account for the rise of still-rare forms of blocking that allow more refined content filtering -- such as blocking by keywords or phrases in any particular HTML page requested by a user, whether or not the site hosting the page is present on an ex ante block list. Such blocking is likely far more technology-intensive, in principle even slowing overall network response time as packets are analyzed by sniffers and the results passed to filters. Aside from allowing more refined content filtering, such newer forms of blocking appear to be linked to disabling Internet access for an arbitrary amount of time for a user who requested a page with forbidden content -- enabling a penalty for attempting access to sensitive material beyond simply denying the very material requested. Other nascent but growing forms of filtering appear to be targeted to limit the information that can be gleaned from search engines -- enabling the automated blocking of search results that may not (yet) have been filtered through human placement on a "forbidden" list.
The Chinese government and associated network authorities are clearly continuing to experiment with different forms of blocking, indicating that -- unlike Saudi Arabia, which appears to have a single, declared method of blocking and a much more constant (and apparently smaller) list of non-sexually-explicit blocked sites -- Chinese network filtering is an important instrument of state Internet policy, and one to which significant technical and human resources continue to be devoted.
Additional details on data collection and interpretation are available in the technical appendix. The authors have also indexed related work by others.
The authors are grateful to Ronald F. Guilmette for assistance with locating proxy servers in China, to Joshua Rosenzweig of the Dui Hua Foundation for assistance in locating routing glitches, and to Nongji Zhang of the Harvard Law School Library for assistance with Chinese translations.
A version of this document was included in the March/April 2003 edition of IEEE Internet Computing.
* Jack N. and Lillian R. Berkman Assistant Professor of Entrepreneurial Legal Studies, Harvard Law School.
** J.D. Candidate, Harvard Law School, 2005.
Support for this project was provided by the Berkman Center for Internet & Society at Harvard Law School.
from http://cyber.law.harvard.edu/filtering/china/
------------------------
The appendix sections below offer technical details beyond those of the main report. It contains the following sections:
Blocking of Entire Web Sites and Entire Servers
Reporting Criteria and the "Blocking Quotient" of Reported Sites
DNS Filtering/Redirection and Its Implications
Independent Filtering Implementations and Corresponding Circumvention Techniques
Other Effects of Chinese Filtering: Routing and Email
We conducted testing of only one URL per Web host based on our background knowledge, reinforced by subsequent testing, that when the default page of a site was filtered, the entirety of that site was typically filtered.
To test the hypothesis of entire-site blocking, we formed a sample of web hosts found to be inaccessible, and we checked whether an arbitrary subdirectory on each such site was also inaccessible. Though the arbitrary directory name we chose was intended not to exist on the servers, web servers return a "not found" error message in response to a non-existent request. We confirmed that these error pages themselves were inaccessible in a total of 99.8% of tests. We attribute the other 0.2% of results to anomalies such as transient network errors that may have wrongly rendered the web host inaccessible in the first instance when the host was not intended to be blocked.
At the moment, then, it seems that when the default page ("front page") of a host is blocked, all other pages on that host are also blocked. (Of course, the reverse need not be the case, and the authors have separately confirmed multiple instances in which it is not the case.)
When an entire host is filtered, our data show that this filtering typically operates on the basis of the host's IP addresses rather than on the basis of its one or several domain names. To make this confirmation, we observed that when many web sites are hosted on a single web server (as is typical in commercial "shared hosting" at the lowest monthly rates), blocking by China of one web site on a given server (with a given IP address) typically entails blocking of all other web sites on that server. For example, we found a total of 308 distinct (by domain name and differing page content) blocked sites all hosted on the server at IP address 216.34.94.186, a parking/redirection server used by domain name registrar Dotster. To the extent that this server in fact hosts additional sites beyond those we tested, it is highly likely that they too were blocked. Indeed, a representative of domain name registrar enom reported to the authors that its primary domain name forwarding service had been blocked by China -- rendering unreachable literally hundreds of thousands of domain names that rely on that server.
While filtering of a host's top-level page predicts the filtering of all other pages on that site, such filtering is not technically mandated. Indeed, midway through our testing, the authors learned of and confirmed the blocking of certain pages on otherwise-accessible sites. At least some of this blocking appears to be triggered by one of relatively few keywords in page URLs or contents; this therefore represents a technical layer of blocking wholly distinct from (and seemingly rarer than) that which results in an entire site being made unavailable.
Since blocking typically affects an entire web server, our reporting includes all Yahoo and Google/DMOZ categories that reference any pages on affected web servers.
In order to sort out intentional blocks from mere unintentional network blockages or other variation we tested candidate URLs multiple times and through multiple proxies. In many cases, sites were unavailable only on one occasion, or unavailable from one proxy in China while available from another. While such phenomena might represent intentional blocking that is simply limited in time or regional scope, we operationalize the notion that a URL is blocked "in China" only when it has been found to be unavailable on at least two occasions, and from at least two distinct proxies, all while still accessible from the United States. Variations in blocking across proxies, if not due to transient network failures, could reflect a distribution of authority to make and implement blocking decisions from one region to the next or a technical burden or delay to readily programming key routers across China to block an undesirable URL.
To the extent that blocking varies across networks and across geographic locations, to describe a URL or entire Web site as "blocked in China" may be inexact -- a site can be found accessible in some places and simultaneously inaccessible in others. In the absence of further data about political decision-making and technical implementation, we can be only as precise as the data is accurate -- and we therefore apply a threshold of overall inaccessibility to determine that a site is "blocked in China."
We have received reports indicating that certain locations -- for example, hotels predominantly frequented by western visitors -- have significantly less stringent filtering policies. Our reporting of sites "blocked in China" should not be taken to describe Internet access from these locations.
Having tested all sites on multiple occasions from multiple distinct locations within China, the authors have found some sites that were blocked consistently -- on all occasions, from all locations -- while other sites were blocked less often. The "blocking quotient" slider in our reporting seeks to characterize this observation: a wide red bar signifies a site blocked more frequently, while a narrower bar denotes intermittent blocking or blocking observed from relatively fewer locations within China. We report this measurement with a slider rather than a number to reflect the uncertainty necessarily associated with these measurements and the resulting analysis.
DNS Filtering/Redirection and Its Implications
For some 1,043 of sites tested, we confirmed that DNS servers in China report a web server other than the official web sever actually designated via each site's authoritative name servers. We call this phenomenon "DNS redirection," though others sometimes refer to the situation as "DNS hijacking." Consistent with prior reporting by Dynamic Internet Technology, our data show that such sites were consistently unreachable in their entirety.
Currently, when a user in China requests a site affected by DNS redirection, the user's computer is told that the site's domain name is associated with the IP address 64.33.88.161. That IP address is associated with the host www.falundafa.ca, the site of a Canadian organization that promotes the practice of Falun Gong. However, that address is itself blocked by Chinese border routers, preventing such requests from reaching either the falundafa server or any other. As a result, Chinese users are unable to reach the entirety of these many sites, including their respective default pages as well as their subsidiary pages.
While the authors cannot know for sure the specific rationale for implementing this additional method of filtering by Chinese network staff, we suggest two possible understandings. First, this method of filtering might be intended to supplement border router filtering; depending on the specific method of implementation, it might be in some way more efficient or easily updated by Chinese network staff, and compliance of ISPs can be more easily monitored remotely via ordinary DNS tools such as dig. Second, this method of filtering is a likely precursor to efforts both to monitor accesses to specific sites and to revise or replace content on those sites with other content specifically provided by Chinese network staff ; either approach would rely on proxy servers to be placed at specified IP addresses and would require that requests for designated sites in some way be redirected to those addresses. While this second theory is largely speculative, it rings true given related efforts to replace Google (see the authors' prior Replacement of Google with Alternative Search Systems in China) and subsequent filtering of certain Google search terms (including the names of key political figures and the terms required to use the Google cache).
Independent Filtering Implementations and Corresponding Circumvention Techniques
We have observed certain idiosyncrasies in Chinese methods of Internet filtering, and in some instances we have found methods to circumvent particular aspects of filtering. Based on this data, we can draw inferences about particular methods of filtering. In this section, we detail these anomalies as well as their implications.
- Filtering on the
basis of web server IP address. As described above, we were able to confirm
that filtering was on the basis of IP address by observing that when China
blocked access to one web site on a given physical server, all other sites
on that physical server (i.e. on that IP address) were also typically blocked.
- Implementation method: This method of filtering likely relies on block lists loaded into border routers that connect China's internal networks with international networks. ISPs reportedly share block lists, perhaps with additional centralized coordination of updates. Variation across networks and over time is to be expected based on delays in propagation of list revisions. Our data suggest that when Chinese network staff deem a site to contain undesirable content, their most common method of filtering it is simply to drop IP packets destined for it.
- Circumvention methods: This method of blocking, the most widely-used in our experience, is difficult to circumvent. The typical circumvention method relies on channeling Web page requests and viewing associated results through proxy servers which are themselves outside China. However, monitoring and proxy-blocking efforts reportedly provide a check on the use of proxies. See details in Bennett Haselton's List of possible weaknesses in systems to circumvent Internet censorship and Seth Finkelstein's discussion of filtering "loopholes." When Google's cache feature was available in China, it allowed circumvention of this method of filtering, but this feature has since become unavailable, as described below.
- Filtering on the
basis of domain name server IP address. Like filtering on the basis of
web server IP address, this method likely relies on block lists loaded into
border routers. Even if the desired web server is itself reachable, a user's
computer cannot reach the web server if it cannot first convert the site's
domain name into a numeric IP address -- and when the site's DNS server is
blocked, no such conversion is possible.
- Apparent unintentionality of blocking: We have observed that many of the filtered DNS servers are also themselves web servers, or are located on networks that are filtered in totality (as distinguished from networks filtered only in part, i.e. for which certain specific IP addresses are filtered while others remain accessible). This lends some support to the inference that filtering at the level of DNS may be unintentional -- an accidental consequence of filtering a web server or network that also happens to offer domain name services.
- Circumvention methods: When filtering operates on the basis of domain name server IP address, filtering can sometimes be circumvented via direct entry of the desired web server's IP address. In particular, an interested user may simply enter the IP address of the desired web server directly into a browser's Location bar (into the same location where the site's domain name would ordinarily be placed). Of course, this method requires that the user know the server's IP address (which the user cannot obtain directly through the ordinary domain name system since the domain's DNS server is, by hypothesis, blocked), and it further requires that the server provide only this single site (rather than hosting many sites via HTTP multiplexing). Nonetheless, in some situations entering an IP address directly may prove able to circumvent Chinese filtering efforts. An additional possible method of circumvention is the use of non-Chinese DNS servers, with such servers performing a subset of the role that an overseas proxy would serve to circumvent web host IP blocking. If such an approach became widespread, border routers could be reconfigured to refuse outbound DNS requests except when received from authorized DNS servers.
- DNS redirection.
As described above, DNS servers in China have been found
to offer incorrect answers as to the IP addresses of certain domain names.
- Circumvention methods: Use of non-Chinese DNS servers bypasses this method of filtering, though such use might in the future be blocked by border routers.
- Filtering on the
basis of keywords in URL. Beginning in September 2002, our data reflect
that when a subscriber to a Chinese ISP submitted a URL request that itself
contains certain words or phrases -- this typically happens for search engine
searches, like http://www.google.com/search?q=jiang+zemin
-- no response would be received. This effect was particularly notable at
Google, where names of key political figures apparently came to be off-limits,
as are certain other words used to invoke controversial Google features (among
them the caching feature that can allow Google to be a method of circumventing
the filtering implementations described above). In some instances, the authors
have also observed that these keyword blocks may apply equally to requests
from other sites; from at least certain locations in China, attempts to retrieve
any URL containing the character string "jiang+zemin" triggers filtering
(even if the result of that request would only be a 404 Not Found error page).
- Additional symptoms noted: Subsequent to a request for a URL with a prohibited term, the authors have received reports of (and have confirmed) "timeout" periods of 5 to 30 minutes during which either the target site or even all sites (including otherwise-permissible sites) became inaccessible. The authors have received further reports that some timeout periods may last until a user's computer is rebooted and/or until a user's DSL modem is powercycled. If intentional, as seems likely, this represents a type of filtering that tries to "train" the end user to avoid using prohibited terms, imposing a penalty beyond inaccessibility of the requested URL should the terms be used.
- Implementation method: This method of filtering is likely implemented via packet-filtering systems integrated into border routers or placed adjacent to them. See additional discussion below.
- Circumvention methods: We have observed that keyword-based filtering systems tend to search for plaintext in URL strings -- searching for the word "cache," for example, and blocking any request to google.com that contains this word in its URL. However, the HTTP RFC specification describes additional techniques for encoding ("escaping") characters in a URL (RFC 2396 section 2.4.1). For example, ASCII characters can be encoded in hexadecimal code via escape sequences of the form %4A where 4A is the hexadecimal code of the ASCII character at issue. The authors have confirmed that in at least some instances, Chinese filtering systems of the sort described in this section are not currently triggered by keywords that, when expressed in plain text, consistently prevent access to the requested pages. (This errata reflects a failure to properly implement the comparison specified in RFC 2616 section 3.2.3.)
- Filtering on the
basis of keywords or phrases in HTML response. Beginning in September
2002, the authors observed that certain keywords in HTML response pages seemed
to be blocked by Chinese network infrastructure. In particular, even when
a page came from a server not otherwise filtered, and even when the page featured
a URL without controversial search terms, it might nonetheless be inaccessible
if the page itself contained particular controversial terms. Such pages were
often truncated, i.e. interrupted midway through their display. On certain
browsers, including recent versions of Microsoft Internet Explorer, pages
truncated in this way may flash briefly on screen, then disappear. This phenomenon
represents an augmentation of "compiled" filtering with "interpreted"
filtering -- the former representing specific sites deemed ex ante
to be off-limits, with routers configured accordingly, and the latter representing
data deemed on-the-fly, mechanically, to be off-limits, with corresponding
temporary loss of access to the source of that data.
- Level of accuracy: The authors have observed that filtering on the basis of keywords in sometimes seemed to malfunction, i.e. to allow passage and viewing of a page that contained words that were otherwise prohibited. This occurrence seemed to be random, but in some instances seemed to take place as often as not.
- Implementation method: The observed results are precisely what would be expected if Chinese border routers (or associated hardware) implemented a packet-filtering system triggered by particular controversial keywords. To reduce memory and processor requirements, such systems promptly pass on all packets found to be acceptable. However, upon the receipt of the first packet containing a prohibited term, a packet-filtering system would be configured to discard all further packets from the same source and/or destination for some designated period -- causing the page truncation consistently observed under these circumstances. The randomness in successful filtering might reflect that packet filtering operates at less than line speed, i.e. is able to inspect only a portion of content passing through a given router. It might also reflect that packet filtering fails to take account of borders between packets, such that a page is permitted to be viewed if a part of a prohibited word is received in one packet and the remainder in a subsequent packet.
- Additional symptoms noted: Timeout periods, as described above.
- Circumvention method: Based on our understanding of the likely implementation method of such filtering, the authors note two possible means of circumventing this filtering. First, content providers can escape their text, using HTML markup that is equivalent to the characters at issue or adding HTML whitespace (comment tags, etc.) in the middle of controversial words or phrases. (These techniques are as documented in HTML specifications for character entity references and comments.) Second, Chinese users can reduce their TCP/IP stack's specified maximum transmission unit (MTU) -- reducing the amount of text contained in a given packet and thereby reducing the effectiveness of packet-inspection systems; however, this approach typically reduces performance and also increases network overhead.
Other Effects of Chinese Filtering: Routing and Email
Routing. The authors have observed that some American ISPs route packets through China towards destinations beyond China (in particular, to Hong Kong). When the desired web servers are blocked from China, such a routing typically yields to filtering by network equipment in China of an American user's request. In response to this problem, affected American ISPs can address the situation by manually altering the routes used to reach hosts in Hong Kong and elsewhere. However, affected ISPs are often unaware of the situation, and an effective response requires delay and/or causes additional expense as an affected ISP finds the necessary partner ISPs and establishes peering relationships with them.
Email. When border routers in China discard packets destined to or received from certain hosts, we understand that they typically do so without regard for the specified protocol of communications. As a result, email messages are typically filtered when sent to or received from blocked sites. The authors understand that additional filtering efforts may specifically target certain controversial emails, and the authors plan to document this situation in detail in future work.
Other Protocols: Filtering on the basis of server IP address can restrict additional protocols of Internet communications. For example, FTP is as affected as the web by blocking of a requested server's IP address. The authors have also received reports of failures of instant messaging software, likely reflecting difficulty in passing packets to and from designated servers.
from http://cyber.law.harvard.edu/filtering/china/appendix-tech.html