“Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

2016-02-23Academic papers, Internet censorshipAnonymous users, Differential Treatment, Publisher side blocking, TorSheharbano Khattak

If you use an anonymity network such as Tor on a regular basis, you are probably familiar with various annoyances in your web browsing experience, ranging from pages saying “Access denied” to having to solve CAPTCHAs before continuing. Interestingly, these hurdles disappear if the same website is accessed without Tor. The growing trend of websites extending this kind of “differential treatment” to anonymous users undermines Tor’s overall utility, and adds a new dimension to the traditional threats to Tor (attacks on user privacy, or governments blocking access to Tor). There is plenty of anecdotal evidence about Tor users experiencing difficulties in browsing the web, for example the user-reported catalog of services blocking Tor. However, we don’t have sufficient detail about the problem to answer deeper questions like: how prevalent is differential treatment of Tor on the web; are there any centralized players with Tor-unfriendly policies that have a magnified effect on the browsing experience of Tor users; can we identify patterns in where these Tor-unfriendly websites are hosted (or located), and so forth.

Today we present our paper on this topic: “Do You See What I See? Differential Treatment of Anonymous Users” at the Network and Distributed System Security Symposium (NDSS). Together with researchers from the University of Cambridge, University College London, University of California, Berkeley and International Computer Science Institute (Berkeley), we conducted comprehensive network measurements to shed light on websites that block Tor. At the network layer, we scanned the entire IPv4 address space on port 80 from Tor exit nodes. At the application layer, we fetched the homepage from the most popular 1,000 websites (according to Alexa) from all Tor exit nodes. We compared these measurements with a baseline from non-Tor control measurements, and uncover significant evidence of Tor blocking. We estimate that at least 1.3 million IP addresses that would otherwise allow a TCP handshake on port 80 block the handshake if it originates from a Tor exit node. We also show that at least 3.67% of the most popular 1,000 websites block Tor users at the application layer.

We find that the websites that block Tor mostly belong to Autonomous Systems (ASes) corresponding to mobile and access ISPs, and hosting services. Some of these ASes perform wholesale blocking of Tor, that is all the IP addresses in the AS block Tor. We also wrote classifiers to map websites to their web hosting services. Our results bring out CloudFlare and Akamai as dominant Tor blockers, highlighting the amplified blocking effect such centralized web services may create when their Tor-unfriendly policy trickles down to thousands of their client websites. The figure below shows the top 20 websites by how many Tor nodes they block, from the Alexa top-1,000 list. Each row in this figure represents a website, and each column represents a Tor exit node (of about 900 total). So a blue bar means that the website blocks a Tor exit node. Clearly, these websites (mostly hosted by Akamai and Amazon Web Services) block a large fraction of Tor exit nodes. We think that some of this blocking is caused by blacklists that include Tor exit nodes, yet other instances likely arise when abuse generated from Tor exit nodes trigger automated blocking mechanisms on websites.

Websites from the Alexa top 1,000 sites that block most Tor exit nodes

Our work provides a first step towards addressing the problems faced by Tor users by characterizing websites that treat traffic from the Tor network differently from other sources. The next steps, as described by Tor developer Roger Dingledine, involve social activism to engage with major players on the web such as CloudFlare and get their perspective on this problem and discuss possible solutions. There is not much we can do in the case of entities such as ISPs and countries that preemptively block all Tor exit nodes as a matter of policy, beyond some alleviation in the form of awareness campaigns to highlight the problem (such as, Tor’s “Don’t Block Me” initiative). With abuse-based blocking, we need solutions to enable precise filtering beyond IP address blocking of Tor exit nodes, so that benign Tor users don’t have to suffer from the abusive actions of other Tor users sharing the same exit node.

In a broader context, our work calls attention to a new kind of blocking that is mandated by publishers. In the classical censorship scenario, blocking takes place near the user, for example an intermediate device dropping a user’s request for a blacklisted website. In publisher side blocking, the user’s request arrives at the publisher, but the publisher (or something working on its behalf) refuses to respond based on some property of the user. Who else over the Internet besides Tor users is subject to publisher-side blocking?

“Do You See What I See? Differential Treatment of Anonymous Users” by Sheharbano Khattak, David Fifield, Sadia Afroz, Mobin Javed, Srikanth Sundaresan, Vern Paxson, Steven J. Murdoch, and Damon McCoy will be presented at the Network and Distributed System Security Symposium, San Diego, US, 21–24 February 2016.

This post also appears on the UCL Information Security group blog, Bentham’s Gaze.

13 thoughts on ““Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users”

Roland Dobbins says:

2016-02-24 at 00:46 UTC

The reason Tor ends up being blocked by a non-trivial proportion of operators is because Tor is, unfortunately, often misused by threat actors who launch DDoS attacks and perform other types of hostile activity via Tor. See ‘Torshammer’ for one example of a Tor-specific DDoS attack tool.

My advice has always been that if a given operator needs to block Tor exit-node access to a given site/property during a DDoS attack in order to maintain availability, that’s perfectly acceptable; but that blocking Tor access as a normative policy isn’t desirable. Unfortunately, operators get tired of abuse of the Tor network to launch attacks, and end up simply blocking it as a matter of policy.

Reply
1. t84 says:
  
  2016-06-07 at 09:38 UTC
  
  > Tor is, unfortunately, often misused by threat actors who launch DDoS attacks and perform other types of hostile activity
  
  A recent study by Akamai indicates that 99% of malicious traffic does NOT originate from the Tor network. But many website operators have been misled by dishonest hosting & CDN providers which are trying to scapegoat Tor in order to profit from bogus claims of security. Just be aware that blocking Tor is not an effective substitute for a good firewall + IDS. You can mitigate DDOS attacks without blocking Tor, and blocking Tor does not prevent attacks.
  
  Reply
April MacDonald says:

2016-02-25 at 19:35 UTC

There is a growing moment among publishers to block worthless traffic.
That includes, but isn’t limited to; using anonymity tools, blocking JavaScript and blocking adverts.

I think web users should maybe get rid of their sense of entitlement and realize if they don’t provide any benefits to website owners, then why should website owners provide anything to them?

Reply
1. Angus Lee says:
  
  2016-02-29 at 15:15 UTC
  
  Or maybe website owners need to realize they don’t necessarily have a right to know their user’s identity
  
  Reply
2. Echelon says:
  
  2016-03-10 at 16:30 UTC
  
  “There is a growing moment among publishers to block worthless traffic.” A great many people use ToR for reasons including, but by no means limited to, (1) they live under oppressive regimes that attempt to censor news/information and block online services enjoyed in free nations, or (2) they don’t appreciate being the targets of unprovoked (no articulable probable cause or reasonable suspicion) constantly government and corporate spying and profiling, nor do they subscribe to the doctrine of “nothing to hide, nothing to fear” set forth by Joseph Goebbels (Hitler’s “Reich Minister of Propaganda” and later “Reich Plenipotentiary for Total War”). The fact that you view traffic generated by such people as “worthless” reveals much about you. Goebbels would have been proud of you.
  
  “That includes, but isn’t limited to; using anonymity tools, blocking JavaScript and blocking adverts.” The “anonymity tools” part of your gibberish is addressed above. JavaScript has been repeatedly proved to open the door to a myriad of security risks — thus, you advocate that people lower their defenses and expose themselves to risks. Blocking advertisements does not generate traffic; blocking advertisements reduces traffic. I, like most people, am sick and tired of having information about products and services relentlessly thrown in my face and crammed down my throat by people/companies seeking to extract money from me and profile me to learn how to “get through” to me. My tolerance for it has been forced into exhaustion. If I want a product or service, I will go looking for it. If you put it in my face when I’m not looking for it and did no ask you about it then I will never buy it from you. Even if I don’t block ads online, I still consciously ignore them, as do many others.
  
  “I think web users should maybe get rid of their sense of entitlement and realize if they don’t provide any benefits to website owners, then why should website owners provide anything to them?” Anything put in the public with no restrictions is public domain. If a web site is publicly accessible with no up-front disclosure of terms of use/service (e.g., you cannot use this web site unless you activate JavaScript and view advertisements), then I have been given entitlement by the web site owner/operator. There are numerous court decisions to this effect. If a web site owner/operator seeks to benefit from my use of a web site, then he/she can require sign-up or registration, and login. Also, it’s further failed logic on your part to imply that a web site owner/operator cannot benefit from my usage unless he/she knows certain personal information about me. Such web site owner/operators must (and eventually will, whether they like it or not) accept the fact that people are becoming increasingly aware of the rampant and abusive practice of collecting peoples’ personal information and then, with no disclosure to the victim, reaping huge profits by indiscriminately sharing that information with numerous third parties all over the world, which practice, in many cases, subjects innocent people to identity theft and personal/physical risk. As this awareness increases, more and more people will begin resisting giving up their personal information online. From my own personal standpoint, there is not a single web site anywhere on the Internet that is important enough to me to make me willing to give up any personal information without full and complete accounting, disclosure and guarantees as to how my information will be used, who will end up with it, and how it is secured (which accounting, disclosure and guarantees very few web sites offer and can enforce).
  
  It’s really a shame that you weren’t born in the 20’s in Germany. By the age of 23-25 you could have, and most likely would have, been brought into a position of governmental power and made your name in the history books.
  
  Reply
3. Kakurady says:
  
  2016-03-11 at 13:06 UTC
  
  Some people block ads because malvertising, that is, advertisements that serve malware.
  
  Because how ad spots are resold, this can be hard to trace.
  
  I agree that users shouldn’t expect to be entitled to web content. But websites also shouldn’t be surprised of a fleeting audience if they provide a substandard experience, or depend on the browser to side with them against the user. They’re called “User Agent” for a reason.
  
  Reply
4. JJMac says:
  
  2016-04-24 at 10:55 UTC
  
  So your opinion is that if users are not prepared to allow your site’s third-party advertisers to grab their private data, the user should not use your site? Superficially a fair statement, however, the user has little control on _what_ is being shared. Whether by ignorance on the user’s end or lack of transparency on the advertiser’s end, the fact is undeniable and a severe problem made worse by the lack of _apparent_ choice given to the user. If advertisers had not tried to track and content-target users, instead opting to simply keep content relevant to the site, this problem would not exist.
  As to your comment about JavaScript, you clearly have no idea of the potential threat that JavaScript poses to even an experienced user. Ironically, JavaScript can easily become a vector, turning the users’ machines into the “bots” that later generate much of the “undesired traffic” that creates problem for site owners. The great circle of life.
  A final note on JavaScript is that, aside from security issues, there are ethics to consider. Some people who do not wish to run non-free (non-libre) software are not given the option not to execute those JavaScript objects at page load. See Richard Stallman’s paper on the JavaScript trap.
  
  The long and short: Before doing so, give users the choice before executing code or collecting data and be absolutely transparent about it. Failing that, do not gripe about anonymity tools that defeat advertising revenue or security plugins that prevent unwanted/malicious code execution.
  
  Reply
5. t84 says:
  
  2016-06-07 at 10:01 UTC
  
  > There is a growing moment among publishers to block worthless traffic.
  
  1. If you expect to succeed in the publishing business, you had better know how to spell words like “movement.”
  
  2. A recent study by Akamai revealed that the conversion rate of traffic into sales is the same on Tor as everywhere else. Surprise, surprise: people who value their privacy still make online purchases!
  
  3. For hundreds of years, newspapers have been financed by advertising, WITHOUT tracking and profiling readers. There is now a growing movement among internet publishers to emulate the traditional print media. But many others will only try a proven solution as a last resort. Always remember, the internet is an EXTENSION of the brain — not a SUBSTITUTE for it.
  
  > I think web users should maybe get rid of their sense of entitlement
  
  The internet was designed with public funds, for non-commercial purposes. There is nothing wrong with conducting business online — but there are none so entitled as the rich buffoon who thinks that everyone else should be forced to adapt to his dysfunctional business model. So please bear in mind,
  
  IT IS NOT THE PUBLIC’S FAULT IF YOU NEGLECTED TO LEARN HOW THE INTERNET WORKS BEFORE INVESTING YOUR MONEY IN AN ONLINE PUBLISHING BUSINESS.
  
  Corrupt governments and ISP’s are now recording everything we read or write online for the purpose of political profiling and industrial espionage. Do not expect people to switch off their security just to satisfy some entitled publisher who thinks the world owes her a living. If people truly appreciate what the publisher has to offer, they will support the sponsors. If you want CREDIT for that, you must implement a feedback mechanism which does not require the reader to disable their security on a GLOBAL basis just to satisfy YOUR personal profit goals.
  
  > if they don’t provide any benefits to website owners, then why should website owners provide anything to them?
  
  The internet was founded on the concept of freely sharing information. We are now developing distributed publishing systems that will replace the centralised server/hosting model, so we can bypass investors who think the internet was designed for their benefit. If your goal is to get rich and your methods are not working, you should get out of the online publishing business.
  ___
  
  References
  
  https://falkvinge.net/2015/05/05/a-year-ago-the-european-supreme-court-appears-to-have-ruled-the-whole-web-is-in-the-public-domain-and-nobody-noticed/
  
  https://p2pfoundation.net/Category:P2P_Infrastructure
  https://gist.github.com/moshest/aea88f152fac89e1c526
  https://www.gnunet.org/links/
  http://ipfs.io/
  http://p2peducation.pbworks.com
  http://www.swirl-project.org/
  http://commonstransition.org/
  
  Reply
Ross Anderson says:

2016-02-25 at 20:59 UTC

There is some nice coverage of this research in The Register.

Reply
Elliot Ness says:

2016-03-01 at 22:05 UTC

Why would anyone want to leave a comment with their email address giving it away to a non-Tor site. I understand the concept of having multiple anonymous email accounts but they aren’t something I wish to have an army of. But giving you a website address?
No way.

Reply
Kakurady says:

2016-03-11 at 13:19 UTC

Some websites block visitors from China, as policy or as automated abuse-based blocking.

One website basically said “I’ll unblock China if you can pressure the government to change their policy of Internet censorship”. (Personally, I think that sort of action is rather unlikely to have any effect.)

Also, a side effect of China’s blockage of Google, is that any website using Google’s CDNs to serve common resources, such as the jQuery library, or web fonts, no longer functions or load very slowly (as they wait for resources to time out). In addition, any website that uses ReCaptcha, a Google service, is also affected. This includes any website served by CloudFlare, which requires suspicious users to solve ReCaptcha.

Even though the Chinese government is not targeting these websites for blocking, and that the websites are not blocking Chinese users as a policy, Chinese users are not able to access these websites without circumvention tools.

Reply
1. Kakurady says:
  
  2016-03-11 at 21:13 UTC
  
  Of course, if they use an anonymity network as their circumvention tool, they’re still going to be treated differently.
  
  Reply
DoSwatcher says:

2016-03-23 at 02:44 UTC

Any DoS attempt using Tor is also an attack against Tor. That’s partially related to why Tor is so slow. Also, the number of abusive requests from exits are overestimated due to misunderstanding of the nature of proxies in general. Traffic from proxies sometimes “spikes” even without any real attacks.

Reply