Sampled Traffic Analysis by Internet-Exchange-Level Adversaries

Users of the Tor anonymous communication system are at risk of being tracked by an adversary who can monitor both the traffic entering and leaving the network. This weakness is well known to the designers and currently there is no known practical way to resist such attacks, while maintaining the low-latency demanded by applications such as web browsing. For this reason, it seems intuitively clear that when selecting a path through the Tor network, it would be beneficial to select the nodes to be in different countries. Hopefully government-level adversaries will find it problematic to track cross-border connections as mutual legal assistance is slow, if it even works at all. Non-government adversaries might also find that their influence drops off at national boundaries too.

Implementing secure IP-based geolocation is hard, but even if it were possible, the technique might not help and could perhaps even harm security. The PET Award nominated paper, “Location Diversity in Anonymity Networks“, by Nick Feamster and Roger Dingledine showed that international Internet connections cross a comparatively small number of tier-1 ISPs. Thus, by forcing one or more of these companies to co-operate, a large proportion of connections through an anonymity network could be traced.

The results of Feamster and Dingledine’s paper suggest that it may be better to bounce anonymity traffic around within a country, because it is less likely that there will be a single ISP monitoring incoming and outgoing traffic to several nodes. However, this only appears to be the case because they used BGP data to build a map of Autonomous Systems (ASes), which roughly correspond to ISPs. Actually, inter-ISP traffic (especially in Europe) might travel through an Internet eXchange (IX), a fact not apparent from BGP data. Our paper, “Sampled Traffic Analysis by Internet-Exchange-Level Adversaries“, by Steven J. Murdoch and Piotr Zieliński, examines the consequences of this observation.

We discuss what happens when IXes are considered in location diversity models and show that, at least in the UK, IXes are better locations for launching traffic analysis attacks, when compared to any individual ISP. What is more, some IXes (including LINX and AMS-IX) even record the header of around one in every few thousand packets, through the sFlow capabilities of their switches. Hence, an attacker wouldn’t have to install monitoring equipment of their own, they just need to get access to data already collected. Our paper also shows that traffic analysis attacks on sampled data, even at such a low rate, still work.

This figure shows the top 10 ASes between the UK Tor nodes in our sample and the rest of the Tor network. Connections going through LINX are shown in red, AMS-IX in blue and DE-CIX (the German IX) in green. While 22% of connections pass through the tier-1 ISP Level 3, even more (27%) passes through LINX. The full details can be found in Section 3.1 of our paper.

AS connectivity via IX graph

So what does this mean for Tor users? Right now there is no particular need to worry – this paper introduces a new class of adversary, and reduces the cost estimate of the attack, but fundementally end-to-end traffic analysis is not new. There remains much work to be done before implementation of defences can begin, such as verifying the hypothesis on a larger scale and establishing how to perform secure traceroute-based network mapping on Tor. I think this paper shows that this is a promising area of research and I hope it will spur further development.

Our paper will be presented at the 7th Workshop on Privacy Enhancing Technologies (PET 2007) held in Ottawa, Canada, June 20–22 and the final version will be published in Springer LNCS.

7 thoughts on “Sampled Traffic Analysis by Internet-Exchange-Level Adversaries

  1. Isn’t what we need something like – splitting the connection across multiple nodes, with each user also being a router node? It could be faster, and more secure. You could call it “TorRent” 😉

  2. Alex,

    No it would not be fool proof as the adversary (being effectivly omnipitant 😉 could still cross corelate against time, and they can always timestamp enumerate the TOR network to then build a latency map.

    The only solution to the problem is to compleatly de couple time and size related information not just between the input and output of the network but inbetween nodes as well which although it can be done has issues such as latency.

    Oh and you also need a reasonable level of realistic / dummy traffic in,out and through the TOR network as well for the real messages to be able to hide in.

  3. @Alex,

    Sorry in my earlier post I was a bit brief in my reply so,

    Let us assume a simple model of the system in terms of points the adversary (hostil organisation) can monitor but not read as it is encrypted,

    1, User TOR
    2, TOR Destination
    3, TOR TOR

    The user wants to communicate to the destination and the TOR network is used to hopefully make the communication anonymous

    If you assum that the hostile organisation can see two or more of those points then they can determin due to the low latency of the TOR network if the trafic in the paths are corelated.

    In TEMPEST work the first couple of things you get taught is Bandwidth and Energy, both have to be present in sufficient quantities for communication to take place if either is insufficient then communications cannot take place.

    When you get to understand that they start teaching you about cross-modulation where one communications signal carries another on top of it (an inadvertant or covert channel).

    All of these have their analoges in the network domain where Energy can be related to the number of packets sent, Bandwidth to channel capacity and cross modulation to several things but timing jitter on packets is one.

    Let us assume that the hostile organisation can control the network between the user and the TOR (User TOR) they can put timing delays on the network packet to modulate the data stream (think about Spread Spectrum Communications or Digital Water Marking etc).

    If they suspect you are talking to a particular destination and they can monitor that link (TOR Destination) then your trafic can easily be detected by cross corelating to find the effective Digital Watermark they super imposed onto the users network packet timing.

    Worse they can do the opposit which is to modulate all traffic from the destination into the TOR if they use the equivalent of the JPL Ranging Codes then they can find all people using the Destination site simply by monitoring all the TOR outputs. Unfortunatly this attack would work simultaniously for all data streams on the link which is just plain nasty.

    This rather nasty attack can be fairly easily stoped. In TEMPEST they tell you repeatadly “clock not just your inputs but your outputs” this prevents most timeing attacks by removing the cross modulation at the input or the output.

    So if all the TOR nodes where time synchronised and only sent data out at agreead time intervals the hostile organisation would find it difficult (but not impossible) to make this kind of attack.

    You could then think of the TOR network like a PipeLined CPU the latency increases but so does the throughput. Alternativly think of it as a digital delay line for the covert channel the more the delays the lower the bandwidth available to the channel.

    The down side is obviously the fine dividing line between latency and the sisze of message transmitted. Obviously at some point a large enough message will allow the hostile organisation to watermark the message at a given rate of latency. So the bigger the message the greater the latency required which is a double whamy…

    Another technique that can be used is for the TOR to use a low latency between nodes but randomly delay a network packet for n periods. Lets us say it does this randomly with one in ten packets and the TOR depth is a minimum of ten nodes then the expected normal distrubution of the latency is going to prove somewhat unhelpfull to the hostile organisation. However it will average out for the user which gives another area for trade off.

    However that does not help if the amount of trafic on the TOR is low, no matter how much timing jiggerling the TOR network does if yours is the only traffic through it then the hostile organisation has an easy time of finding both ends of the communication path.

    This is where fixed rate communications is usually used where dummy traffic insertion is the usual method of ensuring the fixed rate. If all nodes send exactly ten packets in a given time period then the hostile organisation will find it difficult to spot the dummy trafic from the real traffic. Obviously the current web browser clients would not like this but it would be quite easy to arange with Ajax type techneiques (one for the programers to get their teeth into).

    Obviously dummy trafic takes up quite a bit of bandwidth which would appear as a a waste especialy within the TOR network (TOR TOR) paths. But does it nee to be dummy traffic?

    No if the TOR network was used to carry two types of traffic one for which latency was important (Web) and one for which it was not (EMail). Then the dummy trafic could be replaced with the taffic that did not require low latency, and only when that was scarce would dummy traffic be inserted.

    This then leaves another problem which is demand. What happens when the demand exceeds a certain threshold so that the web traffic became sufficiently dominent that the cross modulation attack became possible again at the given packet rate. Well the simple solution is to make the packet rate variable in fixed size steps and have the whole TOR network switch up and down as appropriate.

    This should give you a flavour of what is possible for both the TOR network to help anonymous communications and what the hostile organisation has to do to fingerprint traffic through the TOR network.

  4. Clive’s suggestion of synchronised data sending is essentially Wei Dai’s PipeNet proposal which dates back to 1995.

    As to dummy traffic … well the anonymity literature is full of hand-waving proposals for such traffic; and attacks on the schemes that are fleshed out enough to permit others to study them. The usual rule of traffic analysis “It’s hard to make one thing look like another” continues to apply in spades!

  5. Richard,

    I agree that all the suggestions on their own will have deficiencies and I certainly do not belive there is or will be a “one shot silver bullet solution”.

    As you say the usual rule of traffic analysis… and you will note I qualified dummy traffic with a “realistic”.

    All the open litrature proposals I have looked at in the past have always made the mistake of trying to optomise in some way to try and preserve performance. Which experiance has shown will always leave a hook or two (clasic example is secure crypto algorithms such as AES/RSA/etc etc that when optimised for performance leak key info by timming difference due to cache hits etc).

    Essentialy the problem is that “if” you allow the traffic to look “different” in some respect for the sake of “efficiency” or “low latency” or a host of other reasons then you leave it open to attack.

    However in millitary networks the usuall premise is dummy traffic always looks the same from all sides. That is it is always (link) encrypted and packet sizes etc are always the same size and packet times and rates etc are fixed. This does not give the hostile organisation a hook to hang their hat on and get down to work (or atleast that is the theory 8).

    Therefore if a proposal only fixes one asspect of the traffic differentiation then it obviously will not be effective.

    The trade off is like being a passenger on a motor bike or on a train. The motor bike gives freedom in time and place high rates of acceleration good fuel economy etc etc but how safe are you compared to a modern train on a well organisaed and run network (France Germany etc)?

    If all the individual sugestions I gave (and one or two others) are combined in a thoughtfull manner then although not perfect the system will go a long way to solving the “known” problems.

    As you then say you build and test a real world system and revise as required untill you do remove the hooks that you and others can currently find. Then you allow the time for people to attack the solution, which then usually moves the body of knowledge on as new methods of attack become apparent (FEAL being an example).

    The current body of open knowledge on traffic analysis can be compared to the state of the open knowledge of cryptography ten years after DES came into use.

    If people are negative about traffic analysis prevention techniques then the body of open knowledge will not move forward because nobody will bother (think number factoring prior to RSA).

    However apart from the moving forward of the “body of knowledge” the real question is are “real users” going to accept the limitations in efficiency etc to gain the increase in security that this method gives?

    I suspect that the razor about security-v-usability also applies for security-v-performance, ie you can have security or performance but not both. Which would support the argument that “security always has costs” and “are people prepared to pay the cost”?

    However you will also note that I very specificaly avoided the hostile entity being a part of the network as this will certainly allow any and all traffic analysis protection to be stripped away using one of many many techniques.

  6. There is a paper from Washington Uni presented at USENIX07 on how consumer devices leak information about you.

    One such device they highlight is the Sling Media Slingbox Pro, which streams video across a home network.

    They have showed how simple traffic analysis can be used to work out which movie is being streamed even when encryption is enabled due to using “more efficient” transmission methods (variable Bitrate Encoding in this instance).

Leave a Reply

Your email address will not be published. Required fields are marked *