Currently the performance of the Tor anonymity network is quite poor. This problem is frequently stated as a reason for people not using anonymizing proxies, so improving performance is a high priority of their developers. There are only about 1 000 Tor nodes and many are on slow Internet connections so in aggregate there is about 1 Gbit/s shared between 100 000 or so users. One way to improve the experience of Tor users is to increase the number of Tor nodes (especially high-bandwidth ones). Some means to achieve this goal are discussed in Challenges in Deploying Low-Latency Anonymity, but here I want to explore what will happen when Tor’s total bandwidth increases.
If Tor’s bandwidth doubled tomorrow, the naïve hypothesis is that users would experience twice the throughput. Unfortunately this is not true, because it assumes that the number of users does not vary with bandwidth available. In fact, as the supply of the Tor network’s bandwidth increases, there will be a corresponding increase in the demand for bandwidth from Tor users. This fact will apply just as well for other networks, but for the purposes of this post, I’ll use Tor as an example. Simple economics shows that performance of Tor is controlled by how the number of users scales with available bandwidth, which can be represented by a demand curve.
I don’t claim this is a new insight; in fact between me starting this draft and now, Andreas Pfitzmann made a very similar observation while answering a question following the presentation of Performance Comparison of Low-Latency Anonymisation Services from a User Perspective at the PET Symposium. He said, as I recall, that the performance of the anonymity network is the slowest tolerable speed for people who care about their privacy. Despite this, I couldn’t find anyone who had written a succinct description anywhere, perhaps because it is too obvious. Equally, I have heard the naïve version stated occasionally, so I think it’s helpful to publish something people can point at. The rest of this post will discuss the consequences of modelling Tor user behaviour in this way, and the limitations of the technique.
[ R source code ]
The figure above is the typical supply and demand graph from economics textbooks, except with long-term throughput per user substituted for price and number of users substituted for quantity of goods sold. Also, it is inverted, because users prefer higher throughput, whereas consumers prefer lower prices. Similarly, as the number of users increases, the bandwidth supplied by the network falls, whereas suppliers will produce more goods if the price is higher. In drawing the supply curve, I’ve assumed the network’s bandwidth is constant and shared equally over as many users as needed. The shape of the demand curve is much harder to even approximate, but for the sake of discussion, I’ve drawn three alternatives. We will return to these assumptions later. The number of Tor users and the throughput they each get is the intersection between the supply and demand curves — the equilibrium. If the number of users is below this point, more users will join and the throughput per user will fall to the lowest tolerable level. Similarly, if the number of users is too high, some will be getting lower throughput than their minimum, so will give up, improving the network for the rest of the users.
Now let’s assume Tor’s bandwidth grows by 50% — the supply curve shifts, as shown in the figure. By comparing how the equilibrium moves, we can see how the shape of the demand curve affects the performance improvement that Tor users see. If the number of users is independent of performance, shown in curve A, then everyone gets a 50% improvement, which matches the naïve hypothesis. More realistically, the number of users increases, so the performance gain is less and the shallower the curve gets, the smaller the performance increase will be. For demand curve B, there is a 18% increase in the number of Tor users and a 27% increase in throughput; whereas with curve C there are 33% more users and so only a 13% increase in throughput for each user.
In an extreme case where the demand curve points down (not shown), as the network bandwidth increases, performance for users will fall. Products exhibiting this type of demand curve, such as designer clothes, are known as Veblen goods. As the price increases, their value as status symbols grows, so more people want to buy them. I don’t think it is likely to be the case with Tor, but there could be a few users who might think that the slower the network is, the better it is for anonymity.
To keep the explanation simple, I’ve made quite a few assumptions, some more reasonable than others. For the supply curve, I assume that all Tor’s bandwidth goes into servicing user requests, it is shared fairly between users, there is no overhead when the number of Tor clients grows, and the performance bottleneck is the network, not clients. I don’t think any of these are true, but the difference between the ideal case and reality might not be significant enough to nullify the analysis. The demand curves are basically guesswork — it’s unlikely that the true one is as nicely behaved as the ideal ones shown. It more likely will be a combination of the different classes, as different user communities come into relevance.
I glossed over the aspect of reaching equilibrium — in fact it could take some time between the network bandwidth changing and the user population reaching stability. If this period is sufficiently long and network bandwidth is sufficiently volatile it might never reach equilibrium. I’ve also ignored effects which shift the demand curve. In normal economics, marketing makes people buy a product even though they considered it too expensive. Similarly, a Slashdot article or news of a privacy scandal could make Tor users more tolerant of the poor performance. Finally, the user perception of performance is an interesting and complex topic, which I’ve not covered here. I’ve assumed that performance is equivalent to throughput, but actually latency, packet loss, predictability, and their interaction with TCP/IP congestion control are important components too.
In summary, I’ve shown how the relationship between network bandwidth and user-perceived performance is more subtle than it might at first seem. The dominant factor behind Tor’s performance is the number of potential users who are willing to tolerate a certain throughput. Until this relationship is better understood, it remains unclear how much faster Tor will become as the network grows. It would be an interesting research project to establish the shape of the supply and demand curves, through modelling Tor’s scalability and predicting user tolerance. For the latter quantity, it might be more helpful to consider tolerance in terms of latency rather than throughput, which would lead to a non-inverted supply and demand chart. However, the relationship between number of network users and latency is even less clear than that of throughput. Finally, the application of more advanced economic techniques could give more insight than that of the rudimentary approach discussed here.