There is a Workshop on Privacy in The Electronic Society taking place at the beginning of November. We (George Danezis, Marek Kumpost, Vashek Matyas, and me) will present there results of A Study on the value of Location Privacy we have conducted a half year back.
We questioned a sample of over 1200 people from five EU countries, and used tools from experimental psychology and economics to extract from them the value they attach to their location data. We compare this value across national groups, gender and technical awareness, but also the perceived difference between academic use and commercial exploitation. We provide some analysis of the self-selection bias of such a study, and look further at the valuation of location data over time using data from another experiment.
The countries we gathered the data from were Germany, Belgium, Greece, the Czech Republic, and the Slovak Republic. As some of the countries have local currencies, we have re-calculated the values of bids in different countries by using a “value of money” coefficient computed as a ratio of average salaries and price levels in particular countries — this data was taken from Eurostat statistics.
We have gathered bids for three auctions or scenarios. The first and second bids were for one-month tracking. The former data were to be used for academic purposes only, and the latter for commercial purposes. The third bids were for the scenario where participants agreed with a year long tracking and data free for commercial exploitation. Let us start with the first bids.
Differences among Countries
The distributions of the first bids are on the following plot. Although there are differences between all nations, the Greek bids are beyond our expectations.
Distributions of bids in the first auction round.
Distributions of bids in the third auction round.
Note: We use box plots with outliers. You can find a description of box plots on many places, e.g. on wikipedia.
The second figure shows the same division of participants but the values of bids come from the third auction. The Czechs bid again lowest, while the median of the Greek bids went through the roof. The problem with Greek results, however, is the low number of respondents. We believe there is still some significance in the results, because of the immense difference from bids in other countries (the probability of this happening incidentally is low even for such a small sample set), but this needs to be confirmed with more representative data!
Men and Women
Our forms had few questions but they still allow for quite interesting analysis — especially of the first scenario bids. One of the questions we were particularly interested in were the possible differences between sexes. When participants were offered the possibility to increase the bid for a year-long study, males behaved differently from women. Medians of the second bids are in ratio 1.4:1, and medians of the third bids are 1.8:1, with women bidding higher (numbers of women and men stayed at a roughly constant ratio). These results indicate that women are possibly more sensitive to what the collected data may be used for.
While we did not expect substantially different behaviour between men and women, we definitely expected it with respect to the frequency of movements. Our expectations encompassed a correlation between the bid values and frequency of movements. If we were talking only in terms of quartiles, there would be no difference above a possible statistical error. Despite general expectations, no correlation between the bids and frequency of movements was proved. Even though there might have been some misunderstanding of this question, it seems to us that the participants did not perceive their unusual movements as more sensitive than their every-day behaviour.
Impact of Scenarios
Another interesting comparison of the first bids is when reflecting behaviour of bidders in the second and the third auctions. The next figure shows differences of the first bids when the sample set was divided according to whether the participants kept bidding in the second and the third rounds.
First bids according to how many rounds the participants took part in.
Second bids — declined or not participation in the third bid.
We believe that the results come from a combination of initial curiosity and privacy cautiousness. Although the differences are not conclusive, they are similar for all three quartiles. We may therefore entertain a bit of speculation here. The bidders who were really considering the difference between the scenarios bid higher and did not accept the conditions of the year long study — raw data confirm this as the median of those more “privacy aware” is about 20% higher. There may also be a reason for the 1st bids of those who kept bidding till the last auction being lower than those of the former group. The people who put only the first bid did it only for curiosity and just did not want their data to be used for commercial purposes (the median of participants’ bids does not change with the decision in the second auction round).
We have already presented one plot related to the third bid. It is, however, very interesting how the bids changed between the auctions. The graph of the next figure shows bid distributions of people who entered values for all three possible uses of data — they depict the same data in two different forms. On the left hand graph, the x-axis shows value of bids in € and the y-axis shows the fraction of bidders who entered bids of value lower or equal to a given amount.
Distributions of bids in auction rounds.
One can see that the median of the bids increased about twofold when compared to the bids when the data was to be used not only for academic purposes but also for commercial purposes. The extension of the study period from one month to a year yielded another twofold increase in the median bids. This is a clear indication that participants were more sensitive to the purpose of the data collection, than the duration and quantity of data collected (the period has been increased from one to twelve months). Interestingly, there are huge differences among countries in the sensitivity to the time extension. Medians of the bids increased by 20% with Czechs, 250% with Belgians, and fivefold with Germans and Slovaks.
Exploring the Non-linearity in Time
As explained in the previous section, participants were presented with an option to take part in a commercial study. They were asked to provide valuations for studies of one month and twelve months durations. Perhaps surprisingly, the valuations were not linear in time — it is not the case that the amount sought in the second case was twelve time the amount in the first case. Instead we only observed a modest increase of a factor of two on average (five in two countries).
We see two possible explanations for this phenomenon. Firstly, we could explain it through hyperbolic discounting — people tend to value most things that are close in time, and often irrationally undervalue options that are farther away into the future. Studies of this phenomenon and privacy were already presented. Yet the non-linearity is substantial, and only an extreme form of hyperbolic discounting behaviour could explain it.
Correlation between time spent in the same cell for two consecutive months.
A second hypothesis is that the value of the data extracted in months after the first one, is indeed of less value to the participant. We chose to test this hypothesis by using data from a real mobile phone usage study performed at MIT in the context of the Reality Mining project. In this study, about a hundred participants were provided with mobile phones recording all their interactions with their phone, and the mobile cell they were in. This is exactly the type of data that one would expect to extract from our fictitious cover study.
Our thesis is that data after the first month is of less value to the participant, and also to third parties collecting it. The only reason we can see for this is that additional data provides very little new information to an observer about a participant’s location, and therefore little additional privacy infringement for the observed party. We analysed the reality mining data from two arbitrary months (month 9 and 10) from the reality mining project to test this. We extract and plot in the last figure the amount of time spent by each person in a particular cell for both months. The cells are aggregated by time spent in them, and the intensity of the graph indicated time spent in the cell in the second month.
We found that the correlation between the location of users in one month and the next one is striking. When a participant spends more than an hour in a cell at month n they are very likely to also spend a similar amount of time in the same cell at month n+1. In case one spends more than one day in a cell, she is almost certainly going to spend a similar amount of time in the same cell the month after. These locations are likely to correspond to the accommodation used on a daily basis, and the place of work or study of the participants. These are indeed not likely to change for most people from one month to another. On the other hand, there is only a very weak correlation for cells in which the user spends less than 1 hour per month.
The analysis of this data seems to provide evidence to support the second hypothesis: participants are not irrational to value less subsequent months of surveillance, after the first one. An observer gets a lot of information at the start of the observation period, such as their usual movement pattern. Subsequent months add very little information, and can therefore be seen as less valuable both from the point of view of the observer, and the person observed.
The highlight of the study could have been the evidence of Greek sensitivity to possible privacy breaches if a larger sample set was collected. Still, the findings are worth mentioning and they deserve a follow-up study to be confirmed. The reason why we see so much importance in these results is a possible impact of an eavesdropping scandal. Top Greek politicians were being wiretapped for a period of eleven months during and after 2004 Olympic Games as was confirmed at the beginning of February 2006 by the Greek government — just two months before this study actually took place.
Basic results confirm a Cambridge study, which inspired us, in the overall value of bids — e.g. medians of bids are £20 and €43 (i.e. about £28 at the August 2006 exchange rates) for non-commercial use of data, respectively.