12 thoughts on “Liveblogging SOUPS 2013

  1. Robert Biddle kicked off the Workshop on Risk Perception in IT Security and Privacy. He used Crowdflower to assess US residents’ risk perceptions for 30 activities such as using an ATM, having your computer fixed and using a search engine, for risks such as loss and embarrassment. He then did hierarchical clustering to get a dendrogram of risk perception, and also graphed risks by perceived severity and likelihood. Financial risks like ATM and online banking are seen as severe but rare; risks from social networks and adult content as more frequent and less concerning; while using a credit card over the phone has the highest combination of likelihood and impact. Subjects seem unaware of linkages, such as that installing bad software can endanger everything else, including online banking.

    The next talk was by Janice Tsai, a Microsoft privacy manager, on how the company has been improving its just-in-time browser warnings. A few years ago, these were “Too much too late” (the title of her talk); recently they’ve been trying to optimise the flow and the text so as to educate people in time, and helping them learn.

    Ester Moher is a behavioural economist in the electronic health information lab (EHIL) at the University of Ottawa, interested in how people perceive disclosure risks in a health setting. EHIL develops the PARAT software for anonymising health records, which reduces the ethics process for medical research from 6–12 months to 2 weeks. Question: does privacy assurance induce people to disclose more data? She looked at administrators first, soliciting rates of antibiotic-resistant infection in old age homes. Home operators who responded early disclosed low infection rates, while the late responders had more infections. She concluded that the privacy assurance letters made privacy more salient for guilty home operators. Then she recruited 400 participants from North America and asked a slew of health questions of varying sensitivity about the subjects and their families, giving non-response options too. The manipulations were in the consent form; when assured that everything is confidential they are more likely to not disclose, confirming Leslie John’s results. Subjects were also more dishonest when they did disclose; and the study was replicated twice with variants. She was surprised the results were so robust given that people often don’t seem to read consent forms before signing them!

    Larry Koved’s topic was perceived risk in mobile authentication. Mobile devices are now security tokens and mobile wallets up the stakes further; how do people view this? He studied IT workers, mturkers, doctors and IT security experts, asking them their views on doing a range of tasks (with company apps, bank apps, browsers) in a range of environments. A Beijing hotel room, a busy street and an Internet cafe are seen as comparably unsafe; your home, your office and a quiet train are comparably safe. Security experts paid attention to being directly observed: to shoulder surfing risks, then MITM. IT workers also prioritised these but were also concerned about theft of their mobiles. Doctors were less concerned with direct observation, but more with account takeover or device theft, both of which raise compliance issues for them. Individuals were not concerned about network-borne attacks. About half the subjects locked their phones; most IT workers did, and most individuals didn’t.

  2. People reason analogically about security, and Jean Camp wants to use their existing mental models to get them to make effective mitigating choices rather than educating them to understand what’s going on. The biggest target group is US English speakers, and the most rapidly growing segment is seniors, who will control ever more assets. Most warnings, however, are not clear, actionable or personalised. Jean gave examples of effective and unusable warnings.

    Two researchers from Leibniz University whose names were not audible created a mockup Android market with four apps in each of four categories (office, finance, weather and games). Some required unreasonable permissions (e.g. a weather app wanting your contacts as well as network and GPS). Half the time, apps gave visual warnings of permission effects. Each of the subjects installed an app from each category for each type. Visual warnings decreased installations of high-requesting apps. However this was just a pilot study with 11 subjects.

  3. Lutfi bin Othman wants better estimates of which attackers have both the capability and motivation, and did a survey on ten experts’ opinions around a videoconferencing system. He concludes that knowledge of attacker capabilities helps reduce uncertainty. In questions, people argued whether attacker capabilities could be modelled usefully.

    Mary Ellen Zurko’s position was that different organisational stakeholders, such as operators, administrators, security officers, IT management and executives, all have quite different definitions of risk. Most researchers however don’t have much experience of “compliance” as it happens in the corporate world. System designers ought to think carefully what sort of feedback people in different corporate roles will get when things go wrong; to understand their risk perception you must think whether what they have (or plan) in compliance, out of compliance, or not covered by compliance?

    Matt Bishop’s argument is that security guidance documents are often defective as they make implicit but unreasonable environmental assumptions. For example, the US federal government has money and experts not available to states and counties; a directive to implement RBAC for elections may seem sensible in Fort Meade, but cuts little ice in a county whose IT department consists of one or two people who don’t even understand the standard. He told of a system for filing real-estate transfers that was claimed to be secure as it had SSL, without any thought as to endpoint security. He wants to gather data on standards failures, and has a particular interest in online and electronic elections.

  4. Marian Harbach is interested why people won’t use privacy-preserving authentication; is previous work established that many users are a bit fatalistic, feeling overwhelmed by the number of things you have to do to be “secure” and the feeling that “they’ll get you one way or another”. In his latest work he looks at what threats people worry about in what context. In general Internet use, these are “malware” followed by “losing personal data” with everything else much lower. More specific scenarios will follow once he has the data and analysis.

    Alexander Mimig’s topic is security apathy; many writers have commented on users’ willingness to compromise security to get work done, and the general lack of interest in protection. Unwillingness is different from inability, and tools seem to ignore it. Users may have different responsibilities, experience, knowledge and risk perception; Dourish et al wrote on frustration, pragmatism and a sense of futility in 2004 in the context of embedded systems. In questioning, one point was that security apathy is reasonable in a world of pervasive ubiquitous computing; surely it’s up to engineers to make more reasonable assumptions about users’ security maintenance budget.

    Zinaida Benenson has been looking at the differences in security and privacy awareness between Android and iPhone users. By choosing one of these phone you choose a risk communication strategy, whether Apple’s “Don’t worry be happy” or Google’s expectation that they be technically literate and be convinced by rational security argument. She ran a survey and found that iPhone users, like Android users, claim to consider privacy and permissions when deciding whether to use an app – despite the fact that (to a first approximation) they don’t really know. More work is needed on users’ technical literacy, the efficacy of runtime versus install-time warning, the optimal strategy for non-savvy users and the ethical aspects of not informing them at all.

    Serge Egelman argues that security usability must be optimised for human behaviour in the aggregate rather than the “average user”. How can we personalise security warnings (for example) for individual traits and behaviours? He’s using the ten-item personality index and looking for correlations with various standard privacy metrics, and also the Crowne-Marlowe social desirability scale which may be one of the oldest metrics of willingness to disclose; the Moon verbosity index; and the John-Loewenstein-Acquisti set of invasive questions. What other metrics should they be using in such surveys? We don’t seem to have much in the way of security behaviour metrics.

  5. The SOUPS workshop proper opened with “When it’s better to ask forgiveness than get permission” by Chris Thompson. His goal is to enable users to understand what’s causing what on their mobile phones: too many interactions make warnings likely to be overlooked. The idea is to grant low-risk permissions automatically (things like changing the volume and the wallpaper) but enable the user to easily attribute any unwanted actions. They surveyed 189 users; 73% of them could find the data usage panel to attribute a data overuse. However only 22% were aware that a backgrounded app could still use the Internet. Provenance of changes to settings, and identification of annoyances, need to be properly engineered. They built a notification drawer mechanism, which they tested on 76 Android users from Craigslist. Their goal was to identify which app was causing misbehaviour (unwanted vibration, or Justin Bieber wallpaper), and the new mechanisms significantly helped.

    The second talk was by Mathias Beckerle, joint winner of the best paper award. His goal is to make access control rule sets readable and understandable. Rules should have no gaps, no redundancy, not contradictions and no overfitting; they should be minimal and not deny any authorised access. These goals were formalised, leading to quantifiable measurements based on access decision sets and problem sets. The methodology was validated with an expert user study, of IT support staff. The takeaway is that metric-based optimisation can really help usability.

    Third up was Eiji Hayashi on context-aware scalable authentication; we authenticate users based on both active and passive factors, and big firms increasingly select the former (password or 2FA) based on the latter (IP address). The paper works out the probability theory for optimal decisions. Case studies followed: if you don’t require a phone PIN at home but do elsewhere, then do you require it at work or not? In the former case you’re signalling that the workplace is not safe. His resolution is requiring a PIN at work only if the subject isn’t using a computer at the time; if she is, the computer screen shows if the phone is active and its location. Users liked this.

  6. Oshrat Ayalon did her masters’ thesis study on managing longitudonal privacy in social networks. Her theme is that digital persistence shifts informational power away from individuals, and she studied how sharing preferences change with the time passed since publishing information on a social network. A within-subjects study with 193 mturkers looked at whether they’d like to share, or alter, old posts as a function of age; and why. Willingness to share decreases with time but grows in variability. People mostly wanted to delete stuff because it was irrelevant, then because their circumstances had changed; concerns over inappropriateness or offence followed after that. But there was no significant correlation between willingness to delete and elapsed time. She suggests the privacy paradox is at work here; it’s easier to stick with the default of leaving stale content up there. This leads her to suggest a default expiration date, and her figures suggest that two years would be about right. She is now doing a between-subjects study to validate this.

    Scott Ruoti’s talk was on “Confused Johnny”. He wants to bring e-mail encryption to the masses and developed a transparent system, which made people confused; a more manual system which was more usable. His system Pwm (private webmail) works with gmail, hotmail and yahoo using javascript and is aimed at “good-enough” security. There’s a secure overlay on the compose window, using iframes and the same-domain policy to stop either an attacker or even gmail from reading it. Installation uses bookmarklets, and identity-based encryption is used to allocate escrowed private keys to email addresses. Subject lines are left in clear so that subjects can do some searching (and webmail providers can still serve some ads). No user interaction is required in the simplest case. However participants wanted more details and were unsure of who could read their emails and some forgot to enable encryption when sending email. A more manual version was then mocked up where people had to log into a key escrow server which let them set up access lists, and cut-and-paste ciphertext: this improved comprehension while getting almost the same usability score. Users trusted the system more when they saw what looked like ciphertext. He suggests a hybrid approach which is easy to use but where you can see what’s going on.

    The morning’s last talk, by Cristian Bravo-Lillo, seeks to make genuine risks harder to ignore. Most dialogs can be eliminated but not those whose criticality depends on information the user has but the system doesn’t. Their approach is to get people to pay attention to the salient information such as the name of a software publisher; how can you do this in the face of habituation? They tried mechanisms including the ANSO standard contrasting font; an animated connector between “publisher” and its name; causing the name to flash; or even forcing the user to type the publisher name in order to select the risky option. Their experiment got 872 mturkers to “download” a silverlight update from Miicr0s0ft.com; many of the attractors were significantly better than the controls. As for habituation, even after 2.5min/22 repetitions they were still better.

  7. Harold Thimbleby’s invited talk was on security and safety overlaps. His initial example of poor safety usability was a coastguard phone in Wales with a sign saying “999” but only buttons for “123”, the alternate UK emergency number. Why couldn’t the notice be changed? His second was an infusion pump with a “do not switch off or alter rate” message sellotaped to it; why couldn’t it be locked? Medics argue that computers are good for safety as their reliability can approach 100%; yet a hammer is 100% reliable. One make of infusion pump had 4% of operator time spent on trying to change batteries, even though it’s mains-operated. In short, there’s a huge amount of sloppy, non-systems thinking. In one pediatric ward, computerisation increased the mortality rate from 3% to 6%; the extra time taken to feed the machinery effectively reduced the number of doctors by one.

    Measuring design-induced death is hard, but on the best available figures some 4,000 people get killed each year in Britain, about twice the road traffic fatality rate. (Accidents involving computers are about 10% of the overall toll from medical errors.) By comparison, deaths from WHO “never events” such as wrong site surgery are about 55 a year (but there’s significant underreporting). US data are more reliable: 2,500 wrong-site surgery events are reported a year. Even there, confidentiality and “clinical judgement” cover much; and surgeons can’t read the manual for every device as there’s no time. A surgeon who kills a patient after not reading it will be fired, leading to loss of the experience that could lead to better manuals. Instead people cling to the dogma that such errors don’t happen.

    Harold showed two Bodyguard 545 infusion pumps with quite different user interfaces, with known design errors in one of them. Next was the Multidata therapeutic accelerator which killed 18 patients in Panama (if you moved the mouse anticlockwise the dose calculation was wrong) – the manufacturer got out of that by pointing out that under their contract the hospital “held them harmless” and arguing that clinical judgment should have been used; several radiotherapists are now in prison in Panama for manslaughter. Next was the report into the Varis 7 in Scotland which also held the staff member largely responsible; the invstigators from the Scottish government did not talk to the suppliers or look into the software. The Hospira aimplus pump uses the same key for the up button and the decimal point to save money, and places it next to the zero, against FDA guidance; after a fatal accident the root cause analysis concluded that nurses should be trained better. Yet a two-hour study with five nurses showed that most were confused by most functions.

    Another fatal accident occurred when two nurses mistook the daily dose of the chemotherapy drug fluorouracil for the hourly dose; it would be safer for pharmacists to calculate the dose than nurses, but pharmacists earn more and are also in a position to dump the liability on the nurses. Unlike the military, healthcare does not seem to do the teamwork aspects of safety.

    Safety is about stopping good people doing bad things while security is about stopping bad people doing bad things. Yet in a culture of blame avoidance, safety becomes like security; anyone who did a bad thing must be a bad person. It can even be worse; if security fails people try to fix it in case they’re attacked again, while in the hospital culture ignoring a fatal error is seen as doing no harm, and may even help if the patient’s family give up the lawsuit and go away. Reporting errors is discouraged: the nurse Kimberley Hiatt accidentally gave an infant a fatal dose of CaCl after a blameless 25-year career and reported it immediately but she was fired, fined $3000, and killed herself; the nursing commission closed the investigation, so we don’t know what went wrong.

    The regulatory side is completely broken; vendors get pumps approved that don’t comply with ISO standards even for UI symbols. The vendors’ submissions to the FDA have tens of thousands of pages, which staffers don’t have time to read, and there is no product evaluation. Why for example does a ventilator have an “off” button that a nurse can press accidentally, when it’s even got a backup power supply to keep going in the event of mains power failure?

    In an ideal world we’d not just fix this, but also train designers to compensate with errors (such as attribute substitution) likely to be made by biases in our perceptual system (Kahnemann’s system 1). This means working to make errors and hazards perceptually salient. Harold gave the example of wheel-nut indicators which make loose wheel nuts on buses obvious. Can this be applied to medicine? He did a trial with 190 nurses entering real clinical data into new UI designs. Traditional keypads had a 3% unnoticed error rate; eye-tracking showed that nurses looked at complex keypads, while with a simple up-down arrow pair to enter doses, they looked mostly at the display, leading to half the error rate.

    A second point is that errors should lock the display until it’s cleared; entering 1/0+3 into a standard calculator outputs 3 following the error at the fourth keypress. Similar considerations apply to wrap-around. To investigate this, he got a keystroke error model, ran simulations, and studied the rates of unnoticed bad results. It turns out that most existing software fails to deal adequately with predictable number entry errors. Doing the error-handling right could halve the death rate associated with infusion pumps. Yet we don’t test the safety usability of infusion pumps, although we do test them for electrical safety despite only 22 people die a year of shocks (only two of these in institutions)! More at http://www.chi-med.ac.uk.

  8. The lightning talks session started with Arne Renkema-Padmos talking of a design approach to deception. Design patterns need to be contextualised; how do we deal with new threats: His idea is to compress out worldview into a “design lens”. Our trust is constructed by observing signals, which can be faked or mimicked; Diego Gambetta’s work on trust signalling among taxi drivers and the Mafia gives many insights into how the game is conditioned by the ease of constructing or faking signals. Arne is trying to apply these ideas to phishing, IT supply chains, and food supply chains.

    Janice Tsai next told us why we’re too smart for our own good. She spent a year working for the Californian state senate on privacy for smart grids.The public utility commission put out privacy rules for energy usage data in 2011. She described what the regulations require.

    Patrick Walshe of the GSM Association talked of the many policy initiatives around smartphones; but there’s not much idea how laws translate into code and usable choices. They’ve done privacy studies in Europe, Malaysia and Indonesia, and it seems people withhold data for lack of trust. The association has published privacy design guidelines.

    Aristea Zafeiropoulou investigates the privacy paradox, and previously established that it applies to location data. She’s now working with focus groups to explore this further. Preliminary findings suggest that trust isn’t an issue, so much as not sharing location with some people (such as parents) and untagging embarrassing photos once they start looking for jobs. Basically they care who can see their data, not which apps can.

    Luke Hutton noted that location sharing has been around for five or six years, long enough for social norms to start to emerge. The tensions are between social uses such as showing off, and commercial ones. He wondered whether introducing incentives might disrupt this, and built an app that paid people for checking in. Some subjects were told the data would be shared with advertisers. Afterwards, more people felt “businesses are entitled to my PII in exchange for money”; on the other hand, feedback about secondary data uses cut checkin rates. In effect people were more selective about location disclosure.

    Anna Ferreira investigates socio-technical attacks that meld technical and deceptive elements. Phishing is just the start; she has a framework for how we act differently when alone, with friends, at work and so on. She is testing this on TLS warning messages, with an attack that primed users to accept “self-signed certificates” as a brand.

    Akira Kanaoka asks whether perceived risks are the real risks for the user. He’s developed a hierarchical classification of risks to the user’s money, reputation and so on, and is doing a preliminary user study with students.

    Martin Emms works on contactless payment, door-entry and travel cards. He’s been looking at the data that can be got from bumping against someone; it includes name, date of birth, who you work for, where you live, your credit card numbers and roughly when you travel. The more cards people have on them, the richer the picture; he can get your last couple of transactions to impersonate you to online banking, and phish for your pin number by trying a guess. Repeated pin tries can be done using a bad door lock. Bump attacks in crowds could harvest data on a fairly large scale using a standard mobile phone. Amazon makes it easy to set up accounts as there’s no CVV or address check; this loophole should be closed.

    Stuart Schechter talked of ethical compliance as a service. The Alien poster said “In space, no-one can hear you scream”; yet we cannot hear the screams of the mturkers we hire for experiments (if they read consent forms it lowers their hourly wage). Ethics is almost always a secondary task for security researchers, so: can we make the ethical path the easiest? He proposes ethics as a service; people who complete deception studies would be redirected to http://www.ethicalresearch.org for three random questions. The idea is that real-time data can replace hunches and speculation; harmful experiments might even be detected in real time and stopped.

    Jeremy Wood proposes that people should be able to share fine-grained location information in public places but much fuzzier in private ones. He has a company http://www.locationanonymization.com.

    Robert Biddle suggested it’s time we had a grand challenge prize in security usability, along the lines of the Sikorsky prize for a human-powered helicopter that was won recently. Possible topics include usable encrypted email, usable encrypted data storage, usable identification of website provenance, useful software signatures, usable strong passwords and usable 1-time passwords. Perhaps we could call these the “Johnnies”. The Sikorsky prize required a 60-second flight to 3 metres within a 10 metre square; what would be the more precise testable target?

    Eva Vuksani finally talked of Device Dash, a game in which a system administrator tries to keep a corporate network free of compromise. There are users attaching bad devices, and administrator compromises that spread to all nearby users and devices; for defence there are scanners and network access control. The game is more like space invaders or tetris though than a strategy roleplay.

  9. Thursday’s final session started with Pedro Leon on factors that affect user willingness to share information with advertisers. People can find targeted ads creepy if they search for “scalp conditions” on WebMD and then get ads for medicated shampoo on Expedia. Per-company opt-outs are inadequate; users just don’t know enough of the likely consequences of their actions. They did a survey of sharing with WebMD/WebDR alone, with Facebook, and with other visited sites; retention for one day or indefinitely; and whether the subjects could access and edit the retained information. Factor analysis suggested that personal information: browsing history, computer information, location, demographics, and finally name/email address. Retention period impacted browsing, location and demographic data sensitivity; sharing scope impacted browsing, location and PII. Willingness to share increases with limited retention, but only if the limit is a week or (better) a day.

    Saurabh Panjwani’s topic is “Do not embarrass”. User attitudes to online behavioural advertising are complex but largely negative. However users mostly don’t know how it works, and the paper by Ur et al at SOUPS last year used a WSJ journal which is privacy priming. Also, most subects are American. He did a pilot sample of 53 Indian users, mostly IT workers in Bangalore; and built a plugin to scrape and cluster the subject’s last 1000 URLs; over half of them volunteered to use this. Consistent with other studies there was low initial awareness (only 3 of 53 could explain what cookies do). Attitudes were more neutral than in pevious studies (25% +ve and 28% -ve, compared with 17/40 in Ur et al). Subjects marked about a quarter of their browsing as sensitive; banking was top, then email, SNS and adult sites. Travel ads were most liked and sex ads most hated; females liked ads much more. 74% of subjects reported getting embarrassing ads: sex, dating, lingerie and matrimonial being the main categories. He concludes that better treatment of embarrassing ad categories might be a way forward.

    Thursday’s last talk was from Idris Adjerid on “sleight of privacy”: the ability of minor, non-normative changes in the framing to influence their propensity to disclose. Framing and reference dependence are well known, along with anchoring and contextual effects. Their hypothesis was that framing a privacy policy as increasing protection would result in increased disclosure. 386 mturkers had privacy propensity measured with the Acquisti list of creepy questions. Indeed, users with an apparently decreasing protection model disclosed less, while an increasing-protection frame caused them to disclose more. Privacy misdirection can also be based on the fact that attention’s a limited resource, so they wan a survey of 280 CMU students; they were told their answers would be available to students, or to faculty and students; the treatment groups saw one of four misdirections any of which (even a privacy-irrelevant one) eliminated the effect of sharing with faculty. They conclude that bounded attention limits the impact of privacy notices anyway, and that we should therefore either design privacy notices to compensate, or establish baseline privacy protections that are not dependent on notice and choice.

  10. Stuart Schechter kicked off Friday’s proceedings with the first of the morning’s lightning talks. He examined the Rockyou dataset to look for traces of password strength rules, and found that 1% of passwords with upper/lower, numerical and special characters were “P@ssw0rd”. How can we design better feedback for users? There are too many things they can do wrong to give a proscriptive ruleset; his philosophy is WWSD “What would Shannon do?” and the approach is to give users feedback on how much entropy each new character adds, with warnings against particularly likely next characters.

    Ivan Chardin described his company’s products. He complained of the lack of consistent transnational regulation of web-page publishers, and their lack of incentive to do anything to limit the trackers; he talked of future software products to screen personal information sharing for both browsers and apps.

    Tammy Denning has designed a deck of threat discovery cards to help people develop a security mindset. She showed some examples, describing various assets, attacker resources, motivations and interesting threats. She’s at securitycards@cs.washington.edu.

    Jeremy Epstein talked of NSF’s solicitation, due October 15th, for research projects into a secure and trustworthy cyberspace; the call number is 13-578 and usable security is definitely in scope. Last year he funded 5 usable-security grants making about 7% of the $80m budget.

    Friday’s first refereed talk was by Dirk Van Bruggen. He’s supplied 200 undergraduates with free smartphones on return for monitoring usage; they collect meta-data and configuration history, but not content. They have been experimenting to see whether either targeted security warnings or social influences can affect security behaviours. Initially 51% used a pattern lock, 14% a text lock and 35% nothing; they sent a message saying that losing an unlocked phone could violate university policy on sensitive information (on other students) in the deterrence treatment, offered a $10 Amazon voucher in the reward treatment; and sent out an exhortation in the morality treatment. Deterrence provoked the most interest, but the morality appeal resulted in the most permanent change. Females were more likely to change, as were persons with an agreeable personality trait. People who used more data were less likely to change; and students who changed their behaviour were more likely to have a best friend who also did. Future studies will range from mobile AV through credit card use to risk perception.

    Florian Schaub is exploring the design space of graphical passwords on smartphones. Phones restrict designers in some ways and empower them in others. The main discriminants in such schemes is cued-recall versus recall versus recognition; the spatial and temporal arrangement interact with the password space, observation resistance and memorability. The nature of cues and decoys also matters, as does the use of sophisticated interaction methods such as multitouch. Constraints include not just screen size and the available sensors, but also the vision and other impairments of some users. Their paper sets out to map all these interactions; it also looks at five existing proposals (Pass-Go, UYI, TAPI, CCP and MIBA). They reimplemented all these schemes on Android and tested them on a Galaxy nexus, using a PIN system as a baseline and 60 CS students (half of whom didn’t use any phone lock, as in the previous paper). Pass-Go did as well as PIn for shart passwords (14 bits); none did better for long passwords (42 bits). Live and video shouldersurfing showed that CCP, MIBA and TAPI were significantly harder to surf than PIN/password entry, perhaps because CCP and TAPI in particular have very small touch targets. This empirical work helped validate the framework; for example cued-recall schemes are preferable as are small touch targets while randomised positioning impairs both usability and security (slowing the user and giving the observer more time).

    Jaeyeon Jung’s talk was on “Little Brothers Watching You”. A Pew survey of 714 people found 30% had uninstalled a cellphone app when they found it was collecting data in ways they didn’t like. They did a role-playing lab study with 19 Android users to explore the gap between expectations and reality. The subjects were graduates, and had all played Angry Birds, but initially only six of them had a realistic idea of what information apps collect and where they sell it. They then got users to play games on a phone with privacy leak tracking enabled: Taintdroid shows that Angry Birds shares phone ID data with flurry.com, jumptap.com and appads.com, while Toss It shares location with adwhirl.com and myyearbook.com despite not needing location at all. They implemented privacy-leak notification with a drip-drip-drip sound, vibration and a white/pink/red graphic. All users, even the marketing-savvy ones, were surprised by the frequent and wide sharing to unrecognised destinations. One of them said every time you use the phone or an app it’s not big brother watching you but a lot of little brothers.

  11. Researchers doing password studies often worry about whether they are realistic enough to be valid. Matthew Smith decided to run a study of ecological validity. This was done both online and in the lab, testing the effect of priming on password strength and composition. He looked for subjects using systems to generate passwords in order to classify their behaviour accurately. The roughly one-half of users that did comply with advice showed strong correlations, not just between treatments but between lab and online. He concludes that much of the effort that password researchers put into obfuscating their experimental goals might not be as necessary as they think.

    Julie Thorpe presented GeoPass, where users authenticate by clicking on a map location. 97% of people could remember their chosen location a week later; but could an attacker guess the spot, given some knowledge of the target user? Higher zoom levels give more security; the optimum seems Google maps level 16 which shows buildings, while higher levels often lack detail. Setting a 10 pixel error tolerance seemed best. One issue is that users often don’t remember or even notice their zoom level; setting a location and thus an error tolerance at Z16 and then logging in at Z15 can make it fiddly. It might be improved if users who take a long time to log in, or who drag around a lot, are advised to change the scale. Login times were much longer than for regular passwords; 25–30s mean across three sessions. As for attacks, an opponent might try points of interest, as defined by Tripadvisor; a local-knowledge adversary might do still better. Julie claimed such an attacker would still have to try tens of thousands of likely locations in Toronto alone. A questioner pointed out that a middleperson could observe which map files had been downloaded, greatly decreasing the guessing effort.

    Elizabeth Stobert is interested in how the graphical-password design choice between recall, cued recall or recognition affects security and usability. She did 5 between-subjects studies, all online although one had a lab component. Two were passwords (assigned and chosen) and three used PassTiles (which can be configured in each of the three conditions). Memory time (the longest anyone remembered the login) was no better for any graphical password than for assigned text, though chosen text lasted longer; assigned text led to more password resets and chosen text to fewer, but there were no differences in resets between PassTiles conditions; text was faster for logging in; and more people wrote down assigned passwords. Replicating the experiment with stronger (28 bit) passwords gave no significant differences except increasing login times. In a further replication, they discouraged subjects from writing passwords down by not telling them there’d be a followup session, and found that object PassTiles were remembered significantly longer than assigned PassTiles. Finally they changed the rules so people didn’t have to shuffle PassTiles and this reduced login times from over 30s to under 20s. Finally, recognition-based PassTiles are more memorable than the recall-based variant. Recognition involves a binary decision on each image; recall involves fewer but more complex tasks.

  12. Thanks for the blogging!

    ‘with a sign saying “999″ but only buttons for “123″, the alternate UK emergency number’

    Do you perhaps mean 112?
    (It’s only a small point, but might just be important to someone, sometime!)

Leave a Reply to Ross Anderson Cancel reply

Your email address will not be published. Required fields are marked *