Towards greater ecological validity in security usability

When you are a medical doctor, friends and family invariably ask you about their aches and pains. When you are a computer specialist, they ask you to fix their computer. About ten years ago, most of the questions I was getting from friends and family as a security techie had to do with frustration over passwords. I observed that what techies had done to the rest of humanity was not just wrong but fundamentally unethical: asking people to do something impossible and then, if they got hacked, blaming them for not doing it.

So in 2011, years before the Fido Alliance was formed (2013) and Apple announced its smartwatch (2014), I published my detailed design for a clean-slate password replacement I called Pico, an alternative system intended to be easier to use and more secure than passwords. The European Research Council was generous enough to fund my vision with a grant that allowed me to recruit and lead a team of brilliant researchers over a period of five years. We built a number of prototypes, wrote a bunch of papers, offered projects to a number of students and even launched a start-up and thereby learnt a few first-hand lessons about business, venture capital, markets, sales and the difficult process of transitioning from academic research to a profitable commercial product. During all those years we changed our minds a few times about what ought to be done and we came to understand a lot better both the problem space and the mindset of the users.

Members of the public were invariably thrilled when hearing about the topic of our research: “At last!” was their feeling, immediately followed by personal tales of password-induced frustration. And yet, when we offered our prototypes for testing, not merely at no charge but at negative cost (with a small monetary reward for the participants’ time, as is common in user trials), the uptake was minimal. It turns out that, while people hate passwords with a passion, they hate changing their habits even more. If you develop a solution that works on a certain browser, or on a certain brand of smartphone, nobody except the keenest early adopters will even bother trying it, regardless of how good it might be, if they’re not already using that browser or that brand of phone. If you are an academic researcher trying to come up with a novel solution, it is reasonable to develop just for one platform (usually the most open one, the one that gives you greatest control) and leave the porting to other more commercial platforms as an intellectually unrewarding “exercise for the reader”. If however you care about your solution being used in practice (for example because you are an entrepreneur trying to make money from your invention, or because you want it to solve a problem rather than just make an intellectual point) then you must support the platforms that real people actually use, even if it’s a lot more work.

In a 2012 paper called “The Quest to Replace Passwords”, which grew out of the “related work” section of my 2011 Pico paper and then became one of the most cited reference works in the field with over 800 citations on Google Scholar, we examined and compared in detail several dozen web authentication schemes. The web is largely responsible for the proliferation of passwords that we experienced over the past quarter century and most of the current password usability literature still focuses on web passwords. However, the problem with web passwords is sufficiently annoying and sufficiently pervasive that most users already have some coping strategy that allows them to survive, such as having the same basic password for many sites, sometimes with minor variations—or, for the more sophisticated users, a password manager, perhaps the one already embedded in the browser. Users still grumble, but they get along, and they prefer continuing to use their admittedly imperfect workaround rather than changing their habits. Commercial websites, conversely, as originally noted by Florêncio and Herley in 2010, have also evolved their own coping strategies, after realising that annoying their users too frequently will cause those users to take their custom elsewhere; hence the long-lived login cookies, thanks to which users no longer have to type their passwords every time. As a consequence of these two trends (ad-hoc coping strategies from both users and websites) we witness the security usability paradox that, while users continue to be very vocal about their hatred of passwords, they are generally reluctant to try new solutions to get rid of them.

In a new paper recently accepted by the Journal of Cybersecurity, “Deploying authentication in the wild: towards greater ecological validity in security usability studies” by Seb Aebischer, Claudio Dettoni, Graeme Jenkinson, Kat Krol, David Llewellyn-Jones, Toshiyuki Masui and Frank Stajano, we report on a user study on web authentication we performed in collaboration with Gyazo, an Alexa Top 1000 website, and how it led our Pico team to pivot from the web to the desktop in our quest to alleviate password problems. We tell the edifying story of our ill-fated trial with a friendly government organisation, Innovate UK. We distill a number of instructive lessons about usability of authentication and about the path travelled by the Pico team over the years, some of which might now seem obvious in hindsight even though we had to learn them the hard way. We also offer our open source code (as is—no maintenance or support) for others who might wish to improve on it.

At a higher level, we note how the publishing incentives in academia are stacked against validating security usability research with realistic experiments. Early-career researchers get to publish more papers, and with more respectable-looking statistical results, if they validate their hypotheses with simulated experiments on Mechanical Turk that reach hundreds of experimental subjects, as opposed to actually building working prototypes, finding a few people who will accept to use them as part of their daily tasks, deploying the imperfect prototypes to them, fixing the inevitable problems as they come up, and observing and debriefing those users after extensive practice in their normal environment. For a given investment of time and effort, the latter strategy will lead to substantially greater engineering effort, fewer data points, less convincing statistical evidence and many fewer papers overall. It is easy to see how researchers seeking academic promotion would be deterred from following this approach. But our thesis is that, while there is scope for mTurk surveys during the initial exploratory stages, the true validation of a security usability design is only offered by the more laborious iterated methodology of design – build – deploy – measure – fix – repeat that is successfully adopted in other academic areas such as Systems research. Security is almost never the user’s goal, but rather something that gets in the way of honest users who were trying to achieve their goal; this is why simulated experiments (which focus on using the security mechanism itself), as opposed to live deployments (which focus on getting on with one’s daily routine despite the security mechanism), only tell a small part of the story. We hope the security usability community will recognise the value of failure and of realistic deployments and will therefore change its incentives to reward this higher standard of ecological validity.

author = {Seb Aebischer and Claudio Dettoni and Graeme Jenkinson and Kat Krol and David Llewellyn-Jones and Toshiyuki Masui and Frank Stajano},
title = {Deploying authentication in the wild: Towards greater ecological validity in security usability studies},
year = {2020},
journal = {Journal of Cybersecurity},
doi = {},
publisher = {Oxford University Press},
url = {},

Leave a Reply

Your email address will not be published. Required fields are marked *