Hardware Scrambling – No More Password Leaks

The main reason for passwords appearing in headlines are large password breaches. What about being able to fearlessly publish scrambled passwords as they are stored on servers and still keep passwords hidden even if they were “123456”, “password”, or “iloveyou”.

We have developed a system that uses a trusted hardware component to “scramble” user passwords. This trusted hardware holds encryption keys that scramble passwords (using SHA1-HMAC) and one needs this hardware to do any password attack. If we use the system for, let’s say, WordPress then instead of calling MD5 function, we call a REST API of our system, e.g.:
pwdValue = readfile("https://scrambler.s-crib.com:4242/SCRAMBLE/george/$receivedPwd/4/salt");

This small difference in the WordPress code (arguably we will need a plugin initially) means that you can publish all scrambled user passwords for anyone to download and no one will be able to find passwords even if they were “password”, “12345678” or “letmein”.

Figure 1: Two S-CRIB Scramblers plugged to Raspberry Pi.

The current implementation uses Raspberry Pi as an “untrusted” host for web service. It is an inexpensive but sufficiently powerful platform for our password scrambling system.

Security

Our way of password scrambling is to compute message authentication code with SHA1-HMAC. This is a one-way cryptographic function with a key. This key is only available inside the trusted hardware device (Scrambler). It is not stored on the server using S-CRIB Scramblers or anywhere else.

As long as the encryption key is kept secret, all passwords are secure, regardless of their own strength. Even if passwords were just one letter, the attacker would not be able to find out from their scrambled values.

Our first implementation used encryption – the idea was to provide more flexibility for administrators. On second thought, we realised that it might be better not to put so much responsibility on administrators and so the product now uses SHA1-HMAC.

The dongle (Scrambler) uses 4 keys / passwords.

  • 1 – 10 characters long is used to identify clusters (when more than one dongle is used to boost throughput).
  • 2 – this is the actual key for SHA1-HMAC
  • 3 – this is used for initialisation vectors.
  • 4 – encryption key for remote commands ENSCRAMBLE and ENGETID. This key is shared with the client (WordPress in our case) to provide end-to-end encryption of passwords sent for scrambling.

We tried to implement key generation as secure as possible. We use a microsecond timer in the dongle that measures communication delays between the dongle and its host computer. We do fifty samplings. If you don’t trust it, you can use your own random initialisation key and and set it with the SETINITKEY command. It is our believe that passwords contain sufficient amount of entropy and that comes from two sources:

  1. A unique hardware key set when a dongle is programmed.
  2. Random initialisation key that is derived from microsecond timer measuring delays in communication between itself and host PC.

Architecture

The system comprises two elements:

  • Hardware dongle – It is attached to a USB port of a host PC. It is a serial device (UART) over USB with a relatively simple management and operational protocol.
  • Web service – This simplifies use of hardware dongles and provides unified RESTfull interface that can be used locally or remotely (e.g., when the client system runs in virtualised environment) as the whole solution offers end-to-end encryption of sensitive data.

Performance

The following figures will be improved as we use a vanilla implementation of SHA1.

One hardware dongle (S-CRIB Scrambler) can scramble around 220 330 (updated 8 March) passwords per minute remotely (with end-to-end encryption). Although it does not sound like a big number, it is sufficient throughput for more than 10,000 users. The throughput can be however multiplied by creating clusters of Scrambles that share the load.

We have tested the system with S-CRIB Scrambler connected to a Raspberry Pi in the role of untrusted host. We used the Scrambler(s) via the web service to include latency of the whole application stack. You can see detailed results below.

S-CRIB Scrambler Design

Basics

We use the same hardware as for our Password S-CRIB and only re-implemented the firmware to add required functionality. The keys / passwords now have 32 characters so they can be directly used with AES-256. Each password can give provide up to 199 bits of entropy as we use 76 different characters. The source of passwords is a combination of a “dongle key” (unique for each Scrambler) and a random SHA1 key generated using microsecond timer applied on communication between Scrambler and the host PC.

Multiplying Throughput – Clusters

One can create scrambling clusters by cloning Scramblers. Scramblers can be cloned by replacing the random SHA1 key with a value derived from the cloned dongle. This is only possible during the initialisation phase when management functions can be used. The initialisation phase is closed automatically after the first call of an operational function (e.g. SCRAMBLE).

Scrambler-WS Design

Basics

The Scrambler-WS implements a RESTfull API. Each call returns one text line for easy processing. The SCRAMBLE command would return the following data:
IN: SCRAMBLE mypassword 10
OUT: 49FD12DB38C3278BEAA95C0CCE491BBFD41D8DC9 10 11le4Ir2LO 0000003F

The first item is a SHA1-HMAC of the password – 40 hexadecimal characters encoding 160 bits if the SHA1-HMAC value. It is followed by the length and the actual salt value. The last value is an operation counter from the dongle.

The call may include an existing salt value, if it is not present a random value is generated. Password salt generated by S-CRIB Scramblers contains lower and upper-case letters and digits. The maximum length is 16 characters.

Remote Use

If you want to use Scrambler(s) remotely – e.g., from an application running in virtual machine, you can use encrypted version of the SCRAMBLE command – ENSCRAMBLE. The parameters are then encrypted into one 64 byte long block that is sent directly to a dongle. Your S-CRIB Scrambler will decrypt it, compute scrambled value of the password and create another 64 byte long block with the result. Encryption is done with on of the keys/passwords generated by the dongle. Request and response are linked with a counter or timestamp sent by the application in the request – it can be up to 10 characters long.

Testing Site

You can test the service at:

  • http://scrambler.s-crib.com:4242https://scrambler.s-crib.com:4242
  • http://scrambler.s-crib.com:4243

A detailed API description is here but here are basic commands you can directly test by copying them to your browser’s address box.

We have removed some internal checks so that you can test all commands remotely. As you can see, there is an API key “george” and the dongle’s ID is 0.

You need to encrypt data to call ENSCRAMBLE. Python code for testing is at Github – here.

Configuration

The testing site uses default parameters. It cancels requests after 5 seconds in processing queue and there is also a limit on the number of WSGI threads. The site also has limit of 20 TCP connections.

Current Implementation

S-CRIB Scramblers use the form factor or small hardware dongles that we manufacture.

Scrambler-WS is implemented with internal queues to maximise flexibility and throughput of the system.

We can provide image for SD-card for Raspberry Pi that we suggest to use as an untrusted host if one is required. This means that you can just simply copy the image to a new SD-card and start using the system once you plug it to a network.

Details of Load Testing

We did load testing using a Scramble connected to a Raspberry Pi. The test client was connecting from a residential broadband connection (20Mbps). The test client started a number of threads to simulate concurrent requests.

Note: results have been updated on 8th March after removing unnecessary delays in communication between the web service and scramblers.

TEST 1 – 5 client threads

  • average latency 1.38s 0.9s per request
  • throughput – 3.6TPS 5.5TPS

TEST 2 – 10 client threads

  • average latency 2.7s 1.76s per request
  • throughput – 3.7 TPS 5.6TPS

TEST 3 – 20 client threads

  • average latency 5.4s 3.54s per request
  • throughput – 3.7TPS 5.6TPS

TEST 4 – 30 threads
260 TIMEOUTs (in blocks of 10 pretty much…) se requests stayed in processing queues for more than 5 seconds (a configuration parameter). No timeouts after communication delays have been removed.

  • average latency 6.4s 5.32s per request
  • throughput – 3.8TPS 5.6TPS

We also looked and memory and processor load on the Raspberry Pi
NO TRAFFIC
3.5 MiB + 654.0 KiB = 4.2 MiB scribTCP.py -- queuing and hardware management
4.9 MiB + 786.5 KiB = 5.6 MiB scribREST.py -- RESTfull API - very light server
6.3 MiB + 831.5 KiB = 7.1 MiB python2.7

30 THREADS RUNNING AGAINST THE BOX
3.9 MiB + 682.0 KiB = 4.6 MiB scribTCP.py -- queuing and hardware management
4.9 MiB + 791.5 KiB = 5.6 MiB scribREST.py -- RESTfull API - very light server
6.3 MiB + 830.5 KiB = 7.1 MiB python2.7

The CPU under load was utilised at about 30-40% while my processes took about 30% of that (in user space). Having said that the overall load peaked around 1.0.

The Python code for Scrambler-WS is at Github.

We are looking for feedback so if you’re interested in testing the system or deploy it, get in touch (dan@s-crib.com)

About Dan Cvrcek

I got my PhD and associate professorship from Brno University of Technology. I was a post-doctoral researcher at the Computer Lab in 2003-2004 and 2007-2008 (almost 3 years combined). I then thought it might be worth having a look at the real world and joined Deloitte. I analysed payment systems, card issuance system, key management in Barclays, Barclaycard, and some more banks. Myself, Petr Svenda and David Gudjonsson founded Enigma Bridge in 2015 - we built a cloud encryption service based on secure hardware.

13 thoughts on “Hardware Scrambling – No More Password Leaks

  1. Back in he 1990s we addressed the first round of password cracking with a similar approach on a system at MIT. We changed the system library crypt() routine so that our crypt used a different number of rounds than the Unix standard. As a result, a stolen password file could only be cracked on our system, and not using stolen cycles on some supercomputer. We then further gimmicked the crypt() routine so that after 25 uses in the same process it would generate incorrect answers and send an alert to the system manager. This effectively ended our password cracking problem.

  2. The basic idea is quite a good one, but the devil is in the details. Some thoughts:

    * Please be aware that the basic concept is not unique by any means. Within the industry, devices like this are called HSMs, and are widely commercially available. However, what you seem to be offering is a streamlined, low cost HSM that just does password verification — something that might be bought by lots of web businesses that would never consider an HSM. OK, but …

    * 5.5 verifications per second really isn’t acceptable for most sites that are likely to be targetted by crackers. I’m not sure how you calculated that it will suffice for 10,000 users, but I can’t agree. Based on the Erlang queuing formula, even if each user logs on only once per day, with these parameters a user’s log on attempts will be blocked roughly 1 time in 25. Most businesses would consider this too high. And 10,000 logons per day is quite small for many types of business, e.g. even a larger community forum. Many social media businesses are multiple orders of magnitude larger — and they don’t want to buy thousands of dongles. Further, even if one dongle can handle your normal traffic, it would be very vulnerable to DOS. In short, I think you need to increase throughput *a lot*.

    * This should be easily possible: an HMAC-SHA verification of a typical length password takes only two SHA computations, and even quite low-end devices can do that orders of magnitude faster than 10/second. A moderately powerful device like the Raspberry Pi should be able to exceed 100,000 per second.

    * I am concerned by your description of cryptographic random number generation. It is a subtle and dangerous art, and errors in this area have ruined many an otherwise sound protocol. For your application, it really sounds as if you should have a built-in hardware TRNG.

    * I am really doubtful about communicating from host to dongle by using a high-level networking protocol like REST over HTTP. Yes, it’s conceptually simpler for web developers; but on every other count it’s a dubious idea. For one thing, it adds a lot of overhead that may be holding down your (not very good) throughput. For another, it means that despite having a trusted hardware platform, a lot of its security properties are pushed back onto your network security. This is *not* the way to design an HSM; instead, start by assuming that everyone else screws up (because eventually, they will.)

    * You might want to think about the subtle play-off between backups and security. As it stands, if your dongle forgets keys 2 or 4, or lets the smoke out or something, you would seem to be totally hosed.

    Don’t mean to rag on you — I like your idea, and hope my comments help.

  3. @Roger

    You pretty much got the idea so I will just comment on particular items:
    – DDoS – yes, I agree. If you’re under attack then you’re in trouble. Use of our system will make attacks visible but it’s probably not that much interesting for users. Logons are only one problem of DDoS and much more important is memory use by the web server. We have had this experience and ended up with setting up firewall to limit connections from IP address ranges.
    – DDoS – upside – when you want to protect against DDoS, you need some throttling mechanisms. Firewall is best as it protects web server. Limiting logons throughput is much worse approach as it will annoy users but it will protect your users by limiting online dictionary attacks. What I try to say is that it could be the last line of defence, not the first one.

    – Queuing – I think the system behaves much better than you suggest. If you use one of online calculators – e.g., http://www.supositorio.com/rcalc/rcalclite.htm

    1. The system utilisation is 2.3% and time requests spent in the system is close to the processing time for 10,000 users.

    2. It becomes interesting for let’s say 230,000 logons per day (9,600 per hour). Here you get 2% of users waiting more than 1 second – the whole system is utilised at 53%. You get back to nice behaviour with one additional dongle.

    3. Using the calculator above, you can get interesting results for 4 dongles (=servers) and 1,000,000 users per day. (57% utilisation, average users in system 2.7, 99.2% users served within 0.5s). Being an admin, I would like to keep utilisation somewhere around 33% – 7 dongles for 1,000,000 logons per day.

    I studied similar problem in large banking systems and there were no significant “request” peaks from genuine users there – something that surprised me at the time.

  4. HTTPS is a configuration option of the web server. The API has end-to-end encryption of sensitive data. If you want to have HTTPS, just get a suitable server certificate and run your web server with SSL switched on.

  5. For more entropy, take a lesson from the lava lamps. The raspberry pi has a video input. you don’t actually need a video of lavas, the noise in the CCD provides plenty of entropy. so hook up a cheap camera, and run a cryptographic hash the video for more bits. (this works nicely on smart phones as well.)

  6. @elethan – if it’s lost, you can create a clone from, e.g., a hard-copy of the initialisation key. If 0wned then a) we know it as the box is likely to have been stolen b) unfortunately we are back to square one and users will have to reset their passwords.

    @ronW – I don’t think it’s quite true. SHA1 is still FIPS140-2 approved (http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf). There are theoretical attacks with complexity of about 2^60 to find collisions – never demonstrated despite considerable effort. We do not mind collisions so these attacks have limited impact. Having said all that, we may swap for SHA256 eventually.

  7. Many web servers are moving to HTTPs to address concerns with people seeing data along the way. An actual deployment would require HTTP to be turned on to be viable. When is the team going to turn on HTTPs for the REST interface?

  8. @Doughty – we have now switched one of the test Scramblers to HTTPS. Well, you must trust its certificate first.

    In terms of Python code changes:
    – one new class MySSLCherryPy that loads X509 certificate and private key.
    – setting the new class as the web server to use.

  9. From your blog post it is hard to figure out what is going on. Could you provide a protocol description?

    Also of interest is what parties need to change their code. You mention the hardware dongle that is attached to the user’s end device. Could you briefly describe this?

  10. @Hannes Tschofenig – We have built a web service to make use of Scramblers easy. This WS is Python and we are extending documentation for setting it up in Raspberry Pi / Ubuntu. The guide should be portable though – with differences in how to install dependencies.

    A protocol run for password verification goes like this:
    1. User enters her/his password.
    2. Password (in some form) is received by the server where the user tries to connect.
    3. The server will call SCRAMBLE or ENSCRAMBLE command that will return one line of text. (This is the main difference, usually you would only do MD5 or SHA1 bcrypt of the password.)
    4. The result of 3 will provide HMAC-SHA1 of the password that will be compared with the copy in the server’s database. If it matches, user is in.

    Step 3 in more detail (let’s say you use the web service)
    a) Command SCRAMBLE can be sent directly to the WS. You need to create an encrypted structure for command ENSCRAMBLE – we have produced example code in Python and PHP, .NET or other versions will be available soon.
    b) The WS receives the command (it can be over HTTPS), puts it into its processing queue – selecting the right one by an API ID (george in the example in the post).
    c) Eventually the request is sent to a free Scrambler device over its UART/USB interface.
    d) The Scrambler will decrypt the password and other data.
    e) The Scrambler will compute HMAC of the password.
    f) The Scrambler will create an (encrypted for ENSCRAMBLE) data structure.
    g) the result from f) is sent back by the WS back to the user server.

    You need to install the WS and libraries for Scrambler – one of the reason for our choice of Raspberry Pi as the host for Scramblers was its low price and easy installation. You don’t need RPI if you can plug Scramblers directly to your server though.

    Then you need to change the password verification and password change algorithm in your (web) server.

Leave a Reply to Doughty Cancel reply

Your email address will not be published. Required fields are marked *