GDPR: Application Password Security in 2018

Security Primer

As we hurtle towards GDPR, more and more of my clients are becoming concerned with data security. This is great news - I've always recommended best practices but come up against resistance occasionally due to the extra amount of time some implementations take - after all, time is money, and to some clients money is more important than security. Thankfully we're all now being forced to consider it more seriously.

I'm no security or cryptographic expert but as part of my job I have to provide secure authentication and storage systems to protect my client's data. As I'm a Microsoft stack guy this is often dealt with by an Active Directory or Federated Identity Server - but on occasion I need to "roll my own" authentication system or heavily modify an existing one.

One client recently asked me about the password hashing I'd put in place for them - I'd implemented SHA-256 with salting, using the same piece of code for about a decade. Its fast, relatively light weight and portable - but unbeknownst to me (until now), is no longer recommended or appropriate for hashing passwords. The client wanted me to implement PBKDF2 as they'd heard it was more secure.

Hashing vs Encryption

Security novices often confuse hashing and encryption. They are both cryptographic concepts but that's where the similarities end; in simple terms, hashing is a one-way operation that takes some data, makes a right old mess of it ("a hash"), and stores it somewhere for future reference. The aim of "hashing" is to make that data unreadable by humans and computers even when they know how it was originally created. The output hash aims to be impossible to reverse engineer, but computationally exact given the same input.

Encryption on the other hand, is a two-way operation. Data can be encrypted, then decrypted back to its original format at a later date using a global key - anyone who knows what encryption algorithm was used and has access to the key can read the data. It's usually very fast and can be used for securing large amounts of data (such as data on your hard disk). Its also useful for sending sensitive data across boundaries - for example, sending a the contents of a web page to a client web browser via TLS/SSL, or sending a secure email message.

There are multiple types of hashing and encryption algorithms - you've probably heard of MD5 and SHA - two of the most widely used. AES and PGP are popular encryption algorithms, but there are dozens that have been developed by different bodies and individuals over the years. Its worth noting that the vast majority of these algorithms are RFCs or open-source, so anyone with a bit of maths knowledge could theoretically figure out how they work and use them in their software. Most languages and frameworks have their own implementations that are well tested - therefore rolling your own algorithm implementation is not advised; someone more experienced has already done that work for you, so you should use their libraries and focus on your business logic.

Security Never Changes

There are people out there that want your data for one reason or another. No, you're not special or unique - you're just part of a numbers game. Malicious users might gain access to a database where your password is stored in plain text, then use that same password to legitimately log on to your bank account - if you used the same password of course. They might simply be after a list of email addresses that they can sell to spammers - either way, that line of data in a database is worth something to someone, somewhere.

Computers are getting faster and attackers are always finding new ways to break cryptographic algorithms. MD5 used to be considered the most secure hashing function, but it was severely compromised in 2013, and it now takes less than one second to crack a MD5 hash on a standard home computer. The same is true for SHA-0 and SHA-1 - both compromised. Even SHA-2 (which comprises of SHA-256 and SHA-512) is considered to have some weaknesses. Tools are getting easier to use - even amateurs are able to get access to cryptographic toolkits that do all the complex maths for them.

Security experts and developers are therefore constantly trying to stay ahead of threats. Let's look at a few security breach scenarios and discuss which method is best to secure the data.

Scenario: Brute force attack against your api / application / login form

An attacker has set up an automated script or botnet that generates passwords and tries to log in to your online system. Depending on how the attack script is throttled, it may appear to to be a DDoS attack.

Mitigation: No amount of encryption or hashing will help here. You should implement hardware traffic management / monitoring and some kind of application level account lockout procedure - i.e. 3 incorrect passwords and the user can no longer log in.

Scenario: Attacker gains raw access to your database

An attacker has breached your server / network and has taken a copy of the database files stored there. This usually indicates that there are other major security flaws in your system that should be sorted out as soon as possible. However, it's happened - so how can you make sure that the hacker can't read the database?

Mitigation: Most modern databases can be encrypted using an integrated feature / tool; it's usually a relatively simple task, and should be a priority. It requires a little bit of overhead and a little bit more development work, but it's always worth it.

If the database isn't encrypted, or the attacker also has access to the encryption key, then they can read all plain text data in the database. This is where password hashing comes in - although the attacker now has access to all of the data in your database, you can still make it difficult for them to figure out what each user's password is by using a hashing algorithm such as MD5, SHA, bcrypt, scrypt or PBKDF2.

Sub-Scenario: The Dictionary / Rainbow Table Attack

So, you've hashed the heck out of your user's passwords - now what? Well, there's still a vulnerability you may not be aware of; attackers who gain raw access to your database can use a dictionary attack or rainbow table attack in order to crack a password or ascertain your global hash key or passphrase.

In simple terms, this works by using a pre-compiled list of words / phrases (a dictionary) or hashes (in a rainbow table) to search for common passwords in your database. The larger your database, the greater the likelihood of more than one user using the same password. If two users use the password "password", it will be hashed into exactly the same string - and attackers know how to reverse engineer these patterns into passwords or hashes.

Mitigation: You can control the passwords your users use to some extent using application level rules, though less security conscious people will try to use the simplest password possible, or simply increment a number at the end of their existing password when the system asks them for a new one. Whilst you can protect against people using "12345" and "password" with pattern matching, you still can't protect against people using "P@ssword1" (which would satisfy even the most annoying of password rules).

So what do we do? We salt the hash. A salt is a cryptographically safe string of a certain length, unique to each user. This string is concatenated with the plain text password before hashing takes place - its randomly generated, then stored alongside the hashed password. Its visible to anyone with access to the raw data - even attackers that manage to get their hands on your database.

It sounds mad - why does a salt help if the attacker can see it? Well, because the added salt generates a unique hash for each user's password - even when two users use the exact same password. Every password essentially now has its own key, making the pre-compiled dictionary or rainbow table attack infeasible as a new table will need to be generated for every user. Consider the following pseudocode:

var user1Password = "password";
var user2Password = "password";
var user1Hash = Hash(user1Password);
var user2Hash = Hash(user2Password);


user1Hash and user2Hash will output exactly the same value - which creates a predictable pattern that can be exploited.

Now consider this:

var user1Password = "password";
var user2Password = "password";
var user1Hash = Hash(user1Password + GenerateSalt());
var user2Hash = Hash(user2Password + GenerateSalt());


GenerateSalt() returns a random string. user1Hash and user2Hash are unique, even though the passwords are the same, reducing the overall risk. As long as the output of GenerateSalt() is stored alongside the hash, you can validate that any future input matches the original hash value.

There are some recommendations that a salt should be globally unique, but I don't subscribe to this thinking. Yes, a user might use the same password on your website as another website, but the likelihood that you're going to generate exactly the same salt as the other application is extremely thin. I believe this advice exists to discourage developers from using predictable data such as username, email addresses or other unique identifiers that the user has control over. My advice is to make sure that you use any framework provided salt generation methods, and ensure you use the recommended minimum salt length.

Sub-Scenario: Using PBKDF2 instead of SHA for password hashing

Why did my client ask me to strip out SHA-256 and replace it with PBKDF2? SHA-256 is secure enough, right? Well, yes - generally SHA-2 is still considered secure, and the salting I put in place makes it even more so. However, attackers are now using the extreme processing power of GPUs (graphics cards with thousands of cores, processing hundreds of millions - possibly billions - of operations per second) rather than CPUs (standard computer processors with dozens of cores, processing millions of operations per second), which makes password cracking a much quicker affair.

This post on StackOverflow gives some 2009 figures for older CPUs / GPUs - 48 million vs 160 million ops per second respectively, so - applying Moore's Law - you can imagine how much more powerful GPUs are in 2018.

The basic decision points come down to this:
  • SHA is designed to be as fast as possible, and will only get faster as processors increase in speed.
  • PBKDF2 stands for Password-Based Key Derivation Function 2, and as the name suggests, it was specifically designed for password hashing. It is slow by design and iterates X times over a pseudo-random function such as SHA-1 or HMAC to create a derived key.

The one drawback to using PBKDF2 is that you have to review the number of iterations used to generate passwords often. As hardware improves you'll have to increase these iterations to continue to keep up with the attackers - I expect that high-end password cracking rigs exist with similar specs to the popular high-end bitcoin mining rigs, as they perform very similar functions. The more GPU power available, the quicker an attacker can succeed in performing a brute force attack.

Summary

It seems that brute force is still the preferred vector for password cracking - attackers are leveraging better hardware to perform more operations per second, decreasing the time it takes for a password to be cracked.

There are now more suitable password algorithms available - Argon2 deserves a special mention, though it's still relatively new (2015) and not quite as trusted as PBKDF2.

Of course this blog post is not exhaustive. There are many more steps you can take towards application and network security. Drop me a line if you require any advice on securing your applications or network.

Disclaimer

This blog post aims to provide advice only - I am not a security expert, though I have much more experience in application and network security than I do with implementing and understanding the minutiae surrounding cryptographic functions. Please let me know if you notice any mistakes and I'll update this post ASAP.

Popular posts from this blog

Handling uploads with MVC4, JQuery, Plupload and CKEditor

Generating a self-signed SSL certificate for my QNAP NAS

Why GUIDs Are a Bad Choice for SQL Server Key and Identity Columns