Not mentioned in the article, is why salting is important even if rainbow tables aren't part of your threat model:
There is more than one reason, but they all boil down to one fact; two accounts with the same password will hash to the same value if you do not have a salt.
As an example, if an attacker gets a dump of all the hashed passwords, the weak passwords will be immediately apparent just from an analysis of duplicates.
I almost feel as though this is at least half of the point of using salts. We're trying to make things slow, so forcing an attacker to crack the same password twice (or more) in cases where it's reused seems beneficial and very cheap. Weak passwords will still be cracked first but it will still require at least SOME work.
That was the reason salts were used in /etc/passwd. You couldn't tell if John and James had the same password, and you couldn't tell if John used the same password on multiple machines. Or more critically, if the root password was the same on the entire cluster.
As you two have already discussed, it mostly provides herd safety. You can't target the 'weak ones' because all the passwords look roughly the same.
That's kind of the point; if you're worrying at all about salts in your login code, you're using a crappy algorithm. If you're using a decent algorithm, it takes care of the salts.
Sort of a catch-22 though. On the one hand you want all of the important details about your password "encryption" system to be handled by more competent people (read: cryptographic experts). On the other hand, without an understanding of the details you will be too ignorant to tell the difference between a good and bad algorithm and in whom you can entrust such implementation specifics or not.
Additionally, one might say "well, just follow industry best practices", but that is based on the idea that "use bcrypt/script" is such, but it's not. The "best practice" as defined by the industry at large is mostly just to use MD5, maybe with a salt, because that's what "everybody else is doing".
People need to read this article very carefully before forming any sort of take-aways.
The title and the start of the article suggest that salts are irrelevant. Then only a few paragraphs later, the author states that salts are indeed crucial.
Later in the article, we learn that this article isn't really about salts at all, but the importance of a high work factor / rounds.
The problem with the way this article is presented is that a casual reader will be left with the idea that salts are no longer important, which is false.
A better title would be something like, "Salts are important, but so is a high work factor"
Careful. Salt quality is almost totally unimportant. And there are no well-known, well-reviewed password hashes that aren't randomized. The lesson I'd take away from an article like this is: "don't DIY your password hash".
Historically, every discussion I've seen that tried to acknowledge the "importance" of salts devolved into people defending their DIY "secret-salt" schemes. "Important salts" seem like an invitation to reject bcrypt, scrypt, and PBKDF2.
I think part of the problem is the term. I think we should stop talking about "salts", and instead just say whether a given password scheme is randomized or not.
Google AppEngine is part of the Google Cloud Platform. It's a pretty big deal.
If you want to use it, (at least with Python) you need to roll your own. You can't upload scrypt, bcrypt or PBKDF2 with your app, since code with C isn't kosher on there.
Or to phase it another way: The article is extremely poorly written and attempts to be edgy by suggesting something unconventional only to then roll it back entirely just a few paragraphs later.
The whole thing can be distilled down to: "Make sure you're using a modern hashing algorithm, and pay attention to the work factor." It actually has nothing interesting to say regarding salts and or peppers that hasn't been said a million times before.
Salt is a requirement but not a panacea. AshleyMadison's salted MD5 proves that well enough. The point of a salt is so that same password does not result in the same hash for two different users. The key point is that attack time scales linearly with user count, which makes a difference in the actual crack rate when the breach is large.
When cracking passwords, to crack the most hashes, your inner loop is iterating the salts from the database while the outer loop moves down the dictionary. Crucially, that means the salt should always be the first bits fed to your hashing function, or else the cracker can optimize away some hashing cycles across users. HMAC will do the right thing here if salt is passed as the 'key' (although the way it expands the key to the block size is somewhat flawed). Feeding the salt to the hashing function first approximately halves the crack rate versus appending salt at the end, by preventing any reuse of hash function state in the inner loop. [1]
Serious question: Why isn't there a "best practice" for this? A library everybody uses to store passwords in the most secure way possible, and when more secure ways are found, the library is updated and not your code?
Everything is so low level, manual and error prone for such a basic part of app security.
1) Language fragmentation. Java developers aren't going to use a C# library or a PHP library. So instead we're stuck with, "If you're on Python, use X; if you're on Ruby, use Y..."
2) Password storage is a problem that's easy to get wrong without knowing it. If you have a DIY password storage system that accepts valid passwords, rejects invalid passwords, and looks scrambled when you view them in the database, it's hard for non-experts to notice that it's broken in any other way.
3) The combination of those two problems results makes it difficult for the security community to reach consensus on good language-specific libraries. Competent Java security experts can't easily evaluate a PHP scrypt library.
So all they can say is, "allow long passwords, use per-password salts, and use PBKDF2/scrypt/bcrypt with a pretty big work factor, but don't do it yourself."
With HaXe though you might be able to swing something like this. In fact something like passlib (python) would be beautiful but dangerous - imagine cross language bugs ( ie the same bug in Java, C++, Python, etc.)
I wonder if in large organisations it'd be better to have a single service handling password management (with split databases still). That way you could lock it down to one set of IPs, all access monitored and more importantly one password storage implementation.
Such a thing exists, it's called LDAP. Bonus effect, many LDAP implementations (Windows AD, OpenLDAP, Lotus) give you replication and failover for free.
PBKDF2, bcrypt and scrypt actually have their own advantages and disadvantages. I think for embedded devices PBKDF2 is preferable and more performant, and given different threat model, you may choose one over the other.
Django provides this and you don't have to worry about it. However, they did not for a bit, even as the discussions raged on about how salt + SHA1 is bad. The reason was that Django is 100% pure Python and at the time the only available implementations of any of this were C Python extensions. Ideally, I'd love to see bcrypt, scrypt, and PBKDF2 included directly in Python's stdlib so that people wouldn't have to re-invent the wheel every time they use the language.
pbkdf2 is included in Python's stdlib since 2.7.8[0] and 3.4[1], though sadly the interface isn't great: it outputs the hash only, not a modular crypt[2] and it requires a salt instead of generating one. Passlib[3] provides more algorithms, a much better interface and handles hash migrations[4] ridiculously cleanly.
The library can't be updated automatically, because suddenly all your stored passwords wouldn't work.
If it's a big monolithic library like Spring, they just deprecate APIs and add new ones (which is in fact what they do); if you're in a microframework language like Node, you're on your own to rip out your md5 module and switch it to bcrypt.
> because suddenly all your stored passwords wouldn't work.
Passlib handles that with a concept of "deprecated algorithms"[0]: a cryptcontext has a list of algorithms it can accept as inputs to validate against, and a subset of these can be marked as deprecated[1]. The passwords hashed with deprecated algorithms will validate but they'll be flagged as needing an upgrade. The `verify_and_upgrade` method will return both the validity of the password and a new hash if it should be upgraded, so the normal pattern is:
valid, new_hash = pass_ctx.verify_and_update(password, old_hash)
if valid:
if new_hash:
# store new hash for user
# password was valid
else:
# password wasn't valid
[1] in recent versions you can even set the deprecated list to "auto", this will consider all algorithms but the default (the one used if you encrypt with the cryptcontext without requesting anything specific) to be deprecated. It's a thing of beauty.
Django's auth system handles updates safely and cleanly. Don't know about others, but the mechanism is simple:
* The value stored in the DB is actually of the form "algorithm$iterations$salt$hash".
* The current preferred algorithm and iterations are configurable in the settings.
* When logging a user in, the algorithm/iterations in the stored value are used to check the password.
* If the stored value's algorithm/iterations are out of sync with the preferred algorithm/iterations in the settings, then the last step of the login process is to re-hash the (verified correct) submitted password with the new preferred algorithm/iterations, and push that value back to the DB.
Which creates a rolling refresh of the stored values as users log in, which in turn lets you gracefully change the algorithm or number of iterations.
Because of all the developers that consume open source code to provide value to their organizations, the proportion who are willing, able and whose organizations allow them (not just permission but resources) to contribute back is effectively zero. Also all the tools required to do so still suck, for the same reason.
For just one example, why - when there's billions of humans that each have hundreds of data points which are relevant to their lives - do we only have maybe hundreds of datastores, reporting tools or visualization tools? Hint: it's not because of the universal applicability and quality of existing products.
Humans aren't very good at considering others or planning very far ahead. On top of that we're not very good yet at training people who can code despite the acute demand for them as everyone transitions paper processes to digital or old digital processes to new digital processes.
Everyone is too busy building yesterday to worry about tomorrow, even on small scales. It seems only outlier organizations have the discipline and foresight to invest in even private internal tools or processes, let alone public ones.
Interesting side note: one of the most successful altcoins (a few years ago, back when people cared about this stuff) was called litecoin. The major difference is that it used scrypt instead of SHA because it was hard to parallelize. The idea was this would keep rich groups from taking over the network by throwing money at hashing clusters.
I built a machine in a milk crate that had 5xR9s to mine it. I broke even and sold it after I got bored :) It was fun for a month or two.
The client's hash is the password but the password file contains salted hashes of the client's hash. An attacker that has the client's password hash can replay it, but they don't know the user's password. An attacker with the password file doesn't know any clients' password hashes or whether any of the password hashes are reused by different users.
Can't vouch that this is the case with keybase, but where I've seen this done before, the hash is calculated in combination with some kind of time-based shared secret at both at the server and the client and then compared, with the theory being it would protect against the TLS connection is MITM'd; all the attacker would get is the hash, which is then useless as the time element isn't known.
IMHO, MITM would be better defended against using HSTS, certificate pinning, and perhaps DNSSEC.
Technically yes, but I would imagine they'd subsequently hash the hash on the server side as well.
I assume the extra client-side hashing is done to keep the plaintext passwords out of the application memory, not protect it in transit.
To clarify, this is just an assumption. I have not read up on the topic nor do I claim to be a security expert. This is just what came to mind when I had the same thought as you.
As someone else mentioned, Litecoin. Not in passwords but in "mining" (proof of work). It resisted GPU for a while until it didn't. Although the difference between CPU and GPU was not big (much smaller than with SHA256). The work factor was too small and it couldn't be changed easily. After 4 years, though, nobody "cracked" it.
The work factor of a password database can be changed easily, either by re-hashing every time someone with the old scheme logs in, or by hashing all the hashes more.
Chaining hashes seems like a great way to get the benefits of both, and to have an extra layer in case one falls. Why isn't that done more commonly in practice?
I've actually seen it cause numerous issues. For example, consider this pseudocode:
// Returns binary data
shaPass = crypto.sha256(userPassword)
// returns an scrypt password
crypto.bcrypt(shaPass)
I've seen many people pass binary data into functions that will terminate reading the string at a null byte. This obviously limits the strength of the number of bytes before a null byte is hit in the binary data (mostly only concerns PHP and C).
Just noticed someone else posted the ircmaxwell blog, which is the best writing on this topic.
I never understand why PBKDF2 gets so little love. It's easy to use, it's easy to implement correctly, it's standardised. scrypt is cooler, but PBKDF2 is just fine.
- The work factor depends on the length of hash. For example, if you compute and store 64 bytes of PBKDF2-HMAC-SHA256, an attacker needs to only compute the first 32 bytes to verify the guess, and with PBKDF2 it means doing half the work.
- In most common use (with HMAC), it inherits HMAC "vulnerability": a password longer than the hash function block is equal to the its hash (see this Twitter thread: https://twitter.com/dchest/status/421595430539894784). For example, `plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd` and `eBkXQTfuBqp'cTcar&g*` have the same PBKDF2-HMAC-SHA1 hash. (Scrypt is also vulnerable, BTW, as it uses PBKDF2-HMAC-SHA256.)
- PBKDF2 is commonly used with hashes which are fast on GPU or custom hardware and require tiny memory, thus making defense/attack cost ratio very bad. See scrypt paper for cost estimates (https://www.tarsnap.com/scrypt/scrypt.pdf).
BTW, we had Password Hashing Competition (https://password-hashing.net), the (tweaked) winner of which will hopefully be widely used in the future.
* If you use the underlying hash as a black-box, each HMAC invocation compresses 4 blocks. An optimized implementation will only require 2. Many implementations (e.g. the .NET one) don't include that optimization and thus suffer from a 2x slowdown compared to the attacker's implementation.
It xors together the outputs of different iteration numbers. This can interfere with the feed-forward of certain hash constructions. As far as I can tell this does not cause security issues, but it's still very dirty.
---
Personally I'd go so far as to say that it's worse than PBKDF1 with a sufficiently wide underlying hash.
> - The work factor depends on the length of hash. For example, if you compute and store 64 bytes of PBKDF2-HMAC-SHA256, an attacker needs to only compute the first 32 bytes to verify the guess, and with PBKDF2 it means doing half the work.
That's a fair criticism. In that respect, a better design might have been to generate a seed S using the PBKDF2 core, then just spit out HMAC(S, 1), HMAC(S, 2) for the derived key. I don't know if that would have had its own issues though; I'm not a cryptologist.
> In most common use (with HMAC), it inherits HMAC "vulnerability": a password longer than the hash function block is equal to the its hash
True, but as you imply it's really not a vulnerability. And it's certainly better than limiting passwords to 56 bytes, as bcrypt does (with human-memorable passwords, 56 bytes really aren't enough).
> - PBKDF2 is commonly used with hashes which are fast on GPU or custom hardware and require tiny memory, thus making defense/attack cost ratio very bad.
Sure, scrypt is better. But scrypt is not what 'everyone' uses: bcrypt is.
PBKDF2 is okay, but the underlying algorithms (usually SHA256) are a little too easy to accelerate with GPUs and ASICs. scrypt can also be accelerated with GPUs, but the difference is much smaller due to its design.
PBKDF2 is still widely used in applications that need to be small and simple, such as embedded devices and disk encryption tools. For hashing website passwords, though, it is showing its age.
Disclaimer: I'm not a cryptographer, so I might be talking out of my ass.
That depends on whether someone else has already implemented it for you.
In PHP, for example, bcrypt is just `password_hash()`. To use PBKDF2, on the other hand, you have to generate your own salt and pass it to `hash_pbkdf2`.
scrypt has the advantage that it can be memory-tuned and is designed to be both CPU-hard and memory-hard. PBKDF2 and bcrypt are only cpu-hard.
Though IME it's bcrypt which is considered cooler, scrypt is regularly mentioned in KDF discussions but not widely available let alone used.
The big advantage of bcrypt is its cost factor is exponential rather than linear, so the changes in cost factor are much lower and less frightening (you need to double the pbkdf2 cost factor to double the cost, on bcrypt you just increase the cost factor by 1, to match Moore's Law you just need to increment the cost factor by 1 every 2 years). And that's ignoring pbkdf2 can also be tuned using the hash function.
So is PBKDF2 just fine? Yeah, that's what I generally use. But it's easier to create a simple and correct interface for bcrypt, it has less moving part and the parts which do move move slower.
Reading this title I was wondering what news they have over PBKDF2/Scrypt/Bcrypt and whether it would really be better. Read through it quickly and was disappointed. Unlike they claim, this advice was popular even in 2010: https://security.stackexchange.com/questions/211/how-to-secu...
"The multiple salts don’t really do much since they’re all likely known to an attacker or can be quickly calculated given knowledge of the other two component"
I've never heard before that you can calculate the sitewide salt (i.e. in memory abd unknown to attacker) from having both the user password and the per-user salt.
How is this done?
The second half of that sentence is false, assuming the site salt has enough entropy, i.e. 32-bytes from a CS-PRNG.
However, if your server is compromised, pulling the site-wide salt (aka 'pepper') from memory is presumably trivial, so you typically assume it is known to the attacker.
I'm not really familiar with the field, is it really trivial to compromise both the database and the application server? And is it really that trivial to gain data out of memory?
Is there a good resource that reviews attacks that actually occurred and analyzes what and how it could have been prevented?
The same way you would calculate the password and if you knew the site salt and the per-user salt.
Or the password in case there's just a user_pw+salt :)
In that case it's simply that the "site salt" is the "password" you after it, since you know the 2 other variables you just randomly generate the 3rd one, hash it and see if it matches.
Except that salts do not need to be remembered and as such should and usually are randomly generated with high entropy and are unlikely to be brute forced like normal passwords.
These days, what is wrong with rolling your own encryption of passwords using key strengthening and salting?
Like this:
hash = sha1 applied recursively 4071 times ( password . salt ) . salt
Serious question. Don't just say religiously, "just use bcrypt". Tell me what is really wrong. What attacks will succeed today? Any crypto enthusiasts in the audience?
The non-expert answer is: Because crypto math is difficult for randos to analyze, and it's not obviously the case that running an algorithm 4,000 times is 4,000 times harder to break than running it once.
And from a purely pragmatic perspective, why even consider it? You have a bcrypt/scrypt library, writing your own code is harder and almost certainly worse, so there is literally no reason to spend even a second thinking about it.
Yes, there are reasons to think about it. For example if I want to understand the principles that make the usual algorithm strong.
There is a reason why Apple makes every app implement its own receipt validation scheme. So a dedicated attacker can't crack all apps at once. Same here.
Modern password hashes are designed to use a large amount of RAM in addition to CPU time in order to make password cracking using ASICs and GPUs more difficult. The paper on Argon2[1], the winner of the recent password hashing competition, would be a good read if you're interested in learning more about how password hashes are designed.
So you essentially then end up with PBKDF2, as long as you're using an HMAC configuration with your hashing function you should be ok. That said, just go ahead and use a PBKDF2 library so that you match what others do and you have an easier time with interoperability if you have to use a different language or environment somewhere else with the same set of users.
What others said, but also for very practical reasons. Why roll your own at all since this is an area where lots of FOSS code is already available? I see two broad categories here: your environment already has an scrypt/bcrypt/PBKDF2, so just use that, as you would a regular expressions library or a URL parser; or your environment does not, in which case you should grab the spec for scrypt/bcrypt/PBKDF2 (whichever you think is easiest to implement) and implement it. Inventing your own risks errors, while creating a system that's incompatible with what other systems are doing and does not follow any type of researched security protocol.
Just out of principle. You are mis-using crypto primitives, which can lead to bugs so subtle you would never guess they're there. And thanks to the random looking hashes, you will never immediately see that i492tr7gf89que092378gt827 is actually a broken hash and not a strong one.
Besides getting the scheme right, the biggest problem is the implementation. There are lots of little things to consider; I can't list them because honestly i've only briefly studied the many gotcha's of previous holes in other implementations. Can you build your own suspension bridge? Sure. Would I drive over it? Nope.
That being said, if you wanted to build your own _for fun_, do it! You will learn a lot in the process. Just don't expect your project to be over any time soon.
In practice, rolling your own strengthened/salted hash is probably fine if you use a decent hash function, but why bother? The code already exists to do it for you, and using a standard function allows for better interop and portability.
It will also (likely) have been designed and/or checked by people with domain knowledge, and so is more likely to avoid subtle attacks - just repeating a hash function n times might not offer the security you expect (e.g. 2DES is barely any stronger than a single DES pass - you need 3DES to make much of a difference).
How so? Currently known symmetric crypto algorithms are secure against attacks by quantum computers. More than that, hash functions, which are building primitives of password hashing functions, are used in "post-quantum crypto", e.g. for signatures (https://en.wikipedia.org/wiki/Post-quantum_cryptography#Hash..., http://sphincs.cr.yp.to/). (Although more research should be done on memory-hard functions.)
There is more than one reason, but they all boil down to one fact; two accounts with the same password will hash to the same value if you do not have a salt.
As an example, if an attacker gets a dump of all the hashed passwords, the weak passwords will be immediately apparent just from an analysis of duplicates.