Password storage can be tricky because of the possibility of them being stolen. To make matters worse, users frequently repeat passwords across several services, which increases the importance of securely storing them. The idea of securely storing passwords is to prevent hackers from being able to figure out any user’s actual password, even if the database hosting them is stolen. This disallows the use of plain-text password storage. Given that the attacker would not be aware of the real passwords, using encryption may seem like a wise decision (because they are encrypted). The encryption keys would, however, most likely1 be compromised if the database were compromised. This method of storage is insecure since the attacker might decrypt the encrypted passwords using these keys. Here, hashing or hash functions are utilized.
What Are Hash Functions?
They are functions that have these properties:
- Preimage resistance: Given the output of a hash function OutIt should be hard to find any input In, which when hashed, results in the same output (hash(In) = Out). For example, if I take a random SHA256 hash output (string data type) like “401357cf18542b4117ca59800657b64cce2a36d8ad4c56b6102a1e0b03049e97”, it should be very hard to know what the input to the hash function was that resulted in this output. Try googling your way into finding it!
- Second preimage resistance: If the input to a hash function is known, it should be hard to find another input that has the same hashed output.
- Collision resistance: This says that it is hard to find any two inputs such that their hashed output is the same. This is slightly different from (2) since in (2), you are given one input, and in this case, you can cook up any input.
- Predictability: The hash function should always return the same output given the same input.
- Fixed Length Output: The output of the hash function always has the same length (number of chars), regardless of the input’s length.
- Input sensitivity: A small change in the input (even just one character), should have a large change in the output string. For example, the SHA256 hash of “hello” is “2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824”. But the hash of “hella” is “70de66401b1399d79b843521ee726dcec1e9a8cb5708ec1520f1f3bb4b1dd984”. As you can see, the outputs are verified differently.
All in all, hash functions are “one-way functions.” If you only know the output, it’s impossible/difficult to know its input. This is unlike encryption in which given the output and the encryption key, you can know the input.
Due to this one-way property, storing a hashed value of a password is a good idea since if their hash is compromised (via a database leak), the attacker would not know the original password (which is the input to the hash function). In fact, the only “entity” which would know the input to the hash function would be the end-user who generated the password in the first place. This is exactly what we want from a security point of view.
What Is Salting and Why Hashing Alone Is Not Good Enough – Problems With Humans
Many people tend to use common passwords like “password”, “12345” etc. Since the hash of the same input never changes (see property (4) above), we can precompute the hash of common passwords and then check the leaked database data against those precomputed hashes.
For example, the SHA256 hash of “12345” is “5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5”. If this hash is seen in a database, we know that the user’s password is “12345”. In fact, there is an entire database of precomputed hashes that can be checked against. These are called rainbow tables.
The way to solve this problem is to add some random string, known as “salt,” to a password before hashing it (during the signup process), and then we append that random string to the computed hash before storing it in the database. Let’s take an example:
- Alice’s password: “12345”
- Bob’s password: “12345”
- Alice’s random string (salt): “ab$45”
- Bob’s random string (salt): “ih&g3”
- Alice’s modified password: “12345ab$45”
- Bob’s modified password: “12345ih&g3”
- Alice’s SHA256 hash: “2bb12bb768eb669f0e4b9df29e22a00467eb513c275ccfff1013288facac7889”
- Bob’s SHA256 hash: “b63400702c6f012aeaa57b5dc7eefaaaf3207cc6b68917911c410015ac0659b2”
As you can see, their computed hashes are completely different, even though their passwords are the same. Most importantly, these hashes won’t be in any rainbow table since they are generated from fairly random strings (“12345ab$45” and “12345ih&g3”).
Before we store these hashes in the database, we must append the salt to them:
- Alice’s hash + salt: “2bb12bb768eb669f0e4b9df29e22a00467eb513c275ccfff1013288facac7889.ab$45”
- Bob’s hash + salt: “b63400702c6f012aeaa57b5dc7eefaaaf3207cc6b68917911c410015ac0659b2.ih&g3”
The reason we append the salt to the hash is so that during the verification process, we have to use the same salt as we did originally. So we must store this somewhere. Even if the salt is compromised, it’s not a security issue since the attacker would still need to know/guess the user’s password to be able to generate the same hash.
Let’s have a look at how the verification process happens:
- Alice’s password: “abcdef” (Incorrect password)
- Alice’s salt: “ab$45” (fetched from the db)
- Alice’s modified password: “abcdefab$45”
- Alice’s SHA256 hash: “c5110931a3ae4762c1c0334d8eeba8c9c555962cf7d2750fdd732936319a058c”
- Alice’s hash + salt: “c5110931a3ae4762c1c0334d8eeba8c9c555962cf7d2750fdd732936319a058c.ab$45”
Since the computed hash + salt doesn’t match the one in the database, we reject this password. If Alice was to enter her correct password (“12345”), it would indeed generate the same hash + salt as the one in the database, verifying her identity.
Which Hash Function to Choose?
Even after salting, the issue of brute force attacks still remains. An attacker could repeatedly guess different passwords (very quickly) to see which one matches the leaked hash. There are two dimensions that determine how quickly an attacker can find a match:
- The randomness and length of the user’s password.
- The time it takes for the hash function to compute the hash
If a user uses a random and long enough password, the chances of the attacker guessing that exact string reduces. This means they have to crunch through more guesses which will take more time. Here is a cool tool that estimates how much time it would take to guess a password.
The slower and more computationally expensive the hashing function, the more time it would take to validate each guess. As of this writing (2nd March 2022), the recommended hashing technique uses Argon2id with a minimum configuration of 15 MiB of memory, an iteration count of 2, and 1 degree of parallelism.
As computational power increases, recommended hashing techniques change as well. Even if the algorithm remains the same, the recommended number of “rounds” / amount of “work” that should be done per password hash may increase.
Example Code
NodeJS
import * as argon2 from "argon2"; import * as crypto from "crypto"; const hashingConfig = { // based on OWASP cheat sheet recommendations (as of March, 2022)parallelism: 1, memoryCost: 64000, // 64 mbtimeCost: 3 // number of itetations } async function hashPassword(password: string) { let salt = crypto.randomBytes(16); return await argon2.hash(password, { ...hashingConfig, salt, }) } async function verifyPasswordWithHash(password: string, hash: string) { return await argon2.verify(hash, password, hashingConfig); } hashPassword("somePassword").then(async (hash) => { console.log("Hash + salt of the password:", hash) console.log("Password verification success:", await verifyPasswordWithHash("somePassword", hash)); });
The above produces the following output:
Hash + salt of the password: $argon2i$v=19$m=15000,t=3,p=1$tgSmiYOCjQ0im5U6NXEvPg$xKC4V31JqIK2XO91fnMCfevATq1rVDjIRX0cf/dnbKY Password verification success: true
If you run the above program, it will produce a different hash each time since the salt is regenerated each time.
GoLang
package main import ( "crypto/rand""crypto/subtle""encoding/base64""errors""fmt""log""strings" "golang.org/x/crypto/argon2" ) type params struct { memory uint32 iterations uint32 parallelism uint8 saltLength uint32 keyLength uint32 } func main() { p := ¶ms{ memory: 64 * 1024, // 64 MB iterations: 3, parallelism: 1, saltLength: 16, keyLength: 32, } encodedHash, err := generateHashFromPassword("somePassword", p) if err != nil { log.Fatal(err) } fmt.Println("Hash + salt of the password:") fmt.Println(encodedHash) match, err := verifyPassword("somePassword", encodedHash) if err != nil { log.Fatal(err) } fmt.Printf("\nPassword verification success: %v\n", match) } func generateHashFromPassword(password string, p *params) (encodedHash string, err error) { salt, err := generateRandomBytes(p.saltLength) if err != nil { return "", err } hash := argon2.IDKey([]byte(password), salt, p.iterations, p.memory, p.parallelism, p.keyLength) // Base64 encode the salt and hashed password. b64Salt := base64.RawStdEncoding.EncodeToString(salt) b64Hash := base64.RawStdEncoding.EncodeToString(hash) // Return a string using the standard encoded hash representation. encodedHash = fmt.Sprintf("$argon2id$v=%d$m=%d,t=%d,p=%d$%s$%s", argon2.Version, p.memory, p.iterations, p.parallelism, b64Salt, b64Hash) return encodedHash, nil } func generateRandomBytes(n uint32) ([]byte, error) { b := make([]byte, n) _, err := rand.Read(b) if err != nil { return nil, err } return b, nil } func verifyPassword(password, encodedHash string) (match bool, err error) { // Extract the parameters, salt and derived key from the encoded password// hash. p, salt, hash, err := decodeHash(encodedHash) if err != nil { return false, err } // Derive the key from the other password using the same parameters. otherHash := argon2.IDKey([]byte(password), salt, p.iterations, p.memory, p.parallelism, p.keyLength) // Check that the contents of the hashed passwords are identical. Note// that we are using the subtle.ConstantTimeCompare() function for this// to help prevent timing attacks.if subtle.ConstantTimeCompare(hash, otherHash) == 1 { return true, nil } return false, nil } func decodeHash(encodedHash string) (p *params, salt, hash []byte, err error) { vals := strings.Split(encodedHash, "$") if len(vals) != 6 { return nil, nil, nil, errors.New("the encoded hash is not in the correct format") } var version int _, err = fmt.Sscanf(vals[2], "v=%d", &version) if err != nil { return nil, nil, nil, err } if version != argon2.Version { return nil, nil, nil, errors.New("incompatible version of argon2") } p = ¶ms{} _, err = fmt.Sscanf(vals[3], "m=%d,t=%d,p=%d", &p.memory, &p.iterations, &p.parallelism) if err != nil { return nil, nil, nil, err } salt, err = base64.RawStdEncoding.Strict().DecodeString(vals[4]) if err != nil { return nil, nil, nil, err } p.saltLength = uint32(len(salt)) hash, err = base64.RawStdEncoding.Strict().DecodeString(vals[5]) if err != nil { return nil, nil, nil, err } p.keyLength = uint32(len(hash)) return p, salt, hash, nil }
Python
import argon2 argon2Hasher = argon2.PasswordHasher( time_cost=3, # number of iterations memory_cost=64 * 1024, # 64mb parallelism=1, # how many parallel threads to use hash_len=32, # the size of the derived key salt_len=16 # the size of the random generated salt in bytes ) password = "somePassword" hash = argon2Hasher.hash(password) print("Hash + salt of password", hash) verifyValid = argon2Hasher.verify(hash, password) print("Password verification success:", verifyValid)
Java
import de.mkammerer.argon2.Argon2; import de.mkammerer.argon2.Argon2Factory; public class PasswordHashing {public static void main(String[] args) { // salt 32 bytes// Hash length 64 bytes Argon2 argon2 = Argon2Factory.create( Argon2Factory.Argon2Types.ARGON2id, 16, 32); char[] password = "somePassword".toCharArray(); String hash = argon2.hash(3, // Number of iterations64 * 1024, // 64mb1, // how many parallel threads to use password); System.out.println("Hash + salt of the password: "+hash); System.out.println("Password verification success: "+ argon2.verify(hash, password)); } }
Happy Learning …