Why It Matters Whether Hashed Passwords Are Personal Information Under U.S. LawOn January 22, 2021, Bleeping Computer reported about yet another data dump by the hacker group Shiny Hunters, this time for a clothing retailer. Shiny Hunters is known for exfiltrating large databases of customer information, often through misconfigured or otherwise compromised database. These databases typically contain credential information for customers, as was the case here. What made this report a bit unique was that Bleeping Computer also reported that: “The passwords stored in the database are hashed using SHA-256 or SHA-512 according to threat actors who have started to analyze the database. One threat actor claims to have already cracked the passwords for 158,000 SHA-256 passwords but has been unable to crack the SHA-512 passwords.” This revelation explicitly highlights what is increasingly becoming an important legal question: Are hashed passwords secure? Or, perhaps more importantly from a legal perspective, does an unauthorized person having access to a username/email address and an accompanying hashed password “permit access to an online account?”

We’ve known for some time that the EU considers hashed passwords to be personal information under GDPR and has specifically advised against using well-known hashing algorithms such as MD5 and SHA-1. Similarly, NIST has recommended that federal agencies stop using SHA-1 for generating digital signatures, generating time stamps, and other applications. Yet, hashing has continued to be touted in the U.S. as a secure, deidentified data point under U.S. law.

The reason the classification of hashed values is critically important is because whether or not the information permits access to an online account can be determinative of whether it is personal information for the purposes of some breach notification statutes, as well as the private right of action in CPRA. This argument has already been advanced in California, in Atkinson v. Minted, Inc., 3:20-cv-03869-JS (N.D. Cal. June 2020) (see First Amended Complaint at Par. 13 “Because passwords that are merely ‘hashed’ and ‘salted’ are not encrypted, they ‘can be accessed and used even while […] redacted with different levels of utility based on how much manipulating of the data is done to protect privacy.’ [Citation omitted]. Therefore, at a minimum, the PII disclosed in the Data Breach included user passwords that would permit sophisticated hackers like the Shiny Hunters to access to an online account.”) To understand the implications of these hacks and the potential impact on litigation requires a bit of technical understanding of hashing and why it is used.

A “hash” of a password is the result of a hashing function applied to the password, and it is used to avoid storing a password in plain text while also allowing a quick and easy evaluation of credentials for a site. The hashing function takes the password and scrambles it up with a large number of simple rote operations with the intent to make it impossible to determine the password from the hash even if the hashing function used is known. As an example, consider a simple computer model of a pool table with a perfectly uniform friction surface, the balls racked precisely at one spot on one end and the cue ball place precisely on the spot in the other. Only a few inputs such as the angle and force of the cue stick hitting the cue ball and a few hard-coded laws of physics will determine the final position of all the balls after they come to rest after a new break. However, even if the laws of physics are simple to apply and mechanical, the complexity of the interaction of all the balls means that it would be impossible to discern the input values by looking at the result — in this case the final position of all the balls. For non-trivial inputs where the balls moved significantly, the resulting position of the balls would give absolutely no information about the inputs, even for someone well versed in the laws of physics. Another feature of this example is that when given a precise set of final positions and inputs that purportedly generate them, it would be trivial to confirm by plugging in the inputs and running the model.

These features — being effectively impossible to determine the input from the outputs alone even knowing the rules to generate the output and being relatively easy to confirm if an input and output match — are the characteristics of what are referred to in mathematics as one-way functions. As the name suggests this is because they are easy to compute one way and effectively impossible to compute the other. Hashing is a one-way function with the password being the input and the “hashed password” being the output, which means that possession of a password and hashed password pair makes it trivial to check if they match but possession of just the hashed password makes it effectively impossible to determine the plain text password. Unfortunately, however, even if a hashing algorithm is effective in not allowing reverse calculation of the password, it does not mean it is entirely secure as there are other mechanisms that can effectively reveal the plain text password.

While all industry standard hashing algorithms may make it effectively impossible to work backward from a hashed password to determine a password under current computational limits, there are other techniques that can make hashed passwords insecure. If a hacker gets a set of hashed passwords, one simple attack vector is to generate a table of possible passwords, run the hashing algorithm, produce their corresponding hash values and then compare the hashed values for matches. Given the processing power available, hackers have and continue to generate enormous tables that contain anything from every possible combination of values for shorter passwords to lists including variants of common and known passwords. These are called rainbow tables, and if the password used is included in one of these tables, then the cleartext password is known to the hacker. There are two factors that impact the effectiveness of these attacks. The first is password complexity because hackers continue to generate complete dictionaries of all combinations of shorter and more simple passwords, such as those containing only letters or numbers. The longer and more complex a password is the less likely it is to be part of a complete dictionary of possible values because the sheer number of combinations are too great to compute.

Another defense to these rainbow attacks is called “salting,” which involves including a value in with the password prior to running the hash. This means the resulting hash will not match the rainbow tables.  For example, if the user uses “password” as their password, the resultant hash is undoubtedly already in a rainbow table as it is arguably the most common password used.  But say the site adds “2@3” to the front and back prior to hashing, the resultant hash is for 2@3password2@3.  This value is highly unlikely to be in a rainbow table even though the user used one of the single most common passwords. The effect of adding a salt even if the value is known to an attacker is that pre-existing rainbow tables are ineffective and a hacker would have to generate a new one, which is time consuming and costly.  An even more secure use of salting is applying a salt value that is kept secret. This is recommended in NIST Special Publication 800-63B “Digital Identity Guidelines” (June 2017). With a secret salt, the site can always check a password entered by adding the secret value, running the hash and comparing to its stored hash. As long as the salt value remains secret, this is a very effective method against rainbow attacks and other current methods of attack.  Unfortunately, many sites do not use secret salt values or complex passwords and therefore the relative security of hashed passwords varies tremendously.

The question of whether a hashed password “permits access” to an online account is a complex question that has not been fully addressed from a legal standpoint. On one end of the spectrum, a hashed password could be considered to never permit access to an online account because even if the actual password might be determined from the hashed value, the hash itself does not permit access as it always requires some additional hacking effort to determine the plaint text password. On the other end spectrum is an argument that if a password could be determined for one or more hashed passwords it means that the whole set “permits” access. Most likely, if this issue is fully litigated, courts will end up somewhere in the middle of this spectrum but that remains to be seen. Another complicating factor is that technical advances in computing power will continue to move the needle on the effectiveness of existing attacks, perhaps establishing practices such as hashing without a salt value as per se insecure.

As cyberattacks continue to grow at a dramatic rate, coupled with what appears to be renewed interest by cybercriminals to “crack” hashed passwords, it seems likely that hashing will come under increased scrutiny in both the courts and in the minds of the public at large. It will be important for companies to not only review their hashing techniques and employ best practices, but to pay attention to technical advances available to both the company and the hackers as well.