Is it possible for data that has undergone hashing to still be considered “personal information?”


Hashing refers to the process of using an algorithm to transform data of any size into a unique fixed sized output (e.g., combination of numbers).  To put it in layman’s term, some piece of information (e.g., a name) is run through an equation that creates a unique string of characters.  Anytime the exact same name is run through the equation, the same unique string of characters will be created.  If a different name (or even the same name spelled differently) is run through the equation, an entirely different string of characters will emerge.

While the output of a hash cannot be immediately reversed to “decode” the information, if the range of input values that were submitted into the hash algorithm are known, they can be replayed through the hash algorithm until there is a matching output.  The matching output would then confirm, or indicate, what the initial input had been.  For instance, if a Social Security Number was hashed, the number might be reverse engineered by hashing all possible Social Security Numbers and comparing the resulting values.  When a match is found, someone would know what the initial Social Security Number that created the hash string was. The net result is that while hash functions are designed to mask personal data, they can be subject to brute force attacks.

Whether a hash value in and of itself is considered “personal information” depends upon the particular law or regulation at issue.

In the context of the CCPA, information is not “personal information” if it has been “deidentified.”1  Deidentification means that the data “cannot reasonable identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.”2  An argument could be made that data once hashed cannot reasonably be associated with an individual.  That argument is strengthened under the CCPA if a business takes the following four steps to help ensure that the hashed data will not be re-identified:3

  1. Implement technical safeguard that prohibit reidentification.  Technical safeguards may include the process or techniques by which data has been deidentified.  For example, this might include the hashing algorithm being used, or combining the hashed algorithm with other techniques that are designed to further obfuscate information (e.g., salting).4
  2. Implement business processes that specifically prohibit reidentification.  This might include an internal policy or procedure that prevents employees or vendors from attempting to reidentify data or reverse hashed values.
  3. Implement business processes to prevent inadvertent release of deidentified information.  This might include a policy against disclosing hashed values to the public.
  4. Make no attempt to reidentify the information. As a functional matter, this entails taking steps to prohibit reidentification by the business’s employees.

In comparison, in the context of the European GDPR, the Article 29 Working Party5 considered hashing to be a technique for pseudonymization that “reduces the linkability of a dataset with the original identity of a data subject” and thus “is a useful security measure,” but is “not a method of anonymisation.6  In other words, from the perspective of the Article 29 Working Party while hashing might be a useful security technique it was not sufficient to convert “personal data” into deidentified data.