- 1798.100 – Consumers right to receive information on privacy practices and access information
- 1798.105 – Consumers right to deletion
- 1798.110 – Information required to be provided as part of an access request
- 1798.115 – Consumers right to receive information about onward disclosures
- 1798.120 – Consumer right to prohibit the sale of their information
- 1798.125 – Price discrimination based upon the exercise of the opt-out right
Is it possible for data that has undergone hashing to still be considered “personal information?”
Hashing refers to the process of using an algorithm to transform data of any size into a unique fixed sized output (e.g., combination of numbers). To put it in layman’s term, some piece of information (e.g., a name) is run through an equation that creates a unique string of characters. Anytime the exact same name is run through the equation, the same unique string of characters will be created. If a different name (or even the same name spelled differently) is run through the equation, an entirely different string of characters will emerge.
While the output of a hash cannot be immediately reversed to “decode” the information, if the range of input values that were submitted into the hash algorithm are known, they can be replayed through the hash algorithm until there is a matching output. The matching output would then confirm, or indicate, what the initial input had been. For instance, if a Social Security Number was hashed, the number might be reverse engineered by hashing all possible Social Security Numbers and comparing the resulting values. When a match is found, someone would know what the initial Social Security Number that created the hash string was. The net result is that while hash functions are designed to mask personal data, they can be subject to brute force attacks.
Whether a hash value in and of itself is considered “personal information” depends upon the particular law or regulation at issue.
In the context of the CCPA, information is not “personal information” if it has been “deidentified.”1 Deidentification means that the data “cannot reasonable identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.”2 An argument could be made that data once hashed cannot reasonably be associated with an individual. That argument is strengthened under the CCPA if a business takes the following four steps to help ensure that the hashed data will not be re-identified:3
- Implement technical safeguard that prohibit reidentification. Technical safeguards may include the process or techniques by which data has been deidentified. For example, this might include the hashing algorithm being used, or combining the hashed algorithm with other techniques that are designed to further obfuscate information (e.g., salting).4
- Implement business processes that specifically prohibit reidentification. This might include an internal policy or procedure that prevents employees or vendors from attempting to reidentify data or reverse hashed values.
- Implement business processes to prevent inadvertent release of deidentified information. This might include a policy against disclosing hashed values to the public.
- Make no attempt to reidentify the information. As a functional matter, this entails taking steps to prohibit reidentification by the business’s employees.
In comparison, in the context of the European GDPR, the Article 29 Working Party5 considered hashing to be a technique for pseudonymization that “reduces the linkability of a dataset with the original identity of a data subject” and thus “is a useful security measure,” but is “not a method of anonymisation.6 In other words, from the perspective of the Article 29 Working Party while hashing might be a useful security technique it was not sufficient to convert “personal data” into deidentified data.
Does the CCPA apply to information that has been de-identified?
The CCPA governs the collection, use, and disclosure of the “personal information” of California residents. The term “personal information” is defined broadly to include any information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.”1 The Act defines the term “de-identified” as a near inverse to personal information – i.e., information that “cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.”2 The one disparity between the definition of “personal information” and the definition of “de-identification” is that the former purports to apply to information relating to a “household” whereas the latter refers only to “consumers.” Given the history of the CCPA, this likely was a drafting oversight, although it remains to be seen whether courts will attempt to ascribe meaning to it.3
While de-identified information is, by definition, not “personal information” and, therefore, not subject to the CCPA, there is a great deal of uncertainty as to what level of obfuscation is required in order for information to not “reasonably” identify an individual. That confusion is due, in part, to the fact that de-identification is not a single technique, but rather a collection of approaches, tools, and algorithms that can be applied to different kinds of data with differing levels of effectiveness. In 2010, the National Institute of Standards and Technology (NIST) published the Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) that provides a set of instructions and de-identification techniques for federal agencies, which can also be used by non-governmental organizations on a voluntary basis. The guide defines “de-identified information” as “records that have had enough PII removed or obscured, also referred to as masked or obfuscated, such that the remaining information does not identify an individual and there is no reasonable basis to believe that the information can be used to identify an individual.”4 NIST identified the following five techniques that can be used to de-identify records of information:
- Suppression: The personal identifiers can be suppressed, removed, or replaced with completely random values.
- Averaging: The personal identifiers of a selected field of data can be replaced with the average value for the entire group of data.
- Generalization: The personal identifiers can be reported as being within a given range or as a member of a set (i.e., names can be replaced with “PERSON NAME”).
- Perturbation: The personal identifiers can be exchanged with other information within a defined level of variation (i.e., DOB may be randomly adjusted -5 or +5 years).
- Swapping: The personal identifiers can be replaced between records (i.e., swapping the ZIP codes of two unrelated records).
The European Union’ Article 29 Working Party identified the following additional de-identification techniques:5
- Noise Addition: The personal identifiers are expressed imprecisely (i.e., weight is expressed inaccurately +/- 10 lb).
- Differential Privacy: The personal identifiers of one data set are compared against an anonymized data set held by a third party with instructions of the noise function and acceptable amount of data leakage.
- L-Diversity: The personal identifiers are first generalized, then each attributed within an equivalence class is made to occur at least “l” times. (i.e., properties are assigned to personal identifiers, and each property is made to occur with a dataset, or partition, a minimum number of times).
- Pseudonymization – Hash Functions: The personal identifiers of any size are replaced with artificial codes of a fixed size (i.e., Paris is replaced with “01,” London is replaced with “02,” and Rome is replaced with “03”).
- Pseudonymization – Tokenization: The personal identifiers are replaced with a non-sensitive identifier that traces back to the original data, but are not mathematically derived from the original data (i.e., a credit card number is exchanged in a token vault with a randomly generated token “958392038”).
The uncertainty as to what counts as “de-identified” data is further complicated by the fact that different regulatory agencies, and legal systems, have historically applied different standards when assessing whether information is, or is not, capable of being re-associated to an individual. For example, the Federal Trade Commission indicated in its 2012 report Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers that the FTC’s privacy framework only applies to data that is “reasonably linkable” to a consumer.6 The report explains that “data is not ‘reasonably linkable’ to the extent that a company: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to re-identify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data.”7 With respect to the first prong of the test, the FTC clarified that this “means that a company must achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device.”8 Thus, the FTC recognizes that while it may not be possible to remove the disclosure risk completely, de-identification is considered successful when there is a reasonable basis to believe that the remaining information in a particular record cannot be used to identify an individual. The FCC adopted in its Broadband Privacy Order the FTC’s three-part de-identification test.9
The CCPA does not directly adopt the FTC’s recommended framework of requiring public commitments to re-identification, or explicitly mandating that contracts prohibit re-identification attempts, but does require that a company that believes that data is de-identified take the following four steps to proactively prevent re-identification:
- Implement technical safeguard that prohibit re-identification.
- Implement business process that specifically prohibit re-identification
- Implement business processes that prevent inadvertent release of de-identified information, and
- Make no attempt to re-identify the information.10
1. CCPA, § 1798.140(o)(1).
2. CCPA, § 1798.140(h).
3. The CCPA was put together quickly (in approximately one week) as a political compromise to address a proposed privacy ballot initiative that contained a number of problematic provisions. (For more on the history of the CCPA, you can find a timeline that illustrates its history and development on page 2 of BCLP’s Practical Guide to the CCPA). Given its hasty drafting there are a number of areas in which the act intentionally, or unintentionally, is at best ambiguous, at worst leads to unintended results.
4. National Institute of Standards and Technology, Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), (April 2010), available at http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf.
5. Article 29 Working Party, Opinion 05/2014 on Anonymization Techniques, WP216, http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf.
6. Federal Trade Commission, Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers, (March 2012), available at https://www.ftc.gov/sites/default/files/documents/reports/federal-trade-commission-report-protecting-consumer-privacy-era-rapid-change-recommendations/120326privacyreport.pdf.
7. Id. at iv.
8. Id. at 21.
9. Protecting the Privacy of Customers of Broadband and other Telecommunications Services, Notice of Proposed Rulemaking, WC Docket No. 16-106, 30 FCC Rcd ___ (2016), para. 106, available at http://transition.fcc.gov/Daily_
10. CCPA, § 1798.140(h)(1)-(4).
Can a service provider use and transfer personal information if they anonymize or aggregate it?
Section 1798.140(v) of the CCPA states that a service provider must be contractually prohibited from “retaining, using, or disclosing the personal information [provided to it by a business] for any purpose other than for the specific purpose of performing the services specified in the contract for the business.”1 The CCPA also states, however, that nothing within it restricts the ability of a business to “collect, use, retain, sell, or disclose consumer information that is “deidentified or in the aggregate consumer information.”2 The net result is that if a service provider has an interest in retaining, using, or disclosing the information that it receives from a client, the service provider can anonymize or aggregate the information in order to convert it from “personal information” (for which there are retention, use, and disclosure restrictions) to non-personal information (for which the CCPA imposes no such restrictions).
Anonymized data, sometimes referred to as “de-identified” data, refers to data that “cannot reasonable identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer.”3 While there are a number of strategies for converting a file that contains personal information into one that does not, the CCPA requires that a business that uses de-identified information take the following four steps to help ensure that the data will not be re-identified:4
- Implement technical safeguard that prohibit reidentification. Technical safeguards may include the process, or techniques, by which data has been de-identified. For example, this might include some combination of hashing, salting, or tokenization.
- Implement businesses process that specifically prohibit reidentification. This might include an internal policy or procedure that prevents employees or vendors from attempting to reidentify data.
- Implement business processes to prevent inadvertent release of deidentified information. Among other things, this might include safeguards to help prevent de-identified information from being accessed or acquired by unauthorized parties.
- Make no attempt to reidentify the information. As a functional matter, this entails that a business follow the policies that it enacts that prohibit reidentification.
It should be noted that the standard for “anonymization” or “de-identification” under the CCPA arguably differ from the standard for anonymization under the European GDPR. While the CCPA considers information that cannot “reasonably” identify an individual as anonymous, the Article 29 Working Party interpreted European privacy laws as requiring that data has been “irreversibly prevent[ed]” from being used to identify an individual.5
Aggregation is defined within the CCPA as information that “relates to a group or category of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device.”6 In common parlance, it refers to the situation where multiple consumer data points are combined so as to prevent the extrapolation of data as it relates to any particular consumer. For example, if Mary lives 5 miles from Company A, and Peter lives 10 miles from Company A, an aggregate value (e.g., consumers live, on average, 7.5 miles from Company A) cannot be used to extrapolate the distance of Mary or Peter.
From a practical standpoint, if a service provider intends to retain, use, or share anonymized or aggregated information, the parties should consider including within the service provider agreement a definition of “anonymization” and “aggregation” that matches the definitions of those terms used within the CCPA.
1. CCPA, Section 1798.140(v).
2. CCPA, Section 1798.145(a)(5) (emphasis added).
3. CCPA, Section 1798.140(h).
4. CCPA, Section 1798.140(v).
5. Article 29 Working Party, WP 216: Opinion 05/2014 on Anonymisation Techniques at 7, 20 (adopted 10 April 2014).
6. CCPA, Section 1798.140(a).