This article is part of a VB special issue. Read the full series here: How Data Privacy Is Transforming Marketing.

Personal data doesn’t have to identify you to be personal. With the right technologies and artificial intelligence (AI), even a string of random numbers can be combined with other information to discover your identity, and become personally identifiable information (PII). 

This raises significant data collection challenges for organizations that need to collect data to generate insights and optimize customers’ experience, without leaving PII exposed to mismanagement or unauthorized third parties. 

At the same time, regulations like the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) expect enterprises to be much more proactive and transparent about how they process PII. 

Any mistake in collecting or processing this data can result in costly regulatory penalties and legal action, as Facebook parent company Meta found out last month after it received a $400 million fine for exposing children’s personal data on Instagram. But what is PII exactly? 

Event

Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

Register Now

What data counts as PII? 

Before an organization can protect PII, it needs to identify what type of data falls under this classification. This is difficult because there is no universal definition of PII. According to Gartner VP analyst, Bart Willemsen, what PII is, “depends on who you ask.” 

Regulators in the U.S. and EU, for example, have different opinions on what constitutes PII. 

“In the U.S., with their fragmented and mostly absent privacy legislation, PII historically refers to two or three dozen identifiers like name, address, SSN, driver’s license or credit card number and such,” Willemsen said. 

However, in regions like the EU, and jurisdictions including China, Brazil, and states like California and Virginia, PII can be “anything that directly or indirectly identifies or assists in the identifiability of an individual,” Willemsen said. 

These core differences result in a different perception of PII under each regulator. For instance, categories of data that are PII under the GDPR but not under the CCPA include racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, sexual orientation and others. 

Willemsen also highlights that personal data is anything that says something about a person, that can be used to single out an individual even if they “remain nameless at first sight.” 

As a result, any information that carries a privacy risk, when processed outside of its original purpose, can be considered personal data. Generally, if this personal data can be combined with other information to reidentify an individual, then it becomes PII.

However, organizations need to remember that developments in AI and machine learning are constantly changing how much data is needed to re-identify an individual. As AI solutions become more complex, it will take fewer data to tie a user to their online identity. In short, user anonymity could become a myth. 

Case study: Is GPS data personal data? 

There’s an argument to be made that even GPS data can be considered PII.

“Location data has long been considered personal information, when tied to personal identifiers, like a name or phone number or device ID,” said Cobun Zweifel-Keegan, managing director of the International Association of Privacy Professionals (IAPP). 

So if GPS data can tie to a specific individual or device, it can be considered PII. However, Zweifel-Keegan notes that it’s not classified as personal data when it’s de-identified and stripped of identifying information that can be tied back to an individual or if the location data gathered isn’t precise enough.

In August of this year, the FTC announced it was suing Kochava for allegedly selling the personal GPS data of customers who’ve visited reproductive health clinics, places of worship, homeless and domestic violence shelters and addiction recovery facilities. 

Although, it’s important to note that Kochava maintains it does not collect GPS data with its SDK and doesn’t sell any of the data it collects on behalf of its customers to third parties.

“The FTC and this lawsuit wrongly assert that the Collective data marketplace provides real-time identifiable information about consumer activity that crosses locations that could be considered sensitive locations,” said Founder and CEO of Kochava, Charles Manning in a statement. 

While this is a single case with its own nuances — where the FTC suggests that Kochava wasn’t just collecting GPS data, but also selling it to third parties — it still highlights that this is something regulators are paying close attention to. 

The FTC argues that Kochava’s data processes took information from mobile devices and packaged it into customized data feeds that matched unique mobile device identification numbers with time-stamped latitude and longitude locations, which it sold to third parties who could potentially use the information to re-identify the users. 

In this instance, according to the FTC’s press release, the agency “alleges that by selling data tracking people, Kochava is enabling others to identify individuals and exposing them to threats of stigma, stalking, discrimination, job loss, and even physical violence.” 

The regulatory crackdown on PII blunders 

Another problem surrounding the management of PII is that regulators are constantly developing new expectations for how organizations should manage and process it. 

Peter Hoff, vice president, security and risk at IT consulting company Wursta, suggests there is an ongoing crackdown on the mismanagement of PII across the U.S. 

“In 2023, five states are cracking down on what information companies gather about their customers and how that information is used and shared,” Hoff said. 

“At the same time, information security will become more of a concern at the federal level, with the U.S. government focusing on preventing American companies from knowingly or unknowingly putting customer PII and intellectual property into the hands of foreign governments,” he added. 

It does appear that regulators are being much less forgiving in how they assess data protection practices. Just last month, Morgan Stanley Smith Barney received a fine of $35 million for failing to properly dispose of approximately 15 million customers’ PII data.

The financial giant received this fine for hiring a moving and storage company that failed to conduct adequate data destruction and decommissioning devices before selling servers and hard drives containing PII to unauthorized third parties.  

How to make PII anonymous: Reverse engineering 

While managing PII in a way that’s compliant with international and domestic data protection regulations can be challenging, enterprises can mitigate the risks by periodically testing whether their users’ personal data can be re-identified. 

“Privacy should be integrated into the design of new products and services while trying to balance legitimate business interests,” said Criss Bradbury, principal and U.S. cyber data risk leader at Deloitte Advisory. 

“It’s important for organizations to select a data privacy rationalized framework that aligns well with their organizational strategic objectives — and to regularly test to confirm that data cannot be reverse-engineered to identify an individual,” Bradbury said. 

Without testing whether data can be reverse-engineered, an organization has no guarantee that PII can’t be recompiled to identify the end user. 

At the same time, Bradbury highlights that solutions offering encryption (at rest, in transit, in use), role-based access control (RBAC) and identity and access management (IAM) can mitigate some challenges surrounding managing personal data. 

Although de-identifying personal data is an option, it’s easier said than done, particularly when considering that some privacy regulations mandate that de-identified data be confirmed as such by an “expert determination” and a statistical analysis that confirms it can’t be traced back to an individual. 

Organizations that do attempt to de-identify PII should consider the risk that anonymous data can be combined with other information to discover a user’s identity.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.