Check out all the on-demand sessions from the Intelligent Security Summit here.


Data privacy concerns from Americans are on the rise. Lacking any sweeping legislation on a national level, such as Europe’s GDPR laws, Americans feel weary and vulnerable to data collection done both by companies and by the government. 

According to Pew Research, 81% say the risks outweigh the benefits of data collection from companies, and 61% feel the same way related to government data collection. And it’s not just talk — 52% say they’ve decided not to use a good or service specifically because of data collection and privacy concerns. 

Federal legislators are working to address this. In 2021, 27 privacy bills were passed by states aimed at reigning in the tech industry’s loose handling and sale of personal data. So far in 2022, Utah and Connecticut joined the likes of California, Colorado and Virginia in passing their own state data privacy laws — which go into effect in 2023.

“One of the important things about data privacy is that privacy is contextual,” said Os Keyes, a Ph.D. candidate at the University of Washington’s department of human-centered design and engineering who researches data ethics, medical AI, facial recognition, gender and sexuality.

Event

Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

Data, Keyes explained, can become deanonymized quickly when put into context with other data about you. One dataset combined with another from a different source can reveal a lot, rather quickly, and sometimes that can become dangerous. 

“All you need to do is be able to stitch existing datasets together,” said Keyes.

Government agencies, like the U.S. Census Bureau, are taking a closer look at their data privacy practices and responsibilities. Looking ahead to the 2030 census, the Bureau this year opened up a comment period for experts like Keyes to weigh in on its data anonymization efforts and inform the agency about how to improve before gathering the next decade of data.

Testing datasets to see what works to find what doesn’t

Keyes and colleague Abraham (Abie) Flaxman, associate professor of health metrics science and global health at the University of Washington, set out to test a major hypothesis for the Census Bureau: Could transgender teenagers be outed and identified using simulated datasets? 

The unfortunate answer, the two found, was yes. By using the Census Bureau’s data-anonymization approach from the 2010 census, Keyes and Flaxman were able to identify 605 transgender teenagers. Although it was a simulation to specifically test for this purpose, it reveals how easily personally identifiable information (PII) can be de-anonymized, which in the case of transgender teens could put them at risk for hate crimes or their parents at risk for child abuse charges for seeking gender-affirming medical care for their child — depending on where they live.   

“We took simulated data designed to mimic the data sets that the Census Bureau releases publicly and tried to re-identify trans teenagers, or at least narrow down where they might live, and unfortunately, we succeeded,” they wrote in a piece for The Scientific American.

Although alarming, the simulation’s results are why the Census Bureau opened up a comment period — to see what may not be working and where they could improve so that this doesn’t actually happen in the future. 

“We found it encouraging that Os and Abie’s work helps validate our concerns and decisions for 2020 and beyond,” said Daniel Kifer, senior advisor for formal privacy to the Census Bureau’s 2020 decennial census disclosure avoidance system development team. “Specifically, privacy is about protecting how you differ from everyone else; perceptions about what information is private can change over time; data can be misused and attacked in many different ways that are difficult to anticipate.”  

The limits of protecting privacy

Kifer pointed out that although this happened with the Census Bureau’s 2010 approach to the simulated data, Keyes and Flaxman’s simulation still “can do no better than random guessing when the attacker uses the Census Bureau’s demonstration data products based on the 2020 Census disclosure avoidance system, but is much more successful against legacy techniques that the agency used prior to the 2020 decennial product releases.”

The 2020 product release was a new differential-privacy approach specifically aimed at improving privacy protections for census data.

Keyes and Flaxman confirmed Kifer’s assertion and said that when they used the Census Bureau’s new approach to data privacy, it cut the identification rate of transgender teens back by 70%. All three underscored the importance for the agency to continue its work and become even better before it embarks on the 2030 Census undertaking.

“The Census Bureau has come back to say it’s not possible to have a 100% decrease. They believe that there’s always some sort of chance disclosure — and I think they’re right about it,” said Flaxman. “So we’ve had this back and forth with the Bureau, where we’ve been trying to figure out what is the limit of protecting privacy and have they reached it? I think what’s quite clear to me at this point is that their machine is capable of achieving that kind of optimal privacy. They’re now at the stages of making their final decision about where they’re going to set the knobs on their machine to improve it for 2030.”

Designing better data privacy

The Census Bureau, founded in 1902, is probably not what most think of when looking at who is on the forefront of data innovation with a machine that is capable of optimizing privacy to its fullest extent — but the agency actually has a long history of doing just that. 

“Part of this innovation is driven every 10 years by the decennial census and the significant scrutiny that it receives,” Kifer told VentureBeat. “As the largest federal statistical agency, the Census Bureau runs other surveys and also collects statistical data on behalf of other agencies. Necessity and access to data has given the Census Bureau a tremendous advantage in innovating collection, analysis, and dissemination, as well as finding new applications for the data.”

Much of the Bureau’s innovation around data privacy and collection, Kifer explained, has come from research communities that have worked to turn privacy into “a mathematical science that is compatible with policy and regulations.”

Continuing to find ways to innovate data gathering and privacy practices is not just important for the Census Bureau, he explained, but for the entire U.S. federal statistical system.

“High-quality data are needed to support policy making decisions,” said Kifer. “The population is changing, the important policy questions are changing, and the data needs are changing.” 

When data needs change, one of the Census Bureau’s goals is to adapt because the agency’s access to data and the latest research drives its innovation even further.

The way a 120-year-old government agency can become swift, proactive and agile to adapt to changing data and population needs says a lot about the plays in other industries that may claim privacy is too challenging to adapt to, Keyes and Flaxman pointed out.

“It tells us that there is a tension in privacy, which we sort of abstractly know,” Keyes said. “This tension is really worth paying attention to. This idea, as some big data hype people say ‘privacy is dead,’ — really it’s not. What we’re seeing here is not only proof that we should not just throw privacy away, but also that there are techniques for thoughtfully, sensibly protecting people… There are all the stereotypes of the government being the problem rather than the solution. I think it’s nice to see an instance where, actually, the U.S. Census — they are ahead of the curve on this.” 

No excuse to not prioritize data privacy

What this really highlights, Keyes and Flaxman agreed, is that private companies have no excuse for not prioritizing data privacy — or claiming they can’t be perfect in the face of regulations compelling them to do so. 

Because the Census Bureau is required to consider privacy as part of its function, it has found a way to do this while optimizing privacy to derive policy-impacting insights from data without sacrificing innovation, Keyes explained.

“I think it is a really interesting example to hear people say, ‘Oh, you can’t regulate private industry around privacy because it’ll banish innovation, and it won’t work.’ Well, here we have an example of both of those things being false,” said Keyes.

“Not only will it work,” said Keyes, “but the Census Bureau is actually responsible for a lot of really interesting and intricate privacy protection mechanisms, and also answers to questions like, OK, how do we link records across datasets in a way that is robust when we have these privacy protections in place?’ They are under heavy regulation, and still innovating. A big part of the lesson is that there is no contradiction between regulation and doing things better. If anything, it’s the other way around.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.