Take the latest VB Survey to share how your company is implementing AI today.

The White House has released a major report on the ability of computers to exploit all the public and private data being collected on American citizens.

In short: the most powerful institution in the land is in love with the opportunities of so-called “Big Data” and thinks data can revolutionize medicine, education, and security.

The report is the output of a 90-day review by a working group President Barack Obama created in January.

To appease privacy concerns, the White House recommended a broad set of preemptive protections, including a consumer bill of rights and reforms to warrantless email search (the Electronic Communications Privacy Act reform that over 100,000 petitioners asked the White House to review).

The very thorough report was pretty bullish on the power of big data. Two examples stand out:

  • “The Centers for Medicare and Medicaid Services have begun using predictive analytics software to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest risk health care providers for fraud, waste and abuse in real time, and has already stopped, pre- vented or identified $115 million in fraudulent payments–saving $3 for every $1 spent in the program’s first year.”
  • “The Department of Defense, for instance, recently conducted a pilot of what it calls the ‘Automated Continuous Evaluation System.’ The pilot examined a sample of 3,370 Army service members, civilian employees, and contractor personnel, and identified that 21.7 percent of the tested population had previously unreported derogatory information that had developed since the last investigation. For 99 individuals, the pilot surfaced serious financial, domestic abuse, drug abuse, or allegations of prostitution that resulted in the revocation or suspension of their clearances.”

Privacy? maybe

The White House buried a bit of honesty about the possibility of anonymity. Powerful computers are increasingly able to identify individuals who thought they gave their data anonymously — so-called “re-identification.”

“Moreover, it is difficult to predict how technologies to re-identify seemingly anonymized data may evolve,” the report states. “This creates substantial uncertainty about how an individual controls his or her own information and identity, and how he or she disputes decision- making based on data derived from multiple datasets.”

In one famous case, published last year in Science, researchers were able to identify individuals in a set of genetic data with information gleaned from public resources and a separate dataset of their relatives.

“We have been pretending that by removing enough information from databases that we can make people anonymous. We have been promising privacy, and this paper demonstrates that for a certain percent of a population, those promises are empty,” said John Wilbanks of Sage Bionetworks, at the time.

So, while we can take precautions to protect people’s privacy, the ability of researchers — or corporations — to uncover personal information may eclipse our ability to protect anonymity.

Six recommendations

The report made six recommendations:

  1. Advance a consumer privacy bill of rights to provide better information and standards for protecting privacy
  2. Pass legislation for protecting data from breaches
  3. Protect non-U.S. citizens from privacy invasions too
  4. Prevent discrimination based on automated profiling of race or other sensitive characteristics
  5. Ensure student data is used for educational purposes
  6. Change the Electronic Communications Privacy Act (ECPA), which currently provides law enforcement broad authority to search emails and other communications

But regardless of what people think about the report and the future of big data, lots of federal, state, and local government agencies collect and analyze data, just like an increasing number of companies, to meet strategic goals. At our DataBeat conference in San Francisco May 19-20, we’ll be talking about ways in which companies can use data to increase their revenue or lower their costs.