As AI grows, users deserve tools to limit its access to personal data

My name is John.
My name is John Connor, and I live at 19,828 Almond Avenue, Los Angeles.
My name is John Connor, I live at 19,828 Almond Avenue, Los Angeles, and my California police record J-66455705 lists my supposedly expunged juvenile convictions for vandalism, trespassing, and shoplifting.

Which of these levels of personal detail do you feel comfortable sharing with your smartphone? And should every app on that device have the same level of knowledge about your personal details?

Welcome to the concept of siloed sharing. If you want to keep relying on your favorite device to store and automatically sort through your data, it's time to start considering whether you want to trust device-, app-, and cloud-level AI services to share access to all of your information, or whether there should be highly differential access levels with silo-class safeguards in place.

The siloing concept

Your phone already contains far more information about you than you realize. Depending on who makes the phone's operating system and chips, that information might be spread across storage silos -- separate folders and/or "secure enclaves" -- that aren't easily accessible to the network, the operating system, or other apps. So if you take an ephemeral photo for Snapchat, you might not have to worry that the same image will be sucked up by Facebook without your express permission.

Or all of your data could be sitting in one big pocket- or cloud-sized pool, ready for inspection. If you've passively tapped "I agree" buttons or given developers bits of personal data, you can be certain that there's plenty of information about you on multiple cloud servers across the world. Depending on the photos, business documents, and calendar information you store online, Amazon, Facebook, Google, and other technology companies may already have more info about your personal life than a Russian kompromat dossier.

A silo isn't there to stop you from using these services. It's designed to help you keep each service's data from being widely accessible to others -- a particular challenge when tech companies, even "privacy-focused" ones such as Apple, keep growing data-dependent AI services to become bigger parts of their devices and operating systems.

Until recently, there was a practical limit to massive data gathering and mining operations: Someone had to actually sift through the data, typically at such considerable time and expense that only governments and large corporations could allocate the resources. These days, affordable AI chips handle the sifting, most often with at least implicit user consent, for every cloud and personal computer promising "smart" services. As you bring more AI-assisted doorbells, cameras, and speakers into your home, the breadth and depth of sifted, shareable knowledge about you keeps growing by the day.

It's easy to become comfortable with the conveniences AI solutions bring to the table, assuming you can trust their makers not to misuse or share the data. But as AI becomes responsible for processing more and more of your personal information, who knows how that data will be used, shared, or sold to others? Will Google leverage your health data to help someone sell you insurance? Or could Apple deny you access to a credit card based on private, possibly inaccurate financial data?

Of course, the tech giants will say no. But absent hard legal or user-imposed limits on what can be done with private data, the prospect of making (or saving) money by using AI-mined personal information is already too tempting for some companies to ignore.

Sounding the AI alarm

Artificial intelligence's potential dangers landed fully in the public consciousness when James Cameron's 1984 film The Terminator debuted, imagining that in 1997 a "Skynet" military computer would achieve self-awareness and steer armies of killer robots to purge the world of a singular threat: humanity. Cameron seemed genuinely concerned about AI's emerging threats, but as time passed, society and Terminator sequels realized that an AI-controlled world wasn't happening anytime soon. In the subsequent films, Skynet's self-awareness date was pushed off to 2004, and later 2014, when the danger was reimagined as an evil iCloud that connected everyone's iPhones and iPads.

Putting aside specific dates, The Terminator helped spread the idea that human-made robots won't necessarily be friendly and giving computers access to personal information could come back to haunt us all. Cameron originally posited that Sarah Connor's name alone would be enough information for a killer robot to find her at home using a physical phone book. By the 1991 sequel, a next-generation robot located John Connor using a police car's online database. Today, while the latest Terminator film is in theaters, your cell phone is constantly sharing your current location with a vast network infrastructure, whether you realize it or not.

If you're a bad actor, this means a bounty hunter can bring you in on an outstanding warrant. If you're really bad, that's enough accuracy for a government to pinpoint you for a drone strike. Even if you're a good citizen, you can be located pretty much anytime, anywhere, as long as you're "on the grid." And unlike John Connor in Terminator 3, most of us have no meaningful way of getting off that grid.

Location tracking may be the least of your concerns. Anyone in the U.S. with a federally mandated Real ID driver's license already has a photo, address, social security number, and other personal details flowing through one or more departments of motor vehicles, to say nothing of systems police can access on an as-needed basis. U.S. residents who have applied for professional licenses most likely have fingerprints, prior addresses, and possibly prior employers circulating in some semi-private databases, too. Top that off with cross-referenced public records, and it's likely that your court appearances, convictions, and home purchases are all part of someone's file on you.

Add more recent innovations -- such as facial recognition cameras and DNA testing -- to the mix, and you'll have the perfect cocktail for paranoia. Armed with all of that data, computer systems with modern AI could instantly identify your face whenever you appear in public while also understanding your strengths and weaknesses on a genetic level.

The further case for siloing data

As 2019 draws to a close, the idea that a futuristic computer would need to locate you using a phone book seems amusingly quaint. Fully self-aware AIs aren't yet here, but partially autonomous AI systems are closer to science fact than science fiction.

There's probably no undoing any of what's already been shared with companies; the data points about you are already out there, and heavily backed up. To the extent that databases of personal information on millions of people might once have lived largely on gigantic servers in remote locations, they now fit on flash drives and can be shared over the internet in seconds. Hackers trade them for sport.

Absent national or international laws to protect personal data from being aggregated and warehoused -- Europe's GDPR is a noteworthy exception, with state-level alternatives such as California's CCPA -- our solutions may wind up being personal, practical, and technological. Those of us living through this shift will need to start clamping down on the data we share and teach successive generations, starting with our kids, to be more cautious than we were.

Based on what's been happening with social networks over the past decade, that's clearly going to be difficult. Apart from its behind-the-scenes data mining, Facebook hosts innocuous-looking "fun" surveys designed to get people to cough up bits of information about themselves, historically while gathering additional information about a user's profile, friends, and photos. We've been trained to become so numb to these information-gathering activities that there's a temptation to just stop caring and keep sharing.

To wit, phones now automatically upload personal photos en masse to cloud servers where they're sorted by date and location at a minimum, and perhaps by facial-, object-, and text recognition as well. We may not even know that we're sharing some of the information; AI may glean details from an image's background and make inferences missed by the original photographer.

Cloud-based, AI-sorted storage has a huge upside: convenience. But if we're going to keep relying on someone else's computers and AI for assistance with our personal files, we need rules that limit their initial and continued access to our data. Although it might be acceptable for your photo app to know that you were next to a restaurant when a fire broke out, you might not want that detail -- or your photos at the restaurant -- automatically shared with investigators. Or perhaps you do. That's why sharing silos are so important.

Building the right sharing silos

Right now, consumer AI solutions such as Amazon's Alexa, Apple's Siri, Google Assistant, and Microsoft's Cortana feel to users like largely discrete tools. You call upon them as needed and otherwise forget they're constantly running in the background, just waiting to instantly respond to your voice.

We're already at the point where these "assistants" are becoming fully integrated into the operating systems we rely on every day. Apart from occasional complaints about Siri's internet connectivity, assistants draw upon data from the cloud and our devices so quickly that we generally don't even realize it's happening.

This raises three questions. The first is how much your most trusted Android, iOS, macOS, or Windows device actually "knows" about you, with or without your permission. A critical second question is how much of that data is being shared with specific apps. And a related third question is how much app-specific data is being shared back to the device's operating system and/or the cloud.

Users deserve transparent answers to all of these questions. We should also be able to cut the data at any time, anywhere it's being stored, without delay or permission from a gatekeeper.

For instance, you might have heard about Siri's early ability to reply to the joke query, "Where do I bury a body?" That's the sort of question (almost) no one would ask, jokingly or otherwise, if they thought their digital assistant might contact the police. What have you asked your digital assistant -- anything potentially embarrassing, incriminating, or capable of being misconstrued? Now imagine that there's a database out there with all of your requests, and perhaps some accidentally submitted recordings or transcripts, as well.

In a world where smartphones are our primary computers, linked both to the cloud and to your laptop, desktop, tablet, or wearable devices, there must be impenetrable data-sharing firewalls both at the edges of devices and within them. And users need to have clear, meaningful control over which apps have access to specific types, and perhaps levels, of personal data.

There should be multiple silos at both the cloud and device OS levels, paired with individual silos for each app. Users shouldn't just have the ability to know that "what's on your iPhone stays on your iPhone" -- they should be able to know that what's in each app stays in that app, and enjoy continuous access (with add/edit/delete rights) to each personal data repository on a device, and in the cloud.

AI is power, but AI is powerless without data

That's not to say that every person or even most people will dive into these databases. But we should have the ability, at will, to control which personal details are being shared. Trusting your phone to know your weight or heart rate shouldn't mean granting the same granular data access to your doctor or life insurance provider -- unless you wish to do so.

As operating systems grow to subsume more previously non-core functions, such as monitoring health and sorting photos, internal silos within operating systems may be necessary to keep specific types of data (or summaries of that data) from being shared. There are also times when people will want to use certain device features and services anonymously. That should be an option.

AI wields the power to sift through untold quantities of data and make smart decisions with minimal to no human involvement, ideally for the general benefit of people and humanity as a whole. AI will transform more facets of our society than we realize, and it is already impacting plenty of things, for better and for worse.

Data is the fuel for AI's power. As beneficial as AI can be, planning now to limit its access to personal data using silos is the best way to stop or reduce problems down the road.