With the advent of cloud computing, e-commerce, and social media, it’s difficult to keep tabs on who has access to our data, and harder still to know how much care they’re taking with it — barely a day goes by without some form of data-breach, lapse, or privacy scandal coming to the fore. But what constitutes “data-misuse” is covered by a broad gamut of scenarios that reach beyond poor security hygiene.

Online tracking and profiling is rife — it turns out there is a heap of money to be made from knowing where you are, what you do, and what you like. It all comes down to personalization: selling things, be it products, playlists, or a political ideology, based on who you are. The Facebook and Cambridge Analytical, which highlighted how social networks armed with vast banks of personal data could be leveraged to profile voters and micro-target with personalized political ads, was something of a watershed moment in terms of elevating the issue of data-privacy and abuse into the public consciousness.

Personalization permeates the digital sphere though, from video- and music-streaming services, to web-searches and e-commerce apps — and this usually comes at a cost. Companies vacuum up hordes of behavioral data, such as pages-viewed, links-clicked, videos-watched, songs listened-to, and often combine it with other data such as location, email address, and demographic information, to create a sort of virtual model of a person in the cloud. It’s all about getting to “know” you and predicting what else you might like. And this may not be to everyone’s tastes.

It’s against that backdrop that Canopy is setting out to carve its niche in a market ripe for privacy-focused tools. The New York-based company has created what it calls a “private content recommendation system” that doesn’t send personal data back-and-forth between a remote server and the user’s device — all the data-processing and machine learning takes place on the device itself.

Canopy was founded in 2017 by CEO Brian Whitman, formerly cofounder and chief technology officer (CTO) at music intelligence company The Echo Nest, which he sold to Spotify in 2014 to power its music recommendation algorithms through a combination of data mining, machine learning, signal processing, among other mechanisms. Whitman left Spotify in 2016 to work on “personal projects and start something new in 2017,” he announced at the time.

To showcase its new technology, Canopy today launched its first product — Tonic — an iPhone-only app that uses on-device machine learning and differential privacy to suggest longer-form reads from across the internet.

Just the tonic

From the get-go, Tonic is all about letting the user know that it doesn’t want any of their information. “Our editors work with custom-made machine learning tools to tailor the internet’s hidden gems to your vibe,” it proclaims, before noting that the user doesn’t even have to log-in or create an account.

“We’re trying to build a more private and ethical internet,” it continues.

As with other similar content-recommendation systems, Tonic first asks the users to select from a bunch of feature articles from the across the web, recommending that around 5 should be enough to get the ball rolling for future recommendations.

Inside the app, users can read all the articles they’ve just added. The personalized recommendations kick off in subsequent days via a daily dose of fresh articles delivered based on what Tonic thinks the user likes. Users can also guide this process by long-pressing on an article in their activity stream, and then sliding a little controller to indicate how much they like a particular article.

When the user clicks to read an article, they’re taken to the original publication’s website which is accessed via an in-app browser that’s effectively an incognito window that rejects all cookies.

At its core, Canopy is most interested in what types of content the user likes reading rather than really getting to know who you are as a person. Under the hood, its recommendation engine find similarities in what people are reading across the board, and makes other recommendations that might not be immediately obvious where the connection lies. This isn’t a million miles away from how Spotify’s popular algorithmic personalized Discovery Weekly playlist works.

Canopy uses what is known as differential privacy, which limits access to only aggregate information from a broader group of users. Differential privacy essentially enables organizations to learn from group patterns without distinguishing between individuals within it. So Tonic won’t store IP addresses or device IDs, for example. Other content recommendation systems often aggregate all its data only after it arrives on their servers, something that Canopy is keen to avoid. It just doesn’t want all that data to begin with.

“It’s a crucial difference for our approach,” Whitman wrote in a co-authored blog post explaining the technology earlier this year. “Even in the worst case of the encryption failing, or our servers being hacked, no one could ever do anything with the private models because they do not represent any individual.”

There is one downside to this approach though. If a user buys a new phone, or wants to access Tonic on a different device, they will need to start completely from scratch — there is no way to port their recommendations over.

Echo chambers

Heading up the broader Canopy product development is Matthew Ogle, who recently joined the company from Instagram where he worked as a product manager. Before that, he was head of web product at early music-streaming pioneers Last.fm, before going on to join the Echo Nest and later Spotify, where he guided product development for Discover Weekly. So he knows a thing or two about tailored recommendations.

According to Ogle, Canopy uses both machine learning and humans as part of its content recommendation engine. It has an editorial team, led by former New York Times’ community editor Bassey Etim, which works in concert with the engineers to curate the automated recommendations — it’s all about making sure that humans have given all the content a once-over before passing it into Tonic. “There’s a constant feedback loop between them [editors] and what the machine learning — the clustering, the logic — is doing,” Ogle told VentureBeat in an interview.

The main reason for working this way, rather than giving over entirely to the machines, relates to echo chambers.

“We didn’t want to take the entire internet and filter it using machine learning,” Ogle continued. “We think we know the downsides of that and where that leads. One thing that we heard a lot during our research [was that] everyone loves personalization, but not to the point where it pigeonholes them.”

Ultimately, it’s all about ensuring quality and diversity.

“The best personalization products should reserve the right to surprise you, as well as gives you something that’s really in your wheelhouse,” Ogle added.

Canopy, which raised a small $4.5 million seed round of funding last year, hasn’t given any indication as to whether it plans to monetize Tonic — or any subsequent similar apps it launches. The longer term goal, it seems, is to license the underlying architecture to third-party developers who can engineer privacy into their own apps and market themselves as the antithesis of Facebook and its ilk.

“This [Tonic] is a playground for experimenting with the tech,” explained Annika Goldman, Canopy’s head of product strategy and former Spotify exec, to VentureBeat. “We really have a vision of what a future internet ecosystem could look like. We’re also currently in conversation with other companies about licensing the underlying private personalization technology to them.”

While Tonic is very much is a proof-of-concept in many ways, designed to highlight its technology, the company was keen to point out that it sees a world with both “Canopy-powered tech and Canopy-owned experiences,” as Ogle put it.

In other words, we can expect to see Tonic get more bells and whistles in the future even if it signs deals with third-parties.

Awareness

Despite some misguided claims that there isn’t any meaningful consumer backlash against big technology companies, high-profile scandals such as Facebook and Cambridge Analytica andcountless others, can only bolster a growing sentiment that companies’ tracking our every digital move is not a good thing.

“Consumers are waking up to the idea that you don’t have to trade your digital identity for a great experience online,” Whitman told VentureBeat. “Until recently, there weren’t any alternatives to big platforms, but now there is a movement of smaller tech companies giving people amazing experiences without exploiting their data.”

For the best part of the last two decades, Whitman has been involved in some form of music recommendation work, but it was at the Echo Nest and latterly Spotify where it seems he became aware of how algorithms and personalization were shaping user-privacy. “I’m especially reflective these days on the role of prediction, privacy, information retrieval, and machine learning on our culture,” he said at the time.

Spotify hasn’t exactly been engulfed in the kind of data-privacy scandals of its technology peers, such as Google, Facebook, or Amazon, but the company holds a wealth of personal and emotional data that is tied to its users listening habits, and it can even target ads based on someone’s mood. More recently, Spotify also started to demand users’ GPS data to verify subscribers on its family plan. So while Spotify is seen as a relatively good actor in the privacy space, there is definitely the ingredients there for things to simmer over.

Speaking at a VentureBeat conference several years ago, Spotify executive Brian Benedik agreed that the company collects an “enormous amount of data on what people are listening to, where, and in what context,” he said. “It really gives us insight into what these people are doing.”

For many, though, it was the Facebook and Cambridge Analytica episode that truly brought to light the full extent of how big tech platforms can be used to profile users. Most people were probably already aware that data they voluntarily gave to Facebook, such as when they “like” a company’s page or update their relationship status, can be used for advertising. But most people were probably not aware that Facebook could also track their off-Facebook activities, given that any website that has integrated Facebook’s technology (such as the Like button) feeds into Facebook’s gargantuan web-tracking operation. After Cambridge Analytica, people really started to pay attention to what was going on, even if they didn’t make any immediate changes to how they use the web.

More broadly, a growing awareness of how the web works has made the topic of data-privacy top-of-mind for governments, companies, and at least some consumers. The European General Data Protection Regulation (GDPR), which launched last year, forced firms to rethink their data-harvesting practices or face mammoth penalties. In turn, this has opened up a whole host of opportunities for well-financed startups to capitalize on a fledgling industry built around privacy tools, while tech companies such as Firefox-creator Mozilla have pushed to align themselves with “privacy” rather than “privacy breaches.”

Software that promise privacy as a core-selling point will likely explode in demand in the years ahead.

“We’re seeing increasing regulation — GDPR is just the start — and we’re seeing increasing scrutiny of technology,” Goldman said. “And so a lot of companies are saying, ‘I want to get ahead of that, I want to invest in technology that enables me to do all the amazing things I want to do, but keep data private and secured’.”