Canopy provides a blueprint for privacy-focused content recommendations

With the advent of cloud computing, ecommerce, and social media, it's difficult to keep tabs on who has access to our data, and harder still to know how much care they're taking with it. Indeed, barely a day goes by without some form of data breach, lapse, or privacy scandal coming to the fore. But "data misuse" constitutes a broad gamut of scenarios that reach far beyond poor security hygiene.

Online tracking and profiling is rife -- it turns out a heap of money can be made from knowing where you are, what you do, and what you like. It all comes down to selling things -- from products to playlists to a political ideology -- based on who you are. The Facebook and Cambridge Analytical scandal, which highlighted how social networks armed with vast banks of personal data could be leveraged to profile voters and micro-target them with personalized political ads, was something of a watershed moment for issues of data-privacy and abuse.

But personalization still permeates the digital sphere, from video- and music-streaming services to web searches and ecommerce apps. Companies vacuum up hordes of behavioral data, such as pages viewed, links clicked, videos watched, and songs listened to, and often combine this with other data, such as location, email address, and demographic information, to create a sort of virtual model of a person in the cloud. It's all about getting to "know" you and predicting what else you might like, whether you want these companies in your head or not.

Against this backdrop, Canopy is carving out a niche in a market ripe for privacy-focused tools. The New York-based company has created what it calls a "private content recommendation system" that doesn't send personal data back and forth between a remote server and the user's device -- all the data-processing and machine learning takes place on the device itself.

Canopy was founded in 2017 by CEO Brian Whitman, formerly cofounder and chief technology officer (CTO) at music intelligence company The Echo Nest, which he sold to Spotify in 2014 to power the latter's music recommendation algorithms through a combination of data mining, machine learning, and signal processing, among other mechanisms. Whitman left Spotify in 2016 to work on "personal projects and start something new in 2017," as he announced at the time.

To showcase its new technology, Canopy today launched its first product -- Tonic -- an iPhone-only app that uses on-device machine learning and differential privacy to suggest longer-form reads from across the internet.

Just the tonic

From the get-go, Tonic is all about letting users know it doesn't want any of their information. "Our editors work with custom-made machine learning tools to tailor the internet's hidden gems to your vibe," it proclaims, before noting that the user doesn't even have to log in or create an account.

"We're trying to build a more private and ethical internet," it continues.

As with similar content-recommendation systems, Tonic first asks users to select from a bunch of feature articles from across the web, recommending they choose at least five or so to get the ball rolling.

Inside the app, users can read all the articles they've just added. The personalized recommendations kick off in subsequent days via a daily dose of articles based on what Tonic thinks the user likes. Users can also guide this process by long-pressing on an article in their activity stream and then sliding a little controller to indicate how much they like a particular article.

When a user clicks to read an article, they're taken to the original publication's website, which is accessed via an in-app browser that's effectively an incognito window that rejects all cookies.

At its core, Canopy is more interested in finding out what type of content you like reading than in getting to know you as a person. Under the hood, its recommendation engine finds similarities in what people are reading across the board and makes other recommendations that might not be immediately obvious. This isn't a million miles away from how Spotify's personalized Discovery Weekly playlist works.

Canopy uses what is known as differential privacy, which limits access to only aggregate information from a broader group of users. Differential privacy essentially enables organizations to learn from a group's patterns without distinguishing between individuals within it. So Tonic won't store IP addresses or device IDs, for example. While content recommendation systems often aggregate data only after it arrives on their servers, Canopy doesn't want all that data to begin with.

"It's a crucial difference for our approach," Whitman wrote in a coauthored blog post explaining the technology earlier this year. "Even in the worst case of the encryption failing, or our servers being hacked, no one could ever do anything with the private models because they do not represent any individual."

There is one downside to this approach, however. If a user buys a new phone or wants to access Tonic on a different device, they will need to start completely from scratch -- there is no way to port their recommendations over.

Echo chambers

Heading up the broader Canopy product development is Matthew Ogle, who recently joined the company from Instagram, where he worked as a product manager. He was previously head of web product at early music-streaming pioneers Last.fm, before going on to join The Echo Nest and later Spotify, where he guided product development for Discover Weekly. So he knows a thing or two about tailored recommendations.

According to Ogle, Canopy uses both machine learning and humans to tailor its content recommendation engine. An editorial team, led by former New York Times' community editor Bassey Etim, works in concert with the engineers to curate the automated recommendations -- it's all about making sure humans have given all the content a look before passing it to Tonic. "There's a constant feedback loop between [the editors] and what the machine learning -- the clustering, the logic -- is doing," Ogle told VentureBeat in an interview.

The main reason for working this way, rather than giving everything over to the machines, has to do with echo chambers.

"We didn't want to take the entire internet and filter it using machine learning," Ogle continued. "We think we know the downsides of that and where that leads. One thing that we heard a lot during our research [was that] everyone loves personalization, but not to the point where it pigeonholes them."

Ultimately, it's all about ensuring quality and diversity.

"The best personalization products should reserve the right to surprise you, as well as gives you something that's really in your wheelhouse," Ogle added.

Canopy, which raised a small $4.5 million seed round of funding last year, hasn't indicated whether it plans to monetize Tonic -- or any subsequent similar apps it launches. The longer-term goal, it seems, is to license the underlying architecture to third-party developers who can engineer privacy into their own apps and market themselves as the antithesis of Facebook and its ilk.

"[Tonic] is a playground for experimenting with the tech," Annika Goldman, Canopy's head of product strategy and former Spotify exec, explained to VentureBeat. "We really have a vision of what a future internet ecosystem could look like. We're also currently in conversation with other companies about licensing the underlying private personalization technology to them."

While Tonic is in many ways a proof of concept designed to highlight its underlying technology, the company was quick to point out that it sees a future for both "Canopy-powered tech and Canopy-owned experiences," as Ogle put it.

In other words, we can expect to see Tonic get more bells and whistles in the future even if it signs deals with third parties.

Awareness

Despite some misguided claims that there hasn't been any meaningful consumer backlash against big technology companies, high-profile scandals like that of Facebook and Cambridge Analytica -- and countless others -- only bolster a growing sense that having companies track our every digital move is not a good thing.

"Consumers are waking up to the idea that you don't have to trade your digital identity for a great experience online," Whitman told VentureBeat. "Until recently, there weren't any alternatives to big platforms, but now there is a movement of smaller tech companies giving people amazing experiences without exploiting their data."

For the best part of the last two decades, Whitman has been involved in some form of music recommendation work, but it was at The Echo Nest and latterly Spotify that he became acutely aware of how algorithms and personalization were shaping user privacy. "I'm especially reflective these days on the role of prediction, privacy, information retrieval, and machine learning on our culture," he said at the time of his departure from Spotify.

Spotify hasn't exactly been engulfed in the kind of data privacy scandals that have hit peers like Google, Facebook, and Amazon, but the company holds a wealth of personal and emotional data that is tied to its users' listening habits, and it can even target ads based on someone's mood. More recently, Spotify also started to demand users' GPS data to verify subscribers on its family plan. Speaking at a VentureBeat conference several years ago, Spotify executive Brian Benedik agreed that the company collects an "enormous amount of data on what people are listening to, where, and in what context." "It really gives us insight into what these people are doing," he added.

So while Spotify is seen as a relatively good actor in the privacy space, the ingredients for a privacy debacle are arguably in place.

Before the Facebook and Cambridge Analytica incident, most people were probably aware that data they voluntarily gave to Facebook, such as when they "like" a company's page or update their relationship status, could be used for advertising. But most users were probably not aware that Facebook could also track their off-Facebook activities, given that any website that integrates Facebook's technology (such as the Like button) feeds into the social media giant's gargantuan web-tracking operation. After Cambridge Analytica, people really started paying attention to what was going on, even if they didn't immediately change how they use the web.

More broadly, a growing awareness of how the web works has made the topic of data privacy top of mind for governments, companies, and at least some consumers. The European General Data Protection Regulation (GDPR), which launched last year, forced firms to rethink their data-harvesting practices or face mammoth penalties. This has, in turn, opened up a host of opportunities for well-financed startups capitalizing on the fledgling privacy tool industry. Meanwhile, tech companies like Firefox-creator Mozilla have pushed to align themselves with "privacy" rather than "privacy breaches."

All of which is to say that demand for privacy-conscious software will likely explode in the years ahead.

"We're seeing increasing regulation -- GDPR is just the start -- and we're seeing increasing scrutiny of technology," Goldman said. "And so a lot of companies are saying, 'I want to get ahead of that, I want to invest in technology that enables me to do all the amazing things I want to do, but keep data private and secured.'"