Streamlytics aims to reduce AI bias by helping users sell their data

The data tides are changing. Between the influx of regulations, Apple's new privacy controls, and greater concern around privacy issues, it's clear enterprises won't be able to collect and leverage data as they have been for much longer.

Streamlytics, a Miami-based data provider founded in 2018, believes letting users sell their data could be part of the solution. The company has already collected more than 100 million data points this way and says it aims to "democratize" data by giving users more control and then selling the user-supplied data to enterprises -- from media conglomerates to consumer goods companies. Streamlytics is particularly focused on working with Black users in the U.S. and on getting underrepresented data into AI training models.

To begin, people download their data from whatever platforms they choose and then upload it to Streamlytics' own consumer application, Clture. Users sign a data license that says they own their data (rather than Streamlytics or the companies that will later pay for it), request a payout, and get paid. The company's patent-pending data standard then unifies the various file types and data sources, packaging it up nicely for enterprise customers. Companies can only buy aggregate data, either via a data feed, a specific number of data points, or another offering customized for their use cases or specific needs. Users currently do not know who is purchasing their data and cannot opt out of specific companies, but Streamlytics is thinking about adding this capability in the future.

Streamlytics founder and CEO Angela Benton believes this approach is necessary not just to begin compensating users more equitably, but to make sure companies are using better, less biased data to build their technologies. She spoke with VentureBeat about the changing data privacy landscape and how Streamlytics wants to shift the status quo.

This interview has been edited for brevity and clarity.

VentureBeat: What does it mean to democratize data?

Angela Benton: For me, it's less about democratizing for companies because that pretty much exists right now. But consumers are just finally starting to really understand that they're creating data, how much of it they're creating, and that they don't really own it or have any agency over it. I personally tend to think about it in terms of how much money is being made (both in terms of revenue and companies leveraging data internally), and I feel like consumers should get some kind of compensation. People think of it as a negative -- that these companies are leveraging data. But to me, the real issue is that there's no equality in the relationship between the people who are creating the data and the people who are actually leveraging it. This is about leveling the playing field.

VentureBeat: And so why do we need to do this, especially in terms of enterprises? With people becoming more aware, there's of course the issue of customer trust. But how might this approach to data be good for enterprises, too?

Benton: When I talk about the importance of consumers having agency over their data, people tend to think of advertising and marketing. I actually think about it in terms of AI. Everything is going to be powered by AI. You can see its growth over not even the last 10 or five years, but the past two years. And so if the algorithms within the AI ecosystem aren't trained on data that is diverse, that correctly represents the gender and ethnic makeup of the world we live in today, that for me is the bigger problem. That's how you end up with bias. To me, that's the bigger implication of why data is so important.

VentureBeat: So it's not even just a question of if this current system is fair, but also getting better data. How can this change how enterprises acquire and use data while also improving the data itself?

Benton: I'll give you an example, and this is actually interesting because it applies to what happened in 2020 with the Black Lives Matter movement and everything that happened with George Floyd. You have a lot of advertisers who are interested in, I don't necessarily want to say "targeting" the African African American community, but they want to target them in a way where they're doing some kind of social good. And with the changes to third-party tracking, they don't really have a way to understand who is, for instance, African American, and who is not.

We're working with a large media company that has brands coming to it to do that. We're the largest first-party data provider for African American data that is sourced in this ethical way, and we can make recommendations based on activity that falls kind of within a demographic. So think about how you typically stream -- you're probably also looking at what you're going to order on UberEats, shopping on Amazon, and more. So if they want to understand African American women ages 18 to 24, what our data uniquely does is say they watch Bridgerton on Netflix, but also maps to other things they're doing and buying. It will say they buy wellness products more, and specifically what kinds.

These details allow the companies to make better decisions, and we're seeing some brands, for instance, interested in using this for product innovation. Another company we're talking to, for example, and this actually applies to the AI and bias we were talking about, is a top five technology company. And they want to allow our user base to provide images for training algorithms.

VentureBeat: I assume you're referring to Apple's recent iOS store update, which allows users to turn off third-party app tracking in favor of more privacy and is specifically designed to limit the data advertisers can access. And so you're saying that because your users are opting in to offer their data while so many people are opting out, you're able to get more and better data that's nicely packaged together, right when the traditional route is starting to become limited?

Benton: Exactly. We're actually at a critical moment for our business and the ecosystem, which is exciting. I don't think there's ever been a point in time where there's been a shift in how businesses are interacting with consumer data.

VentureBeat: But how do you make sure the training data is truly diverse, especially if it's a self-selected set of data? Eliminating bias has been one of the biggest challenges around AI and machine learning, so that's a big goal and claim. Are you saying your data can currently reduce or eliminate bias or that's what you're working toward?

Benton: What we're saying is that data partners currently do not focus on providing data from a specific group of people, and that's part of why training data isn't necessarily correct. A good example would be consumer banking. Maybe you live in a neighborhood that's at the beginning stages of being gentrified, but it was a house that was passed down to you, you're African American, and you might get declined for a loan based on your location information. The algorithm doesn't know. So what we can do, because we have data for this specific community, is a financial services company can come to us saying it wants to diversify its training dataset. Saying it wants 30% of the training dataset to include data from African American communities. That's probably the best way to think about how we're actually helping algorithms. And where we're really focused right now is getting the data into people's hands and then building a relationship and working with our partners to actually measure the success, particularly when it comes to artificial intelligence.

VentureBeat: How do you make sure the companies buying the data aren't using it to discriminate? How do you safeguard against abuse?

Benton: The customers we work with are generally all on the same page in terms of wanting to use data in an ethical fashion. It's not a typical sale where it's like, "Oh we need data, here's money and give us the data." There are a lot of things asked from us by our partners -- for instance, they want to know that we're CCPA-compliant and if someone wants to delete their data, how they get it out of their ecosystem. It's very intentional, and we've been collaborating with enterprise data privacy groups and brand-safe interest groups, as well.

VentureBeat: And so what exactly does this "ethically sourced" data look like?

Benton: For us, after users upload their data to Clture, we use a proprietary algorithm that prices the data in a more fair fashion. The pricing is dynamic, like the stock market. We consider the data source and determine the value of the company, looking at its market cap and how it uses data. So if the user uploads their Netflix data, we look at Netflix. We also look at how much data is in that file and multiply that by the data point valuation to determine how to actually value that specific package of data. And so as a result, people can make a significant amount of money. I think the largest amount we paid someone was like $1,100. So we have some super users who are really motivated to upload their data, as well as more average and less-active users. For us, that's ethical because we're not saying to hand over your data for free. We're also not saying "Here are a few pennies for your data." And we're not saying you don't continue to own your data. And as you can with any CCPA-compliant platform, you can request to have your data or account deleted. But this hasn't happened much -- maybe less than 1%. And that's interesting in itself, and I think it's because consumers are incentivized.

VentureBeat: There's certainly momentum, but can you lay out the challenges for this kind of shift? Are there technical and operational hurdles enterprises face, or is it just needing to change minds and challenge the (very profitable) status quo?

Benton: I think the bigger challenges are changing lines within an organization. Luckily, people are looking for solutions, just because they were used to doing business and leveraging data a certain way. Now they're like "What do we do?" And I really think it's exciting because people want to do the right thing. But I do also think the need for a data standard is going to be a huge challenge and important for enterprises. Without any kind of data standard, it's going to be very hard and it's going to be very messy.

Update at 6:30am Pacific: We updated this post to reflect that Streamlytics has collected 100 million data points to date, not 75 million.

More