Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
“The Terminator,” “The Matrix,” “I, Robot.”
All of these are films where machines become sentient and attempt to take over the world (or at least kill all humans). It’s a popular plot line because it speaks to our deep-seated fears about technology. Will our devices and the data they collect be used against us as we move toward Web3?
It’s not just Hollywood paranoia. In recent years, we’ve seen increasing evidence that our data is being used in ways we never intended or anticipated. The Cambridge Analytica scandal showed how Facebook data was harvested and used to manipulate voters in the U.S. presidential election.
Google has been fined for collecting data from children without their parent’s consent. And facial recognition technology is being used by law enforcement and corporations with little regulation or oversight.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
In this article, we will delve into the dangers of unfettered data pipelines and how blockchain technology, particularly as we move towards Web3, can potentially reduce the opacity of the black box of algorithms.
The world runs on algorithms
We are living in an age where algorithms are increasingly making decisions for us. They decide what we see on social media, the ads we might like and who gets a loan and who doesn’t.
Algorithms can be simple, like the one that decides what order to show results in a search engine. Or they can be more complex, like the ones used by social media companies to decide which posts to show us in our newsfeeds.
Some of these algorithms are designed to be transparent. We know how Google’s search algorithm works, for example. But many others are opaque, meaning we don’t know how they work or what data they use to make decisions.
This lack of transparency is concerning for a number of reasons. For one, it can lead to biased decisions. If an algorithm is using race or gender as a factor in its decision-making, that bias will be reflected in the results.
Second, opaque algorithms can be gamed. If we don’t know how an algorithm works, we can’t figure out how to game it. This is why many companies keep their algorithms secret — they don’t want people to manipulate the system.
Finally, opaque algorithms are difficult to hold accountable. If an algorithm makes a mistake, it can be hard to figure out why or how to fix it. This lack of accountability is particularly problematic when algorithms are used for important decisions, like whether or not someone gets a loan or a job.
The dangers of data pipelines
The problem with algorithms is that they are only as good as the data they are using. If the data is biased, the algorithm will be biased. If the data is incomplete, the algorithm will make inaccurate predictions.
And often, the data that algorithms use is far from perfect. It comes from a variety of sources, including social media, sensors, and government databases. This data is then collected and processed by a variety of companies before it ever reaches the algorithm.
Each step in this process introduces potential errors and biases. Social media data, for example, is often unrepresentative of the population as a whole. And sensors can be inaccurate. The result is a data pipeline that is often opaque, biased, and difficult to hold accountable.
Admittedly, killer robots lean in a bit towards the fantastical — but there are more discrete ways for your data to be exploited by the powers that be. So what is your data being used for? Here are some possibilities:
How your data is used
- To score your political views and manipulate you during election season
- To sell you products you don’t need
- To track your location and movements
- To target you with ads
- To prevent you from getting a job or loan
These are just a few examples — the list goes on. And it’s not just corporations that are doing this. Government agencies are using data to track citizens, predict crime, and even fight wars.
In short, data is being used to control and manipulate people in a variety of ways. And often, these uses are hidden from the people who are being affected.
Web3 and the potential of blockchain-based data markets
One potential solution to the problems with data pipelines is a blockchain-based data market. In this type of market, data would be collected and stored on a decentralized network.
This would have a number of advantages. For one, it would make the data pipeline more transparent. We would know where the data came from and how it was collected. Also, it would make the data more trustworthy. If the data is stored on a decentralized network, it would be much harder to manipulate. This type of data storage could become an even more important concept as we move toward Web3.
Finally, it would make the data more accessible. Anyone would be able to access the data and use it to build algorithms. Privacy would not be an issue because the data would be anonymized and there are mechanisms to prevent misuse.
For instance, the Ocean Protocol is a decentralized data exchange protocol that enables data sharing while maintaining data privacy. It is built on the Ethereum blockchain and uses smart contracts to ensure that data is only shared with parties who have permission to use it.
The Ocean Protocol could be used to create a data market where data is collected, stored, and distributed in a transparent and trustless way. This would allow data to be used more efficiently and could help to solve some of the problems with the current data pipeline.
Of course, you can see how this is one of the greatest frontiers of Web3 since data is the lifeblood of the new internet.
Overcoming the challenges of blockchain-based data markets
It’s important to note that a blockchain-based data market is not a perfect solution. There are still some challenges that need to be addressed. For instance, it’s not clear how data would be priced in such a market.
The blockchain community would have to be proactive in participating in and developing data markets. Otherwise, they run the risk of being left behind as the centralized platforms continue to dominate.
People will be able to treat their data as an asset — but the right infrastructure needs to be in place first. One way to do this is to develop data wallets that would allow people to control their data and receive compensation for sharing it.
The uPort platform — now split into Serto and Veramo — is one example of a data wallet that is being developed on the Ethereum blockchain. uPort allows users to control their identity, personal information, and data. It also enables them to share this information with others in a secure and decentralized way.
Data quality is imperative
Another challenge is data quality. In a centralized system, data is controlled by a single entity. This means that the data is more likely to be accurate and of high quality.
In a decentralized system, however, there is no single source of truth. This means that the quality of data can vary greatly. Data quality is an important issue that needs to be addressed for Web3 and blockchain-based data markets to be successful.
A potential fix for the data quality issue is to use data curation markets. In these markets, people would be incentivized to provide accurate and high-quality data. The iGrant data wallet platform is one example of a data curation market that is being developed on the Ethereum blockchain.
These are just some of the challenges that need to be addressed. If they can be overcome, blockchain-based data markets have the potential to revolutionize the way that data is collected, stored and distributed.
The next article will cap off our Web3 series, putting everything we have talked about together to see the big picture of how data, crypto, blockchain, and Web3 will shape the internet — and the world — in the years to come. Stay tuned!
Daniel Saito is CEO and cofounder of StrongNode.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!