Data platform Fluree pursues validated data management across the web

Fluree is a relatively new data storage company promising a “data-centric” approach. Database companies have been focusing on data forever, so it seems fair to ask how Fluree can improve on the old tactics and ensure that data management software is more robust and ready for the next set of data.

Fluree is taking a novel approach, as under the hood resides a blockchain-backed semantic graph database. The Fluree crew has also taken cues from GitHub development methods as it builds out a platform intended to secure data sets in a distributed manner.

VentureBeat sat down with Brian Platz, one of the founders and co-CEOs, to understand what Fluree's approach brings to market.

This interview has been edited for brevity and clarity.

VentureBeat: What is Fluree’s plan for data-centric computing?

Brian Platz: Fluree is a data platform, sort of a database plus plus, if you will. It's really focused around some of the modern needs we have specifically to manage data. But ultimately what we end up helping to deliver is what we call data ecosystems, where data needs to connect to other data and other parties.

VentureBeat: So you have a data management ecosystem. How is that different from a storage place?

Platz: Once you have an ecosystem, you're in the realm of collaborating. You can't really collaborate around data with a shared storage place. You might be able to have everyone pull the data, but not much more. There are a lot of capabilities that need to come into this kind of world, and Fluree implements those beyond a typical database. One capability is that data needs to be able to defend itself.

We need to put permissions into the data tier, as opposed to having them sit inside of the application. We have to do things like lock in time for the same reason that software developers couldn't contribute around source code without Git and GitHub.

Similar capabilities are needed to collaborate around data sets. We need semantics, we need a common language so that data can be transferred and understood without developers having to write one-off ETL processes. We need trust. We need anyone receiving the data to have the equivalent of what we get. You know when we're in a web browser and we get the little lock icon that knows that we're talking to Google and the data hasn't been tampered with? Machines need that same thing. Those are some of the capabilities that we layer on top that go well beyond just a common storage place.

VentureBeat: That’s quite a list. Let's unpack some of those. So you talked about 'permissioning,' does this mean role-based access? Perhaps row-level locking?

Platz: Even more granular. In fact, even more granular than cell level. Our locking mechanism can implement logic. We call these smart functions, and they’re stored with the data itself. What makes them more powerful than a typical kind of 'permissioning' paradigm is that, because it's in the data and it's part of the data, it can leverage the data in the rules. So you can dynamically have the ability to update this invoice in the supply chain system because you work for the company and have this role in the company and the company is the one that issued the invoice. But I can't update anyone else's invoices. The data can actually express those permissions in a very powerful way.

VentureBeat: Could you even do things like grant access if the invoice is less than $100,000? Then it might need two signatures. If it's more than $100,000, it needs three signatures?

Platz: Absolutely, even at cell-level. If it's less than $100,000, I can see the data. But if it's more than $100,000, the data doesn't even exist for me. I can try and do whatever I want, but I will never see that information.

VentureBeat: You talked about Git-level version control. Does that involve committing and creating different branches and things like that?

Platz: Absolutely. In our language, a database never changes. Every time you update a database, you have an entirely new database. Every historical database exists and you can query them at any time. That's exactly like Git. You have the full history, a full provable history of every change. You can get to any version of history if you want and you can actually branch for any point and have branches come up and be able to merge. We can do the same thing with data.

VentureBeat: Do you end up just using a lot of the Git algorithms and the get code? Or have you implemented and even gone beyond some of their algorithms?

Platz: We’ve implemented the idea that you can embed granular security rules in it. So if you have access to write to the repository, you can write to the repository. Now, other people can view it or approve it. But in Fluree, you can express the granular data that you have the ability to change and how you are able to make those changes. So it can really defend itself.

And actually it's leading into a very exciting product that will have a data hub. But not like a data hub where people are just publishing public data sets. Instead, the data hub might have a group who controls the meteorologists that are trusted on an annual basis to update weather records that anyone around the globe can use. Anyone can validate where it came from. They can validate who put that data in there. So you have these global data sets, but you can control the entities that have the right permissions. The cryptographic proof is there for anyone to validate.

VentureBeat: You’ve mentioned the tables are ledgers, which I guess is a kind of standard chain of signatures, right?

Platz: If you're going to compare it to blockchains, the difference is that blockchains are usually storing the current application state, like a currency balance. We're storing basically any sort of data you want. We actually use the W3C RDF format. Our triple store is a very granular way of representing information. Every sort of block or every new entry in the ledger not only has the cryptography wrapped around it -- the data inside of it is just simple expressions of deltas of data changes. So you get this log of provable data changes.

VentureBeat: You can walk the changes back? I've been on these projects in the past and so many problems emerged from the version of the database filled with test data. The code was easy to track, but not the data. It was very easy to use the blame function to figure out just what was going wrong with the code and who did these terrible things. But the database, which actually contained a fair amount of logic, caused a lot of trouble. We had no way of tracking any of that down. Figuring who messed it up. The test clusters just booted up with some random version of the data.

Platz: Part of the philosophy here is that companies need to be far more strategic with their data to remain competitive than they are today. And part of that is going to require them to think about their data more independently than from an application. Right now people have 500 software-as-a-service apps inside a company. They're duplicating data. They’re investing in technologies to try and combine them to get insights. Well, if you can actually start with a well-structured and governed set of data, apps can wrap around that.

VentureBeat: And the data lives on its own, independently?

Platz: We see a world that hopefully isn't too far away. Hopefully, in my lifetime that sort of flips the script. Right now, we take our data and we send it to the app. You might send your sales prospect to Salesforce.com. We should be able to build and manage and organize and store our secure data sets anywhere we want.

VentureBeat: Like the push to split apps into microservices and put those in containers?

Platz: Exactly. We have the apps containerized. Now we can move our data because our data is expressed with these common schemas. Any app that understands these schemas should be able to read and write to them and we shouldn't have to move our data anywhere. We shouldn't have to build data lakes and we shouldn't have to build APIs. That's the world we see us gradually moving towards

When you express data in these globally unique ways, you can combine different data sources. So yeah, you get away from the need to even have all your data in one place to query it. You can actually express a query that joins multiple data sources together as though it was one. There are sets like the wiki data that already exist and adheres to all these standards. So you can dynamically combine them all.

VentureBeat: And there can and will be marketplaces?

Platz: Fluree is doing work with the Department of Education and some universities on this right now. They set these semantic standards. Of course, we've seen Google and Microsoft pushing these for people to embed semantic data into web pages so they can get better search results typically in the form of schema.org. Once you have that ability, you can create your own sort of schema or ontology or language. Fluree will just go right in and start transacting schema.org data. Anyone who understands that vocabulary can look at your data and integrate with it. No one needs to translate it into different views or anything like that.

VentureBeat: It’s not just the format, right? You’ve started to integrate this with the ledger and the other cryptographic proofs.

Platz: Some of the stuff we're doing works with verifiable credentials. It really takes this idea of these decentralized data nodes to an extreme. To the point where some of the decentralized data in your database is sitting on somebody's phone, in the form of a credential. This is the work we're doing with the Department of Education. They want cryptographically verifiable credentials for transcripts and degrees.

VentureBeat: And there must be other applications for credentials.

Platz: Some of the states are now starting to issue digital driver's licenses In fact, Apple at their Worldwide Developer Conference started to introduce support for representing things like driver's licenses in their Wallet app. So, we're gonna start having people with all of this very great control over their own data, that might sit in their app like a credit card would, but being able to selectively disclose it. If I'm applying for a job, or applying for a mortgage, or just want to share it with somebody, it's all interconnected.

More