Imagine a functional, live, operational database living on separate servers scattered around the globe in say, San Francisco, London, Dubai, Moscow, and Johannesburg, simultaneously. Not synced, replicated, or cloned — but a single database without a single location.
Now tie up your jaw.
Actually, no imagination is required, because today TransLattice announced the world’s first geographically distributed relational database. The company calls it TED, or the TransLattice Elastic Database.
“For the last 15 years, people have been trying to solve the distributed database problem,” TransLattice vice president Louise Funke told VentureBeat. “This is unique … it’s truly one database with data in multiple locations.”
It’s the perfect solution for companies that absolutely must remain live and functional — no matter what happens to electrical power generation in northern Virginia. More on that later.
The Santa Clara company’s goal was to create a database that can live in multiple locations simultaneously but does not require any one of them.
It’s the ultimate hydra, with many heads. Or, if you prefer, it is a database modeled after the internet itself, designed for ultimate survivability with multiple nodes, self-healing capacity, and no single vulnerable center.
How does Elastic Database work?
TransLattice, founded in 2007, only released its first product last summer — the first years were spent solely working on the math and building the technology. TED is built on a foundation of PostgresSQL, but as you would expect, the company has made major modifications to how the system maintains consistency.
“At a high level, we’re breaking up the data — sharding it — and breaking it up according to three things,” Funke told me last week.
The first is policy — in some cases an enterprise cannot permit data to leave a particular country or region. The second is history — how often a particular set of data is accessed, and where it is accessed from. And the third is randomness — the data is never just confined to a single disk or node.
Data in TED is always on multiple nodes for both reliability and robustness as well as accessibility. But it is not on all nodes.
At its core, the Elastic Database relies on a set of algorithms called Global Consensus Protol, which commits thousands of transactions simultaneously and ensures that data is consistent and accurate throughout the entire set of locations (currently up to 12 nodes, and soon to be more).
“People have said you have to give up consistency in order to scale geographically, moving to solutions such as noSQL,” says Funke. “But we don’t give up consistency.”
In fact, Elastic Database is fully ACID-compliant, just as a traditional relationship database management system. (ACID is a model describing the properties of reliable database standing for atomicity, consistency, isolation, and durability.)
But what about latency?
The immediate concern with a geographically diverse database is, of course, latency: How big can the database get, how active can it be, and how many nodes can you have before the communication between nodes that is necessary to ensure consistency overwhelms the communication between application and data that the database is intended to serve.
In other words, when will internal communication destroy usefulness?
“No single node has a full copy of the data,” Funke says, “but we maintain copies of the data between the nodes.”
That helps on the one hand: Pre-positioning data when and where it’s most likely to be needed speeds database response. And it hurts on the other: When distributed data is updated in one location, it must be updated in all locations.
TransLattic says it has solved those issues.
“Our system is very unique — we monitor communications between nodes to understand bandwidth and usage,” Funke told me, “so that when we make data placement decisions, we place it where it’s being used. We can place it for optimal performance.”
Actual multi-continent installs … today
The announcement today is not just a product in principle. TransLattice has real revenues and real customers using the technology today, Funke says. Getting names and contact information is a challenge, however, because the types of customers using this are not very forthcoming — about anything.
“We have a lot of public-sector clients in the defense and intelligence industries,” Funke said.
The scenario she outlined for me is classic: Imagine you are the U.S. military. You need certain datasets as close to the battlefields as possible. So you plunk down a server in Afghanistan.
It needs to move from time to time, and maybe it gets blown up occasionally. But that doesn’t matter, because you can reconstitute local nodes easily.
New nodes from bare metal in 20 minutes
If that server in Afghanistan does, in fact, get blown up, local techs can resume operations incredibly quickly.
“Customers can very simply add new nodes to their clusters,” Funke says. “If one fails or you suddenly need to set up a temporary operation, you simply can take a node and plug it into your network.”
The other nodes will recognize it immediately, and within minutes, TED is populating it with data and users are accessing information from all across the database.
“You can get a new node running in 20 minutes from bare metal, which is unheard of,” Funke told me.
Cloud yes, high costs no
Given that TransLattice’s database seems to run much like a geographically-distributed cloud, I wanted to know whether it can be run on other companies’ clouds.
TransLattice says it can.
TED can be run on virtual machines, it can be run on Amazon EC2, or it can be run on a TransLattice appliance. Or it can be run on all of them simultaneously.
“You can use whatever mix of hardware you want,” Funke says. “The nodes do need to be relatively similar in size and I/O capacity and compute capacity.”
And the costs, according to the company, are low — as much as 60-80 percent less than a comparable IBM DB2 solution or an Oracle RAC solution. Except of course that those do not come with the ability to run your database on both sides of the Atlantic at the same time.
In addition to military and security customers, TransLattice is seeing considerable uptake in the financial sector. Businesses are more geographically diverse today, Funke told me, and many are dealing with regulatory environments that do not allow the transfer of private user data out of country.
TransLattice enables that with protocols for where data can live.
In addition, however, many financial organizations have datasets that are too large to replicate, making it harder for the organization to give all parts of the company access. Instead they silo the data, limiting the company’s ability to effectively use the data for business intelligence and strategy.
Elastic Database, according to TransLattice, solves that problem as well.
The question will be, now that TransLattice is publicly available: Do the advantages of regionalization outweigh any of the advantages of tradition databases from Oracle, IBM, and others?
Image credit: Toria/ShutterStock
Free Big Data daily tracker. Understand your entire ecosystem, monitor the key players, and track deal flow in just 5 minutes. Sign up here.