This is not a rhetorical question: How do you move 2 petabytes of data to a cloud that can analyze it? It’s not practical to move it over the internet — even with an obscenely fast 10 Gbps connection, it would take 19 days. Over more common 45Mbps T3 connections, it’s effectively an impossibility, because it would take 12 years, according to Microsoft’s calculations.
This is the problem the company faced when it was developing some of its intelligent edge offerings. Its solution is Microsoft’s Azure Data Box family, a series of appliances that offer high storage capacities. Most of the Data Boxes require customers to physically ship them to Microsoft so the data can be moved to Azure, which in a sense flips the paradigm of edge computing.
Part of the idea of edge computing is, as Microsoft head of Azure IoT Sam George told VentureBeat in an interview at Build 2019, “It’s often easier to move compute to where data is than to move data where compute is.” With most of the Data Boxes, though, it’s just not feasible to do so — and thus this model is moving data to compute, not the other way around.
The amount of data a company generates varies significantly. In a conversation with VentureBeat at Microsoft Build, Kundana Palagiri, principal product manager for Microsoft Azure, said that over the course of 15 days, a company working with autonomous technologies could potentially generate up to 2PB of data. But the sweet spot for most operations is probably close to 10TB, 35TB, or 100TB over that same time period.
Some companies need to have a Data Box onsite at all times and thus would need to have a rotation of Data Boxes constantly trucking between Microsoft and the company premises. In other cases, a company will ship off their Data Box to Microsoft and be without one for a time. “Sometimes [a company] will be without it,” Palagiri said, in the case of things like “archiving, testing — various scenarios.”
In other cases, the computing and transmitting of data is actually done on the edge. There are Data Boxes for both.
From the Build 2019 stage, Microsoft showcased how grocery chain Kroger is piloting the use of Microsoft Azure services and Data Boxes in two of its stores. Announced initially in January, the program partially is using Azure-powered video analytics and AI to track inventory based on what’s physically on store shelves. Kroger points video camera arrays at its shelves, and using machine learning, it can track in real time how many units of a given product are left. It’s a tremendous amount of data, with intelligence laid over top.
The Data Box family
There are four members of the Azure Data Box family, each with unique traits designed to suit different business needs.
The most endearing, perhaps, is the Data Box Heavy. Its name is precisely descriptive: It’s a box the size of a small refrigerator, and although it’s unclear how much it weighs, it is on wheels, which tells you all you need to know about that. Inside is a bunch of hard disks totaling 1PB of capacity. For context, that’s 1,000 terabytes (TB), or 1 million gigabytes (GB). It has 4 x 40 GbE QSFP+ networking capabilities, which is extraordinarily speedy.
The Azure Data Box is physically smaller than its big brother Heavy, and although it offers a tenth of the storage capacity, that works out to a still quite robust 100TB. The Data Box is also designed such that the customer fills up the storage capacity and physically sends the box back to Microsoft for uploading and return. Though not as extreme as the Data Box Heavy, the networking prowess of the Data Box is significant, at 2 x 10 GbE SFP+.
Both of the big boxes offer AES 256-bit encryption with SMB/CIFS and NFS connectivity.
For a little more flexibility, customers can opt for the Azure Data Box Disk, which is actually five disks. They’re white-label drives bearing Microsoft branding. Each promises 8TB of capacity, so the total capacity per order is 40TB. Unlike the other two Data Boxes, these drives have no networking capabilities, so data has to move over a USB 3.1, SATA II, or SATA III connection. Instead of 256-AES encryption, they have 128-bit.
With all of the above options, you literally mail the boxes back to Microsoft, where the company uploads or transfers the data for you. Then they’ll clear the data from storage drives and send the box back.
The fourth member of the family, the Azure Data Box Edge, is a departure from the rest of Data Boxes. Not only is it designed to transfer data over an internet connection to and from the cloud, it computes on the edge. This is the manifestation of the FPGA-powered sort of appliance that Microsoft announced this week at Build. Data flows from the Data Box Edge to the cloud and back, and the machine performs preprocessing, ML inferencing, and cloud-managed compute. It runs on two 10-core Intel Arria FPGAs and has 12TB of onboard storage capacity.
Like the two larger Data Boxes, it has 256-bit AES encryption, as well as SMB/CIFS and NFS connectivity. For networking, Microsoft equipped it with 4 x 25 GbE SFP+ capabilities.
The other three types of Data Box, though, represent a (partial) physical metaphor for how edge computing works with and without the internet: They collect data on the edge, and that data is sent to Azure, where it undergoes training and inferencing in the cloud, and then Microsoft turns that into business insights. It’s just that in this case, instead of the data transferring from the edge to the cloud over a local internet connection, it has to move from the edge to a new physical location, and then to the cloud over a local internet connection. The Data Box Disk actually has yet another step, because it doesn’t have its own internet capabilities — once the drives arrive at a Microsoft facility, the data has to be transferred from drive to device, and then to the cloud over the internet.
These paradigms speak to the breadth of Azure services and edge computing, as well as to the practical challenges companies still face with using them — and the creative ways Microsoft is approaching potential solutions.