This article is part of the Technology Insight series, made possible with funding from Intel.
For the last two years, we’ve been writing about Intel Optane persistent memory, knowing that scores of evaluations and pilot projects are underway in enterprises around the world. But quotable case studies have been rare. That changed recently when Intel publicly offered examples of Optane’s real-world suitability and value.
During a high-profile event, Boeing revealed its production data center deployment of Intel Optane persistent memory via Oracle’s Exadata X8M database servers. Not long after, VentureBeat snagged an exclusive meeting with Maruti Sharma, Boeing’s chief architect for digital common services and an associate technical fellow within the aviation giant. Sharma manages Boeing’s enterprise databases and other data management services. He’s the ideal person to take us inside this project and reveal the hands-on details of an enterprise-scale Optane deployment.
- Boeing’s thousands of Oracle databases were foundering on outdated commodity servers, while data demands kept increasing.
- The company decided to consolidate nearly 100 database and storage servers onto only three racks of Oracle Exadata X8M systems supporting Intel Optane persistent memory.
- The move helped drive performance improvements of 2x to 10x in Boeing’s database operations.
Boeing manufactures on both U.S. coasts, which helps drive the company’s need for cloud-based data management. All told, Boeing runs over 21,000 databases, approximately 5,000 of which are Oracle. Given how tempting it is to deploy databases with dedicated server stacks, it’s easy to imagine sprawl and redundancy accumulating over time. So, as in many enterprises, reducing the data center server footprint is one of Sharma’s top objectives.
Not surprisingly, given the huge number of databases, Boeing had an ocean of information it could barely sip. An example:
“Every commercial flight is spitting out tons of data, most of which Boeing hadn’t even started processing,” Sharma explains. “We need a lot of infrastructure to process that data, so we can mine useful information to grow the business. So if a part is going bad, the plane can relay that information to the airport where the flight is going to land. Engineers and mechanics would be ready with that part. So time-to-service is lower. The mechanics spend less time maintaining that plane. And the airline wants the plane to be in the air as much as possible. Better data management means better value for our customers.”
Whether addressing data from the air, factory, or supply chain, Boeing saw a growing need to handle database processing in real time. The company’s workloads often resembled online transaction processing (OLTP) datasets, similar to those used by financial institutions and decision support system (DSS) datasets, which are typically read-only. Boeing often handled both types concurrently in mixed-workload scenarios, especially when combining with other datasets coming in from supply chain partners.
One of Boeing’s database projects involved an operational data store (ODS) running Oracle real application clusters (RACs) on commodity systems. Including dev and test systems and the production environment, the ODS infrastructure consumed roughly 100 servers. Each cluster server ran OLTP and DSS workloads with 1TB of memory and more than 1PB of aggregate storage. As Sharma describes it, cluster performance was “not at par”, and the hardware had reached its end of life.
New tech for new possibilities
Boeing sent out several RFPs and ultimately examined three options: an HPE solution, commodity hardware based on the latest-generation Intel processors, and Oracle’s Exadata Database Machine X8M. One of the latter’s primary advantages, Sharma says, was its incorporation of 1.5TB of Optane persistent memory for each server. Optane’s DRAM-class latency and higher-than-DRAM capacity points made it an ideal fit for Oracle’s OLTP-type workloads, which benefit from very fast access to small chunks of data from cached storage. (In the X8M’s architecture, persistent memory occupies the first of three automatically managed storage tiers, followed by NVMe-attached NAND and finally hard disk storage.)
Boeing’s commercial operations gathers all the company’s manufacturing and supply chain data. It’s then aggregated into a data store, where factory floor transactions are integrated with supply chain data. Boeing uses the results of those integrations to make informed decisions on what inventory is required for each specific plane.
“We need to get data from storage faster and keep it closer to compute,” explains Sharma. “How much needed data can reside in memory? When data is closer to compute, how fast can I process it? With the introduction of persistent memory and 100 gigabit-per-second RDMA over converged Ethernet (RoCE) network fabric — which lets nodes request data directly from PMem rather than going through the entire stack — we saw an opportunity to eliminate most of the latency.”
To be clear, Optane “PMem” does not replace DRAM. Rather, it serves as high-capacity volatile system memory while bumping DRAM into a high-speed caching role. Or it can act as a super-fast, non-volatile (persistent) cache for storage. Boeing mainly employs the latter, via Oracle’s data accelerator functionality.
However, Boeing has a second use in “persisting” (saving into non-volatile storage) Oracle redo logs — a necessary step before a transaction can be committed. Redo logs normally get persisted to conventional SAN storage, which introduces substantial latency. That step previously accounted for a lot of Boeing’s lag, especially since the group’s redo lot sizes average around 24GB. Trying to persist that amount of data frequently adds considerable process delays.
Thanks to Intel’s App Direct Mode for Optane persistent memory, and Oracle’s existing support for App Direct in its platform, Boeing could address both volatile and persistent models simultaneously, says Sharma.
From deployment to results
After months of extensive testing, Boeing deployed its new Exadata servers into production in June, 2020. Teams consolidated nearly 100 commodity servers down to only two Exadata racks with eight servers each, and a pair of half racks divided across two data centers.
“Overall, based on how much value we could get from any of the options, the Exadata with persistent memory stood out most,” says Sharma. “It was integrated with other Oracle internals, like Oracle Linux and GoldenGate, that we use heavily to bring in data from our OLTP environment.”
According to Sharma, Boeing encountered no issues during deployment or even a need to adapt its software to accommodate persistent memory, as Oracle had already done this work with Intel. The only extra labor arose from Boeing’s policy against third-party racks into its data centers. As a result, Oracle had to re-rack its Exadata systems into Boeing’s own racks over the course of several days.
In the months since initial deployment, Boeing reports 2x to 10x productivity gains by switching to the Exadata X8M platform. The biggest database operations improvements over the previous commodity infrastructure came from bolstered redo log performance, Sharma says, adding that other examples abound.
“When we run our batch processing, multiple jobs run overnight. Workloads that consistently used to take 14 hours now take about two hours. This really matters because of various work shifts coordinating across time zones. It becomes very challenging when a shift starts and needs results from another group that hasn’t finished. With jobs finishing faster, our shifts can make better decisions.”
Going forward: Doing more with fewer resources
Despite its considerable server consolidation, Boeing says it still has ample capacity left in its new Exadata solution. This opens the door to taking on more workloads from other tasks or groups. Sharma expects container technology to play a role in further consolidation, allowing engineers to cleanly separate, say, manufacturing workloads from the supply chain, engineering, or analytics. Containers could also help with compliance in Boeing’s government operations, he adds.
Beyond consolidation and data isolation, the company says it can now maintain and manage workloads with fewer resources. For example, rather than needing separate administrators to manage different layers of the solution (storage networking, etc.), one consolidated team now can manage the entire stack. Sharma says this becomes doubly important because Boeing’s Oracle databases run on a mix of Linux, IBM AIX, HP-UX, and other operating systems. Having one standard platform reduces spending on resources and infrastructure. Again, it’s about the efficiencies of consolidation.
“There has always been a race between the different components of the infrastructure stack,” Sharma notes. “These advances in compute and persistent memory allow customers like us to process more data in a timely fashion. Data is exploding, and so is the demand for storing, retrieving, and processing the data set. These innovations will give us more leverage in consolidating workloads and doing more data analytics locally. Especially with container technology onboard, we can bring in petabytes of data for processing in one location and help drive the business.”