We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Database encryption protects sensitive information by scrambling the data when it’s stored, or, as it has become popular to say, “is at rest.”
There are several methods to generate and apply secret codes, but the end result is to make the data unusable in case an attacker manages to evade the standard defenses and gain direct access to the raw bits inside.
While the basic motivation remains rendering the data unreadable to those without authorized access, the process of encoding has evolved to support a number of different use cases:
- Complete secrecy — The database and all of its contents are locked up to prevent access.
- Partial secrecy — Some of the columns are scrambled to prevent disclosure, but others are left open. All regular operations on the open columns or fields work quickly without impediment, and only the queries accessing the scrambled columns are limited.
- Audit trails — The digital signatures or hash functions can be used to track changes and connect them to the users who authorized them.
- Client-side secrecy — The data is scrambled on the user’s computer before it is given to the database for storage. Often the database or any other code running on the server can’t get access to the information.
- Homomorphic secrecy — Sophisticated mathematical transformations make it possible to analyze the data without unscrambling it.
- Hardware level secrecy — Some applications rely on encryption built into underlying hardware like the disk drives.
The encryption process is a close cousin to the mathematical assurance that makes up the foundation of the ledger or blockchain databases. Digital signatures algorithms used to authorize and guarantee the changes to the ledgers are often developed and supported by the same library. While blockchain databases do not necessarily offer privacy — indeed, all transactions are public — they are often categorized similarly.
How are the legacy players approaching it?
Oracle has been shipping tools to enable database encryption for decades, with a feature they call “transparent database encryption” that’s designed to minimize the difficulty of use. Database administrators can protect entire databases, particular tables, or just individual columns. The keys are stored separately in an Oracle Key Vault, and they’re managed to keep authorized database consumers from having to input them. This is because, the documentation explains, the data is “transparently decrypted for database users and applications.” This automated encryption is a good defense against stolen storage media or attackers who manage to gain access to the raw data stored on disks (that is, at rest).
Microsoft’s SQL Server also supports automatically encrypting data before it’s stored to a hard disk drive (HDD) or solid state disk (SSD), something it also calls “transparent database encryption.” Versions running locally or in the Azure cloud can turn it on. They also have a separate layer designed to ensure that all connections to the database from other servers are encrypted.
Many companies are also relying on encryption that’s added by the file system or the hardware of the disk drive itself. Operating systems like MacOS, Linux, or Windows will support encryption of all files as they’re stored, which also covers the indices and data columns stored by the database software. Adding encryption to the file system will affect the overall load of the server by increasing the time it takes to record the data.
Some drives can now handle the encryption using special chips added to the disk drive. Some are designed to be easily removable, so they might be locked up in a physical safe or moved to a different location for backup.
What are the upstarts doing?
Many popular open source databases like MySQL or PostgreSQL include encryption libraries to simplify implementing encryption. Most of them use established cryptographic libraries instead of trying to create their own. The pgcrypto module, for instance, offers encryption functions that can be applied within SQL queries, and the crypt() function is often used to scramble passwords before they’re stored.
MongoDB added the ability to encrypt their databases at rest to the Enterprise edition. The default relies on AES with 256-bit keys. MongoDB added field-level encryption to secure certain parts of the data stored in the database across all its offerings in December 2019.
IBM isn’t an upstart in the industry, but it is one of the leaders exploring some of the more sophisticated algorithms for homomorphic encryption. The company has released a toolkit for adding fully homomorphic encryption to iOS and MacOS. Microsoft’s Research division is also sharing SEAL, a homomorphic encryption library that supports basic arithmetic. It’s released under the MIT license and is built for linking with .Net and C++ code.
What about governance?
The challenge for managing encryption is keeping all of the keys safe and secure. Access to the data is controlled by the keys, and they should be kept independent of the data when the database is not being used. Extra care must also be taken with the backups, because a lost key can mean that an entire database is rendered unreadable.
Cloud companies are supporting key management by setting up separate services that isolate the keys from the regular computation. Microsoft’s Azure calls its service the Key Vault, and it keeps the keys in “Hardware Security Modules (HSM),” which will store them with an extra layer of encryption. IBM calls its service “Key Protect,” and it also uses HSMs to protect the local keys the database uses.
Is there anything an encrypted database can’t do?
Adding encryption requires a significant amount of computation, and this increases the cost of storing and retrieving the information. In some cases, the CPUs are idle, and the extra cost is negligible. Many desktops and cell phones, for instance, rarely use more than a small fraction of their available CPU cycles. If these devices encrypt the data before sending it to the database, they bear the computational burden, which may be negligible, and you don’t want to overload the central database.
But in other cases, adding the encryption can require stronger database servers and larger clusters to handle the load. Much depends on how the encryption is applied and how the data will be used afterwards. Bulk encryption is built into some hard disks and operating systems, and it’s possible to turn on these features without significantly slowing down the hardware.
The most sophisticated algorithms, like homomorphic encryption, require a significantly larger computational infrastructure. The field continues to be an area of extremely active exploration, and new algorithms can be several orders of magnitude faster than their predecessor, but the performance is still not practical for many applications.
This article is part of a series on enterprise database technology trends.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.