Where in the software stack should we encrypt data?
My answer is, do encryption as low in the storage stack where it works well.
In particular, encrypted data does not compress well. Compression works by looking for and removing redundancy from the data. Effective encryption hides patterns and redundancies in the data, rendering compression ineffective. However, compressed data can still be effectively encrypted.
So if the physical layer can do encryption, that is usually ideal. Self-encrypting drives (SEDs) generate a Disk Encryption Key (DEK) when they are first used. Circuitry on the disk itself encrypts all data using the DEK. Upper layers do not have access to the DEK – however, they can request that the SED generate a new DEK. Moreover, the SED may have a second Authentication Key (AK) which is used to encrypt the DEK – the drive will not be able to access data until the DEK is unlocked using the correct AK. This allows a great deal of security – the disk can not be read without the proper AK, and it can be securely erased by changing the DEK. This solves the problem of how to dispose of drives safely – while hard drives can be degaussed, it is not possible to completely remove data from solid state drives.
Similarly, the block storage system can do the encryption. It might be managing the authentication key of underlying SEDs, but more likely it will do its own encryption, allowing it to work with non-SED devices. If the block storage system also does compression, it will be sure to compress before encryption.
File systems may also do encryption. This will generally incur a bit of a performance penalty on the system running the file system, since even with CPU opcodes for encryption, some CPU overhead is used. If the file system is encrypting data, then there must not be any compression in the lower layers. Some file systems, such as Spectrum Scale, do not encrypt metadata. However, a benefit to encrypting in the file system layer is that data flowing through interconnects to the block storage system, and thence to the physical drives, will all be encrypted. Moreover, Spectrum Scale’s encryption is under the control of the policy engine, so different files can be encrypted under the control of different Master Encryption Keys (MEKs), and different systems mounting the same Spectrum Scale file system may have access to different sets of MEKs. In this manner, encryption can help create a secure multi-tenant solution. Encrypting in the file system layer does not preclude also encrypting in the lower layers of the storage stack.
Finally, an application could encrypt its own data, in which case it is fully responsible for the security of its own data. This might be desirable if different portions of files should be encrypted under different keys, perhaps because the application itself is providing some sort of multi-user feature, such as for a database. In general, creating secure encryption software is difficult to do correctly.