Storage operations, such as compression, tiering, and replication, may happen throughout the “storage stack”, and in some cases the same general operation could happen in more than one layer of the stack. When there is a choice, choosing correctly can make a big difference in performance and scalability. But before I talk about these different operations and choosing where to do them, I need to present the main layers of the storage stack.
At the “bottom” of the storage stack is the physical media. This might be traditional hard drives (HDD), solid-state drives (SSD), or newer technologies such as NVMe or SCM. Some properties of the storage system will be determined by the choice of storage media. Usually we will call the components of physical media the “drives”.
Next up is the “block storage” layer. Physical storage media is raw block storage, but most modern storage systems will layer some higher-level block storage abstraction over the physical media. The simplest is to partition a drive, effectively getting several block devices from a single physical storage device (perhaps a single HDD). Another common operation is to use some form of RAID to combine several physical drives into what appears to be a single drive, with better resilience or performance than the component drives. RAID volumes can also be partitioned, and block storage systems can perform advanced storage virtualization operations such as tiering, compression, replication, and encryption. Usually we will call the components of block storage “volumes” or “LUNs”.
Next layer up is the file system. This might be a simple unshared file system such as XFS or NTFS, formatted onto a single volume. Or it might be a shared clustered file system such as GFS2, formatted onto a single volume. Parallel file systems, such as Spectrum Scale, can stripe a shared clustered file system over many volumes. File systems frequently can also do advanced storage operations, such as snapshots, compression, encryption, and maybe even tiering. File systems are presented to applications as trees of directories and files.
At the top of the stack is the application itself. Usually we consider the application the consumer of the storage – but applications may themselves do operations on data that could have been done by one of the lower layers. For instances, an application might do its own data compression or encryption. Database systems often are capable of doing replication themselves.