The term Block storage describes a technology that controls how data is stored on storage devices. The name is derived from the fact that block storage splits any kind of information, such as files or raw disk content, into chunks (or blocks) of equal size.
Block storage is the most versatile type of storage, as it is the underlying structure of other storage options, such as file or object storage. It is also the most known type of storage since most typical storage media (HDD, SSD, NVMe, …) are exposed to the system as block storage devices.
How does Block Storage Work?
Block storage devices are split into a number of independent blocks. Each block has a logical block address (LBA) which uniquely identifies it. Furthermore, the blocks are all the same size for the same block device, and typically only one piece of information can be stored within a single block (as it is the smallest addressable unit).
When an application wants to write a file it is first determined if the file fits into a single block. If this is the case, it’s an easy operation. Find a free / unused block, write the file to it.
If the file is larger than a single block, it is split into multiple parts, with each part being written to a separate free block. The order or consecutive positioning of these blocks is not guaranteed.
Anyhow, after the file is written to one or more blocks, the block address(es) are written to a lookup table. The lookup table is provided through the filesystem that was installed onto the block device and varies depending on the filesystem in use. If you’ve ever heard the term Inode in Linux, that’s part of the lookup mechanism.
Though, when reading the file, based on the filename the blocks and their read-order is looked up in the lookup table, and the block storage reads the requested blocks back into memory where the file is pieced together in the right order.
Block Storage use Cases
Due to the unique characteristics of block storage, it can be used for any kind of use case. Typical simple use cases include computer storage, including virtual hard drives for virtual machines, being used to store and boot the operating system.
Where block storage really shines though is when high performance is required, or when IO-intensive, latency sensitive, as well as mission-critical workloads, such as relational or transactional databases, time-series databases, container storage, require storage. In these cases it’s common to claim, the faster the better.
Database Workloads
A transactional workload is a series of changes from different users. That means that the database receives reads and writes from various users over time. Modifications between different changes need to be atomic (meaning, happen at once or not at all), which is known as a transaction. A common example of transactional workloads are banking systems, where multiple (money) transactions happen in parallel.
Due to the nature of block storage, where each block is an independent unit, databases can optimally read and write data, either with a filesystem in between, or taking on the role of managing the block assignment themselves. With a growing data set, the underlying physical storage can be split into multiple devices, or even multiple storage nodes. The logical view of a block storage device stays intact.
Cloud and Container Workloads
Virtual machines and containers are designed to be a flexible way to place workloads on machines, isolated from each other. This flexibility requires storage which is just as flexible and can easily be grown in size and migrated to other locations (servers, data centers, or operating environments). While alternative storage technologies are available, none of them is as flexible as pure block storage devices.
Other High Velocity Data Workloads
Workloads with high data velocity, meaning rapidly changing data, oftentimes within seconds, need storage solutions that can keep up with the speed of writes and reads. Typical use cases of such workloads include Big Data Analytics, but also real-time use cases, such as GPS tracking data (Uber, DHL, etc). In these cases, direct addressable block access improves read and write performance by removing additional, non-standard access layers.
Block Storage vs File Storage
File level storage, or file storage refers to storage options that work purely on a file level. File storage is commonly associated with local file systems such as NTFS, ext4, or network file systems such as SMB (the Windows file sharing protocol), or NFS.
From a user’s perspective, file storages are easy to use and to navigate, since their design replicates how we operate with local file systems. The present directories and files, and mimic the hierarchical nesting of those. File storages often provide access control and permissions on a file basis.
While easy to use, the way those storages are implemented introduces a single access path, hence the performance can be impacted compared to block storage, especially in situations with many concurrent accesses. It also means that interoperability may be decreased over a pure block storage device since not every file system implementation is available on every operating system.
Typically, a file storage is backed by a block storage device in combination with a file system. This file system is either used locally, or made available remotely through one of the available network file systems.
Block Storage vs Object Storage
Object storage, sometimes also known as blob storage, is a storage approach which stores information in blobs or objects (which explains the origin of its name). Each object has a variable amount of metadata attached to it, and is globally uniquely identifiable.
The object identities are commonly collected and managed by the application that stores or reads the file. These identities commonly are represented by URIs, due to the fact that most object storages (these days) are based on HTTP services, such as AWS S3, or Azure Blob Storage. This means that typical access patterns aren’t available, and that object storages most often require application changes. The S3 protocol currently is kind of a de facto standard across many object storage implementations. Yet, not all (especially other cloud providers) implement it. Meaning that implementations aren’t compatible or interchangeable.
Object storages, while versatile, impact the performance and accessibility of files. The additional protocol overhead, as well as access patterns are great for unstructured, static files, such as images, video data, backup files, and similar, but aren’t a good fit for frequently accessed or updated data.
Block Storage as a Service
In summary, block storage, file storage, and object storage each offer distinct advantages and are suited to different use cases. While block storage excels in performance-critical applications, file storage is ideal for shared file access, and object storage provides scalable storage for unstructured data.
Anyhow, each of those storage types is available via one or more storage as a service offerings. While some may be compatible inside their own category, others are not. Being able to interchange implementations, or change cloud providers when necessary may be a requirement. Incompatible protocols, especially in the case of object storages, paired with the performance impact over block storages makes a basic block storage still the tool of choice for most use cases.
Simplyblock offers a highly distributed, NVMe optimized, block storage solution. It combines the performance and capacity of many storage devices throughout the attached cluster nodes and enables the creation of logical block devices of various sizes and performance characteristics. Imagine virtualization, but for your storage. To build your own Amazon EBS like storage solution today, you can get started right away . An overview of all the features, such as snapshots, copy-on-write clones, online scalability, and much more, can be found on our feature page .
Topics
Share blog post
Tags
Blob Storage, Block Storage, File Storage, LBA, Logical Block Address, NVMe, Object Storage, WorkloadYou may also like:
NVMe-Powered Database Optimization: Lessons from Tech Giants
AWS Storage Optimization: Best Practices for Cost and Performance
Serverless Compute Need Serverless Storage