What is Erasure Coding: A Shield Against Data Loss

Mar 06th, 2024 | 9 min read

What is Erasure Coding?
The Core Principles of Erasure Coding
Advantages of Erasure Coding
Use Cases of Erasure Coding
Challenges and Considerations of Erasure Coding
Erasure Coding and Simplyblock
Questions and Answers

Erasure Coding (erasure code) is a data protection mechanism that protects against data loss by breaking data items, such as files, into fragments, calculating additional data pieces (parity information), and storing them across a set of independent locations or storage media.

For decades, traditional methods like replication have been the go-to solution for protecting against data loss or corruption. In recent years, however, a more efficient and resource-friendly technique has become more prevalent—erasure coding. This innovative approach ensures data integrity, optimizes storage capacity, and reduces the chances of catastrophic data loss. Let us delve into the elements of erasure codes, exploring what they are and how they revolutionize how we protect our digital assets.

What is Erasure Coding?

Like many more commonly known technologies, such as RAID or replication/mirroring, erasure coding is a data protection method. It is a class of high-performance Forward Error Correction (FEC). A simplified explanation would say that it breaks down data into smaller pieces, does some mathematical magic, and writes the pieces to different disks. It doesn’t sound too complicated.

What that really means is slightly more involved, though. Erasure code schemes break down pieces of information, such as files, into fragments (sometimes called chunks), which are enriched with redundancy information (meaning fragments are extended with results of multiple mathematical equations) and eventually distributed across multiple storage nodes and disks.

Unlike traditional replication, which duplicates the entire data, erasure coding allows for more efficient storage utilization. This method employs advanced mathematical algorithms to create parity fragments, which can later be used to reconstruct the original data even if some fragments are lost or corrupted.

The Core Principles of Erasure Coding

While it may sound like erasure coding is the new kid on the block, it was actually invented in 1960 by Irving Reed and Gustave Solomon. Together, they created a new encoding mechanism known as the Reed-Solomon code. Today, this algorithm is widely used in a wide variety of systems, including distributed storage solutions, communication services, and aerospace systems.

These days, while there are many more erasure coding schemes, the three most common ones are:

The Reed-Solomon code is simple and efficient, can be applied to a wide range of applications, and is very common for simple data storage solutions, such as DVD and Blu-Ray disks.
The low-density parity check (LDPC or Gallager code), which is more complex but shows better performance in certain use cases, such as 10GBASE-T (10 Gbit/s Ethernet).
The turbo codes, originally invented by Claude Berrou in 1991, are more complex than LDPC but provide the best performance of data protection to efficiency ratio, and are widely used in mobile communications technologies such as UMTS and LTE.

Anyhow, all the different implementations combine a set of particular features. Storage solutions that utilize erasure coding for data protection most commonly use either a Reed-Solomon or LDPC algorithm.

Data Fragmentation

Erasure coding begins by breaking down the original data into smaller fragments. These fragments are the building blocks that will be distributed across the storage nodes. The size and number of these fragments depend on the specific erasure coding scheme being used.

Parity Creation

Parity fragments (sometimes called coding chunks) are generated using mathematical functions that operate on the original data fragments. These parity fragments are calculated so that any combination of original fragments and parity fragments can be used to reconstruct the original data. This redundancy is the key to the ability of erasure coding to tolerate the loss of information pieces without actual data loss.

Distribution across Nodes

Once the data and parity fragments are created, they are distributed across different storage nodes and disks. This distribution ensures that a failure in one node does not result in the loss of the entire dataset. Each node stores a unique combination of data and parity fragments.

Reconstruction Mechanism

In the event of a node failure or data loss, the erasure coding system can reconstruct the missing or corrupted fragments using the available fragments stored on other nodes. The mathematical relationships established during the parity creation phase facilitate this reconstruction process.

Erasure Coding Profile

Common to all erasure coding algorithms are two specific numbers, called K and M . K defines the amount of fragments the original piece of information is split into, meaning that a K=3 says to split the original object, say a file, into three fragments. M, on the other hand, defines how many parity fragments are distributed. A M=2 means that the parity information is stored on two different systems. In a configuration of K=3, M=2 a storage cluster would need five servers to store the data fragments and parity fragments.

Advantages of Erasure Coding

Erasure coding provides several advantages over more traditional data protection mechanisms, such as RAID or replication.

Optimized Storage Utilization

Erasure coding significantly reduces the amount of storage space required compared to traditional replication methods. While replication duplicates data in its entirety, erasure coding introduces redundancy at the fragment level, allowing for more efficient use of storage resources.

Editor’s Note: Our Erasure Coding Calculator can help you determine the erasure coding overhead.

Fault Tolerance

The distributed nature of erasure coding ensures that the failure of a single storage node does not result in data loss. The original data can be reconstructed as long as the required number of fragments is available across the surviving nodes. This fault tolerance is crucial for systems requiring high availability and reliability.

Cost-Effective Scalability

Traditional replication can become prohibitively expensive as data volumes grow. Erasure coding provides a cost-effective alternative, allowing organizations to scale their storage infrastructure without a linear cost increase.

Reduced Bandwidth Requirements

Transmitting and storing parity fragments instead of full data copies reduces the bandwidth and storage requirements. This is particularly advantageous in scenarios where network bandwidth or storage capacity is a limiting factor.

Use Cases of Erasure Coding

Erasure coding has many use cases, not only in the storage ecosystem but also in communication. UMTS, LTE, and certain satellite communication systems use different erasure code schemes to implement forward error correction.

Anyhow, in terms of storage solutions, next to consumer storage media such as DVD, there are three main storage alternatives that heavily benefit from erasure codes, both in terms of reliability or durability, as well as storage efficiency.

Cloud Storage

Erasure coding is widely adopted in cloud storage environments, where cost efficiency and fault tolerance are paramount. These solutions leverage erasure codes to ensure data durability and availability across many storage nodes or data centers.

Distributed File Systems

Systems like Hadoop Distributed File System (HDFS) and Ceph rely on erasure coding for fault tolerance and efficient storage utilization. It enables these systems to handle large-scale data processing and storage requirements.

Object Storage

Erasure coding optimizes storage space without compromising data integrity, making it an ideal choice for long-term data retention. Object storage platforms, commonly used for archival and backup purposes, benefit from this storage space savings.

Challenges and Considerations of Erasure Coding

While erasure coding offers numerous advantages, it’s important to consider that there is always good and bad news. That said, it has some characteristics that need to be understood.

Computational Overhead

The encoding and decoding processes involve more or less complex mathematical calculations, which can introduce computational overhead. However, advancements in hardware and algorithm optimization have mitigated this concern to a great extent.

Latency

The additional steps involved in the encoding and decoding processes can introduce latency. Organizations must carefully evaluate their performance requirements and select an erasure code scheme that aligns with their needs. Anyhow, the typically distributed nature of the storage using erasure coding commonly mitigates this issue by parallelizing storage requests.

Algorithm and Scheme Selection

Different erasure coding algorithms and schemes offer varying levels of efficiency, fault tolerance, and complexity. Choosing the right scheme requires thoroughly understanding the specific use case and performance considerations.

Erasure Coding and Simplyblock

Erasure coding is a powerful and efficient way to protect data, offering a great balance between storage efficiency, fault tolerance, and cost-effectiveness. As organizations grapple with the ever-growing volumes of data, adopting erasure coding becomes not just a choice but a strategic imperative. That said, it reshapes the data storage landscape, ensuring that our digital assets remain secure, resilient, and accessible in the face of evolving challenges.

That’s why simplyblock utilizes erasure coding for data protection and fault tolerance in our clustered storage solution, enabling “more bang for the buck” in terms of storage efficiency combined with the industry-standard NVMe over TCP protocol. Simplyblock enables logical devices that are high-performance, predictable, low-latency, and cost-effective, available as Kubernetes Persistent Volumes, for the easiest access possible. Not to mention all the other cool features such as compression, deduplication, encryption, thin provisioning, and more; learn more now.

Questions and Answers

What is erasure coding in storage?

Erasure coding is a data protection method that splits data into fragments, adds redundancy, and distributes it across multiple locations. Even if parts of the data are lost or corrupted, the original can be reconstructed. It’s a modern alternative to traditional RAID for distributed storage systems.

How does erasure coding compare to replication?

Unlike replication, which stores full copies of data, erasure coding uses mathematical algorithms to reduce storage overhead. While both improve fault tolerance, erasure coding offers better storage efficiency, especially in cloud-native environments where scale and cost matter.

Why is erasure coding important for modern data infrastructures?

As data volumes grow, erasure coding provides scalable, fault-tolerant storage with less capacity waste. It enables systems like software-defined storage to protect data against disk or node failures while maintaining high availability and performance.

Is erasure coding suitable for Kubernetes workloads?

Yes, erasure coding works well with containerized environments. With platforms like simplyblock, it’s possible to combine erasure coding with Kubernetes persistent volumes to ensure durability, especially for stateful apps handling critical data.

What are the advantages of erasure coding over RAID?

Erasure coding offers greater flexibility, supports higher fault tolerance, and is more efficient at scale than traditional RAID. It’s particularly valuable for modern storage backends using NVMe over TCP and cloud storage architectures where resiliency and efficiency are key.

Topics

All Posts Cloud Knowledge

Simplyblock

Supported Environments

Use Cases