January 27, 2022

Erasure Code: RAID As It Should Be

                   

Erasure Code (EC) offers many advantages over traditional RAID technology. The main difference is that EC constitutes an abstraction of the underlying hardware platform. This eliminates many of the existing limitations of traditional RAID systems.

EC involves forward error correction and comes from the field of communication. The aim is to eliminate data-transfer errors on the recipient side. This prevents frequent retransmission, reduces latency and, in turn, increases bandwidth. Put simply, and without getting too bogged down in the math, data packages are split up and coded using functions. These coded blocks are then transferred with the redundancy specified in the algorithm. If individual blocks fail to reach the recipient, the data can be recovered from the received blocks.

If individual blocks fail to reach the recipient, the data can be recovered from the received blocks.

Applied to storage, this means you code the data as mentioned above and divide it up onto several physical disks. You can see straightaway that it’s the efficiency of the algorithms that counts.

And it is precisely here that Huawei’s expertise lies. After all, Huawei’s roots are in the communications sector. For Huawei’s elastic EC, that means 91.6% utilization for physical storage. Ceph achieves just 33% with three comparable copies, and only 66% with a virtual SAN implementation (4+2).

RAID dates back to the late 1980s. The aim was to augment hard disk storage with more affordable small disk drives, increasing availability and performance at the same time. The principle involves combining multiple hard disks or SSDs and typically storing data redundantly on this media.

The various implementations were defined by means of RAID levels, which were differentiated by performance and redundancy. RAID levels 5 and 6 are often used. RAID 5 provides simple redundancy, i.e., a drive could fail in the disk array but there would be no loss of data. RAID 6 includes two disks – in other words it provides double parity. The disadvantages of RAID systems relate to hardware affinity. All disks should be the same size. These RAID groups are usually combined in storage systems to create storage pools. This should, in turn, ensure all included RAID groups have a similar performance, which means they use the same number of disks.

Systems should always be extended with entire RAID groups to prevent the risk of hotspots. Here, only individual disks are used in the storage pool and performance drops dramatically. This poses major challenges for storage management. Additional problems arise when the used disks can no longer be accessed. You then have to reorganize entire storage pools or storage systems and copy terabytes of data. The increasing size of hard disks and SSDs on offer is another challenge. If one of these disks fails, you have to restore the data from the existing media in the RAID array. With sizes of 16 or 32 TB, this can take days, resulting in reduced redundancy and performance.

Erasure code can address the disadvantages of RAID. Systems can also be extended with individual disks, and data is distributed so as to prevent hotspots. If disks fail, the data is restored in a record-breaking 15 minutes/terabyte. Compute capacity is available in smart enclosures and numerous disks are involved in the recovery. This means virtually negligible performance losses for the live workloads.

EC allegedly has a high compute requirement. This is clearly apparent in certain applications, such as in virtualization environments. The Huawei elastic erasure code was adapted specifically to address this issue. Certain functions were implemented in hardware and the algorithms optimized accordingly. A 0.05-ms latency and 21 million SPC-1 IOPS in the Huawei OceanStor Dorado storage systems reflect the impressive performance.

Erasure code therefore overcomes the disadvantages of traditional RAID systems. It offers far higher capacity utilization and much more flexibility. This makes it easier to run the storage systems and minimizes the administrative effort.

Until next time!

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *