Explosive data growth is driving change across industries and, by extension, the high-performance computing (HPC) sector. As organizations seek to obtain deeper insights from their data, many have realized the urgent need for faster, more reliable IT infrastructure. This has led to a surge in demand for HPC solutions, pushing it beyond science and academia and into the private sector.
Today, businesses in everything from manufacturing to finance use HPC to solve complex data-intensive challenges, sparking a shift from compute- to data-intensive HPC.
While this unlocks game-changing opportunities for organizations, it brings a whole new set of challenges to HPC storage. Data-intensive workloads demand far more from storage than their compute-intensive counterparts. They need high-performing, stable systems that can handle extremely diverse workloads and store massive volumes of data — far more than traditional solutions are capable of. So, how can we address these challenges and accelerate storage in data-intensive HPC scenarios?
Supporting Diverse Workloads
As HPC has evolved, so has the definition of HPC workloads. No longer limited to modelling and simulation workloads, supercomputers now run a range of data-intensive applications, including those used for high-frequency financial trading, pharmaceutical research, and large-scale customer data analysis. Each of these workloads places different requirements on storage — sometimes even within the same application.
Take oil and gas exploration as an example. The first step of this multi-stage process, seismic data processing, consists of large files that require high bandwidth. In contrast, the second stage — seismic data interpretation —relies on smaller files and needs high input/output operations per second (IOPS).
A storage system that can only handle the requirements of one stage must be supplemented by a second solution to support the other. Such complex, siloed infrastructure reduces an organization’s ability to quickly process and analyze data across multiple workloads, highlighting the need for a reliable storage system that supports various applications.
Huawei designed its distributed storage solution for high-performance data analytics (HPDA) — the OceanStor Pacific — to meet this need. Featuring a file system with metadata distribution, targeted processing of large and small I/O, and disk indexing, the solution supports both high bandwidth and high IOPS requirements and meets the needs of new-generation HPC workloads.
Tapping into the Value of Data
While storing data is crucial, the real value lies in analyzing it. Companies that quickly turn their data into insights can gain a significant competitive advantage. HPC plays a crucial role in this; providing a faster, more powerful data processing capability, it slashes the time it takes to perform complex actions on large data sets and accelerates decision-making and innovation.
But that’s assuming data is accessible as quickly as it is processed. For HPC users to extract value from their data without delay, storage performance must match that of the compute and networking components. And that’s where it gets challenging. Processor performance has always tended to increase much faster than storage, creating a performance gap that slows the entire system as the central processing unit (CPU) waits for data to become available. Not only does this waste CPU cycles, it limits the overall analytics efficiency.
This is particularly significant in emerging HPDA scenarios that depend on iterative analysis of large data sets. One example is autonomous driving training. To reach the highly automated phase — known as level 4 — thousands of PBs of data must be continuously processed and analyzed in real-time to train and refine the algorithms that vehicles depend on. The success of this hinges on the performance of the HPC infrastructure; if storage performance lags behind, the entire process will suffer.
To overcome this challenge, next-gen storage solutions for HPC must leverage cutting-edge technologies that increase storage performance to the same level as the CPU. The OceanStor Pacific achieves this with its fully distributed large resource pool architecture that doesn’t centralize access components or modules. Special algorithms ensure service data is stored evenly across the resource pools, eliminating performance bottlenecks caused by a single component or module and levelling storage and CPU performance.
Increasing Efficiency across Multiple Workloads
Complex processes like autonomous driving algorithm training consist of multiple phases, each of which involve different types of data, workloads, and access protocols. Many organizations currently use separate storage systems to provide the file, object, and HDFS services required by these phases. As a result, data must be copied multiple times between devices as and when it’s needed. Of course, this takes time, increases complexity, and wastes storage space — which, in turn, drives up costs.
Autonomous driving isn’t the only industry facing this challenge. Precision medicine and intelligent manufacturing are similarly complex and also require HPC storage solutions that can handle diverse access requirements in various data-intensive scenarios.
Designed to meet these challenges, the OceanStor Pacific supports lossless interworking of multiple protocols, allowing one copy of data to be shared using multiple protocols and eliminating the need to migrate data. This allows for increased analytical efficiency and consistently high performance with zero loss.
Maximizing Storage Density
One of the biggest challenges organizations today face is keeping IT costs to a minimum as data grows. More data requires more memory and physical storage space, meaning that expenses can quickly mount up.
To confront these challenges, storage vendors have long sought to maximize the density of their products. By packing more memory and hardware into a smaller area, we can increase performance and lower power consumption, whilst saving on physical space. Over time, this can contribute to a lower total cost of ownership (TCO) for many organizations, particularly those renting rack space.
Yet, despite its advantages, high-density storage has proven to be an industry-wide challenge. Fitting more disks into a smaller space generates more heat, which, without efficient cooling techniques, runs the risk of overheating. If devices get too hot, their life expectancy is reduced and they may prematurely fail.
Huawei’s solution to this lies in storage design. The OceanStor Pacific series consists of two storage models – high-density capacity and high-density performance. The former houses 120 HDDs, while the latter can accommodate up to 80 half-palm SSDs at a height of 5 U, increasing the capacity and performance densities per unit. Meanwhile, both models benefit from advanced heat dissipation materials and aerospace-grade fans improve cooling efficiency by 30%. Together, this contributes to TCO savings of up to 60%.
Accelerating Innovation with HPC
The shift from compute- to data-intensive HPC brings with it huge potential for scientific, technological, and societal progress. But as it is used to manage increasingly diverse workloads and large data volumes, the pressure on storage will only grow.
For organizations that rely on HPC to unleash the full potential of their data, next-generation storage solutions are key. Investing in them is an important step toward accelerating innovation and getting ahead in an increasingly competitive world.