High Bandwidth Flash: A New Memory for AI Data Centers and Edge Computing

By Alper Ilkbahar, CTO at Sandisk.

Thursday, 21st May 2026 Posted 1 month ago in by Phil Alsop

Artificial intelligence is on a relentless march across the computing landscape. While about one in seven data centers today is equipped to host AI workloads, that’s expected to approach 70 percent by 20301. AI is migrating from hyperscale to enterprise data centers and out to the network perimeter, where edge AI applications are projected to generate nearly $66.5 billion by the end of the decade2. The fuel for the new computing era is data — staggeringly large volumes that must be fed at high speed to demanding and rapidly scaling AI computing infrastructure.

These vast content repositories are overwhelming conventional storage structures and bring an inherent architectural weakness into sharp relief. Data center memory (DRAM and specialized high bandwidth memory known as HBM) is increasingly struggling to keep pace with the growing demands of large AI models in terms of density, storage capacity and scalability. At the same time, hyperscale computing manufacturers are contending with rising DRAM and HBM production costs, design complexity and energy consumption. The challenge is even more daunting in enterprise data centers and edge AI applications where a proportionately smaller physical footprint renders them ill-equipped to absorb rising memory costs and power use.

And there is another pressing issue introduced by AI inference, which is now the dominant AI workload and has different data management requirements than AI training. Inference stores large – and growing – AI models, and HBM and DRAM-based memory have shown they lack the capacity and cost scalability to keep up with these new demands. Given these distinctly different memory characteristics, an opportunity exists for a memory technology optimized specifically for AI inference.

Why DRAM and HBM Underserve AI Inference Workloads

To understand why DRAM and HBM alone are suboptimal for long-term AI deployment consider the following drawbacks3. These began as small fissures but if uncorrected will expand over time to undermine the foundation of next-generation AI-centric storage.

Density Penalties: DRAM capacity scaling has stalled while the need for higher capacity to address AI inference is growing.3

Mismatched for AI Inference: The advantage of DRAM's low latency and random access characteristics are not relevant for AI inference, where access patterns are deterministic and more latency tolerant thanks to techniques like data prefetching.3

Attributes of an Optimized AI Inference Memory Architecture

These fault lines run beneath a $120 billion DRAM industry4 that is eager to retain its hold on the data center given that spending by hyperscale providers on AI infrastructure could reach $6.7 trillion by the end of the decade5.

What if it’s time to make a clean break and design a new memory from the ground up that meets the needs of the application, rather than the other way around? An AI-tuned storage-class memory would have the following attributes:

● Larger and scalable memory capacity provisioned for inference workloads

● Higher memory density (GB/mm²)

● High bandwidth to meet the requirements of AI inference

● Lower system-level power consumption

● Cost-effective metrics ($/TB)

High Bandwidth Flash Takes Aim at the AI Data Center

High Bandwidth Flash (HBF™) is a disruptive new memory architecture, purpose-built to drive the next generation of AI computing. HBF meets the capacity, energy, throughput and scalability requirements of advanced computing and data-intensive applications. Compared to HBM, HBF provides higher capacity and memory density with comparable bandwidth that better aligns with AI inference trends. As a persistent storage medium, HBF also retains data when power is lost and is thermally stable, supporting high operating temperatures6.

To realize these advantages, HBF leverages Sandisk’s BiCS design and manufacturing technology and die architecture, which effectively redesigns NAND flash by optimizing for high bandwidth and inference memory characteristics. The use of BiCS CMOS bonded array (CBA) wafer technology further enhances energy efficiency and bandwidth.

HBF Reimagines NAND Flash for AI Applications

Compared to conventional NAND flash, HBF’s use of parallelism, advanced logic scaling and custom stacking techniques helps deliver lower latency and significantly higher read bandwidth, enabling large language models to stream data at near-DRAM speeds6.

HBF also includes support for large KV caches to efficiently handle long, complex user prompts and customer- and domain-specific data that helps improve AI inference accuracy.

Extending Memory-Centric AI to the Enterprise and Network Edge

Because HBM is not generally available for use in edge and mobile environments due to density, cost and power penalties, the value of larger memory capacity for handling more complex AI inference problems is realized with HBF. This opens the door for edge devices like smartphones that are capable of making real-time decisions to manage a variety of sophisticated tasks. Thanks to its persistent memory, HBF supports the ability to seamlessly retrieve old context from previous queries to solve new problems.

The advantages of HBF extend to enterprise-level computing, where the user base is much smaller than hyperscale data centers and large GPU clusters supported by HBM are too costly. By adopting HBF-enabled accelerators, smaller enterprises can potentially fine-tune large, pre-trained models for domain-specific uses.

Optimized Memory Removes Obstacles to AI Computing Growth

All around us, data centers and edge AI devices operate autonomously, supporting tasks that range from tonight’s dinner recipe to groundbreaking scientific discoveries. Routine tasks like website hosting and enterprise data management are giving way to intelligent workloads that generate actionable insights using machine learning, deep learning and data analytics.

It’s time to reconsider how data center and edge memory are provisioned to manage large-scale inference models that make predictions and generate outputs. Compared to HBM, HBF has a clear capacity advantage while delivering the high throughput required by AI inference applications6. As a scalable new system memory technology, HBF helps reduce performance bottlenecks and accelerates time-to-insight for AI applications in modern data centers and edge networks alike.

###

References

1B. Srivathsan, M. Sorel, P. Sachdeva, with A. Bhan, H. Batra, R. Sharma, R. Gupta, and S. Choudhary, McKinsey & Company, “AI power: Expanding data center capacity to meet growing demand,” (Oct. 2024)

2Grand View Research, “Edge AI Market Size, Share & Trends Analysis Report By Component (Hardware, Software, Services), By End-use Industry (Consumer Electronics, Smart Cities, Automotive), By Region, And Segment Forecasts, 2025-2030”

3S. Legtchenko, I. Stefanovici, R. Black, A. Rowstron, J. Liu, P. Costa, B. Canakci, D. Narayanan, X. Wu, Microsoft Research, “Managed-Retention Memory: A New Class of Memory for the AI Era,” Cornell University (Jan. 2025)

4Fortune Business Insights, “DRAM Market Size, Share & Industry Analysis…” (Feb. 2026)

5J. Noffsinger, M. Patel, P. Sachdeva, with A. Bhan, H. Chang, and M. Goodpaster, McKinsey & Company, “The cost of compute: A $7 trillion race to scale data centers,” (Apr. 2025)

6HBF Fact Sheet, Sandisk, “Sandisk Unveils The Future Of Memory Architecture For AI Introducing: High Bandwidth Flash,” (July 2025)