RRAM-Based FPCA Architecture for Flexible and Efficient High Performance Computing

APPLICATIONS OF TECHNOLOGY:

High performance computing
IoT

BENEFITS:

Reduced latency and high throughput
Improved performance
High accuracy computations
Compatibility with various memory devices supporting in-memory computation

BACKGROUND:

Resistive random-access memory (RRAM) devices have garnered significant interest because of their potential to perform multiple simultaneous operations over a large number of processing units through in-memory computations. RRAM devices are attractive due to their small footprint, low power consumption, excellent stability and re-programmability, making them suitable for field-programmable crossbar array (FPCA) architectures. Thus, a RRAM based FPCA offers superior programmability due to its flexibility and ease of programming.

However, challenges such as long-distance routing, device/crossbar irregularities, and high-power analog-digital interfaces have been identified as performance bottlenecks in this architecture. An FPCA architecture that overcomes these bottlenecks, leading to superior computational performance, is presented.

TECHNOLOGY OVERVIEW:

Scientists at Berkeley Lab have developed a memory-centric, reconfigurable, general-purpose computing platform capable of performing analog, digital, and memory functions while efficiently handling explosive amounts of data and performance bottlenecks. The architecture utilizes reprogrammable FPCA with crossbar layouts designed for efficient and massively parallel computing and data storage tasks.

RRAMs are used as the primary memory element in this architecture, but the design is flexible enough to accommodate other memory technologies. Furthermore, the high programming cycles of RRAMs were reduced, leading to a high-throughput, low-latency system; specifically, the crossbar programming time was reduced by 17× for in-memory addition and by 8.5× for in-memory high-accuracy multiplication compared to state of the art technologies. The in-memory high-accuracy multiplication method has led to a significant reduction in the latency and addition operations compared to implementation on state of the art field programmable gate arrays (FPGAs). The developed architecture improves peak neural network training throughput by 81.3× compared to that of state-of-the-art crossbar-based accelerators.

More Information:

H. Veluri and D. Vasudevan, “BMX-FPCA: 3D Beyond-Moore Flexible Field Programmable Crossbar Array Architecture,” 2024 25th International Symposium on Quality Electronic Design (ISQED), San Francisco, CA, USA, 2024, pp. 1-9, doi: 10.1109/ISQED60706.2024.10528760.

Zidan, Mohammed A., et al. “Field-programmable crossbar array (FPCA) for reconfigurable computing.” IEEE Transactions on Multi-Scale Computing Systems 4.4 (2017): 698-710.

Shafiee et al., “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in Proceedings – 2016 43rd International Symposium on Computer Architecture, ISCA 2016, IEEE, Aug. 2016, pp. 14–26. doi: 10.1109/ISCA.2016.12.

DEVELOPMENT STAGE:

Technology concept and/or application formulated

PRINCIPAL INVESTIGATORS:

Hasita Veluri

Dilip Vasudevan

IP Status:

Patent pending

OPPORTUNITIES:

Available for licensing or collaborative research