A Hardware Accelerator for Optimized Sparse Accumulation in High Performance Computing 2022-066 - Intellectual Property Office

APPLICATIONS OF TECHNOLOGY:

Applications that utilize sparse graph or matrix algorithms in their computational workloads, like:

Scientific computing
Machine learning
Social network analysis
Image and signal processing

BENEFITS:

Improved sparse accumulation (SPA) throughput
Improved core utilization
Reduced energy consumption
Minimal hardware overhead
Flexible software Interface
Support for a broader class of sparse operations

BACKGROUND:

Performing sparse linear algebra is an important operation in many computing applications. One popular algorithm to perform sparse general matrix-matrix multiplication (SpGEMM) is Gustavson’s column-wise SpGEMM. This algorithm has good locality and is easily parallelized. However, the sparse accumulation (SPA) step in this algorithm is a performance bottleneck, primarily due to its use of a hash table for partial sum search, which contributes significantly to the overall execution time of the algorithm.

TECHNOLOGY OVERVIEW:

Accelerated sparse accumulation (ASA) is an architecture that accelerates SPA to overcome the performance bottleneck in column-wise SpGEMM.

Scientists at Berkeley Lab have developed a new architecture that accelerates SPA in column-wise SpGEMM, called ASA. ASA executes the partial sum search and accumulation with a single instruction, eliminating data-dependent branches and streamlining the computation process. ASA also includes a small, dedicated on-chip cache with an accumulator to store and process partial sums efficiently. This enhances the throughput of SPA and reduces energy consumption during cache lookups. ASA reduces search latency by using a parallel search method in the set-associative cache, and delays the merging of partial sum entries evicted from the cache due to conflicts, further optimizing the efficiency of the architecture.

ASA achieves an average of about 2x and 5x speedup as compared to the state-of-the-art software implementation of a Markov clustering application and its SpGEMM kernel, respectively. As compared to a state-of-the-art hashing accelerator design, ASA achieves an average of about 2x speedup in the SpGEMM kernel.

ASA provides a simple, flexible software interface that allows for integration with other software optimizations. Additionally, ASA can be extended to support conditional accumulate operations, enabling efficient handling of various sparse graph and matrix kernels, such as triangle counting, breadth-first search, and betweenness centrality.

DEVELOPMENT STAGE:

Proof of concept

PRINCIPAL INVESTIGATORS:

Chao Zhang, Lehigh University

Xiaochen Guo, Lehigh University

Maximilian Bremer, Berkeley Lab

Cy Chan, Berkeley Lab

John Shalf, Berkeley Lab

IP Status:

Patent pending

OPPORTUNITIES:

Available for licensing or collaborative research