Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures
Published in IPDPS-2021, 2020
Recommended citation: Weiling Yang, Jianbin Fang, Dezun Dong. "Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures." IPDPS. 2021. http://jianbinfang.github.io/files/2020-12-11-ipdps.pdf
General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. There is a large body of work on evaluating and optimizing large-scale matrix multiplication, but how well the small-scale matrix multiplication (SMM) performs is largely unknown, especially for the ARMv8-based many-core architectures. In this work, we evaluate and characterize the performance of SMM subroutines on Phytium 2000+, an ARMv8-based 64-core architecture. The evaluation work is extensively performed with the mainstream open-source libraries including OpenBLAS, BLIS, BALSFEO, and Eigen. Given various experimental settings, we observe how well the small-scale GEMM routines perform on Phytium 2000+, and then discuss the impacting factors behind the performance behaviours of SMM.
Recommended citation: Weiling Yang, Jianbin Fang, Dezun Dong. (2021). “Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures.” IPDPS. 2021.