Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures

Published in Cluster, 2021

Recommended citation: Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. " Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures." Cluster. 2021. http://jianbinfang.github.io/files/2021-07-06-cluster.pdf

This paper presents the first comprehensive performance study on OpenMP barrier implementations on emerging ARMv8-based many-cores. We evaluate seven representative barrier algorithms on three distinct ARMv8 architectures:Phytium 2000+, ThunderX2, and Kunpeng920. We empirically show that the existing synchronization implementations exhibit poor scalability on ARMv8 architectures compared to the x86 counterpart. We then propose various optimization strategies for improving these widely used synchronization algorithms on each platform. Download paper here

Recommended citation: Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. (2021). “Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures.” Cluster. 2021.