Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters

Published in TPDS, 2019

This paper proposes GENIE, a QoS-aware dynamic scheduling framework for a shared GPU cluster, which achieves users QoS guarantee and high system utilization

Recommended citation: Zhaoyun Chen, Wei Quan, Mei Wen, Jianbin Fang, Jie Yu, Chunyuan Zhang, Lei Luo. "Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters." TPDS. 2019. http://jianbinfang.github.io/files/2019-07-29-tpds.pdf

Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

Published in IJPP, 2019

This paper presents a quantitative study for characterizing the scalability of sparse matrix-vector multiplications (SpMV) on Phytium FT-2000+, an ARM-based HPC many-core architecture.

Recommended citation: Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, Zheng Wang. "Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+." IJPP. 2019. http://jianbinfang.github.io/files/2019-11-03-ijpp.pdf

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures

Published in TPDS, 2020

This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures.

Recommended citation: Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang, Tao Tang, Zheng Wang. "Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures." TPDS. 2020. http://jianbinfang.github.io/files/2020-02-27-tpds.pdf

Deep Program Structure Modeling Through Multi-Relational Graph-based Learning

Published in PACT, 2020

This paper presents POEM, a novel framework that automatically learns useful code representations from graph-based program structures. At the core of POEM is a new graph neural network (GNN), which is specially designed for capturing the syntax and semantic information from the program abstract syntax tree and the control and data flow graph.

Recommended citation: Guixin Ye, Zhanyong Tang, Huanting Wang, Jianbin Fang, Songfang Huang, Zheng Wang. "Deep Program Structure Modeling Through Multi-Relational Graph-based Learning." PACT. 2020. http://jianbinfang.github.io/files/2020-07-16-pact.pdf

FlowGAN: A Conditional Generative Adversarial Network for Flow Prediction in Various Conditions

Published in ICTAI, 2020

Existing DL-based models have to be re-trained whenever the flow condition changes, which incurs significant training overhead for real-life scenarios with a wide range of flow conditions. This paper presents FLOWGAN, a novel conditional generative adversarial network for accurate prediction of flow fields in various conditions. FLOWGAN is designed to directly obtain the generation of solutions to flow fields in various conditions based on observations rather than re-training.

Recommended citation: Donglin Chen, Xiang Gao, Chuanfu Xu, Shizhao Chen, Jianbin Fang, Zhenghua Wang, Zheng Wang. "FlowGAN: A Conditional Generative Adversarial Network for Flow Prediction in Various Conditions." ICTAI. 2020. http://jianbinfang.github.io/files/2020-09-03-ictai.pdf

More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Published in TST, 2020

This article develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing (HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption.

Recommended citation: Juan Chen, Xinxin Qi, Feihao Wu, Jianbin Fang, Yong Dong, Yuan Yuan, Zheng Wang, and Keqin Li. "More Bang for Your Buck: Boosting Performance with Capped Power Consumption." TST. 2020. http://jianbinfang.github.io/files/2020-11-01-tst.pdf

Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+

Published in JCST, 2020

This article presents a comprehensive performance evaluation of Phytium 2000+, an ARMv8-based 64-core architecture. We focus on the cache and memory subsystems, analyzing the characteristics that impact the high-performance computing applications.

Recommended citation: Jianbin Fang, Xiangke Liao, Chun Huang, Dezun Dong. "Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+." JCST. 2020. http://jianbinfang.github.io/files/2020-12-02-jcst.pdf

Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures

Published in IPDPS-2021, 2020

General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. There is a large body of work on evaluating and optimizing large-scale matrix multiplication, but how well the small-scale matrix multiplication (SMM) performs is largely unknown, especially for the ARMv8-based many-core architectures. In this work, we evaluate and characterize the performance of SMM subroutines on Phytium 2000+, an ARMv8-based 64-core architecture. The evaluation work is extensively performed with the mainstream open-source libraries including OpenBLAS, BLIS, BALSFEO, and Eigen. Given various experimental settings, we observe how well the small-scale GEMM routines perform on Phytium 2000+, and then discuss the impacting factors behind the performance behaviours of SMM.

Recommended citation: Weiling Yang, Jianbin Fang, Dezun Dong. "Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures." IPDPS. 2021. http://jianbinfang.github.io/files/2020-12-11-ipdps.pdf

BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs

Published in TPDS, 2021

This article presents an efficient implementation of the alternative least squares (ALS) algorithm called BALS built on top of a new sparse matrix format for parallel matrix factorization. Note that the reviewing process takes around 3 years spanning from April 2, 2018 to March 1, 2021, which is the most time-consuming one I have ever seen.

Recommended citation: Jing Chen, Jianbin Fang, Weifeng Liu, Canqun Yang. "BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs." TPDS. 2021. http://jianbinfang.github.io/files/2021-03-01-tpds.pdf

FlowDNN: a physics-informed deep neural network for fast and accurate flow prediction

Published in FITEE, 2021

In this paper, we propose FlowDNN, a novel deep neural network (DNN) to efficiently learn flow representations from CFD results. FlowDNN saves computational time by directly predicting the expected flow fields based on given flow conditions and geometry shapes. FlowDNN is the first DNN that incorporates the underlying physical conservation laws of fluid dynamics with a carefully designed attention mechanism for steady flow prediction. This approach not only improves the prediction accuracy but also preserves the physical consistency of the predicted flow fields, which is essential for CFD.

Recommended citation: Donglin Chen, Xiang Gao, Chuanfu Xu, Siqi Wang, Shizhao Chen, Jianbin Fang, Zheng Wang. " FlowDNN: a physics-informed deep neural network for fast and accurate flow prediction." FITEE. 2021. http://jianbinfang.github.io/files/2021-05-04-fitee.pdf

LibShalom: Optimizing Small and Irregular-shaped Matrix Multiplications on ARMv8 Multi-Core

Published in SC, 2021

This article presents LibShalom, an open-source libraryfor optimizing small and irregular-shaped GEMMs, explicitly targeting the ARMv8 architecture. LibShalom builds upon the classical Goto algorithm but tailors it to minimize the expensive memory accessing overhead for data packing and processing small matrices. It uses analytic methods to determine GEMM kernel optimization parameters, enhancing the computation and parallelization efficiencyof the GEMM kernels.

Recommended citation: Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang. " LibShalom: Optimizing Small and Irregular-shaped Matrix Multiplications on ARMv8 Multi-Core." SC. 2021. http://jianbinfang.github.io/files/2021-06-22-sc.pdf

Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures

Published in Cluster, 2021

This paper presents the first comprehensive performance study on OpenMP barrier implementations on emerging ARMv8-based many-cores. We evaluate seven representative barrier algorithms on three distinct ARMv8 architectures: Phytium 2000+, ThunderX2, and Kunpeng920. We empirically show that the existing synchronization implementations exhibit poor scalability on ARMv8 architectures compared to the x86 counterpart. We then propose various optimization strategies for improving these widely used synchronization algorithms on each platform.

Recommended citation: Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. " Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures." Cluster. 2021. http://jianbinfang.github.io/files/2021-07-06-cluster.pdf

wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems

Published in JCST, 2021

This paper presents a comprehensive study to evaluate cache architecture design on three representative ARMv8 multi-cores, Phytium 2000+, ThunderX2, and Kunpeng 920 (KP920). To this end, we develop the wrBench, a micro-benchmark suite to measure the realized latency and bandwidth of caches at different memory hierarchies when performing core-to-core communications.

Recommended citation: Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. " wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems." JCST. 2021. http://jianbinfang.github.io/files/2021-09-02-jcst.pdf

Characterizing OpenMP Synchronization Implementations on ARMv8 Multi-Cores

Published in HPCC, 2021

This paper presents a study of OpenMP synchronization implementation on two representative ARMv8 multi-core architectures, Phytium 2000+ and ThunderX2, by considering various OpenMP synchronization mechanisms offered by two mainstreamed OpenMP compilers, GCC and LLVM.

Recommended citation: Pengyu Wang, Wanrong Gao, Jianbin Fang, Chun Huang, Zheng Wang. " Characterizing OpenMP Synchronization Implementations on ARMv8 Multi-Cores." HPCC. 2021. http://jianbinfang.github.io/files/2021-10-24-hpcc.pdf

talks

teaching

Advanced Compiler Technology

Graduate course, Natioinal University of Defense Technology, College of Computer Science, 2017

We aim to teach graduates the advanced compiler technologies.

Advanced Compiler Technology

Graduate course, National University of Defense Technology, College of Computer Science, 2018

We aim to teach graduates the advanced compiler technologies.