Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU
Jianbin Fang, Peng Zhang, Tao Tang, Chun Huang, Canqun Yang. "Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU." ISPA. 2017.
Jianbin Fang, Peng Zhang, Tao Tang, Chun Huang, Canqun Yang. "Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU." ISPA. 2017.
Minquan Fang, Jianbin Fang, Weimin Zhang, Haifang Zhou, Jianxing Liao, Yuangang Wang. (2018). "Benchmarking the GPU memory at the warp level." Parallel Computing. 71:23-41.
Jing Chen, Jianbin Fang, Weifeng Liu, Tao Tang, Canqun Yang. "clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization." FGCS. 2018.
Peng Zhang, Jianbin Fang, Canqun Yang, Tao Tang, Chun Huang, Zheng Wang. "MOCL: An Efficient OpenCL Implementation for the Matrix-2000 Architecture." CF. 2018.
Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, Zheng Wang. "Auto-tuning Streamed Applications on Intel Xeon Phi." IPDPS. 2018.
Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, Zheng Wang. "Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures." HPCC. 2018.
Donglin Chen, Jianbin Fang, Shizhao Chen, Chuanfu Xu, Zheng Wang. "Optimizing Sparse Matrix-Vector Multiplications on An ARMv8-based Many-Core Architecture." IJPP. 2018.
Wenxu Zheng, Jianbin Fang, Juan Chen, et. al. "Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems." HPCC. 2019.
Zhaoyun Chen, Wei Quan, Mei Wen, Jianbin Fang, Jie Yu, Chunyuan Zhang, Lei Luo. "Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters." TPDS. 2019.
Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, Zheng Wang. "Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+." IJPP. 2019.
Peng Zhang, Jianbin Fang, Canqun Yang, Chun Huang, Tao Tang, Zheng Wang. "Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures." TPDS. 2020.
Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang. "Parallel Programming Models for Heterogeneous Many-Cores : A Comprehensive Survey." CCF THPC. 2020.
Wanrong Gao, Jianbin Fang, Chuanfu Xu, Chun Huang. " Dissecting the Phytium 2000+ Memory Hierarchy via Microbenchmarking. " CCF ACA. 2020.
Guixin Ye, Zhanyong Tang, Huanting Wang, Jianbin Fang, Songfang Huang, Zheng Wang. "Deep Program Structure Modeling Through Multi-Relational Graph-based Learning." PACT. 2020.
Xiaosong Yu, Huihui Ma, Zhengyu Qu, Jianbin Fang, Weifeng Liu. "NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-based Many-Core Architectures." NPC. 2020.
Donglin Chen, Xiang Gao, Chuanfu Xu, Shizhao Chen, Jianbin Fang, Zhenghua Wang, Zheng Wang. "FlowGAN: A Conditional Generative Adversarial Network for Flow Prediction in Various Conditions." ICTAI. 2020.
Juan Chen, Xinxin Qi, Feihao Wu, Jianbin Fang, Yong Dong, Yuan Yuan, Zheng Wang, and Keqin Li. "More Bang for Your Buck: Boosting Performance with Capped Power Consumption." TST. 2020.
Jianbin Fang, Xiangke Liao, Chun Huang, Dezun Dong. "Performance Evaluation of Memory-Centric ARMv8 Many-Core Architectures: A Case Study with Phytium 2000+." JCST. 2020.
Weiling Yang, Jianbin Fang, Dezun Dong. "Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures." IPDPS. 2021.
Jing Chen, Jianbin Fang, Weifeng Liu, Canqun Yang. "BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs." TPDS. 2021.
Donglin Chen, Xiang Gao, Chuanfu Xu, Siqi Wang, Shizhao Chen, Jianbin Fang, Zheng Wang. " FlowDNN: a physics-informed deep neural network for fast and accurate flow prediction." FITEE. 2021.
Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang. " LibShalom: Optimizing Small and Irregular-shaped Matrix Multiplications on ARMv8 Multi-Core." SC. 2021.
Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. " Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures." Cluster. 2021.
Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang. " wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems." JCST. 2021.
Pengyu Wang, Wanrong Gao, Jianbin Fang, Chun Huang, Zheng Wang. " Characterizing OpenMP Synchronization Implementations on ARMv8 Multi-Cores." HPCC. 2021.