Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in SC, 2006
Recommended citation: Guangming Tan, Shengzhong Feng, Ninghui Sun. "Locality and parallelism optimization for dynamic programming algorithm in bioinformatics." SC 2006
Published in SPAA, 2007
Recommended citation: Guangming Tan, Ninghui Sun, Guang R. Gao. "A parallel dynamic programming algorithm on a multi-core architecture." the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 135-144, 2007
Published in LCPC, 2008
Recommended citation: Guangming Tan, Vugranam C. Sreedhar, Guang R. Gao. "Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture." the 21th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2008: 331-342.
Published in ICS, 2009
Recommended citation: Guangming Tan, Ziyu Guo, Dan Meng. "Single-particle 3D Reconstruction from Cryo-Electron Microscopy Images on GPU." The 23rd ACM International Conference on Supercomputing (ICS), pp. 380-389, 2009.
Published in SC, 2011
Recommended citation: Guangming Tan, Linchuan Li, Sean Triechler, Everett Phillips, Yungang Bao, Ninghui Sun. "Fast Implementation of DGEMM on Fermi GPU." ACM/IEEE Supercomputing (SC), 2011.
Published in ICS, 2012
Recommended citation: Jiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun. "An Optimized Large-Scale Hybrid DGEMM Design for CPUs and ATI GPUs." The 26th ACM International Conference on Supercomputing (ICS), pp.377-386, 2012.
Published in PLDI, 2013
Recommended citation: Jiajia Li, Guangming Tan, Mingyu Chen, Ninghui Sun. "SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication." the 34th annual ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), 117-126, 2013.
Published in CGO, 2013
Recommended citation: Jie Yan, Guangming Tan, Xiuxia Zhang, Erlin Yao, Ninghui Sun. "Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms." 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp.1-10,2013
Published in ICS, 2015
Recommended citation: Yulong Luo, Guangming Tan, Zeyao Mo, Ninghui Sun. "FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model." Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), 2015
Published in PPoPP, 2017
Recommended citation: Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, Mingyu Chen. "Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2017: 31-43.
Published in ICS, 2017
Recommended citation: Keren Zhou, Guangming Tan, Xiuxia Zhang, Chaowei Wang, Ninghui Sun. "A Performance Analysis Framework for Exploiting GPU Microarchitectural Capability." ACM International Conference on Supercompting (ICS), 2017
Published in PPoPP, 2018
Recommended citation: Xueqi Li, Guangming Tan, Bingchen Wang, Ninghui Sun. "High-performance genomic analysis framework with in-memory computing." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2018: 317-328.
Published in PPoPP, 2019
Recommended citation: Ke Meng, Jiajia Li, Guangming Tan, Ninghui Sun. "A pattern based algorithmic autotuner for graph processing on GPUs." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2019
Published in ICS, 2019
Recommended citation: Zhen Xie, Guangming Tan , Weifeng Liu , Ninghui Sun. " IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication." In Proceedings of 2019 International Conference on Supercomputing, Phoenix, AZ, USA, June 26–28, 2019 (ICS ’19)
Published in PPoPP, 2021
Recommended citation: Xiaoyang Zhang, Junmin Xiao, Guangming Tan. "I/O Lower Bounds for Auto-tuning of Convolutions in CNNs." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2021
Published in PPoPP, 2022
Recommended citation: Junmin Xiao, Qing Xue, Hui Ma, Xiaoyang Zhang, Guangming Tan. "A W-cycle algorithm for efficient batched SVD on GPUs." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2022
Published in PPoPP, 2022
Recommended citation: Zhuoqiang Guo, Denghui Lu, Yujin Yan, Siyu Hu, Rongrong Liu, Guangming Tan, Ninghui Sun, Wanrun Jiang, Lijun Liu, Yixiao Chen, Linfeng Zhang, Mohan Chen, Han Wang, Weile Jia. "Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms." ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2022
Published in DAC, 2022
Recommended citation: Ruihao Gao, Xueqi Li, Yewen Li, Xun Wang, Guangming Tan. "MetaZip: a high-throughput and efficient accelerator for DEFLATE." DAC 2022: 319-324
Published in ICS, 2022
Recommended citation: Zhongzhe Hu, Junmin Xiao, Zheye Deng, Mingyi Li, Kewei Zhang, Xiaoyang Zhang, Ke Meng, Ninghui Sun, Guangming Tan "MegTaiChi: dynamic tensor-based memory management optimization for DNN training." ICS 2022: 25:1-25:13
Published in SC, 2022
Recommended citation: Junmin Xiao, Yunfei Pang, Qing Xue, Chaoyang Shui, Ke Meng, Hui Ma, Mingyi Li, Xiaoyang Zhang, Guangming Tan, "W-Cycle SVD: A Multilevel Algorithm for Batched SVD on GPUs." SC 2022
Published in SC, 2022
Recommended citation: Wei Hu, Hong An, Zhuoqiang Guo, Qingcai Jiang, Xinming Qin, Junshi Chen, Weile Jia, Chao Yang, Zhaolong Luo, Jielan Li, Wentiao Wu, Guangming Tan, Dongning Jia, Qinglin Lu, Fangfang Liu, Min Tian, Fang Li, Yeqi Huang, Liyi Wang, Sha Liu, Jinlong Yang. "2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT." SC 2022 (GB Finalist)
Published in SC, 2022
Recommended citation: Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, Ninghui Sun. "AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices." SC 2022
Published in HPCA, 2023
Recommended citation: Yewen Li, Xueqi Li, Ruihao Gao, Wanqi Liu, Guangming Tan. "NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling." HPCA 2023
Published in AAAI, 2023
Recommended citation: Siyu Hu, Wentao Zhang, Qiuchen Sha, Feng Pan, Lin-Wang Wang, Weile Jia, Guangming Tan, Tong Zhao. " RLEKF: An Optimizer for Deep Potential with Ab Initio Accuracy. " AAAI 2023
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Library, ICT,CAS, 2009
Dense matrix operations are important problems in scientific and engineering computing applications. There have been a lot of works on developing high performance libraries for dense matrix operations. Basic Linear Algebra Subprograms (BLAS) is a de facto application programming interface standard for publishing libraries to perform basic linear algebra operations such as vector and matrix multiplication. The first BLAS is released as a building block of LAPACK, which is a performance portable library for implementing dense linear algebra. Hardware vendors (Intel, AMD, IBM, etc.) also provide BLAS librariesy tuned on their own processors, i.e. MKL and ACML. It is well-known that the performance of BLAS depends on the underlying hardware.