Research Statement

The Parallel Processing Institute (PPI) conducts research in all aspects of computer systems and architectures, with a primary focus on parallel acceleration and optimization, system software, programming model, runtime and computer architecture for multicore and distributed systems. Our research also involves other disciplines such as large-scale data processing and cloud computing. The research themes of PPI are improving the performance scalability, energy efficiency and dependability of mobile, centralized and distributed computer systems.

Recent News

  • [Accepted] 2018, Our Paper “Harmonia: A High Throughput B+ Tree for GPUs” has been accepted by The 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP 2019)
  • [Publication] 2018, Our Paper “qSwitch: Dynamical Off-Chip Bandwidth Allocation between Local and Remote Accesses” has been accepted by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(TCAD)
  • [Publication] Oct,2017, Our Paper “Prophet: A Parallel Instruction-Oriented Many-Core Simulator” has been accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • [Publication] April,2017, Our Paper “VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-core” has been accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • [Publication] 2017, Our Paper “Eunomia: Scaling Concurrent Search Trees under Contention Using HTM” has been accepted by The 22nd ACM SIGPLAN Symposium on Principle and Practice of Parallel Computing (PPoPP 2017).
  • [Publication] Feburary, 2016, Our Paper “Performance Analysis and Optimization of Full Garbage Collection in a Production JVM.” has been accepted by The 12th Annual International Conference on Virtual Execution Environments (VEE 2016).
  • [Publication] Feburary, 2016, Our Paper “Performance Analysis of Multimedia Retrieval Workloads Running on Multicore.” has been accepted by IEEE Transaction on Parallel and Distributed Systems (TPDS).

Members

Faculty
· Wenyun Zhao
· Weihua Zhang
· Jinhu Jiang
· Yi Li
· Xiaotong Gao
· Yunping Lu

Master Students
· Zhaofeng YanShuhua Fan
· Changheng SongChao DaiYujin Ren,   Wenjie Shen,  Xi Ai
· Qiang Liu,  Zining Zang,    Ziduan Geng,   Bin Su,  Yuzhe Lin,    Xiaoyu Tan,  Ruquan Zhao
Gaodi Zhang,   Faling Wang

Undergraduate Students
· Rongchao Dong , Chuanlei Zhao,  Jiale Guan,  Aier Panjiang,   Jinhao Ran

Alumni and their first positions

· Tingjie Sun
· Yuchen Huo,  Guanshi Zheng,   Xuyu Yang
· Haojun Wang,  Tianju Li,   Jiayuan Yue
· Jiaxin Li,  Keyong Zhou,  Qinghao Min ,  Zhuofang Dai
· Chao Lv,  Chen Dai,   Donglei Yang ,  Peng Chen,  Yi Lu
· Feiwen Zhu,Software Engineer, Nvidia
· Yibin Hu,Software Engineer, National Instruments
· Xun Li, Ph.D Student, University of University of California, Santa Barbara
· Tao Bao, Ph.D Student, Purdue University
· Junpu Chen, Software Engineer, Microsoft
· Yao Zhang, Software Engineer, Morgan
· Qiang Yan, Ph.D Student, Singapore Management University
· Qin Wang, Software Engineer, Synopsys
· Jie Yan, Software Engineer, Synopsys
· Lili Liu, Software Engineer, Huawei
· Xiaoxi Yang, Assistant Professor, Chanzhou College of Information Technology
· Ying Yuan, Master Student, Carnegie Mellon University

Publications

2019

PPoPP                     
Harmonia: A High Throughput B+ Tree for GPUs
Zhaofeng Yan, Yuzhe Lin, Lu Peng, Weihua Zhang
The 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP 2019)

2018

FCS                
Computer comparisons in the presence of performance variation
Samuel Irving,Bin Li,Shaoming Chen,Lu Peng,Weihua Zhang,Lide Duan
Frontiers of Computer Science(FCS)
TPDS                       
Scaling Concurrent Index Structures under Contention Using HTM
Weihua Zhang, Xin Wang, Shiyu Ji, Ziyun Wei, Zhaoguo Wang, Haibo Chen
IEEE Transactions on Parallel and Distributed Systems(TPDS)Volume: 29,Issue: 8, Aug 1 2018
TCAD                     
qSwitch: Dynamical Off-Chip Bandwidth Allocation between Local and Remote Accesses
Shaoming Chen, Lu Peng, Samuel Irving, Zhou Zhao, Weihua Zhang and Ashok Srivastava
IEEE Transactions on on Computer-Aided Design of Integrated Circuits and Systems(TCAD), Volume: 37, Issue: 1, Jan. 2018

2017

TPDS                    
Prophet: A Parallel Instruction-Oriented Many-Core Simulator
Weihua Zhang, Xiaofeng Ji, Yunping Lu, Haojun Wang, Haibo Chen, Pen-Chung Yew
IEEE Transactions on Parallel and Distributed Systems (TPDS) volume: 28,Issue:10,Oct 1 2017
TPDS                   
VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-core
Weihua Zhang, Xiaofeng Ji, Bo Song, Shiqiang Yu, Haibo Chen, Pen-Chung Yew, Tao Li, Wenyun Zhao
IEEE Transactions on Parallel and Distributed Systems (TPDS) Volume: 28, Issue: 4, April 1 2017
PPoPP                 
Eunomia: Scaling Concurrent Search Trees under Contention Using HTM
Xin Wang, Weihua Zhang, Zhaoguo Wang, Ziyun Wei, Haibo Chen, Wenyun Zhao
The 22nd ACM SIGPLAN Symposium on Principle and Practice of Parallel Computing (PPoPP 2017).

2016

TPDS               
Performance Analysis of Multimedia Retrieval Workloads Running on Multicore
Yunping Lu, Xin Wang, Weihua Zhang, Haibo Chen, Lu Peng, Wenyun Zhao
IEEE Transaction on Parallel and Distributed Systems (TPDS) Volume: 27, Nov 2016
TC                          
Hardware Support for Concurrent Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architectures
Weihua Zhang, Shiqiang Yu, Haojun Wang, Zhuofang Dai, Haibo Chen
IEEE Transactions on Computers (TC) Volume: 65, No. 10, October 2016
TPDS                 
A Loosely-Coupled Full-System Multicore Simulation Framework
Weihua Zhang, Haojun Wang, Yunping Lu, Haibo Chen and Wenyun Zhao
IEEE Transaction on Parallel and Distributed Systems (TPDS) Volume: 27, Issue: 6, June 1 2016
ICPP                
Understanding the Architectural Characteristics of EDA Algorithms
Xin Wang, Xiaofeng Ji, Yunping Lu, Yi Li, Weijia Zhou, Weihua Zhang, Wenyun Zhao
The 45th International Conference on Parallel Processing (ICPP)
JPDC             
Parallelizing Image Feature Extraction Algorithms on Multi-core Platforms
Yunping Lu, Yi Li, Bo Song, Weihua Zhang, Haibo Chen, Lu Peng
Journal of Parallel and Distributed Computing (JPDC) Volume: 92, May 2016
VEE                    
Performance Analysis and Optimization of Full Garbage Collection in a Production JVM
Yang Yu, Tianyang Lei, Weihua Zhang, Haibo Chen, Binyu Zang
The 12th Annual International Conference on Virtual Execution Environments (VEE2016)

2015

ICPP                 
Characterizing MultiMedia Retrieval Applications
Yunping Lu, Xin Wang, Weihua Zhang, Yi Li and Wenyun Zhao
The 44th International Conference on Parallel Processing (ICPP, Best Paper Award)
TECS                  
Multi-level Phase Analysis
Weihua Zhang, Jiaxin Li, Yi Li, Haibo Chen
ACM Transactions on Embedded Computing Systems (TECS) Volume: 14, Issue: 2, March 2015

2014

ACA                       
Parallelized Race Detection Based on GPU Architecture
Zhuofang Dai, Zheng Zhang, Haojun Wang, Yi Li and Weihua Zhang
2014 Annual Conference of Advanced Computer Architecture (ACA 2014, Best Paper Award)
ICPP                      
Hydra: Efficient Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architecture
Zhuofang Dai, Haojun Wang, Weihua Zhang, Haibo Chen and Binyu Zang
The 43rd International Conference on Parallel Processing (ICPP)
NAS                       
RPSim: A Rapid Prototyping Full-system Simulator for SoC Software Development
Haojun Wang, Qinghao Min, Weihua Zhang
The 9th IEEE International Conference on Networking, Architecture and Storage (NAS)
DAC                                        
DAPs: Dynamic Adjustment and Partial Sampling for Multithreaded/Multicore Simulation
Chien-Chih Chen, Yin-Chi Peng, Cheng-Fen Chen, Wei-Shan Wu, Qinghao Min, Pen-Chung Yew, Weihua Zhang, Tien-Fu Chen
Design Automaion Conference (DAC), San Francisco, June 1 – 5, 2014

2013

SIGMETRICS        
Understanding Architectural Characteristics of Multimedia Retrieval Workloads
Chen Dai, Chao Lv, Jiaxin Li, Weihua Zhang
The ACM SIGMETRICS 2013 (POSTER), PA, June 17 – 21, 2013
DATE                                
Multi-level Phase Analysis for Sampling Simulation
Jiaxin Li, Weihua Zhang, Haibo Chen and Binyu Zang
Design, Automation & Test in Europe Conference & Exhibition (DATE 2013). Grenoble, France, March, 2013

2012

ICPP                               
Adaptive Pipeline Parallelism for Image Feature Extraction Algorithms
Peng Chen, Donglei Yang, Weihua Zhang, Yi Li, Haibo Chen and Binyu Zang
In the 41st International Conference on Parallel Processing (ICPP 2012). PA, USA, September, 2012
LCTES                                    
Improving Dynamic Prediction Accuracy Through Multi-level Phase Analysis
Zhenman Fang, Jiaxin Li, Weihua Zhang, Yi Li, Haibo Chen, Binyu Zang
In proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2012)
DAC                                    
Transformer: A Functional-Driven Cycle-Accurate Multicore Simulator
Zhenman Fang, Qinghao Min, Keyong Zhou, Yi Lu, Yibin Hu, Weihua Zhang, Haibo Chen, Jian Li, Binyu Zang
The 49th Design Automation Conference (DAC 2012) San Francisco, USA, June, 2012
GPGPU                              
A GPU-based High-throughput Image Retrieval AlgorithmA GPU-based High-throughput Image Retrieval Algorithm
Feiwen Zhu, Peng Chen, Donglei Yang, Weihua Zhang, Haibo Chen, Binyu Zang
The Fifth Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 5) collocated with ASPLOS 2012
VEE                                    
Swift: A Register-based JIT Compiler for Embedded JVMs
Yuan Zhang, Min Yang, Bo Zhou, Zhemin Yang, Weihua Zhang, Binyu Zang
The 8th Annual International Conference on Virtual Execution Environments (VEE 2012). London, United Kingdom

2011

PPOPP                               
COREMU: a Scalable and Portable Parallel Full-system Emulator
Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang, Binyu Zang
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2011). San Antonio, USA, February, 2011
APPT                                  
A parallel analysis on scale invariant feature transform (SIFT) algorithm
Donglei Yang, Lili Liu, Feiwen Zhu, and Weihua Zhang
The 9th International Symposium on Advanced Parallel Processing Technologies (APPT 2011). Shanghai, China
ISPASS                                
A Comprehensive Analysis and Parallelization of an Image Retrieval Algorithm
Zhenman Fang, Donglei Yang, Weihua Zhang, Haibo Chen, Binyu Zang
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2011). Austin TX, USA, April, 2011

2009

PACT                                   
Hierarchical Phase Analysis for Sampling Simulations
Weihua Zhang, Qiang Yan, Binyu Zang, Pen-Chung Yew
The 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), POSTER
SAC                                         
Optimizing Techniques for Saturated Arithmetic with First-Order Linear Recurrence
Weihua Zhang, Lili Liu, Chen Zhang, Hongjiang Zhang, Binyu Zang and Chuanqi Zhu
The 24th Annual ACM Symposium on Applied Computing (SAC 2009) Programming Language Track. Honolulu, Hawaii, USA
APPT                                       
Evaluating SPLASH-2 benchmarks using Hadoop MapReduce
Shengkai Zhu, Zhiwei Xiao, Haibo Chen, Rong Chen, Weihua Zhang and Binyu Zang
The 8th international Conference on Advanced Parallel Processing Technologies (APPT 2009). Rapperswil, Switzerland. August, 2009

2007

LCTES                                      
Optimizing Software Cache Performance of PacketProcessing Applications
Qin Wang, Junpu Chen, Weihua Zhang and Binyu Zang
In proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2007)
PACT                                        
Optimizing Bandwidth Constraint through Register Interconnection for Stream Processors
Weihua Zhang, Tao Bao, Binyu Zang and Chuanqi Zhu
The 6h International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Poster, Brasov, Romania
LCTES                                         
Optimizing Compiler for Shared-Memory Multiple SIMD Architecture
Weihua Zhang, Xinglong Qian, Ye Wang, Binyu Zang and Chuanqi Zhu
In proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2006)
LCPC                                    
Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture
Weihua Zhang, Tao Bao, Binyu Zang and Chuanqi Zhu
The 19th InternationalWorkshop on Languages and Compilers for Parallel Computing (LCPC 2006)

Projects

Transformer

  • Project description: To ease the process of extending novel functional model(FM) and timing model(TM) in full-system multi-core simulators, we investigate a loosely-coupled functional-driven framework with architecture-independent interface between FM and TM. To guarantee cycle-accuracy in the loosely-coupled design, a comprehensive analysis of FM/TM divergence is presented and lightweight solutions are provided to resolve the divergence. Parallelization will be efficiently applied to accelerate the simulation speed as interleaving will be known before TM.

Architecture Research for Multimedia Retrieval Algorithms

  • Project description: Multimedia data, such as image and video, has become one of the most popular data types being processed every day. Due to the prevalence of multimedia retrieval applications and their prohibitive execution times, it is necessary to understand their performance characteristics for evaluation and optimization. To achieve this goal, we build up MMRBench, a public available benchmark suite containing representative state-of-art multimedia retrieval applications, including original version, our implemented POSIX version (thread-level parallelized) and map-reduce version (task-level parallelized). We also do study on these applications about their architectural characteristics as well as other performance related analysis such as input sensitivity, memory/computation intensity, floating operation sensitivity and potential thread-level parallelism. Based on the suite of MMRBench and automatic tools we provided, it is easy to construct a real multimedia retrieval system or do research on related architecture, such as system evaluation, architecture design and accelerator.

Links

Institute

Parallel Processing Institute

Simulators

Uni-processor

SimpleScalar – A system software infrastructure used to build modeling applications for program performance analysis, detailed microarchitectural modeling, and hardware-software co-

verification.

Multi-processor

  • FeS2 – A timing-first, multiprocessor, x86 simulator, implemented as a module for Virtutech Simics
  • GEMS – General Execution-driven Multiprocessor Simulator, based on Simics
  • M5 – A modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. Supports Alpha, SPARC, MIPS, and ARM ISAs, with x86 support in progress.
  • PTLsim – A cycle accurate out of order microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets. PTLsim models a modern speculative out of order x86-64 compatible processor core, cache hierarchy and supporting hardware

 Virtual Machines

  • LLVM – Low Level Virtual Machine
  • QEMU – A full system and user-mode simulator, with accelerators for simulating and executing on the same ISA.

 Compilers

 Techinique