Research Statement

The Parallel Processing Institute (PPI) conducts research in all aspects of computer systems and architectures, with a primary focus on parallel acceleration and optimization, system software, programming model, runtime and computer architecture for multicore and distributed systems. Our research also involves other disciplines such as large-scale data processing and cloud computing. The research themes of PPI are improving the performance scalability, energy efficiency and dependability of mobile, centralized and distributed computer systems.

Recent News

  • [Accepted] 2019, Our Paper “A High Throughput B+tree for SIMD architectures” has been accepted by IEEE Transactions on Parallel and Distributed Systems(TPDS)
  • [News] May 17th,2019, Dr. Lide Duan from Alibaba DAMO Academy visited our laboratory,and made a report “Processing Near or In Memory for Deep Learning”.detail
  • [Accepted] 2019, Our Paper “Unleashing the Power of Learning: An Enhanced Learning-based Approach for Dynamic Binary Translation” has been accepted by The 2019 USENIX Annual Technical Conference(ATC 19)
  • [Accepted] 2018, Our Paper “Harmonia: A High Throughput B+ Tree for GPUs” has been accepted by The 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP 2019)
  • [Publication] 2018, Our Paper “qSwitch: Dynamical Off-Chip Bandwidth Allocation between Local and Remote Accesses” has been accepted by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(TCAD)
  • [Publication] Oct,2017, Our Paper “Prophet: A Parallel Instruction-Oriented Many-Core Simulator” has been accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • [Publication] April,2017, Our Paper “VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-core” has been accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • [Publication] 2017, Our Paper “Eunomia: Scaling Concurrent Search Trees under Contention Using HTM” has been accepted by The 22nd ACM SIGPLAN Symposium on Principle and Practice of Parallel Computing (PPoPP 2017).
  • [Publication] Feburary, 2016, Our Paper “Performance Analysis and Optimization of Full Garbage Collection in a Production JVM.” has been accepted by The 12th Annual International Conference on Virtual Execution Environments (VEE 2016).
  • [Publication] Feburary, 2016, Our Paper “Performance Analysis of Multimedia Retrieval Workloads Running on Multicore.” has been accepted by IEEE Transaction on Parallel and Distributed Systems (TPDS).


· Weihua Zhang
· Jinhu Jiang
· Xiaotong Gao
· Yunping Lu

Master Students
· Changheng SongChao DaiYujin Ren,   Wenjie Shen,  Xi Ai
· Qiang Liu,  Zining Zang,    Ziduan Geng,   Bin Su,  Yuzhe Lin,    Xiaoyu Tan,  Ruquan Zhao
Gaodi Zhang,   Faling Wang
· Yuan Xue,  Zhongjun Zhou,  Jiacheng Tang,  Zhuihui Wang,  Zetong Pan

Undergraduate Students
· Rongchao Dong , Chuanlei Zhao,  Jiale Guan, Jinhao Ran

Alumni and their first positions

· Tingjie SunZhaofeng Yan
· Yuchen Huo,  Guanshi Zheng,   Xuyu Yang
· Haojun Wang,  Tianju Li,   Jiayuan Yue
· Jiaxin Li,  Keyong Zhou,  Qinghao Min ,  Zhuofang Dai
· Chao Lv,  Chen Dai,   Donglei Yang ,  Peng Chen,  Yi Lu
· Feiwen Zhu,Software Engineer, Nvidia
· Yibin Hu,Software Engineer, National Instruments
· Xun Li, Ph.D Student, University of University of California, Santa Barbara
· Tao Bao, Ph.D Student, Purdue University
· Junpu Chen, Software Engineer, Microsoft
· Yao Zhang, Software Engineer, Morgan
· Qiang Yan, Ph.D Student, Singapore Management University
· Qin Wang, Software Engineer, Synopsys
· Jie Yan, Software Engineer, Synopsys
· Lili Liu, Software Engineer, Huawei
· Xiaoxi Yang, Assistant Professor, Chanzhou College of Information Technology
· Ying Yuan, Master Student, Carnegie Mellon University



  • Project description: To ease the process of extending novel functional model(FM) and timing model(TM) in full-system multi-core simulators, we investigate a loosely-coupled functional-driven framework with architecture-independent interface between FM and TM. To guarantee cycle-accuracy in the loosely-coupled design, a comprehensive analysis of FM/TM divergence is presented and lightweight solutions are provided to resolve the divergence. Parallelization will be efficiently applied to accelerate the simulation speed as interleaving will be known before TM.

Architecture Research for Multimedia Retrieval Algorithms

  • Project description: Multimedia data, such as image and video, has become one of the most popular data types being processed every day. Due to the prevalence of multimedia retrieval applications and their prohibitive execution times, it is necessary to understand their performance characteristics for evaluation and optimization. To achieve this goal, we build up MMRBench, a public available benchmark suite containing representative state-of-art multimedia retrieval applications, including original version, our implemented POSIX version (thread-level parallelized) and map-reduce version (task-level parallelized). We also do study on these applications about their architectural characteristics as well as other performance related analysis such as input sensitivity, memory/computation intensity, floating operation sensitivity and potential thread-level parallelism. Based on the suite of MMRBench and automatic tools we provided, it is easy to construct a real multimedia retrieval system or do research on related architecture, such as system evaluation, architecture design and accelerator.



Parallel Processing Institute



SimpleScalar – A system software infrastructure used to build modeling applications for program performance analysis, detailed microarchitectural modeling, and hardware-software co-



  • FeS2 – A timing-first, multiprocessor, x86 simulator, implemented as a module for Virtutech Simics
  • GEMS – General Execution-driven Multiprocessor Simulator, based on Simics
  • M5 – A modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. Supports Alpha, SPARC, MIPS, and ARM ISAs, with x86 support in progress.
  • PTLsim – A cycle accurate out of order microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets. PTLsim models a modern speculative out of order x86-64 compatible processor core, cache hierarchy and supporting hardware

 Virtual Machines

  • LLVM – Low Level Virtual Machine
  • QEMU – A full system and user-mode simulator, with accelerators for simulating and executing on the same ISA.