研究方向

并行处理研究所(PPI)在计算机系统和体系结构的各个方向都展开了一系列的研究,主要着眼于面向多核、众核和分布式平台的并行应用加速和优化,系统软件,编程模型,体系结构研究以及大数据和AI。我们的研究还包含了其他的学科,比如云计算等。PPI的研究目标在于优化各种系统的性能,能效和可靠性等。

最近动向

  • [发表论文] 2018年11月, 我们的论文 “Harmonia: A High Throughput B+ Tree for GPUs” 被 The 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP 2019)会议录用
  • [发表论文] 2016年12月,我们的论文 “Eunomia: Scaling Concurrent Search Trees under Contention Using HTM.” 被 The 22nd ACM SIGPLAN Symposium on Principle and Practice of Parallel Computing (PPoPP 2017) 会议录用。
  • [发表论文] 2016年9月,我们的论文 “VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-core.” 被 IEEE Transaction on Parallel and Distributed Systems (TPDS) 期刊录用。
  • [发表论文] 2016年5月,我们的论文 “Understanding the Architectural Characteristics of EDA Algorithms.” 被 International Conference on Parallel Processing (ICPP 2016) 会议录用。
  • [发表论文] 2016年3月,我们的论文 “Parallelizing Image Feature Extraction Algorithms on Multi-core Platforms.” 被 Journal of Parallel and Distributed Computing (JPDC) 期刊录用。
  • [发表论文] 2016年2月,我们的论文 “Performance Analysis and Optimization of Full Garbage Collection in a Production JVM.” 被 The 12th Annual International Conference on Virtual Execution Environments (VEE 2016) 期刊录用。
  • [发表论文] 2016年2月,我们的论文 “Performance Analysis of Multimedia Retrieval Workloads Running on Multicore.” 被 IEEE Transaction on Parallel and Distributed Systems (TPDS) 期刊录用。

团队成员

教师
· 赵文耘
· 张为华
· 蒋金虎
· 李弋
· 高晓桐
· 鲁云萍
硕士
· 严赵峰 樊书华
· 宋昶衡 代超 任俞锦 沈文杰 艾燨
· 刘强 臧子凝 耿子端 苏斌 林玉哲 谭啸宇 赵如全 张高迪 王发令

本科生
· 董荣朝 赵传磊 管佳乐 艾尔盼江 冉津豪

毕业生及首个职位

· 孙廷杰
· 火雨辰 杨旭瑜 郑冠仕
· 王浩骏 李天驹 岳佳圆
· 黎嘉欣 周克勇 闵庆豪 戴卓方
· 吕超 戴晨 杨冬蕾 陈鹏 陆祎
· 朱斐文, 软件工程师,英伟达
· 胡益斌, 软件工程师,美国国家仪器
· 李寻, 博士研究生, 加利福尼亚大学圣巴巴拉分校
· 鲍韬, 博士研究生,普渡大学
· 陈俊朴, 软件工程师, 微软
· 张垚, 软件工程师, 摩根士利丹
· 严强, 博士研究生, 新加坡管理大学
· 汪钦, 软件工程师, 新思科技
· 严婕, 软件工程师, 新思科技
· 刘力力, 软件工程师, 华为
· 杨小溪, 讲师, 常州信息职业技术学院
· 袁颖, 硕士研究生, 卡内基梅隆大学

论文列表

2019

PPoPP                
Harmonia: A High Throughput B+ Tree for GPUs
Zhaofeng Yan, Yuzhe Lin, Lu Peng, Weihua Zhang
The 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP 2019)

2018

FCS              
Computer comparisons in the presence of performance variation
Samuel Irving,Bin Li,Shaoming Chen,Lu Peng,Weihua Zhang,Lide Duan
Frontiers of Computer Science(FCS)
TPDS                      
Scaling Concurrent Index Structures under Contention Using HTM
Weihua Zhang, Xin Wang, Shiyu Ji, Ziyun Wei, Zhaoguo Wang, Haibo Chen
IEEE Transactions on Parallel and Distributed Systems(TPDS)Volume: 29,Issue: 8,Aug 1 2018
TCAD                       
qSwitch: Dynamical Off-Chip Bandwidth Allocation between Local and Remote Accesses
Shaoming Chen, Lu Peng, Samuel Irving, Zhou Zhao, Weihua Zhang and Ashok Srivastava
IEEE Transactions on on Computer-Aided Design of Integrated Circuits and Systems(TCAD), Volume: 37, Issue: 1, Jan. 2018

2017

TPDS                        
Prophet: A Parallel Instruction-Oriented Many-Core Simulator
Weihua Zhang, Xiaofeng Ji, Yunping Lu, Haojun Wang, Haibo Chen, Pen-Chung Yew
IEEE Transactions on Parallel and Distributed Systems (TPDS) volume: 28,Issue:10,Oct 1 2017
TPDS                         
VarCatcher: A Framework for Tackling Performance Variability of Parallel Workloads on Multi-core
Weihua Zhang, Xiaofeng Ji, Bo Song, Shiqiang Yu, Haibo Chen, Pen-Chung Yew, Tao Li, Wenyun Zhao
IEEE Transactions on Parallel and Distributed Systems (TPDS) Volume: 28, Issue: 4, April 1 2017
PPoPP                       
Eunomia: Scaling Concurrent Search Trees under Contention Using HTM
Xin Wang, Weihua Zhang, Zhaoguo Wang, Ziyun Wei, Haibo Chen, Wenyun Zhao
The 22nd ACM SIGPLAN Symposium on Principle and Practice of Parallel Computing (PPoPP 2017).

2016

TPDS                        
Performance Analysis of Multimedia Retrieval Workloads Running on Multicore
Yunping Lu, Xin Wang, Weihua Zhang, Haibo Chen, Lu Peng, Wenyun Zhao
IEEE Transaction on Parallel and Distributed Systems (TPDS) Volume: 27, Nov 2016
TC                             
Hardware Support for Concurrent Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architectures
Weihua Zhang, Shiqiang Yu, Haojun Wang, Zhuofang Dai, Haibo Chen
IEEE Transactions on Computers (TC) Volume: 65, No. 10, October 2016
TPDS                        
A Loosely-Coupled Full-System Multicore Simulation Framework
Weihua Zhang, Haojun Wang, Yunping Lu, Haibo Chen and Wenyun Zhao
IEEE Transaction on Parallel and Distributed Systems (TPDS) Volume: 27, Issue: 6, June 1 2016
ICPP                        
Understanding the Architectural Characteristics of EDA Algorithms
Xin Wang, Xiaofeng Ji, Yunping Lu, Yi Li, Weijia Zhou, Weihua Zhang, Wenyun Zhao
The 45th International Conference on Parallel Processing (ICPP)
JPDC                        
Parallelizing Image Feature Extraction Algorithms on Multi-core Platforms
Yunping Lu, Yi Li, Bo Song, Weihua Zhang, Haibo Chen, Lu Peng
Journal of Parallel and Distributed Computing (JPDC) Volume: 92, May 2016
VEE                          
Performance Analysis and Optimization of Full Garbage Collection in a Production JVM
Yang Yu, Tianyang Lei, Weihua Zhang, Haibo Chen, Binyu Zang
The 12th Annual International Conference on Virtual Execution Environments (VEE2016)

2015

ICPP                        
Characterizing MultiMedia Retrieval Applications
Yunping Lu, Xin Wang, Weihua Zhang, Yi Li and Wenyun Zhao
The 44th International Conference on Parallel Processing (ICPP, Best Paper Award)
TECS                      
Multi-level Phase Analysis
Weihua Zhang, Jiaxin Li, Yi Li, Haibo Chen
ACM Transactions on Embedded Computing Systems (TECS) Volume: 14, Issue: 2, March 2015

2014

ACA                        
Parallelized Race Detection Based on GPU Architecture
Zhuofang Dai, Zheng Zhang, Haojun Wang, Yi Li and Weihua Zhang
2014 Annual Conference of Advanced Computer Architecture (ACA 2014, Best Paper Award)
ICPP                        
Hydra: Efficient Detection of Multiple Concurrency Bugs on Fused CPU-GPU Architecture
Zhuofang Dai, Haojun Wang, Weihua Zhang, Haibo Chen and Binyu Zang
The 43rd International Conference on Parallel Processing (ICPP)
NAS                        
RPSim: A Rapid Prototyping Full-system Simulator for SoC Software Development
Haojun Wang, Qinghao Min, Weihua Zhang
The 9th IEEE International Conference on Networking, Architecture and Storage (NAS)
DAC                       
DAPs: Dynamic Adjustment and Partial Sampling for Multithreaded/Multicore Simulation
Chien-Chih Chen, Yin-Chi Peng, Cheng-Fen Chen, Wei-Shan Wu, Qinghao Min, Pen-Chung Yew, Weihua Zhang, Tien-Fu Chen
Design Automaion Conference (DAC), San Francisco, June 1 – 5, 2014

2013

SIGMETRICS        
Understanding Architectural Characteristics of Multimedia Retrieval Workloads
Chen Dai, Chao Lv, Jiaxin Li, Weihua Zhang
The ACM SIGMETRICS 2013 (POSTER), PA, June 17 – 21, 2013
DATE                     
Multi-level Phase Analysis for Sampling Simulation
Jiaxin Li, Weihua Zhang, Haibo Chen and Binyu Zang
Design, Automation & Test in Europe Conference & Exhibition (DATE 2013). Grenoble, France, March, 2013

2012

ICPP                     
Adaptive Pipeline Parallelism for Image Feature Extraction Algorithms
Peng Chen, Donglei Yang, Weihua Zhang, Yi Li, Haibo Chen and Binyu Zang
In the 41st International Conference on Parallel Processing (ICPP 2012). PA, USA, September, 2012
LCTES                   
Improving Dynamic Prediction Accuracy Through Multi-level Phase Analysis
Zhenman Fang, Jiaxin Li, Weihua Zhang, Yi Li, Haibo Chen, Binyu Zang
In proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2012)
DAC                       
Transformer: A Functional-Driven Cycle-Accurate Multicore Simulator
Zhenman Fang, Qinghao Min, Keyong Zhou, Yi Lu, Yibin Hu, Weihua Zhang, Haibo Chen, Jian Li, Binyu Zang
The 49th Design Automation Conference (DAC 2012) San Francisco, USA, June, 2012
GPGPU                 
A GPU-based High-throughput Image Retrieval AlgorithmA GPU-based High-throughput Image Retrieval Algorithm
Feiwen Zhu, Peng Chen, Donglei Yang, Weihua Zhang, Haibo Chen, Binyu Zang
The Fifth Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 5) collocated with ASPLOS 2012
VEE                       
Swift: A Register-based JIT Compiler for Embedded JVMs
Yuan Zhang, Min Yang, Bo Zhou, Zhemin Yang, Weihua Zhang, Binyu Zang
The 8th Annual International Conference on Virtual Execution Environments (VEE 2012). London, United Kingdom

2011

PPOPP                   
COREMU: a Scalable and Portable Parallel Full-system Emulator
Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang, Binyu Zang
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2011). San Antonio, USA, February, 2011
APPT                     
A parallel analysis on scale invariant feature transform (SIFT) algorithm
Donglei Yang, Lili Liu, Feiwen Zhu, and Weihua Zhang
The 9th International Symposium on Advanced Parallel Processing Technologies (APPT 2011). Shanghai, China
ISPASS                 
A Comprehensive Analysis and Parallelization of an Image Retrieval Algorithm
Zhenman Fang, Donglei Yang, Weihua Zhang, Haibo Chen, Binyu Zang
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2011). Austin TX, USA, April, 2011

2009

PACT                   
Hierarchical Phase Analysis for Sampling Simulations
Weihua Zhang, Qiang Yan, Binyu Zang, Pen-Chung Yew
The 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), POSTER
SAC                     
Optimizing Techniques for Saturated Arithmetic with First-Order Linear Recurrence
Weihua Zhang, Lili Liu, Chen Zhang, Hongjiang Zhang, Binyu Zang and Chuanqi Zhu
The 24th Annual ACM Symposium on Applied Computing (SAC 2009) Programming Language Track. Honolulu, Hawaii, USA
APPT                   
Evaluating SPLASH-2 benchmarks using Hadoop MapReduce
Shengkai Zhu, Zhiwei Xiao, Haibo Chen, Rong Chen, Weihua Zhang and Binyu Zang
The 8th international Conference on Advanced Parallel Processing Technologies (APPT 2009). Rapperswil, Switzerland. August, 2009

2007

LCTES                 
Optimizing Software Cache Performance of PacketProcessing Applications
Qin Wang, Junpu Chen, Weihua Zhang and Binyu Zang
In proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2007)
PACT                   
Optimizing Bandwidth Constraint through Register Interconnection for Stream Processors
Weihua Zhang, Tao Bao, Binyu Zang and Chuanqi Zhu
The 6h International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Poster, Brasov, Romania
LCTES                 
Optimizing Compiler for Shared-Memory Multiple SIMD Architecture
Weihua Zhang, Xinglong Qian, Ye Wang, Binyu Zang and Chuanqi Zhu
In proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2006)
LCPC                   
Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture
Weihua Zhang, Tao Bao, Binyu Zang and Chuanqi Zhu
The 19th InternationalWorkshop on Languages and Compilers for Parallel Computing (LCPC 2006)

项目

Transformer

  • 项目描述: 为了简化在全系统多核模拟器上扩展新的功能模块和时序模块的过程,我们实现了一个松耦合的功能模块驱动的平台,该平台在功能模块和时序模块之间实现独立于体系结构的接口。为了保证松耦合结构的精确性,我们综合分析了功能模块和时序模块执行流产生差异的原因,并针对其实现了轻量级的解决方案。由于多线程之间的交互在时序模块模拟就可得知,因此能够更加高效地对对其进行并行化加速。

多媒体检索算法的体系结构研究

  • 项目描述: 诸如图像和视频等多媒体数据已经成为当前最为频繁处理的数据之一。现今的多媒体检索应用非常普遍,而其执行时间往往过长,这使得有必要去针对这些应用的特性进行性能评估和优化。为了实现这一目标,我们构建了MMRBench这一套开源的基准测试集。该测试集收录了现今较具代表性的新技术多媒体检索应用,包括其原始版本、我们实现的POSIX版本(线程级并行)和Map-Reduce版本(进程级并行)。此外,我们还研究了这些应用程序在体系结构方面的特性,并对影响其性能的因素进行了分析,包括对输入集大小的敏感度,内存/计算密集度,对浮点计算的敏感度以及潜在的线程级并行能力。基于MMRBench这一套测试集和我们提供的相关自动化工具的支持,构建一个多媒体检索系统或进行在相关体系结构上的研究——如系统评估、体系结构设计和加速器等,将会变得非常简单。

链接

机构

模拟器

单核处理器

  • SimpleScalar – 一个用来构建程序性能分析模型、底层微体系结构模型和软硬件协同验证的系统软件。

多核处理器

  • FeS2 – 一个快速的支持x86的多核迷你器,作为Virtutech Simics的一个模块实现。
  • GEMS – 基于Simics的通用的执行驱动的多核模拟器。
  • M5 – 用于计算机系统体系结构研究的模块化平台,由系统级体系结构和处理器微体系结构组成。支持Alpha, SPARC, MIPS, 对x86的支持正在实现中。
  • PTLsim – 支持x86和x86-64的周期精确度的乱序多处理器模拟器。PTLsim模拟了支持x86-64指令集的乱序投机执行的处理器,缓存和其他辅助硬件。

V虚拟机

  • LLVM – 底层虚拟机
  • QEMU – 全系统用户态模拟器,包括可以模拟和执行相同ISA的加速器。

编译器

其他技术