May 17th,2019, Dr. Lide Duan from Alibaba DAMO Academy visited our laboratory,and made a report “Processing Near or In Memory for Deep Learning”.

Emerging applications such as deep learning are highly memory-intensive, making the main memory a bottleneck in both performance and energy efficiency in current computer systems. Conventional DRAM-based main memory is facing critical challenges in improving latency and scalability. Therefore, multiple computer architecture techniques have been proposed to concur this “memory wall”. First, non-volatile memories (NVM) are being used to replace DRAM to achieve low idle power and long data retention time. Second, processing-near-memory (PNM) or processing-in-memory (PIM) places compute logic near or within main memory to reduce costly data movements. Third, 3D stacking vertically stacks multiple memory and logic dies to improve memory capacity, offering a new dimension of scalability in chip design. In this talk, we will examine state-of-the-art architecture designs that integrate multiple above innovations, e.g., NVM + PIM (processing NN in NVM) and PNM + 3D (HBM/HMC-based designs for deep learning). In particular, we will introduce a new 3D-stacked processing-in-NVM framework that has greatly improved NN processing throughput and 3D-aware model mapping and data flow management.