Preliminary Program

DAY 1 - August 1, Monday

Starting 8:00

Registration in PHO 2nd floor

08:00 – 08:30

Breakfast (PHO906)

08:30 – 09:00

Welcome by General and Program Co-Chairs (PHO906)

09:00 – 10:00

Keynote Talk 1: Dr Kaushik Roy, Purdue University, Enabling Energy-efficient Learning through Co-design of Algorithms and Hardware (PHO906)

10:00 – 10:30

Coffee Break (PHO 2nd floor)

10:30 – 11:45

Session 1 Energy-efficient and robust neural networks (PHO203)

Session 2 Novel computing models (PHO205)

12:00 – 13:30

Lunch (PHO906)

13:30 – 14:45

Session 3 Efficient and intelligent memories (PHO203)

Session 4 Circuit design and methodology for IoT applications (PHO205)

14:45 – 15:15

Coffee Break (PHO 2nd floor)

15:15 – 16:30

Special Session 1: Efficient and Automated Design of Future Intelligent Systems -- Speakers: Dr. Song Han and Dr. Jason Cong (PHO906)


Reception at BU Castle
225 Bay State Road
Boston, MA 02215




Starting 8:00

Registration in PHO 2nd floor

08:00 – 08:30

Breakfast (PHO906)

08:30 – 09:00

Welcome by General and Program Co-Chairs

09:00 – 10:00: Keynote Talk 1

Dr Kaushik Roy, Purdue University

Enabling energy efficient learning through co-design of algorithms and hardware

Speaker Bio: Kaushik Roy is the Edward G. Tiedemann, Jr., Distinguished Professor of Electrical and Computer Engineering at Purdue University and Director of the Center for Brain-Inspired Computing (C-BRIC). He received his PhD from University of Illinois at Urbana-Champaign in 1990 and joined the Semiconductor Process and Design Center of Texas Instruments, Dallas, where he worked for three years on FPGA architecture development and low-power circuit design. His current research focuses on algorithms, circuits and architecture for energy-efficient cognitive computing, computing models and neuromorphic devices. Roy has supervised more than 85 PhD dissertations, and his students are well-placed in universities and industry. He is the co-author of “Low Power CMOS VLSI Design,” both the first and second editions, published by John Wiley & McGraw Hill. Roy has received a National Science Foundation Career Development Award, IBM Faculty Partnership Award, ATT/Lucent Foundation Award, Semiconductor Research Corporation Technical Excellence Award, SRC Inventors Award, Purdue College of Engineering Research Excellence Award, Humboldt Research Award, IEEE Circuits and Systems Society Technical Achievement Award (Charles Desoer Award), Distinguished Alumnus Award from the Indian Institute of Technology, and the Semiconductor Research Corporation Aristotle Award in 2015. He also has served as a Department of Defense Vannevar Bush Faculty Fellow; Global Foundries Visiting Chair at National University of Singapore and Fulbright-Nehru Distinguished Chair.

Talk Abstract: Advances in machine learning, notably deep learning, have led computers to match or surpass human performance in several cognitive tasks including vision, speech and natural language processing. However, implementation of neural algorithms in conventional "von-Neumann" architectures are several orders of magnitude more area and power expensive than the biological brain. Hence, we need fundamentally new approaches to sustain the exponential growth in performance at high energy-efficiency. Exploring the new paradigm of computing necessitates a multi-disciplinary approach: exploration of new learning algorithms inspired from neuroscientific principles, developing network architectures best suited for such algorithms, new hardware techniques to achieve orders of improvement in energy consumption, and nanoscale devices that can closely mimic the neuronal and synaptic operations. In this talk, I will present recent developments on spike-based learning to achieve high energy efficiency with accuracy comparable to that of standard analog deep-learning techniques. Input coding from DVS cameras has been used to develop energy efficient hybrid SNN/ANN networks for optical flows, gesture recognition, and language translation. Complementary to the above device efforts, we are exploring different local/global learning algorithms including stochastic learning with one-bit synapses that greatly reduces the storage/bandwidth requirement while maintaining competitive accuracy, and adaptive online learning that efficiently utilizes the limited memory and resource constraints to learn new information without catastrophically forgetting already learnt data.

10:00 – 10:30

Coffee Break

10:30 – 11:45: Session 1 Energy-efficient and robust neural networks

Chair: Donghwa Shin, Soongsil Univ.

Examining the Robustness of Spiking Neural Networks on Non-ideal Memristive Crossbars (Best paper)
Abhiroop Bhattacharjee, Youngeun Kim, Abhishek Moitra and Priyadarshini Panda

Identifying Efficient Dataflows for Spiking Neural Networks
Deepika Sharma, Aayush Ankit and Kaushik Roy

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators
Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili and Massoud Pedram




10:30 – 11:45: Session 2 Novel computing models

Chair: Priyadarshini Panda, Yale

QMLP: An Error-Tolerant Nonlinear Quantum MLP Architecture Using Parameterized Two-Qubit Gates
Cheng Chu, Nai-Hui Chia, Lei Jiang and Fan Chen

Design and Logic Synthesis of a Scalable, Efficient Quantum Number Theoretic Transform
Chao Lu, Shamik Kundu, Abraham Kuruvila, Supriya Margabandhu Ravichandran and Kanad Basu

A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC for 4-Bit Input Processing
Joonhyung Kim, Kyeongho Lee and Jongsun Park,

12:00 – 13:30


13:30 – 14:45: Session 3 Efficient and intelligent memories

Chair: Kshitij Bhardwaj, LLNL

FlexiDRAM: A Flexible in-DRAM Framework to Enable Parallel General-Purpose Computation
Ranyang Zhou, Arman Roohi, Durga Misra and Shaahin Angizi

Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices
Ya-Hui Yang, Shuo-Han Chen and Yuan-Hao Chang

Exploiting Successive Identical Words and Differences with Dynamic Bases for Effective Compression in Non-Volatile Memories
Swati Upadhyay, Arijit Nath and Hemangee Kapoor

13:30 – 14:45: Session 4 Circuit design and methodology for IoT applications

Chair: Hun-Seok Kim, UMich

HOGEye: Neural Approximation of HOG Feature Extraction in RRAM-Based 3D-Stacked Image Sensors (Best paper)
Tianrui Ma, Weidong Cao, Fei Qiao, Ayan Chakrabarti and Xuan Zhang

A Bit-level Sparsity-aware SAR ADC with Activity-scaling for AIoT Applications
Ruicong Chen, H. T. Kung, Anantha Chandrakasan and Hae-Seung Lee

Analysis of the Effect of Hot Carrier Injection in an Integrated Inductive Voltage Regulator
Shida Zhang, Nael Mizanur Rahman, Venkata Chaitanya Krishna Chekuri, Carlos Tokunaga and Saibal Mukhopadhyay

14:45 – 15:15

Coffee Break

15:15 – 16:30: Special Session 1: Efficient and Automated Design of Future Intelligent Systems

Chair: Amit Agarwal, Intel

Efficient Deep Learning Computing with Sparsity
Dr. Song Han, MIT

Talk Abstract: Modern deep learning requires a massive amount of computational resources, carbon footprint, and engineering efforts, making on-device machine learning challenging; retraining the model on-device is even more difficult. We make machine learning efficient by utilizing sparsity. We’ll first present neural architecture search techniques by sparsely sampling different paths (proxylessNAS), searching sparsely activated subnet works from the once-for-all network (OFA), and MCUNet that brings AI to micro-controllers. Then I’ll describe TinyTL and on-device transfer learning with sparse layer, sparse tensor update, fitting 256KB memory. Next I’ll talk about improving the efficiency by utilizing the temporal sparsity for videos, spatial sparsity for point cloud, and token level sparsity for NLP. I’ll conclude by hardware and system support for sparsity (TorchSparse, SpAtten, SpArch, PointAcc). The presentation will highlight full-stack optimizations, including the neural network topology, inference library, and the hardware architecture, which allows a larger design space to unearth the underlying principles for sparsity.

Can We Automate Accelerator Designs?

Dr. Jason Cong, UCLA

Talk Abstract: Domain-specific accelerators (DSAs) has demonstrated significant performance and energy efficiency over general-purpose CPUs. Ideally, every programmer should offload the compute-intensive portion of his/her program to one or a set of DSAs, either pre-implemented in ASICs or synthesized on demand on programmable fabrics, such as FPGAs. So, a natural question is if we can automate the accelerator designs from the software specification. High-level synthesis (HLS) made an important progress in this direction, but it still requires the programmer to provide various pragmas, such as loop unroll, pipelining, and tiling, to define the microarchitecture of the accelerator, which is a challenging task to most software programmer. In this talk, we present our latest research on automated accelerator synthesis on FPGAs, including microarchitecture guided optimization, such as automated systolic array generation, and more general source-to-source transformation based on graph-based neural networks and meta learning.

18:00 - 20:30


DAY 2 - August 2, Tuesday

Starting 8:15

Registration in PHO 2nd floor

08:15 – 09:00

Breakfast (PHO906)

09:00 – 10:00

Keynote Talk 2: Dr. Vijay Janapa Reddi, Harvard University, Tiny Machine Learning: A System-level Perspective (PHO906)

10:00 – 10:30

Coffee Break (PHO 2nd floor)

10:30 – 11:45

Session 5 Advances in hardware security (PHO203)

Session 6 Novel physical design methodologies (PHO205)

12:00 – 13:30

Lunch and Poster Session (PHO906)

13:30 – 14:45

Session 7 Enablers for energy-efficient platforms (PHO203)

Session 8 System design for energy-efficiency and resiliency (PHO205)

14:45 – 15:15

Coffee Break (PHO 2nd floor)

15:15 – 16:30

Special session 2: What's Next Beyond CMOS? -- Speakers: Dr. Massoud Pedram and Dr. Jing Li (PHO906)

16:30 – 17:00

Awards and Closing Remarks (PHO906)

Starting 8:15

Registration in PHO 2nd floor

08:15 – 09:00

Breakfast (PHO906)

09:00 – 10:00: Keynote talk 2

Dr. Vijay Janapa Reddi, Harvard University

Tiny Machine Learning: A System-level Perspective

Speaker Bio: Vijay Janapa Reddi is an Associate Professor at Harvard University, Inference Co-chair for MLPerf, and a founding member of MLCommons, a nonprofit ML (Machine Learning) organization aiming to accelerate ML innovation. He also serves on the MLCommons board of directors. Before joining Harvard, he was an Associate Professor at The University of Texas at Austin in the Department of Electrical and Computer Engineering. His research interests include computer architecture and runtime systems, specifically in the context of autonomous machines and mobile and edge computing systems. Dr. Janapa Reddi is a recipient of multiple honors and awards, including the National Academy of Engineering (NAE) Gilbreth Lecturer Honor (2016), IEEE TCCA Young Computer Architect Award (2016), Intel Early Career Award (2013), Google Faculty Research Awards (2012, 2013, 2015, 2017, 2020), Best Paper at the 2020 Design Automation Conference (DAC), Best Paper at the 2005 International Symposium on Microarchitecture (MICRO), Best Paper at the 2009 International Symposium on High Performance Computer Architecture (HPCA), IEEE’s Top Picks in Computer Architecture awards (2006, 2010, 2011, 2016, 2017) and he has been inducted into the MICRO and HPCA Hall of Fame (in 2018 and 2019, respectively). He received a Ph.D. in computer science from Harvard University, M.S. from the University of Colorado at Boulder and B.S from Santa Clara University.

Talk Abstract: Tiny machine learning (TinyML) is a fast-growing discipline that blends machine learning techniques with low-cost embedded technology. TinyML allows for on-device sensor data analysis (vision, audio, IMU, and so on) while utilizing minimal power. Processing data close to the sensor enables a variety of unique, always-on ML use-cases that save bandwidth, latency, and energy while improving responsiveness and privacy. This session introduces the TinyML vision and illustrates some of the amazing applications made possible by TinyML. Despite the excitement, we must overcome various hardware and software challenges, as well as data privacy concerns. On-device ML constraints such as limited memory and storage, communication barriers, extreme hardware heterogeneity, software fragmentation, and a lack of relevant and commercially viable large-scale TinyML datasets pose a significant barrier to realizing TinyML's full potential for a more innovative and sustainable low-power ecosystem. Furthermore, the lack of secure protocols at the lowest level of hardware raises concerning questions such as, "Are TinyML devices spying on us?" The talk addresses the potential for addressing many of these challenges and ushering in a new era for TinyML-based "Machine Learning Sensors (ML Sensors)." The talk finishes with reasons why the future of machine learning is tiny and bright.

10:00 – 10:30

Coffee Break

10:30 – 11:45: Session 5 Advances in hardware security

Chair: Aporva Amarnath, IBM

Security Implications of Energy Management in System-on-Chips
Dr. Saibal Muhkopadhyay, Georgia Tech

RACE: RISCV-based Unified Homomorphic Encryption/decryption ACcelerator on the Edge
Zahra Azad, Guowei Yang, Rashmi Agrawal, Daniel Petrisko, Michael Taylor and Ajay Joshi

Sealer: In-SRAM AES for High-Performance and Low-Overhead Memory Encryption
Jingyao Zhang, Hoda Naghibijouybari and Elaheh Sadredini

10:30 – 11:45: Session 6 Novel physical design methodologies

Chair: Aatmesh Shrivastava, Northeastern

Hier-3D: A Hierarchical Physical Design Methodology for Face-to-Face Bonded 3D ICs (Best paper)
Anthony Agnesina, Moritz Brunion, Alberto Garcia-Ortiz, Francky Catthoor, Dragomir Milojevic, Manu Komalan, Matheus Cavalcante, Samuel Riedel, Luca Benini and Sung Kyu Lim

A Study On Optimizing Pin Accessibility of Standard Cells in the Post-3 nm Node
Jae Hoon Jeong, Jonghyun Ko and Taigon Song

Improving Performance and Power by Co-Optimizing Middle-of-Line Routing, Pin Pattern Generation, and Contact over Active Gates in Standard Cell Layout Synthesis
Sehyeon Chung, Jooyeon Jeong and Taewhan Kim



12:00 – 13:30: Lunch and Poster Session


CANOPY: A CNFET-based Process Variation Aware Systolic DNN Accelerator

Cheng Chu, Dawen Xu, Ying Wang and Fan Chen


Evaluation of Spiking Neural Networks

Abinand Nallathambi, Sanchari Sen, Anand Raghunathan and Nitin Chandrachoodan, Layerwise Disaggregated


Tightly Linking 3D via Allocation towards Routing Optimization for Monolithic 3D ICs

Suwan Kim, Sehyeon Chung, Taewhan Kim and Heechun Park


Enabling Capsule Networks at the Edge through Approximate Softmax and Squash Operations

Alberto Marchisio, Beatrice Bussolino, Edoardo Salvati, Maurizio Martina, Guido Masera and Muhammad Shafique


Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained Deep Neural Networks

Matteo Risso, Alessio Burrello, Luca Benini, Enrico Macii, Massimo Poncino and Daniele Jahier Pagliari


Visible Light Synchronization for Time-Slotted Energy-Aware Transiently-Powered Communication

Alessandro Torrisi, Maria Doglioni, Kasim Sinan Yildirim and Davide Brunelli


Directed Acyclic Graph-based Neural Networks for Tunable Low-Power Computer Vision
Abhinav Goel, Caleb Tung, Nick Eliopoulos, Xiao Hu, George K. Thiruvathukal, James C. Davis and Yung-Hsiang Lu


Energy Efficient Cache Design with Piezoelectric FETs

Reena Elangovan, Ashish Ranjan, Niharika Thakuria, Sumeet Gupta and Anand Raghunathan


Predictive Model Attack for Embedded FPGA Logic Locking

Prattay Chowdhury, Chaitali Sathe and Benjamin Carrion Schaefer


(Design Contest Poster) A Low-Power Deep Learning-Based Dense RGB-D Data Acquisition with Sensor Fusion and 3-D Perception SoC

Dongseok Im, Gwangtae Park, Zhiyong Li, Junha Ryu, Sanghoon Kang, Donghyeon Han, Jinsu Lee, Wonhoon Park, Hankyul Kwon, Hoi-Jun Yoo


(Design Contest Poster) Making Lane Detection Efficient for Autonomous Model Cars

Anthony Song, Riley Francis, Kanishk Tihaiya, Jiangwei Wang, Shanglin Zhou, Fei Miao, Caiwen Ding

13:30 – 14:45: Session 7 Enablers for energy-efficient platforms

Chair: Xue Lin, Northeastern

Neural Contextual Bandits Based Dynamic Sensor Selection for Low-Power Body-Area Networks
Berken Utku Demirel, Luke Chen and Mohammad Al Faruque

3D IC Tier Partitioning of Memory Macros: PPA vs. Thermal Tradeoffs
Lingjun Zhu, Nesara Eranna Bethur, Yi-Chen Lu, Youngsang Cho, Yunhyeok Im and Sung Kyu Lim

A Domain-Specific System-On-Chip Design for Energy Efficient Wearable Edge AI Applications
Yigit Tuncel, Anish Krishnakumar, Aishwarya Lekshmi Chithra, Younghyun Kim and Umit Ogras

13:30 – 14:45: Session 8 System design for energy-efficiency and resiliency

Chair: Marisa Lopez Vallejo, UPM

SACS: A Self-Adaptive Checkpointing Strategy for Microkernel-Based Intermittent Systems Yen-Ting Chen, Han-Xiang Liu, Yuan-Hao Chang, Yu-Pei Liang and Wei-Kuan Shih

Drift-tolerant Coding to Enhance the Energy Efficiency of Multi-Level-Cell Phase-Change Memory
Yi-Shen Chen, Yuan-Hao Chang and Tei-Wei Kuo

A Unified Forward Error Correction Accelerator for Multi-Mode Turbo, LDPC, and Polar Decoding
Yufan Yue, Tutu Ajayi, Xueyang Liu, Peiwen Xing, Zihan Wang, David Theodore Blaauw, Ronald G. Dreslinski and Hun Seok Kim

14:45 – 15:15

Coffee Break

15:15 – 16:30: Special Session 2: What's Next Beyond CMOS?

Chair: Marco Donato, Tufts University

Design Methodologies, Circuits, and Architectures for Superconductor Electronic Systems
Dr. Massoud Pedram, USC

Talk Abstract: The success of CMOS has overshadowed nearly all other solid-state device innovations over recent decades. With fundamental CMOS scaling limits close in sight, the time is now ripe to explore disruptive computing technologies. As a viable post-CMOS computing technology, superconductor electronics can deliver ultra-high performance and energy efficiency at scale, thereby paving the way for seminal innovations in integrated electronics, sustainable exascale computing, and acceleration of machine learning. This talk will start with a review of the superconducting devices and single flux quantum (SFQ) logic circuit technology and will continue with solutions for compact modeling of superconducting devices and interconnect, design of cell libraries, design automation tools, and finally circuits and architectures utilizing the SFQ technology. Experimental results will be presented to demonstrate the efficacy of the state-of-art design methodology and tools targeting superconductor electronics.

Liquid Silicon: A Decade Journey on Memory Centric Computing
Dr. Jing Li, University of Pennsylvania

Talk Abstract: Data movement is proven to be the root cause of the inefficiency in today’s computer systems. As such, numerous PIM architectures have been proposed to reduce data movement cost by performing computation in proximity of memory (subarray, bank, rank, channel) or inside memory (bit-line computing) including digital based, mixed-signal based or analog based. However, the fundamental principle behind most PIMs are still the physically separation of compute unit and memory storage elements and their programming model still requires explicitly data movement in order to perform computation, despite reduced physical distance. In this talk, I will present a new class of PIM, named Liquid Silicon, that 1) are enabled by emerging memory technologies, 2) can expand the taxonomy of PIM architectures to fundamentally eliminate (a majority of or all) data motion through a data-flow compute model and 3) can realize general-purpose computing via primitive memory operations rather than explicit arithmetic compute units without sacrificing efficiency as seen in existing domain-specific accelerator design. I will present the technology challenges, architecture design and compiler support to enable this new class of PIM.

16:30 – 17:00

Awards and Closing Remarks