Timeloop/Accelergy Tutorial

Organizers

Tutorial Infrastructure Installation Instructions HERE

This tutorial involves hands-on exercises and labs, as well as some baseline designs if you would like to have a deeper dive into the Timeloop Accelergy system. In order to follow our interactive live session better, please follow the instructions for installing the necessary infrastructure and exercises. It's even better if you can go through the exercises before the tutorial session and be ready with questions.

Overview

Deep neural networks have emerged as the key approach for solving a wide range of complex problems. To provide high performance and energy efficiency to this class of computation and memory-intensive applications, many DNN accelerators have been proposed in recent years. In order to systematically evaluate arbitrary DNN accelerator designs, we need to have an infrastructure that is able to:

Flexibly describe a wide range of architectures. Unlike traditional architectures that have similar architectures but various microarchitectures, DNN accelerators’ architectures vary significantly from one to another. Therefore, the traditional way of using a fixed set of architecture components to describe the design becomes infeasible for describing DNN accelerators. Since being able to describe the architecture is the initial step for any architecture evaluations, it is important for the infrastructure to be able to have the flexibility to describe a wide range of DNN architecture designs.

Find optimal mappings for a wide range of workloads onto the architecture. Unlike traditional architectures that have an ISA that allows a workload to be represented with a single compiled program, each DNN accelerator uniquely exposes many configurable hardware settings and requires the designer to find a way for scheduling operations and moving data for each workload, i.e., find a mapping for each workload. Since different mappings result in widely varying performance and energy efficiency and different workloads have different optimal mappings, finding optimal mappings is essential for evaluating a DNN architecture.

Accurately predict energy for a range of accelerator designs. Since accelerators are designed for different applications (e.g., sparse DNNs vs. dense DNNs), different accelerator design consists of different hardware components. Furthermore, different accelerator designs also implement different hardware optimizations that will result in drastically different energy consumption for the components. Therefore, it is important for the infrastructure to accurately model the energy consumption of all the components involved in the accelerator design space for evaluating a DNN architecture.

Handle a wide range of technologies. Recently, many new technologies have emerged to help improve the performance and energy efficiency of accelerator designs, such as CMOS scaling down to 7nm, the RRAM in-memory computations, and the optical computations. Accelerator designs under different technologies have different performance and energy efficiency even if they have similar architecture and run the same workload under the same mapping. Therefore, to perform fair evaluations of accelerator designs, it is important for the infrastructure to be flexible enough to accurately reflect the technology-dependent costs.

In this tutorial, we will present two integrated tools that enable rapid evaluation of DNN accelerators:

Mapping exploration with Timeloop [paper] Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware architectures. With the aid from accurate energy estimators, Timeloop generates an accurate processing speed and energy efficiency characterization for any given workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture.
Energy estimation with Accelergy [paper] [website] Accelergy serves as the energy estimator that provides flexible energy estimation to facilitate Timeloop’s energy characterization. Accelergy allows specifications of arbitrary accelerator architecture designs comprised of user-defined design-specific high-level compound components and user-defined low-level primitive components, which can be characterized by third-party energy estimation plug-ins to reflect the technology-dependent characteristics of the design.

Video Recordings

Timeloop/Accelergy Background Lectures

Timeloop/Accelergy Hands-on Exercises

Open Source Code

Related Papers

V. Sze, Y.-H. Chen, T.-J. Yang, J. S. Emer, Efficient Processing of Deep Neural Networks, Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2020. [ Pre-order book here ] [ Flyer ]
Y. N. Wu, V. Sze, J. S. Emer, “An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs,” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2020 [ paper ]
Y. N. Wu, J. S. Emer, V. Sze, “Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs,” International Conference on Computer Aided Design (ICCAD), November 2019 [ paper ] [ slides ]
A. Parashar, et al. "Timeloop: A systematic approach to DNN accelerator evaluation," IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019.[ paper ]