This tutorial involves hands-on exercises and labs, as well as some baseline designs if you would like to have a deeper dive into the Timeloop Accelergy system. In order to follow our interactive live session better, please follow the instructions for installing the necessary infrastructure and exercises. It's even better if you can go through the exercises before the tutorial session and be ready with questions.
Deep neural networks have emerged as the key approach for solving a wide range of complex problems. To provide high performance and energy efficiency to this class of computation and memory-intensive applications, many DNN accelerators have been proposed in recent years. In order to systematically evaluate arbitrary DNN accelerator designs, we need to have an infrastructure that is able to:
Flexibly describe a wide range of architectures. Unlike traditional architectures that have similar architectures but various microarchitectures, DNN accelerators’ architectures vary significantly from one to another. Therefore, the traditional way of using a fixed set of architecture components to describe the design becomes infeasible for describing DNN accelerators. Since being able to describe the architecture is the initial step for any architecture evaluations, it is important for the infrastructure to be able to have the flexibility to describe a wide range of DNN architecture designs.
Find optimal mappings for a wide range of workloads onto the architecture. Unlike traditional architectures that have an ISA that allows a workload to be represented with a single compiled program, each DNN accelerator uniquely exposes many configurable hardware settings and requires the designer to find a way for scheduling operations and moving data for each workload, i.e., find a mapping for each workload. Since different mappings result in widely varying performance and energy efficiency and different workloads have different optimal mappings, finding optimal mappings is essential for evaluating a DNN architecture.
Accurately predict energy for a range of accelerator designs. Since accelerators are designed for different applications (e.g., sparse DNNs vs. dense DNNs), different accelerator design consists of different hardware components. Furthermore, different accelerator designs also implement different hardware optimizations that will result in drastically different energy consumption for the components. Therefore, it is important for the infrastructure to accurately model the energy consumption of all the components involved in the accelerator design space for evaluating a DNN architecture.
Handle a wide range of technologies. Recently, many new technologies have emerged to help improve the performance and energy efficiency of accelerator designs, such as CMOS scaling down to 7nm, the RRAM in-memory computations, and the optical computations. Accelerator designs under different technologies have different performance and energy efficiency even if they have similar architecture and run the same workload under the same mapping. Therefore, to perform fair evaluations of accelerator designs, it is important for the infrastructure to be flexible enough to accurately reflect the technology-dependent costs.
In this tutorial, we will present two integrated tools that enable rapid evaluation of DNN accelerators: