Timeloop/Accelergy Tutorial @ MICRO2019

Timeloop/Accelergy Tutorial:
Tools for Evaluating Deep Neural Network Accelerator Designs


Organizers


Register for the tutorial here


Overview

Deep neural networks have emerged as the key approach for solving a wide range of complex problems. To provide high performance and energy efficiency to this class of computation and memory-intensive applications, many DNN accelerators have been proposed in recent years. In order to systematically evaluate arbitrary DNN accelerator designs, we need to have an infrastructure that is able to:

Flexibly describe a wide range of architectures. Unlike traditional architectures that have similar architectures but various microarchitectures, DNN accelerators’ architectures vary significantly from one to another. Therefore, the traditional way of using a fixed set of architecture components to describe the design becomes infeasible for describing DNN accelerators. Since being able to describe the architecture is the initial step for any architecture evaluations, it is important for the infrastructure to be able to have the flexibility to describe a wide range of DNN architecture designs.

Find optimal mappings a wide range of workloads onto the architecture. Unlike traditional architectures that have an ISA that allows a workload to be represented with a single compiled program, each DNN accelerator uniquely exposes many configurable hardware settings and requires the designer to find a way for scheduling operations and moving data for each workload, i.e., find a mapping for each workload. Since different mappings result in widely varying performance and energy efficiency and different workloads have different optimal mappings, finding optimal mappings is essential for evaluating a DNN architecture.

Accurately predict energy for a range of accelerator designs. Since accelerators are designed for different applications (e.g., sparse DNNs vs. dense DNNs), different accelerator design consists of different hardware components. Furthermore, different accelerator designs also implement different hardware optimizations that will result in drastically different energy consumption for the components. Therefore, it is important for the infrastructure to accurately model the energy consumption of all the components involved in the accelerator design space for evaluating a DNN architecture.

Handle a wide range of technologies. Recently, many new technologies have emerged to help improve the performance and energy efficiency of accelerator designs, such as CMOS scaling down to 7nm, the RRAM in-memory computations, and the optical computations. Accelerator designs under different technologies have different performance and energy efficiency even if they have similar architecture and run the same workload under the same mapping. Therefore, to perform fair evaluations of accelerator designs, it is important for the infrastructure to be flexible enough to accurately reflect the technology-dependent costs.

In this tutorial, we will present two integrated tools that enable rapid evaluation of DNN accelerators:

  • Mapping exploration with Timeloop [paper] Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware architectures. With the aid from accurate energy estimators, Timeloop generates an accurate processing speed and energy efficiency characterization for any given workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture.
  • Energy estimation with Accelergy [paper] [website] Accelergy serves as the energy estimator that provides flexible energy estimation to facilitate Timeloop’s energy characterization. Accelergy allows specifications of arbitrary accelerator architecture designs comprised of user-defined design-specific high-level compound components and user-defined low-level primitive components, which can be characterized by third-party energy estimation plug-ins to reflect the technology-dependent characteristics of the design.

Docker Installation

Since this tutorial invloves hands-on exercises and labs, please pre-install the docker and bring your laptop to the session. To install the docker:
  1. If you do not have the docker app installed already, please install docker (community edition)
  2. (Windows users - please manually turn on virtualization via BIOS settings.)
  3. Go to our tutorial repo and get a copy of the single file docker-compose.yaml
  4. Make an empty directory and place the docker-compose.yaml inside.
  5. To pull the newest docker image, run command docker-compose pull
  6. Run command docker-compose run --rm exercises
  7. Once invoked, you can cd ./exercises/timeloop/00-model-conv1d-1level and timeloop-model */*.yaml to run a test example.

Tutorial Schedule

Time Agenda
1:00 - 2:30PM Timeloop Lecture and Exercise
2:30 - 3:00PM Timeloop Free Lab Time
3:00 - 3:30PM Coffee Break
3:30 - 4:30PM Accelergy Lecture and Exercise
4:30 - 5:00 PM Accelergy + Timeloop Free Lab Time