# Timeloop



Angshuman Parashar Yannan Nellie Wu Po-An Tsai Vivienne Sze Joel S. Emer NVIDIA MIT NVIDIA MIT NVIDIA, MIT

## **ISCA** Tutorial

May 2020



Massachusetts Institute of Technology



#### Resources

#### • Tutorial Related

- Tutorial Website: <u>http://accelergy.mit.edu/isca20\_tutorial.html</u>
- Tutorial Docker: <u>https://github.com/Accelergy-Project/timeloop-accelergy-tutorial</u>
  - Various exercises and example designs <u>and</u> environment setup for the tools
- Other
  - Infrastructure Docker: <u>https://github.com/Accelergy-Project/accelergy-timeloop-infrastructure</u>
    - Pure environment setup for the tools without exercises and example designs
  - Open Source Tools
    - Accelergy: <u>http://accelergy.mit.edu/</u>
    - Timeloop: <u>https://github.com/NVlabs/timeloop</u>
  - Papers:
    - A. Parashar, et al. "Timeloop: A systematic approach to DNN accelerator evaluation," ISPASS, 2019.
    - Y. N. Wu, V. Sze, J. S. Emer, "An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs," ISPASS, 2020.
    - Y. N. Wu, J. S. Emer, V. Sze, "Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs," ICCAD, 2019.



## **Domain-Specific Accelerators Improve Energy Efficiency**

Data and computation-intensive applications are power hungry



We must quickly evaluate energy efficiency of arbitrary potential designs in the large design space



## **From Architecture Blueprints to Physical Systems**



- How many levels in the memory hierarchy?
- How large are the memories at each level?
- How many PEs are there?
- What are the X and Y dimensions of the PE array?



### **From Architecture Blueprints to Physical Systems**





## **Physical-Level Energy Estimation and Design Exploration**



## **Physical-Level Energy Estimation and Design Exploration**



## **Accelergy Overview**

- Accelergy Infrastructure
  - Performs architecture-level estimations to enable rapid design space exploration
  - Supports modeling of diverse architectures with various underlying technologies
  - Improves estimation accuracy by allowing fine-grained classification of components' runtime behaviors
  - Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs



### **Architecture-Level Energy Estimation and Design Exploration**



#### Fast design space exploration

- Short simulations on architecture-level components
- Short turn-around time for each potential design



#### **Connect Back to Timeloop**

Timeloop requires energy reference tables (ERTs) to evaluate the energy efficiency of a potential mapping



Phir

• Accelerator-Specific Estimators: Aladdin[Shao, ISCA2014], fixed-cost[Yang, Asilomar2017]





• Accelerator-Specific Estimators: Aladdin[Shao, ISCA2014], fixed-cost[Yang, Asilomar2017]



Energy Estimator



• Accelerator-Specific Estimators: Aladdin[Shao, ISCA2014], fixed-cost[Yang, Asilomar2017]



Energy Estimator

Action Counts Comes from a performance model (e.g., cycle accurate simulator)



• Accelerator-Specific Estimators: Aladdin[Shao, ISCA2014], fixed-cost[Yang, Asilomar2017]



**Energy Estimator** 

Action Counts Comes from a performance model (e.g., cycle accurate simulator)

• Accelerator-Specific Estimators: Aladdin[Shao, ISCA2014], fixed-cost[Yang, Asilomar2017]



**Energy Estimator** 

Action Counts

## **Accelergy Overview**

- Accelergy Infrastructure
  - Performs architecture-level estimations to enable rapid design space exploration
  - Supports modeling of diverse architectures with various underlying technologies
  - Improves estimation accuracy by allowing fine-grained classification of components runtime behaviors
  - Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs





MiT 📀

Available at <a href="http://accelergy.mit.edu/">http://accelergy.mit.edu/</a>



Available at <a href="http://accelergy.mit.edu/">http://accelergy.mit.edu/</a> 18





Simple Example Estimation Plug-in

| class | tech. | width | depth | action  | energy (pJ) | area ( $um^2$ ) |
|-------|-------|-------|-------|---------|-------------|-----------------|
| MAC   | 45nm  | 16b   | N/A   | compute | 5           | 0.4             |
| SRAM  | 45nm  | 64b   | 1024  | access  | 100         | 20              |
| SRAM  | 45nm  | 16b   | 256   | access  | 10          | 2               |



Available at <u>http://accelergy.mit.edu/</u> <sup>19</sup>



Simple Example Estimation Plug-in

| class | tech. | width | depth | action  | energy (pJ) | area ( $um^2$ ) |
|-------|-------|-------|-------|---------|-------------|-----------------|
| MAC   | 45nm  | 16b   | N/A   | compute | 5           | 0.4             |
| SRAM  | 45nm  | 64b   | 1024  | access  | 100         | 20              |
| SRAM  | 45nm  | 16b   | 256   | access  | 10          | 2               |



Available at <a href="http://accelergy.mit.edu/">http://accelergy.mit.edu/</a> 20



Simple Example Estimation Plug-in

| class | tech. | width | depth | action  | energy (pJ) | area ( $um^2$ ) |
|-------|-------|-------|-------|---------|-------------|-----------------|
| MAC   | 45nm  | 16b   | N/A   | compute | 5           | 0.4             |
| SRAM  | 45nm  | 64b   | 1024  | access  | 100         | 20              |
| SRAM  | 45nm  | 16b   | 256   | access  | 10          | 2               |



Available at <u>http://accelergy.mit.edu/</u><sup>21</sup>



Simple Example Estimation Plug-in

| class | tech. | width | depth | action  | energy (pJ) | area ( $um^2$ ) |
|-------|-------|-------|-------|---------|-------------|-----------------|
| MAC   | 45nm  | 16b   | N/A   | compute | 5           | 0.4             |
| SRAM  | 45nm  | 64b   | 1024  | access  | 100         | 20              |
| SRAM  | 45nm  | 16b   | 256   | access  | 10          | 2               |



Available at <u>http://accelergy.mit.edu/</u><sup>22</sup>



access

access

100

10

20

2

SRAM

SRAM

11117

45nm

45nm

64b

16b

1024

256

#### Available at <u>http://accelergy.mit.edu/</u> <sup>23</sup>



Available at <u>http://accelergy.mit.edu/</u><sup>24</sup>



## **Accelergy Overview**

#### Accelergy Infrastructure

- Performs architecture-level estimations to enable rapid design space exploration
- Supports modeling of diverse architectures with various underlying technologies
- Improves estimation accuracy by allowing fine-grained classification of components runtime behaviors
- Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs



## **Plug-ins for Fine-Grain Action Energy Estimation**

- External energy/area models that accurately reflect the properties of a macro
  - e.g., multiplier with zero-gating
- Energy characterizations of the zero-gated multiplier (normalized to idle)



| name       | tech. | width | action             | energy |
|------------|-------|-------|--------------------|--------|
| multiplier | 65nm  | 16b   | random<br>multiply | 23.0   |
| multiplier | 65nm  | 16b   | reused<br>multiply | 16.8   |
| multiplier | 65nm  | 16b   | gated<br>multiply  | 1.3    |



## **Plug-ins for Fine-Grain Action Energy Estimation**

- External energy/area models that accurately reflect the properties of a macro
  - e.g., multiplier with zero-gating
- Energy characterizations of the zero-gated multiplier



With the characterization provided in the plug-in, we can see significant energy savings for sparse workloads

## **Plug-ins for Fine-Grain Action Energy Estimation**

• External energy/area models that accurately reflect the properties of a macro



With the characterization provided in the plug-in, we can see significant energy savings for sparse workloads

## **Plug-ins for Fine-Grain Action Energy Estimation Plug-ins**

- External energy/area models that accurately reflect the properties of a macro
  - e.g., register file with various access types



With the characterization provided in the plug-in,

we can see accurate characterization for memories with different access patterns

## **Flexibly Model Various Primitive Components**

Use energy estimation plug-ins to characterize primitive components







- Practical designs involve many more primitive components
  - Example: smartbuffer a storage unit with preprogrammed address generators (AGs)
    - Domain-specific applications have predictable storage access patterns, allowing offline access stream generation, e.g., general matrix multiply applications.



- buffer belongs to SRAM class
- AGs belongs to *adder* class

Practical designs involve many more primitive components



Simple Architecture Design

Let's construct a more practical design!

Practical designs involve many more primitive components



Practical Architecture Design

Let's construct a more practical design!

Practical designs involve many more primitive components



Practical Architecture Design

Let's construct a more practical design!

37





## **Existing Architecture-Level Energy Estimators**

- Architecture-level energy modeling for general purpose processors
  - Wattch[Brooks, ISCA2000], McPAT[Li, MICRO2009], GPUWattch[Leng, ISCA2013],
    PowerTrain[Lee, ISLPED2015]

CPU/GPU-Centric Architecture Model





Use a fixed set of compound components **«** to represent the architecture

Components that can be decomposed into lower level components



## **Existing Architecture-Level Energy Estimators**

- Architecture-level energy modeling for general purpose processors
  - Wattch[Brooks, ISCA2000], McPAT[Li, MICRO2009], GPUWattch[Leng, ISCA2013],

PowerTrain[Lee, ISLPED2015]

11117



The fixed set of compound components is not sufficient to describe various optimizations in the diverse accelerator design space

## **Accelergy Overview**

#### Accelergy Infrastructure

- Performs architecture-level estimations to enable rapid design space exploration
- Supports modeling of diverse architectures with various underlying technologies
- Improves estimation accuracy by allowing fine-grained classification of components runtime behaviors
- Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs



 Allow succinct architecture description with user-defined compound component classes





- Allow succinct architecture description with user-defined compound component classes
- Allow user-defined compound component hardware structure using primitive components



# Compound component description

- Allow succinct architecture description with user-defined compound component classes
- Allow user-defined compound component hardware structure using primitive components
- Allow user-defined compound component actions using primitive component actions

#### **Compound component description**



• Flexible and succinct architecture representations using user-defined compound components



Flexible and succinct action counts using compound actions



#### **Accelergy High-Level Infrastructure**



### **Accelergy High-Level Infrastructure**



## **Accelergy Overview**

- Accelergy Infrastructure
  - Performs architecture-level estimations to enable rapid design space exploration
  - Supports modeling of diverse architectures with various underlying technologies
  - Improves estimation accuracy by allowing fine-grained classification of components runtime behaviors
  - Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs



## **Energy Validation on Eyeriss [Chen, ISSCC 2016]**

- Experimental Setup:
  - Workload: Alexnet weights & ImageNet input feature maps
  - Ground Truth: Energy obtained from post-layout simulations



## **Energy Validation on Eyeriss [Chen, ISSCC 2016]**

- Experimental Setup:
  - Workload: Alexnet weights & ImageNet input feature maps
  - Ground Truth: Energy obtained from post-layout simulations



## **Energy Validation on Eyeriss [Chen, ISSCC 2016]**

- Total energy estimation is 95% accurate of the post-layout energy.
- Estimated relative breakdown of the important units in the design is <u>within 8%</u> of the post-layout energy.



## **PE Array Energy Breakdown**

Comparisons with existing work: Aladdin[Shao, ISCA2014]

**Energy Breakdown of PEs across the Array** 



Energy impact of sparsity is accurately captured with sparsity-aware estimation plug-ins

## **Accelergy Overview**

- Accelergy Infrastructure
  - Performs architecture-level estimations to enable rapid design space exploration
  - Supports modeling of diverse architectures with various underlying technologies
  - Improves estimation accuracy by allowing fine-grained classification of components runtime behaviors
  - Supports succinct modeling of complicated architectures
- Validation on various accelerator designs
  - 95% accurate on a conventional digital accelerator design
  - Modeling of processing in memory (PIM) based DNN accelerator designs



## **Accelergy Modeling of PIM Architectures**





### **Estimation for PIM Accelerators**



Compound Component Description

### **Estimation for PIM Accelerators**



### **Estimation for PIM Accelerators**



## **Accelergy Modeling of PIM Architectures**

- Parameterizable templates
  - Architecture Template allows architecture parameter sweeping,
    - e.g.,
      - number of PE rows
      - number of PE columns
      - size of global buffer, etc.
  - Component design template allows implementation optimization, e.g.,
    - optimize DAC-based D2A conversion system
    - optimize the design of the flash ADC in the A2D conversion system, etc.



## **Energy Modeling Validation on PIM Design**

- Validation on the ADC-based design proposed in CASCADE [Chou, MICRO2019]
- Design Specs
  - 80 64x64 1-bit Memristor Arrays
  - 1-bit DACs
  - 6-bit ADCs
  - 16-bit data representations
- Workload: VGG Net convolutional layers
- Energy estimation tables: extracted numbers from the paper/cited sources



## **Energy Modeling Validation on PIM Design**

#### **Total Energy Estimation and Breakdown Validation**



#### The architecture is correctly modeled:

- 95% accurate total energy estimation
- tracks the breakdown across different components

Published at [Wu, ISPASS 2020]



## **Energy Modeling Validation on PIM Design**

**Energy Breakdown Across VGG Convolutional Layers** 



Captures the energy breakdown of each convolutional layer

Published at [Wu, ISPASS 2020]

### **Summary**

- Accelergy is an architecture-level energy estimator that
  - Accelerates accelerator design space exploration
  - Provides flexibility to
    - Describe and evaluate a wide range of accelerator designs
    - Support different technologies with user defined plug-ins, e.g., CMOS, RRAM, etc.
  - Achieves high accuracy energy estimations
    - 95% accurate for the Eyeriss accelerator and Cascade PIM accelerator
- The Timeloop-Accelergy system allows fast explorations on
  - High-level architecture properties, e.g., PE array size
  - Lower-level implementation optimizations on the components in the design, e.g., storage designs with local address generation

Acknowledgement: DARPA, Facebook, MIT Presidential Fellowship

#### Resources

#### • Tutorial Related

- Tutorial Website: <u>http://accelergy.mit.edu/isca20\_tutorial.html</u>
- Tutorial Docker: <u>https://github.com/Accelergy-Project/timeloop-accelergy-tutorial</u>
  - Various exercises and example designs <u>and</u> environment setup for the tools
- Other
  - Infrastructure Docker: <u>https://github.com/Accelergy-Project/accelergy-timeloop-infrastructure</u>
    - Pure environment setup for the tools without exercises and example designs
  - Open Source Tools
    - Accelergy: <u>http://accelergy.mit.edu/</u>
    - Timeloop: <u>https://github.com/NVlabs/timeloop</u>
  - Papers:
    - A. Parashar, et al. "Timeloop: A systematic approach to DNN accelerator evaluation," ISPASS, 2019.
    - Y. N. Wu, V. Sze, J. S. Emer, "An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs," ISPASS, 2020.
    - Y. N. Wu, J. S. Emer, V. Sze, "Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs," ICCAD, 2019.

