# An Adaptive Memory Management Strategy Towards Energy Efficient Machine Inference in Event-Driven Neuromorphic Accelerators

#### Saunak Saha, Henry Duwe, and Joseph Zambreno

Iowa State University

Ames, IA, United States

International Conference on Application-specific Systems, Architectures and Processors (ASAP) 2019 IOWA STATE UNIVERSITY Reconfigurable Computing Laboratory

07/16/2019

## Why SNNs over ANNs?



## **Representative SNNs**





(Spiking Convolutional Neural Network)



## **Representative SNNs**



## The CyNAPSE microarchitecture



## Baseline power consumption



## Energy-efficient memory management techniques

## General purpose computing:

- Memory hierarchy and caches
- 🍄 LRU and Random
- Pelady's OPT: Infeasible [3]
- DIP[4], RRIP[5], LIRS[6] : Speculative

## CyNAPSE:

Input queue

- Event-driven simulation: Inherent forward visibility
- Depending on Route latency, Queue length and steady-state Memory bandwidth



## Proposed management scheme



## Proposed management scheme



# Network-adaptive enhancements



Need network -adaptive enhancements to the scheme.



Disallowing allocation of low activity neurons preventing them from thrashing high reuse input neurons



Arming high-activity neurons with an *probable reuse score* based on their reuse distances to prevent being thrashed by low-reuse input neurons

## Experimental infrastructure



## Experimental infrastructure



## Results



**Reconfigurable Computing Laboratory** 

SNNs -> High efficiency, inherently temporal, hybridized for better accuracy

CyNAPSE -> Reconfigurable neural dynamics, reconfigurable topology

Event-driven framework -> forward visibility of memory accesses exploited

Power consumption -> reduced by up to 44% over baseline and 23% over conventinal policies

## References

- [1] E. M. Izhikevich, "Which model to use for cortical spiking neurons?" IEEE transactions on neural networks, vol. 15, no. 5, pp. 1063–1070, 2004.
- [2] F. Jug, "On competition and learning in cortical structures," Ph.D. dissertation, ETH Zurich, 2012.
- [3] L. A. Belady, "A study of replacement algorithms for a virtual-storage computer," IBM Systems journal, vol. 5, no. 2, pp. 78–101, 1966.
  [4] M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer, "Adaptive insertion policies for high performance caching," ACM SIGARCH Computer Architecture News, vol. 35, no. 2, pp. 381–391, 2007.
- [5] S. M. Khan, Y. Tian, and D. A. Jimenez, "Sampling dead block prediction for last-level caches," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2010, pp. 175–186.
- [6] S. Jiang and X. Zhang, "Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance," ACM SIGMETRICS Performance Evaluation Review, vol. 30, no. 1, pp. 31–42, 2002.
- [7] D. F. Goodman and R. Brette, "The brian simulator," Frontiers in neuroscience, vol. 3, p. 26, 2009
- [8] Y. Kim, W. Yang, and O. Mutlu, "Ramulator: A fast and extensible dram simulator," IEEE Computer architecture letters, vol. 15, no. 1, pp. 45–49, 2015.
- [9] K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, and K. Goossens, "Drampower: Open-source dram power & energy estimation tool," URL: http://www. drampower. info, vol. 22, 2012.
- [10] S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, "Cacti-p: Architecture level modeling for sram-based structures with advanced leakage reduction techniques," in Proceedings of the International Conference on Computer-Aided Design. IEEE Press, 2011, pp. 694–701.

# Thank you!

Questions?

# Backup Slides

## Spiking Neuron model



#### *Generalized* Leaky Integrate and Fire (LIF) model:

- Only **7 parameters** need fitting  $(T_m, T_{Na}, T_K, g_l, V_{rest}, V_{reset} and t_{ref})$  instead of **20** for HH model.
- Biologically plausible parameters available in-vitro or in-vivo [2]
- **Reconfigurable:**

-> For conversion to *direct current-integration LIF*: use very small  $T_{Na}$ ,  $T_{K}$  and skip voltage-gated ion-channels

->For conversion to *perfect IF*: use above and arbitrarily large  $T_m$  and/or zero  $g_l$ 

## **Generalized LIF Neuron**



**Reconfigurable Computing Laboratory** 

## **Full-custom Silicon Neuron**

 $\boldsymbol{V}_{\boldsymbol{m}}[\boldsymbol{t}+\boldsymbol{1}] = V_{\boldsymbol{m}}[t] + \boldsymbol{g}_{\boldsymbol{m}}[t] + \boldsymbol{g}_{\boldsymbol{m}}[t] \times (\boldsymbol{v}_{\boldsymbol{m}}[t]) + \boldsymbol{g}_{\boldsymbol{$ 



## High-level system operation



## Proposed management scheme

## Cache replacement scenarios:

- Compulsory miss at warm-up and event read-time
   Hit at read-time
   Capacity/Conflict miss at read-time (read-
- Capacity/Conflict miss at read-time (readtime replacements)
  - -> Conservative Approach
  - -> Aggressive Approach
  - -> Intelligent Approach
    - (reuse threshold)

Compulsory miss at route-time
Hit at route-time
Policy miss at route-time



## Dynamic kernel statistics



#### Layer-wise spike fractions of (from Top) SCWN, SDBN and SCNN



▲ Layer-wise mean reuse distance of neurons in the benchmarks

## Read-time replacements



System Power Consumption: Intelligent < Conservative < Aggressive (for all three benchmarks) System Power Consumption loss: SCWN > SDBN > SCNN

Verdict: Use Intelligent for all benchmarks

#### SCWN





Low input activity (3.99/ Δt) –lower reuse network – High internal activity (esp. Layer 3)  TILRtd dagleRanedome dre expitogreaolyt fremplinel protektjoquated vtb Hræforeetaaivke benedattis from søtaitigsa dap tivær cloensterispticite i snobbestved





**Reconfigurable Computing Laboratory** 

### Summary

| Benchmark | LRU<br>v/s<br>baseline | Random<br>v/s<br>baseline | Proposed Policy<br>(static adaptive)<br>v/s<br>baseline | Proposed Policy<br>(dynamic adaptive)<br>v/s<br>baseline | Proposed Policy<br>v/s<br>LRU |
|-----------|------------------------|---------------------------|---------------------------------------------------------|----------------------------------------------------------|-------------------------------|
| SCWN      | 28.13%                 | 25.99%                    | 44.13%                                                  | 44.45%                                                   | 22.71%                        |
| SDBN      | 5.46%                  | 2.88%                     | 7.65%                                                   | 15.55%                                                   | 10.67%                        |
| SCNN      | 5.12%                  | 4.59%                     | 7.4%                                                    | 12.61%                                                   | 7.9%                          |

## Scope of future work

#### **Architectural enhancements:**

- Multi-core CyNAPSE : interconnects and multi-level memory hierarchy
- Core leakage control techniques
- Compiler driven optimizations/Better dataflow for SNNs
- Is Proposed policy applicable to any event-driven simulation framework?

### Learning:

- \* We are interested in spike driven STDP hardware using memristive devices
- \* Evolving SNNs: benefits of hardware acceleration is still not clear.
- \* Extending CyNAPSE stack up to parsers and down to motor control or BCI.