

## Application Specific Architecture for Hardware Accelerating HOG-SVM to Achieve High Throughput on HD Frames

Asian Institute of Technology University of Minho University of Moratuwa

Piyumal Ranawaka, Mongkol Ekapanyapong, Adriano Tavares, Jorge Cabral, Krit Athikulwongse, Vitor Silva













Universidade do Minho Escola de Engenharia



## Topics

- Introduction
- Background
- Architecture and Methodology
- Results
- Conclusion









## Introduction

• Computer Vision is an emerging field with diverse applications which encompasses many algorithms with heavy computations.





Universidade do Minho Escola de Engenharia



## Introduction

• Histogram of Oriented Gradients-Support Vector Machine (HOG-SVM) is one such versatile algorithm used for object detection and image classification despite it's heavy computation load.





Universidade do Minho Escola de Engenharia



## Introduction

• Processing such an algorithm in real time with adequate throughput is a challenging task for a general purpose processor.





Universidade do Minho Escola de Engenharia



## Introduction

• Moreover, an embedded CPU with very limited processing power could least cater such heavy processing.





Universidade do Minho Escola de Engenharia



## Introduction

• Therefore our research in general focuses on developing application specific architectures for hardware acceleration of computer vision algorithms.





Universidade do Minho Escola de Engenharia



## Introduction

• This paper presents a continuation of a series of research to hardware accelerate HOG-SVM algorithm on FPGA.







Universidade do Minho Escola de Engenharia



## Introduction

• In this paper we mainly present the high performance application specific architecture for hardware acceleration of HOG-SVM which was successful in achieving a high throughput of 240fps on HD frames of size 1920x1080 which is a significant improvement of performance compared to previous research.









## Introduction

• On the other hand, both hardware utilization and power consumption are minimized









## Introduction

• A mechanism based around Block RAM (BRAM) structures and deep pipelining are used as the key architectural techniques of achieving high performance.









## Introduction

• The proposed design was deployed on Zynq7000 FPGA platform which contains a hardwired ARM CPU along with the programmable FPGA fabric





Universidade do Minho Escola de Engenharia



## Introduction

• The accelerator is deployed on the FPGA and integrated with the ARM CPU using AXI memory interfaces.





Universidade do Minho Escola de Engenharia



## Introduction

• A hardware thread model and bare-metal device drivers were developed which encapsulate the behavior of the accelerator as a hardware thread to the applications running on the ARM CPU.





## Architecture and Methedology











## Architecture and Methedology Deep Pipelined Datapath







Architecture and Methedology

Front End Pixel Buffer and Gradient Computation Stage









## **Orientation Binning Pipeline**





Architecture and Methedology

Histogram Creation Pipeline







Architecture and Methedology

## Normalization Pipeline







## **SVM** Classifier

HOG Descriptor









Universidade do Minho Escola de Engenharia



## Results Resource Utilization

| Module               | LUTs   | Registers | BRAMs  | DSPs   |
|----------------------|--------|-----------|--------|--------|
| Front Pixel Buffer   | 36     | 96        | 1.5    | 0      |
| Gradient Computation | 232    | 326       | 0      | 0      |
| Hist. Creation       | 374    | 242       | 1.5    | 7      |
| Normalization        | 472    | 359       | 5      | 4      |
| SVM Classifier       | 4615   | 7287      | 60.5   | 106    |
| Write Unit           | 91     | 213       | 4      | 0      |
| Control Unit         | 20     | 11        | 0      | 0      |
| AXI Components       | 82     | 201       | 0      | 0      |
| Video Source         | 116    | 88        | 0      | 0      |
| Overall design       | 6069   | 8841      | 74.5   | 117    |
| (XC7Z020)            | 11.40% | 8.30%     | 53.21% | 53.18% |







Universidade do Minho Escola de Engenharia



## Results Resource Utilization







## Results Comparison with Previous Work

|                           | [7]        | [8]      | [9]       | [10]     | [11]     | [12]       | [15]       | This Research |
|---------------------------|------------|----------|-----------|----------|----------|------------|------------|---------------|
| Resolution/ Frame Size    | 640x480    | 320x240  | various   | 640x480  | 320x240  | 800x600    | 800x600    | 1920x1080     |
| Frame Rate (fps)          | 30         | 62       | about 15  | 60       | 38       | 72         | 162        | 240           |
| Operating Freequency(MHz) | 127        | 44       | 192       | 25       | 167      | 40         | 150        | 148.5         |
| Platform                  | Altera     | Xilinx   | Xilinx    | Xilinx   | Xilinx   | Altera     | Altera     | Xilinx        |
|                           | Startix II | Virtex-5 | Spartan-6 | Virtex-6 | Virtex-5 | Cyclone IV | Cyclone IV | Zynq 7000     |
| Number of LUTs            | 37940      | 17383    | 4169      | 113359   | 28495    | 34403      | 16060      | 6069          |
| Number of Registers       | 66990      | 2181     | 3533      | 75071    | 5980     | 23247      | 7220       | 8841          |
| Number of DSP blocks      | 120        | no data  | 10        | 72       | 2        | 68         | 69         | 117           |
| Number of BRAMs           | no data    | no data  | no data   | no data  | no data  | no data    | no data    | 74.5          |









# Results



References



[2] C. Wojek, G.Dorko, A. Schulz, B. Schiele, Sliding-Windows for Rapid Object Class Localization: A Parallel Technique. In Proceedings of DAGM-Symposium 2008.

[3] P. Sudowe, B. Leibe, Efficient use of geometric constraints for slidingwindow object detection in video. In Proceedings of the International Conference on Computer Vision Systems (ICVS), Sophia Antipolis, France, 2022 September 2011; pp. 1120.



References

[4] T. Machida, T. Naito, GPU and CPU cooperative accelerated pedestrian and vehicle detection. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 613 November 2011; pp. 506513.

[5] Y. P. Chen, S. Z. Li, X. M. Lin, Fast HOG feature computation based on CUDA. In Proceedings of the IEEE International Conference on Computer Science and Automation Engineering (CSAE), Shanghai, China, 1012 June 2011; pp. 748751.

[6] B. Bilgic, B.K.P. Horn, I. Masaki, Fast human detection with cascaded ensembles on the GPU. In Proceedings of the IEEE Intelligent Vehicles Symposium, San Diego, CA, USA, 2124 June 2010; pp. 325332.







[7] R. Kadota, H. Sugano, M. Hiromoto, , H. Ochi, R. Miyamoto , Y. Nakamura, Hardware architecture for HOG feature extraction. In Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kyoto, Japan, 1214 September 2009; pp.13301333.

[8] K. Negi, K. Dohi, Y.Shibata,K. Oguri, Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm. In Proceedings of the International Conference on Field-Programmable Technology (FPT), New Delhi, India, 1214 December 2011; pp. 18.

[9] P.Y. Hsiao, S.Y Lin, C.Y Chen, A real-time FPGA based human detector. In Proceedings of the 2016 International Symposium on Computer, Consumer and Control (IS3C), Xian, China, 46 July 2016; pp. 10141017.



#### References

[10] M. Komorkiewicz, M. Kluczewski, M. Gorgon, Floating point HOG implementation for real-time multiple object detection. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), Oslo, Norway, 2931 August 2012; pp. 711714.

[11] M. Hiromoto, R. Miyamoto, Hardware architecture for high-accuracy realtime pedestrian detection with CoHOG features. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops
(ICCVWorkshops), Kyoto, Japan, 27 September4 October 2009; pp.894899.

[12] K. Mizuno, Y. Terachi, K. Takagi, S. Izumi, H. Kawaguchi,M.
Yoshimoto,Architectural study of HOG feature extraction processor for realtime object detection. In Proceedings of the 2012 IEEE Workshop on Signal Processing Systems (SiPS), Quebec City, QC, Canada, 1719
October 2012; pp. 197202.



#### References

[10] M. Komorkiewicz, M. Kluczewski, M. Gorgon, Floating point HOG implementation for real-time multiple object detection. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), Oslo, Norway, 2931 August 2012; pp. 711714.

[11] M. Hiromoto, R. Miyamoto, Hardware architecture for high-accuracy realtime pedestrian detection with CoHOG features. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops
(ICCVWorkshops), Kyoto, Japan, 27 September4 October 2009; pp.894899.

[12] K. Mizuno, Y. Terachi, K. Takagi, S. Izumi, H. Kawaguchi,M.
Yoshimoto,Architectural study of HOG feature extraction processor for realtime object detection. In Proceedings of the 2012 IEEE Workshop on Signal Processing Systems (SiPS), Quebec City, QC, Canada, 1719
October 2012; pp. 197202.



#### References

[13] M. Hahnle, F.Saxen, M. Hisung, U. Brunsmann, K. Doll, FPGA-based real-time pedestrian detection on high-resolution images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Portland, OR, USA, 2328 June 2013; pp. 629635. : University Science, 1989.

[14] S. Bauer, S. Khler, K. Doll, U. Brunsmann, FPGA-GPU architecture for kernel SVM pedestrian detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), San Francisco, CA, USA, 1318 June 2010; pp. 6168.

[15] J. H. Luo, C.H. Lin, "Pure FPGA Implementation of an HOG Based Real-Time Pedestrian Detection System", Sensors 2018, vol. 18, April, 2018. Available:https://www.mdpi.com/1424-8220/18/4/1174 [Accessed May 2, 2018]

[16] INRIA Person Dataset Available online:http://pascal.inrialpes.fr/data/ human/ (accessed on 3 April 2018).







Universidade do Minho Escola de Engenharia



## Thank You!