Exploiting FPGA from data science programming languages

Luca Stornaiuolo

Published in
Politecnico di Milano online archive of theses

In the last years, the huge amount of available data leads data scientists to look for increasingly powerful systems to process them. Within this context, Field Programmable Gate Arrays (FPGAs) are a promising solution to improve performance of the system while keeping low energy consumption. Nevertheless, exploiting FPGAs is very challenging due to the high level of expertise required to program them. A lot of High Level Synthesis tools have been produced to help programmers during the flow of acceleration of their algorithms through the hardware architecture. However, these tools often use languages considered low level from the point of view of data scientists and are still much too difficult to use for software developers. This complexity limits FPGAs usage in a number of fields, from Data Science to Signal Processing. One way to overcome this problem is to realize Hardware Libraries of widely used algorithms that transparently offload the computation to the FPGA device from high level languages commonly used by data scientists. This work presents different methodologies to create Hardware Libraries for Desktop and Embedded systems. We have chosen to focus on R, MATLAB and Python languages. For what concerns MATLAB and R, the hardware libraries are developed for Desktop systems by the Reusable Integration Framework for FPGA Accelerators to send and receive data to the FPGA connected via PCI-Express. We have implemented and tested an optimized hardware implementation of the Autocorrelation Function on a Xilinx VC707 board and we reached a speedup of 7x with respect to the execution on an Intel i7-4710HQ. Python, instead, is exploited by using the recently released Xilinx PYNQ platform to create Hardware Libraries for Embedded systems. We have implemented different optimized versions of some NumPy library functions for the PYNQ-Z1 Board, that support the PINQ platform. We are able to achieve a speedup of 3.95x for the Integer Matrices Dot Product algorithm implementation and a speedup of 10x for the Correlation function.

Read Preprint Get More Info