A Case Study for an Accelerated DCNN on FPGA-Based Embedded Distributed System

Publications
Author(s) Anna Maria Nestorov, Alberto Scolari, Enrico Reggiani, Luca Stornaiuolo, Marco D Santambrogio Conference 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Abstract Face Detection (FD) recently became the base of multiple applications requiring low latency but also with limited resources and energy budgets. Deep Convolutional Neural Networks (DCNNs) are especially accurate in FD, but latency requirements and energy budgets call for Field Programmable Gate Arrays (FPGAs)-based solutions, trading flexibility and efficiency. Nonetheless, the offer of FPGAs solutions is limited and different chips often require expensive re-design phases, while developers desire solutions whose resources can scale proportionally to the demands. Therefore, this work presents an FD solution based on a DCNN on a distributed, embedded system with FPGAs, proposing a general approach to reduce the DCNN size and…
Read More

Pareto Optimal Design Space Exploration for Accelerated CNN on FPGA

Publications
Author(s) Enrico Reggiani, Marco Rabozzi, Anna Maria Nestorov, Alberto Scolari, Luca Stornaiuolo, Marco Santambrogio Conference 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Abstract Convolutional Neural Networks (CNNs) are at the base of many applications, both in embedded and in server-class contexts. While Graphics Processing Units (GPUs) are predominantly used for training, solutions for inference often rely on Field Programmable Gate Arrays (FPGAs) since they are more flexible and cost-efficient in many scenarios. However, existing approaches fall short to accomplish several conflicting goals, like efficiently using resources on multiple platforms while retaining deep configurability and allowing a quick Design Space Exploration (DSE) towards the best solution. This paper proposes a solution composed of highly configurable kernels designed for resources time-sharing with an analytical model of their resource/performance characteristics.…
Read More

FPGA-Based Embedded System Implementation of Audio Signal Alignment

Publications
Author(s) Luca Stornaiuolo, Massimo Perini, Marco D Santambrogio, Donatella Sciuto Conference 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Abstract FPGAs are considered a valuable solution for embedded system applications thanks to their performance, energy efficiency and capability to face system failures. However, the number of available applications is limited due to the learning curve needed to customize FPGA-based accelerators. As proof of this, Xilinx recently released PYNQ, a platform for Zynq SoC that relies on Python and overlays to ease the integration of functionalities of the programmable logic into applications. In this work, we build upon this framework to implement an optimized embedded design for audio alignment and we integrated it in the Python applications workflow. In particular, we provide a custom accelerator designed for PYNQ and…
Read More

Diversity and inclusion: buzzword or real value?

Publications
Author(s) Letizia Clementi, Riccardo Cavadini, Fabiola Casasopra, Marco Rabozzi, Sara Notargiacomo, Marco D Santambrogio Conference 2019 IEEE Global Engineering Education Conference (EDUCON) Abstract The STEM field is characterized by a strong gender gap, both in Business and in Academia. Previous studies showed how the gender gap presents some peculiarities: women result to publish less than men across all disciplines, and this is the reason why this publication gap is often referred to as “productivity puzzle”. Strongly believing that gender should not influence the choice of the career to pursue, recent literature in organization has paid greater attention to gender related issues, analyzing the role played by team heterogeneity on performance. Such studies often obtained controversial outcomes, suggesting that the relationship between group heterogeneity and performance is a complex phenomenon. The…
Read More

HLS support for polymorphic parallel memories

Publications
Author(s) Luca Stornaiuolo, Marco Rabozzi, Donatella Sciuto, Marco D Santambrogio, Giulio Stramondo, C Ciobanu, Ana Lucia Varbanescu Conference 2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC) Abstract The importance of High-Level Languages in abstracting machine language to enhance productivity has been proved in many sectors, and has recently encouraged the spread of reconfigurable hardware for general purpose computing. At the same time, Field Programmable Gate Arrays (FPGAs) become popular for data-intensive applications, because they promise customized hardware accelerators and achieve high-performance with low power consumption. However, taking advantage of parallel accesses to the local memories of FPGAs remains difficult, as it currently requires application re-engineering. A solution to this challenge is PolyMem, an easy-to-use parallel memory. In this work, we investigate the implementation, integration, and performance of PolyMem…
Read More

Building High-Performance, Easy-to-Use Polymorphic Parallel Memories with HLS

Publications
Author(s) L Stornaiuolo, M Rabozzi, MD Santambrogio, D Sciuto, CB Ciobanu, G Stramondo, AL Varbanescu Conference IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip Abstract With the increased interest in energy efficiency, a lot of application domains experiment with Field Programmable Gate Arrays (FPGAs), which promise customized hardware accelerators with high-performance and low power consumption. These experiments possible due to the development of High-Level Languages (HLLs) for FPGAs, which permit non-experts in hardware design languages (HDLs) to program reconfigurable hardware for general purpose computing. However, some of the expert knowledge remains difficult to integrate in HLLs, eventually leading to performance loss for HLL-based applications. One example of such a missing feature is the efficient exploitation of the local memories on FPGAs. A solution to address this challenge…
Read More

EXTRA: an open platform for reconfigurable architectures

Publications
Author(s) Cătălin Bogdan Ciobanu, Giulio Stramondo, Ana Lucia Varbanescu, Andreas Brokalakis, Antonis Nikitakis, Lorenzo Di Tucci, Marco Rabozzi, Luca Stornaiuolo, Marco Santambrogio, Grigorios Chrysos, Charalampos Vatsolakis, Charitopoulos Georgios, Dionisios Pnevmatikatos Conference Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation Abstract Reconfigurable hardware is becoming increasingly mainstream, evolving to a valid alternative to Graphics Processing Units-based hardware accelerators. However, several major challenges remain for migrating existing software to heterogeneous reconfigurable architectures. The EXTRA project aims to develop an integrated environment for developing and programming reconfigurable architectures. The EXTRA platform enables the joint optimization of architecture, tools, and reconfiguration technology, and targets the future High Performance Computing hardware nodes. In this paper, we present four innovative EXTRA technologies: (1) a hardware-software co-design framework; (2) a parallel…
Read More

On how to efficiently implement deep learning algorithms on pynq platform

Publications
Author(s) Luca Stornaiuolo, Marco Santambrogio, Donatella Sciuto Conference2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) AbstractDeep Learning algorithms are gaining momentum as main components in a large number of fields, from computer vision and robotics to finance and biotechnology. At the same time, the use of Field Programmable Gate Arrays (FPGAs) for data-intensive applications is increasingly widespread thanks to the possibility to customize hardware accelerators and achieve high-performance implementations with low energy consumption. Moreover, FPGAs have demonstrated to be a viable alternative to GPUs in embedded systems applications, where the benefits of the reconfigurability properties make the system more robust, capable to face the system failures and to respect the constraints of the embedded devices. In this work, we present a framework to efficiently implement Deep Learning algorithms by…
Read More

FIDA: a framework to automatically integrate FPGA kernels within Data-Science applications

Publications
Author(s) Luca Stornaiuolo, Alberto Parravicini, Donatella Sciuto, Marco D Santambrogio Conference 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Abstract Hardware accelerators are an effective solution to increase the performance of algorithms in a wide array of disciplines, from data science to computational finance. However, data scientists and mathematicians often do not have the required knowledge or time to fully exploit these accelerators, and they perceive them as difficult and frustrating to use. OpenCL was created to simplify the creation of computational pipelines with heterogeneous hardware, but as of today, its integration with high-level languages commonly used in data science is limited. In this paper, we propose a framework to integrate OpenCL kernels running on Field Programmable Gate Arrays (FPGAs) with Python, R, and MATLAB, the most common…
Read More

The role of cad frameworks in heterogeneous fpga-based cloud systems

Publications
Author(s) Lorenzo Di Tucci, Marco Rabozzi, Luca Stornaiuolo, Marco D Santambrogio Conference 2017 IEEE International Conference on Computer Design (ICCD) Abstract In the context of heterogneous computing, even though GPUs are the components of election due to both their intrinsically parallel nature and their flexibility, FPGAs are being investigated and experimented due to superior power efficiency on selected workloads While GPUs are the heterogeneous components of election due to both their intrinsically parallel nature and their flexibility, FPGAs are being investigated and experimented due to superior power efficiency on selected workloads. However, the lack of adequate languages, runtimes, programming flexibility and, broadly speaking, proven system level approaches for FPGA-accelerated applications are the most relevant limiting factors to the adoption of these devices into mainstream. In these regards, Amazon recently released…
Read More