Enrico Reggiani, Marco Rabozzi, Anna Maria Nestorov, Alberto Scolari, Luca Stornaiuolo, Marco Santambrogio
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Convolutional Neural Networks (CNNs) are at the base of many applications, both in embedded and in server-class contexts. While Graphics Processing Units (GPUs) are predominantly used for training, solutions for inference often rely on Field Programmable Gate Arrays (FPGAs) since they are more flexible and cost-efficient in many scenarios. However, existing approaches fall short to accomplish several conflicting goals, like efficiently using resources on multiple platforms while retaining deep configurability and allowing a quick Design Space Exploration (DSE) towards the best solution. This paper proposes a solution composed of highly configurable kernels designed for resources time-sharing with an analytical model of their resource/performance characteristics. Building on such models, we propose an Integer Linear Programming (ILP)-based approach to effectively identify pareto optimal kernel configurations in terms of throughput and resource consumption. We evaluate our DSE on two state-of-the-art CNNs, showing how it identifies hundreds of pareto optimal solutions in less than a minute. Guided from the DSE configurations of the AlexNet network, we quickly identified a candidate design for a Xilinx Virtex-7 XC7VX485T FPGA and achieved a peak throughput of 4.05 ms per image, while we measured a maximum estimation error of 6.69% with respect to the proposed analytical models.