GIANLUCA BRILLI - personale UniMoRe

Nuova ricerca

GIANLUCA BRILLI

Pubblicazioni

2023 - Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-based Heterogeneous SoCs [Relazione in Atti di Convegno]
Brilli, G.; Valente, G.; Capotondi, A.; Burgio, P.; Di Masciov, T.; Valente, P.; Marongiu, A.
abstract

2022 - An FPGA Overlay for Efficient Real-Time Localization in 1/10th Scale Autonomous Vehicles [Relazione in Atti di Convegno]
Bernardi, Andrea; Brilli, Gianluca; Capotondi, Alessandro; Marongiu, Andrea; Burgio, Paolo
abstract

Heterogeneous systems-on-chip (HeSoC) based on reconfigurable accelerators, such as Field-Programmable Gate Arrays (FPGA), represent an appealing option to deliver the performance/Watt required by the advanced perception and localization tasks employed in the design of Autonomous Vehicles. Different from software-programmed GPUs, FPGA development involves significant hardware design effort, which in the context of HeSoCs is further complicated by the system-level integration of HW and SW blocks. High-Level Synthesis is increasingly being adopted to ease hardware IP design, allowing engineers to quickly prototype their solutions. However, automated tools still lack the required maturity to efficiently build the complex hard-ware/software interaction between the host CPU and the FPGA accelerator(s). In this paper we present a fully integrated system design where a particle filter for LiDAR-based localization is efficiently deployed as FPGA logic, while the rest of the compute pipeline executes on programmable cores. This design constitutes the heart of a fully-functional 1/10th-scale racing autonomous car. In our design, accelerated IPs are controlled locally to the FPGA via a proxy core. Communication between the two and with the host CPU happens via shared memory banks also implemented as FPGA IPs. This allows for a scalable and easy-to-deploy solution both from the hardware and software viewpoint, while providing better performance and energy efficiency compared to state-of-the-art solutions.

2022 - Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs [Articolo su rivista]
Brilli, Gianluca; Cavicchioli, Roberto; Solieri, Marco; Valente, Paolo; Marongiu, Andrea
abstract

High-performance embedded platforms are increasingly adopting heterogeneous systems-on-chip (HeSoC) that couple multi-core CPUs with accelerators such as GPU, FPGA, or AI engines. Adopting HeSoCs in the context of real-time workloads is not immediately possible, though, as contention on shared resources like the memory hierarchy—and in particular the main memory (DRAM)—causes unpredictable latency increase. To tackle this problem, both the research community and certification authorities mandate (i) that accesses from parallel threads to the shared system resources (typically, main memory) happen in a mutually exclusive manner by design, or (ii) that per-thread bandwidth regulation is enforced. Such arbitration schemes provide timing guarantees, but make poor use of the memory bandwidth available in a modern HeSoC. Controlled Memory Request Injection (CMRI) is a recently-proposed bandwidth limitation concept that builds on top of a mutually-exclusive schedule but still allows the threads currently not entitled to access memory to use as much of the unused bandwidth as possible without losing the timing guarantee. CMRI has been discussed in the context of a multi-core CPU, but the same principle applies also to a more complex system such as an HeSoC. In this article, we introduce two CMRI schemes suitable for HeSoCs: Voluntary Throttling via code refactoring and Bandwidth Regulation via dynamic throttling. We extensively characterize a proof-of-concept incarnation of both schemes on two HeSoCs: an NVIDIA Tegra TX2 and a Xilinx UltraScale+, highlighting the benefits and the costs of CMRI for synthetic workloads that model worst-case DRAM access. We also test the effectiveness of CMRI with real benchmarks, studying the effect of interference among the host CPU and the accelerators.

2022 - Understanding and Mitigating Memory Interference in FPGA-based HeSoCs [Relazione in Atti di Convegno]
Brilli, G.; Capotondi, A.; Burgio, P.; Marongiu, A.
abstract

Like most high-end embedded systems, FPGA-based systems-on-chip (SoC) are increasingly adopting heterogeneous designs, where CPU cores, the configurable logic and other ICs all share interconnect and main memory (DRAM) controller. This paradigm is scalable and reduces production costs and time-to-market, but creates resource contention issues, which ultimately affects the programs' timing. This problem has been widely studied on CPU- and GPU-based systems, along with strategies to mitigate such effects, but little has been done so far to systematically study the problem on FPGA-based SoCs. This work provides an in-depth analysis of memory interference on such systems, tar-geting two state-of-the-art commercial FPGA SoCs. We also discuss architectural support for Controlled Memory Request Injection (CMRI), a technique that has proven effective at reducing the bandwidth under-utilization implied by naive schemes that solve the interference problem by only allowing mutually exclusive access to the shared resources. Our experimental results show that: i) memory interference can slow down CPU tasks by up to 16×in the tested FPGA-based SoCs; ii) CMRI allows to exploit more than 40% of the memory bandwidth avail-able to FPGA accelerators (normally completely unused in PREM-like schemes), keeping the slowdown due to interference below 10%.

2020 - A Systematic Assessment of Embedded Neural Networks for Object Detection [Relazione in Atti di Convegno]
Verucchi, M.; Brilli, G.; Sapienza, D.; Verasani, M.; Arena, M.; Gatti, F.; Capotondi, A.; Cavicchioli, R.; Bertogna, M.; Solieri, M.
abstract

Object detection is arguably one of the most important and complex tasks to enable the advent of next-generation autonomous systems. Recent advancements in deep learning techniques allowed a significant improvement in detection accuracy and latency of modern neural networks, allowing their adoption in automotive, avionics and industrial embedded systems, where performances are required to meet size, weight and power constraints.Multiple benchmarks and surveys exist to compare state-of-the-art detection networks, profiling important metrics, like precision, latency and power efficiency on Commercial-off-the-Shelf (COTS) embedded platforms. However, we observed a fundamental lack of fairness in the existing comparisons, with a number of implicit assumptions that may significantly bias the metrics of interest. This includes using heterogeneous settings for the input size, training dataset, threshold confidences, and, most importantly, platform-specific optimizations, that are especially important when assessing latency and energy-related values. The lack of uniform comparisons is mainly due to the significant effort required to re-implement network models, whenever openly available, on the specific platforms, to properly configure the available acceleration engines for optimizing performance, and to re-train the model using a homogeneous dataset.This paper aims at filling this gap, providing a comprehensive and fair comparison of the best-in-class Convolution Neural Networks (CNNs) for real-time embedded systems, detailing the effort made to achieve an unbiased characterization on cutting-edge system-on-chips. Multi-dimensional trade-offs are explored for achieving a proper configuration of the available programmable accelerators for neural inference, adopting the best available software libraries. To stimulate the adoption of fair benchmarking assessments, the framework is released to the public in an open source repository.

2019 - An open source research framework for IoT-capable smart traffic lights [Relazione in Atti di Convegno]
Brilli, G.; Burgio, P.
abstract

Recent technological advances are completely reshaping the way we build our cities, and the way we enjoy them. Future smart cities will employ a number of smart sensors, which cooperatively work to deliver advanced services that improve security and quality of life. The capability of deploying and testing such technologies directly on-the-field is paramount to research, however comes with a significant effort in terms of time and price. For this reason, we introduce an opensource design framework for highly-connected smart sensors, and we implemented it in an advanced controller for traffic light, providing a single component to support researchers and engineers from the earliest stages of development in laboratories till on-the-field research and testing.

2018 - Convolutional Neural Networks on Embedded Automotive Platforms: A Qualitative Comparison [Relazione in Atti di Convegno]
Brilli, Gianluca; Burgio, Paolo; Bertogna, Marko
abstract

In the last decade, the rise of power-efficient, het- erogeneous embedded platforms paved the way to the effective adoption of neural networks in several application domains. Especially, many-core accelerators (e.g., GPUs and FPGAs) are used to run Convolutional Neural Networks, e.g., in autonomous vehicles, and industry 4.0. At the same time, advanced research on neural networks is producing interesting results in computer vision applications, and NN packages for computer vision object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy and perfor- mance. With this work, we aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art embedded platforms, with commercial-off-the-shelf System-on-Chips such as the NVIDIA Tegra X2 and Xilinx Ultrascale+. In our vision, this work will support the choice of the most appropriate CNN package and computing system, and at the same time tries to “make some order” in the field.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni