Nuova ricerca

Marko BERTOGNA

Professore Ordinario
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2022 - Message from the Chairs [Relazione in Atti di Convegno]
Bertogna, M.; Maggio, M.
abstract


2022 - Preface [Relazione in Atti di Convegno]
Bertogna, M.; Terraneo, F.; Reghenzani, F.
abstract


2022 - Real-Time Requirements for ADAS Platforms Featuring Shared Memory Hierarchies [Articolo su rivista]
Capodieci, Nicola; Burgio, Paolo; Cavicchioli, Roberto; Olmedo, Ignacio Sanudo; Solieri, Marco; Bertogna, Marko
abstract


2021 - All you can embed: Natural language based vehicle retrieval with spatio-temporal transformers [Relazione in Atti di Convegno]
Scribano, C.; Sapienza, D.; Franchini, G.; Verucchi, M.; Bertogna, M.
abstract

Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.


2021 - Performance modeling of heterogeneous HW platforms [Articolo su rivista]
Rehm, F.; Dasari, D.; Hamann, A.; Pressler, M.; Ziegenbein, D.; Seitter, J.; Sanudo, I.; Capodieci, N.; Burgio, P.; Bertogna, M.
abstract

The push towards automated and connected driving functionalities mandates the use of heterogeneous hardware platforms in order to provide the required computational resources. For these platforms, established methods for performance modeling in industry are no longer effective or adequate. In this paper, we explore the detailed problem of mapping a prototypical autonomous driving application on a Nvidia Tegra X2 platform while considering different constraints of the application, including end-to-end latencies of event chains spanning CPU and GPU boundaries. With the given use-case and platform, we propose modeling concepts in Amalthea, capturing the architectural aspects of heterogeneous platforms and also the execution structure of the application. These models can be fed into appropriate tools to predict performance properties. We proposed the above problem in the Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS) Industrial Challenge 2019 and in response, academicians came up with different solutions. In this paper, we evaluate these different solutions and summarize all approaches. The lesson learned from this challenge is then used to improve on the simplifying assumptions we made in our original formulation and discuss future modeling extensions.


2021 - Preface [Relazione in Atti di Convegno]
Bertogna, M.; Terraneo, F.
abstract


2021 - SPHERE: A Multi-SoC Architecture for Next-Generation Cyber-Physical Systems Based on Heterogeneous Platforms [Articolo su rivista]
Biondi, A.; Casini, D.; Cicero, G.; Borgioli, N.; Buttazzo, G.; Patti, G.; Leonardi, L.; Bello, L. L.; Solieri, M.; Burgio, P.; Olmedo, I. S.; Ruocco, A.; Palazzi, L.; Bertogna, M.; Cilardo, A.; Mazzocca, N.; Mazzeo, A.
abstract

This paper presents SPHERE, a project aimed at the realization of an integrated framework to abstract the hardware complexity of interconnected, modern system-on-chips (SoC) and simplify the management of their heterogeneous computational resources. The SPHERE framework leverages hypervisor technology to virtualize computational resources and isolate the behavior of different subsystems running on the same platform, while providing safety, security, and real-time communication mechanisms. The main challenges addressed by SPHERE are discussed in the paper along with a set of new technologies developed in the context of the project. They include isolation mechanisms for mixed-criticality applications, predictable I/O virtualization, the management of time-sensitive networks with heterogeneous traffic flows, and the management of field-programmable gate arrays (FPGA) to provide efficient implementations for cryptography modules, as well as hardware acceleration for deep neural networks. The SPHERE architecture is validated through an autonomous driving use-case.


2021 - The HPC-DAG Task Model for Heterogeneous Real-Time Systems [Articolo su rivista]
Houssam-Eddine, Z.; Capodieci, N.; Cavicchioli, R.; Lipari, G.; Bertogna, M.
abstract

Recent commercial hardware platforms for embedded real-time systems feature heterogeneous processing units and computing accelerators on the same System-on-Chip. When designing complex real-time applications for such architectures, the designer is exposed to a number of difficult choices, like deciding on which compute engine to execute a certain task, or what degree of parallelism to adopt for a given function. To help the designer exploring the wide space of design choices and tune the scheduling parameters, we propose a novel real-time application model, called HPC-DAG (Heterogeneous Parallel Condition Directed Acyclic Graph Model), specifically conceived for heterogeneous platforms. An HPC-DAG allows the system designer to specify alternative implementations of a software component for different processing engines, as well as conditional branches to model if-then-else statements. We also propose a schedulability analysis for the HPC-DAG model and a set of heuristic allocation algorithms aimed at improving schedulability for latency sensitive applications. Our analysis takes into account the cost of preempting a task, which can be non-negligible on certain processors. We show the use of our approach on a realistic case study, and we demonstrate its effectiveness by comparing it with state-of-the-art algorithms previously proposed in literature.


2021 - The Predictable Execution Model in Practice: Compiling Real Applications for COTS Hardware [Articolo su rivista]
Forsberg, B.; Solieri, M.; Bertogna, M.; Benini, L.; Marongiu, A.
abstract

Adoption of multi- and many-core processors in real-time systems has so far been slowed down, if not totally barred, due do the difficulty in providing analytical real-time guarantees on worst-case execution times. The Predictable Execution Model (PREM) has been proposed to solve this problem, but its practical support requires significant code refactoring, a task better suited for a compilation tool chain than human programmers. Implementing a PREM compiler presents significant challenges to conform to PREM requirements, such as guaranteed upper bounds on memory footprint and the generation of efficient schedulable non-preemptive regions. This article presents a comprehensive description on how a PREM compiler can be implemented, based on several years of experience from the community. We provide accumulated insights on how to best balance conformance to real-time requirements and performance and present novel techniques that extend the applicability from simple benchmark suites to real-world applications. We show that code transformed by the PREM compiler enables timing predictable execution on modern commercial off-the-shelf hardware, providing novel insights on how PREM can protect 99.4% of memory accesses on random replacement policy caches at only 16% performance loss on benchmarks from the PolyBench benchmark suite. Finally, we show that the requirements imposed on the programming model are well-aligned with current coding guidelines for timing critical software, promoting easy adoption.


2020 - A Systematic Assessment of Embedded Neural Networks for Object Detection [Relazione in Atti di Convegno]
Verucchi, M.; Brilli, G.; Sapienza, D.; Verasani, M.; Arena, M.; Gatti, F.; Capotondi, A.; Cavicchioli, R.; Bertogna, M.; Solieri, M.
abstract

Object detection is arguably one of the most important and complex tasks to enable the advent of next-generation autonomous systems. Recent advancements in deep learning techniques allowed a significant improvement in detection accuracy and latency of modern neural networks, allowing their adoption in automotive, avionics and industrial embedded systems, where performances are required to meet size, weight and power constraints.Multiple benchmarks and surveys exist to compare state-of-the-art detection networks, profiling important metrics, like precision, latency and power efficiency on Commercial-off-the-Shelf (COTS) embedded platforms. However, we observed a fundamental lack of fairness in the existing comparisons, with a number of implicit assumptions that may significantly bias the metrics of interest. This includes using heterogeneous settings for the input size, training dataset, threshold confidences, and, most importantly, platform-specific optimizations, that are especially important when assessing latency and energy-related values. The lack of uniform comparisons is mainly due to the significant effort required to re-implement network models, whenever openly available, on the specific platforms, to properly configure the available acceleration engines for optimizing performance, and to re-train the model using a homogeneous dataset.This paper aims at filling this gap, providing a comprehensive and fair comparison of the best-in-class Convolution Neural Networks (CNNs) for real-time embedded systems, detailing the effort made to achieve an unbiased characterization on cutting-edge system-on-chips. Multi-dimensional trade-offs are explored for achieving a proper configuration of the available programmable accelerators for neural inference, adopting the best available software libraries. To stimulate the adoption of fair benchmarking assessments, the framework is released to the public in an open source repository.


2020 - BAO: A lightweight static partitioning hypervisor for modern multi-core embedded systems [Relazione in Atti di Convegno]
Martins, J.; Tavares, A.; Solieri, M.; Bertogna, M.; Pinto, S.
abstract

Given the increasingly complex and mixed-criticality nature of modern embedded systems, virtualization emerges as a natural solution to achieve strong spatial and temporal isolation. Widely used hypervisors such as KVM and Xen were not designed having embedded constraints and requirements in mind. The static partitioning architecture pioneered by Jailhouse seems to address embedded concerns. However, Jailhouse still depends on Linux to boot and manage its VMs. In this paper, we present the Bao hypervisor, a minimal, standalone and clean-slate implementation of the static partitioning architecture for Armv8 and RISC-V platforms. Preliminary results regarding size, boot, performance, and interrupt latency, show this approach incurs only minimal virtualization overhead. Bao will soon be publicly available, in hopes of engaging both industry and academia on improving Bao's safety, security, and real-time guarantees.


2020 - Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms [Relazione in Atti di Convegno]
Capodieci, N.; Cavicchioli, R.; Olmedo, I. S.; Solieri, M.; Bertogna, M.
abstract

Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements. In such contexts, achieving the desired performance-per-watt target calls for increasing the number of processors rather than ramping up their voltage and frequency. Hence, generation after generation, modern heterogeneous System on Chips (SoC) present a higher number of cores within their CPU complexes as well as a wider variety of accelerators that leverages massively parallel compute architectures. Previous literature demonstrated that while increasing parallelism is theoretically optimal for improving on average performance, shared memory hierarchies (i.e. caches and system DRAM) act as a bottleneck by exposing the platform processors to severe contention on memory accesses, hence dramatically impacting performance and timing predictability. In this work we characterize how subsequent generations of embedded platforms from the NVIDIA Tegra family balanced the increasing parallelism of each platform's processors with the consequent higher potential on memory interference. We also present an open-source software for generating test scenarios aimed at measuring memory contention in highly heterogeneous SoCs.


2020 - Dissecting the CUDA scheduling hierarchy: A Performance and Predictability Perspective [Relazione in Atti di Convegno]
Olmedo, I. S.; Capodieci, N.; Martinez, J. L.; Marongiu, A.; Bertogna, M.
abstract

Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in safety-related domains has opened up many research problems in the real-time community. The closed and proprietary nature of the scheduling mechanisms deployed in NVIDIA GPUs, for instance, represents a major obstacle in deriving a proper schedulability analysis for latency-sensitive applications. Existing literature addresses these issues by either (i) providing simplified models for heterogeneous CPUGPU systems and their associated scheduling policies, or (ii) providing insights about these arbitration mechanisms obtained through reverse engineering. In this paper, we take one step further by correcting and consolidating previously published assumptions about the hierarchical scheduling policies of NVIDIA GPUs and their proprietary CUDA application programming interface. We also discuss how such mechanisms evolved with recently released GPU micro-architectures, and how such changes influence the scheduling models to be exploited by real-time system engineers.


2020 - End-to-end latency characterization of task communication models for automotive systems [Articolo su rivista]
Martinez, J.; Sanudo, I.; Bertogna, M.
abstract

Different communication models have been historically adopted in the automotive domain for allowing concurrent tasks to coordinate, interact and synchronize on a shared memory system for implementing complex functions, while ensuring data consistency and/or time determinism. To this extent, most automotive OSs provide inter-task communication and synchronization mechanisms based upon memory-sharing paradigms, where variables modified by one task may be concurrently accessed also by other tasks. A so-called “effect chain” is created when the effect of an initial event is propagated to an actuation signal through sequences of tasks writing/reading shared variables. The responsiveness, performance and stability of the control algorithms of an automotive application typically depend on the propagation delays of selected effect chains. Depending on the communication model adopted, the propagation delay of an effect chain may significantly vary, as may be the resulting overhead and memory footprint. In this paper, we explore the trade-offs between three communication models that are widely adopted for industrial automotive systems, namely, Explicit, Implicit, and Logical Execution Time (LET). A timing and schedulability analysis is provided for tasks scheduled following a mixed preemptive configuration, as specified in the AUTOSAR model. An end-to-end latency characterization is then proposed, deriving different latency metrics for effect chains under each one of the considered models. The results are compared against an industrial case study consisting of an automotive engine control system.


2020 - Evaluating Controlled Memory Request Injection to Counter PREM Memory Underutilization [Relazione in Atti di Convegno]
Cavicchioli, R.; Capodieci, N.; Solieri, M.; Bertogna, M.; Valente, P.; Marongiu, A.
abstract

Modern heterogeneous systems-on-chip (HeSoC) feature high-performance multi-core CPUs tightly integrated with data-parallel accelerators. Such HeSoCS heavily rely on shared resources, which hinder their adoption in the context of Real-Time systems. The predictable execution model (PREM) has proven effective at preventing uncontrolled execution time lengthening due to memory interference in HeSoC sharing main memory (DRAM). However, PREM only allows one task at a time to access memory, which inherently under-utilizes the available memory bandwidth in modern HeSoCs. In this paper, we conduct a thorough experimental study aimed at assessing the potential benefits of extending PREM so as to inject controlled amounts of memory requests coming from other tasks than the one currently granted exclusive DRAM access. Focusing on a state-of-the-art HeSoC, the NVIDIA TX2, we extensively characterize the relation between the injected bandwidth and the latency experienced by the task under test. The results confirm that for various types of workload it is possible to exploit the available bandwidth much more efficiently than standard PREM arbitration, often close to its maximum, while keeping latency inflation below 10%. We discuss possible practical implementation directions, highlighting the expected benefits and technical challenges.


2020 - Exact response time analysis of fixed priority systems based on sporadic servers [Articolo su rivista]
Martinez, J.; Dasari, D.; Hamann, A.; Sanudo, I.; Bertogna, M.
abstract

Real-time servers have been widely explored in the scheduling literature to predictably execute aperiodic activities, as well as to allow hierarchical scheduling settings. As they facilitate timing isolation between different software components, there is a renewed interest for the adoption of fixed priority real-time servers in the automotive domain, as a way to implement more efficient reservation mechanisms than TDMA-based methods. In this paper, we focus on the Sporadic Server, which is the only fixed priority server supported by the POSIX standard. Despite its popularity, we realized that only sufficient schedulability conditions exist for real-time systems scheduled with a Sporadic Server. Thus, we develop a formal characterization of an exact response time analysis for fixed priority systems based on Sporadic Servers in a multi-level scheduling setting under preemptive scheduling. We then provide an experimental characterization of the schedulabilty improvement that can be obtained with respect to existing sufficient schedulability tests, proving the effectiveness of the proposed exact analysis.


2020 - Fixed-priority memory-centric scheduler for COTS-based multiprocessors [Relazione in Atti di Convegno]
Schwaricke, G.; Kloda, T.; Gracioli, G.; Bertogna, M.; Caccamo, M.
abstract

Memory-centric scheduling attempts to guarantee temporal predictability on commercial-off-the-shelf (COTS) multiprocessor systems to exploit their high performance for real-time applications. Several solutions proposed in the real-time literature have hardware requirements that are not easily satisfied by modern COTS platforms, like hardware support for strict memory partitioning or the presence of scratchpads. However, even without said hardware support, it is possible to design an efficient memory-centric scheduler. In this article, we design, implement, and analyze a memory-centric scheduler for deterministic memory management on COTS multiprocessor platforms without any hardware support. Our approach uses fixed-priority scheduling and proposes a global “memory preemption” scheme to boost real-time schedulability. The proposed scheduling protocol is implemented in the Jailhouse hypervisor and Erika real-time kernel. Measurements of the scheduler overhead demonstrate the applicability of the proposed approach, and schedulability experiments show a 20% gain in terms of schedulability when compared to contention-based and static fair-share approaches.


2020 - Latency-Aware Generation of Single-Rate DAGs from Multi-Rate Task Sets [Relazione in Atti di Convegno]
Verucchi, M.; Theile, M.; Caccamo, M.; Bertogna, M.
abstract

Modern automotive and avionics embedded systems integrate several functionalities that are subject to complex timing requirements. A typical application in these fields is composed of sensing, computation, and actuation. The ever increasing complexity of heterogeneous sensors implies the adoption of multi-rate task models scheduled onto parallel platforms. Aspects like freshness of data or first reaction to an event are crucial for the performance of the system. The Directed Acyclic Graph (DAG) is a suitable model to express the complexity and the parallelism of these tasks. However, deriving age and reaction timing bounds is not trivial when DAG tasks have multiple rates. In this paper, a method is proposed to convert a multi-rate DAG task-set with timing constraints into a single-rate DAG that optimizes schedulability, age and reaction latency, by inserting suitable synchronization constructs. An experimental evaluation is presented for an autonomous driving benchmark, validating the proposed approach against state-of-the-art solutions.


2020 - Preface [Relazione in Atti di Convegno]
Bertogna, M.; Terraneo, F.
abstract


2020 - Preface [Relazione in Atti di Convegno]
Bertogna, M.; Volp, M.
abstract


2020 - Real-Time Virtualization for Industrial Automation [Relazione in Atti di Convegno]
Scordino, C.; Savino, I. M.; Cuomo, L.; Miccio, L.; Tagliavini, A.; Bertogna, M.; Solieri, M.
abstract

The industry has recently shown a growing interest in running non-critical activities (e.g., design tools) together with real-time control tasks on open-source COTS platforms.In this paper, we present a modular platform for industrial automation based on open-source components. Thanks to the isolation provided by a hypervisor, the same platform can run both the real-time control code and the design tools, reducing the overall hardware costs.Besides illustrating the software architecture and describing the various open-source components, we illustrate extensions needed for convenient applicability, and we address the problem of interference on hardware resources shared by multiple operating systems, which undermines the performance of the real-time control. A set of experimental results show the effectiveness and the performance of the proposed solution.


2020 - Real-Time clustering and LiDAR-camera fusion on embedded platforms for self-driving cars [Relazione in Atti di Convegno]
Verucchi, M.; Bartoli, L.; Bagni, F.; Gatti, F.; Burgio, P.; Bertogna, M.
abstract

3D object detection and classification are crucial tasks for perception in Autonomous Driving (AD). To promptly and correctly react to environment changes and avoid hazards, it is of paramount importance to perform those operations with high accuracy and in real-time. One of the most widely adopted strategies to improve the detection precision is to fuse information from different sensors, like e.g. cameras and LiDAR. However, sensor fusion is a computationally intensive task, that may be difficult to execute in real-time on an embedded platforms. In this paper, we present a new approach for LiDAR and camera fusion, that can be suitable to execute within the tight timing requirements of an autonomous driving system. The proposed method is based on a new clustering algorithm developed for the LiDAR point cloud, a new technique for the alignment of the sensors, and an optimization of the Yolo-v3 neural network. The efficiency of the proposed method is validated comparing it against state-of-the-art solutions on commercial embedded platforms.


2020 - The Key Role of Memory in Next-Generation Embedded Systems for Military Applications [Relazione in Atti di Convegno]
Sañudo, Ignacio; Cortimiglia, Paolo; Miccio, Luca; Solieri, Marco; Burgio, Paolo; Di Biagio, Christian; Felici, Franco; Nuzzo, Giovanni; Bertogna, Marko
abstract

With the increasing use of multi-core platforms in safety-related domains, aircraft system integrators and authorities exhibit a concern about the impact of concurrent access to shared-resources in the Worst-Case Execution Time (WCET). This paper highlights the need for accurate memory-centric scheduling mechanisms for guaranteeing prioritized memory accesses to Real-Time safety-related components of the system. We implemented a software technique called cache coloring that demonstrates that isolation at timing and spatial level can be achieved by managing the lines that can be evicted in the cache. In order to show the effectiveness of this technique, the timing properties of a real application are considered as a use case, this application is made of parallel tasks that show different trade-offs between computation and memory loads.


2019 - Deadline-Based Scheduling for GPU with Preemption Support [Relazione in Atti di Convegno]
Capodieci, N.; Cavicchioli, R.; Bertogna, M.; Paramakuru, A.
abstract

Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications (e.g., Deep Neural Network (DNN) inference, sensor fusion, path planning, etc). As these workload-intensive activities are pushed to higher criticality levels, there is a stronger need for more predictable scheduling algorithms that are able to guarantee predictability without overly sacrificing GPU utilization. Unfortunately, the real-rime literature on GPU scheduling mostly considered limited (or null) preemption capabilities, while previous efforts in broader domains were often based on programming models and APIs that were not designed to support the real-rime requirements of recurring workloads. In this paper, we present the design of a prototype real-time scheduler for GPU activities on an embedded System on a Chip (SoC) featuring a cutting edge GPU architecture by NVIDIA adopted in the autonomous driving domain. The scheduler runs as a software partition on top of the NVIDIA hypervisor, and it leverages latest generation architectural features, such as pixel-level preemption and thread level preemption. Such a design allowed us to implement and test a preemptive Earliest Deadline First (EDF) scheduler for GPU tasks providing bandwidth isolations by means of a Constant Bandwidth Server (CBS). Our work involved investigating alternative programming models for compute APIs, allowing us to characterize CPU-to-GPU command submission with more detailed scheduling information. A detailed experimental characterization is presented to show the significant schedulability improvement of recurring real-time GPU tasks.


2019 - Deterministic memory hierarchy and virtualization for modern multi-core embedded systems [Relazione in Atti di Convegno]
Kloda, Tomasz; Solieri, M.; Mancuso, R.; Capodieci, N.; Valente, P.; Bertogna, M.
abstract

One of the main predictability bottlenecks of modern multi-core embedded systems is contention for access to shared memory resources. Partitioning and software-driven allocation of memory resources is an effective strategy to mitigate contention in the memory hierarchy. Unfortunately, however, many of the strategies adopted so far can have unforeseen side-effects when practically implemented latest-generation, high-performance embedded platforms. Predictability is further jeopardized by cache eviction policies based on random replacement, targeting average performance instead of timing determinism.In this paper, we present a framework of software-based techniques to restore memory access determinism in high-performance embedded systems. Our approach leverages OS-transparent and DMA-friendly cache coloring, in combination with an invalidation-driven allocation (IDA) technique. The proposed method allows protecting important cache blocks from (i) external eviction by tasks concurrently executing on different cores, and (ii) internal eviction by tasks running on the same core. A working implementation obtained by extending the Jailhouse partitioning hypervisor is presented and evaluated with a combination of synthetic and real benchmarks.


2019 - Novel methodologies for predictable CPU-to-GPU command offloading [Relazione in Atti di Convegno]
Cavicchioli, R.; Capodieci, N.; Solieri, Marco; Bertogna, M.
abstract

There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems. Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity.


2019 - System Performance Modelling of Heterogeneous HW Platforms: An Automated Driving Case Study [Relazione in Atti di Convegno]
Wurst, F.; Dasari, D.; Hamann, A.; Ziegenbein, D.; Sanudo, I.; Capodieci, N.; Bertogna, M.; Burgio, P.
abstract

The push towards automated and connected driving functionalities mandates the use of heterogeneous HW platforms in order to provide the required computational resources. For these platforms, the established methods for performance modelling in industry are no longer effective. In this paper, we propose an initial modelling concept for heterogeneous platforms which can then be fed into appropriate tools to derive effective performance predictions. The approach is demonstrated for a prototypical automated driving application on the Nvidia Tegra X2 platform.


2019 - The Parallel Multi-Mode Digraph Task Model for Energy-Aware Real-Time Heterogeneous Multi-Core Systems [Articolo su rivista]
Zahaf, Houssam-Eddine; Lipari, Giuseppe; Bertogna, Marko; Boulet, Pierre
abstract

Many task models have been proposed to express and analyze the behavior of real-time applications at different levels of precision. Most of them target sequential applications with no support for parallelism. The digraph task model is one of the most general ones, as it allows modeling arbitrary directed graphs (digraphs) for sequential job releases. In this paper, we extend the digraph task model to support intra-task parallelism. For the proposed parallel multi-mode digraph model, we derive sufficient schedulability tests and a dichotomic search to improve the test pessimism for a set of n tasks onto a heterogeneous single-ISA multi-core platform. To reduce the computational complexity of the schedulability test, we also propose heuristics for (i) partitioning parallel digraph tasks onto the heterogeneous cores, and (ii) assigning core operating frequencies to reduce the overall energy consumption, while meeting real-time constraints. The effectiveness of the proposed approach is validated with an exhaustive set of simulations.


2018 - A design flow for supporting component-based software development in multiprocessor real-time systems [Articolo su rivista]
Biondi, A.; Buttazzo, G.; Bertogna, M.
abstract

Component-based software development established as an effective technique to cope with the increasing complexity of modern computing systems. In the context of real-time systems, the M-BROE framework has been recently proposed to efficiently support component-based development of real-time applications on multiprocessor platforms in the presence of shared resources. The framework relies on a two-stage approach where software components are first partitioned upon a virtual multiprocessor platform and are later integrated upon the physical platform by means of component interfaces that abstract from the internal details of the applications. This work presents a complete design flow for the M-BROE framework. Starting from a model of software components, a first method is proposed to partition applications to virtual processors and perform a synthesis of multiple component interfaces. Then, a second method is proposed to support the integration of the components by allocating virtual processors to physical processors. Both methods take resource sharing into account. Experimental results are also presented to evaluate the proposed methodology.


2018 - A survey on shared disk I/O management in virtualized environments under real time constraints [Articolo su rivista]
Sanudo Olmedo, Ignacio; Cavicchioli, Roberto; Capodieci, Nicola; Valente, Paolo; Bertogna, Marko
abstract

In the embedded systems domain, hypervisors are increasingly being adopted to guarantee timing isolation and appropriate hardware resource sharing among different software components. However, managing concurrent and parallel requests to shared hardware resources in a predictable way still represents an open issue. We argue that hypervisors can be an effective means to achieve an efficient and predictable arbitration of competing requests to shared devices in order to satisfy real-time requirements. As a representative example, we consider the case for mass storage (I/O) devices like Hard Disk Drives (HDD) and Solid State Disks (SSD), whose access times are orders of magnitude higher than those of central memory and CPU caches, therefore having a greater impact on overall task delays. We provide a comprehensive and up-to-date survey of the literature on I/O management within virtualized environments, focusing on software solutions proposed in the open source community, and discussing their main limitations in terms of realtime performance. Then, we discuss how the research in this subject may evolve in the future, highlighting the importance of techniques that are focused on scheduling not uniquely the processing bandwidth, but also the access to other important shared resources, like I/O devices.


2018 - Analytical Characterization of End-to-End Communication Delays With Logical Execution Time [Articolo su rivista]
Martinez, J.; Sanudo, I.; Bertogna, M.
abstract


2018 - Big data analytics for smart cities: The H2020 CLASS project [Relazione in Atti di Convegno]
Quinones, E.; Bertogna, M.; Hadad, E.; Ferrer, A. J.; Chiantore, L.; Reboa, A.
abstract


2018 - Convolutional Neural Networks on Embedded Automotive Platforms: A Qualitative Comparison [Relazione in Atti di Convegno]
Brilli, Gianluca; Burgio, Paolo; Bertogna, Marko
abstract

In the last decade, the rise of power-efficient, het- erogeneous embedded platforms paved the way to the effective adoption of neural networks in several application domains. Especially, many-core accelerators (e.g., GPUs and FPGAs) are used to run Convolutional Neural Networks, e.g., in autonomous vehicles, and industry 4.0. At the same time, advanced research on neural networks is producing interesting results in computer vision applications, and NN packages for computer vision object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy and perfor- mance. With this work, we aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art embedded platforms, with commercial-off-the-shelf System-on-Chips such as the NVIDIA Tegra X2 and Xilinx Ultrascale+. In our vision, this work will support the choice of the most appropriate CNN package and computing system, and at the same time tries to “make some order” in the field.


2018 - High-performance and time-predictable embedded computing [Curatela]
Pinho, L. M.; Quinones, E.; Bertogna, M.; Marongiu, A.; Nelis, V.; Gai, P.; Sancho, J.
abstract

Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms, Programming models, Mapping and scheduling of parallel computations, Timing and schedulability analysis, Runtimes and operating systems. The work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things.


2018 - Introduction [Capitolo/Saggio]
Pinho, L. M.; Quinones, E.; Bertogna, M.; Marongiu, A.; Nelis, V.; Gai, P.; Sancho, J.
abstract

This chapter provides an overview of the book theme, motivating the need for high-performance and time-predictable embedded computing. It describes the challenges introduced by the need for time-predictability on the one hand, and high-performance on the other, discussing on a high level how these contradictory requirements can be simultaneously supported.


2018 - Mapping, scheduling, and schedulability analysis [Capitolo/Saggio]
Burgio, P.; Bertogna, M.; Melani, A.; Quinones, E.; Serrano, M. A.
abstract

This chapter presents how the P-SOCRATES framework addresses the issue of scheduling multiple real-time tasks (RT tasks), made of multiple and concurrent non-preemptable task parts. In its most generic form, the scheduling problem in the architectural framework is a dual problem: scheduling task-to-threads, and scheduling thread-to-core replication.


2018 - Preface [Capitolo/Saggio]
Pinho, L. M.; Quinones, E.; Bertogna, M.; Marongiu, A.; Nelis, V.; Gai, P.; Sancho, J.
abstract


2018 - Work-in-Progress: NVIDIA GPU Scheduling Details in Virtualized Environments [Relazione in Atti di Convegno]
Capodieci, Nicola; Cavicchioli, Roberto; Bertogna, Marko
abstract

Modern automotive grade embedded platforms feature high performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications. Hence, a GPU scheduling approach with strong Real-Time guarantees is needed. While previous research efforts focused on reverse engineering the GPU ecosystem in order to understand and control GPU scheduling on NVIDIA platforms, we provide an in depth explanation of the NVIDIA standard approach to GPU application scheduling on a Drive PX platform. Then, we discuss how a privileged scheduling server can be used to enforce arbitrary scheduling policies in a virtualized environment.


2017 - A software stack for next-generation automotive systems on many-core heterogeneous platforms [Articolo su rivista]
Burgio, Paolo; Bertogna, Marko; Capodieci, Nicola; Cavicchioli, Roberto; Sojka, Michal; Houdek, Přemysl; Marongiu, Andrea; Gai, Paolo; Scordino, Claudio; Morelli, Bruno
abstract

The next-generation of partially and fully autonomous cars will be powered by embedded many-core platforms. Technologies for Advanced Driver Assistance Systems (ADAS) need to process an unprecedented amount of data within tight power budgets, making those platform the ideal candidate architecture. Integrating tens-to-hundreds of computing elements that run at lower frequencies allows obtaining impressive performance capabilities at a reduced power consumption, that meets the size, weight and power (SWaP) budget of automotive systems. Unfortunately, the inherent architectural complexity of many-core platforms makes it almost impossible to derive real-time guarantees using “traditional” state-of-the-art techniques, ultimately preventing their adoption in real industrial settings. Having impressive average performances with no guaranteed bounds on the response times of the critical computing activities is of little if no use in safety-critical applications. Project Hercules will address this issue, and provide the required technological infrastructure to exploit the tremendous potential of embedded many-cores for the next generation of automotive systems. This work gives an overview of the integrated Hercules software framework, which allows achieving an order-of-magnitude of predictable performance on top of cutting-edge Commercial-Off-The-Shelf components (COTS). The proposed software stack will let both real-time and non real-time application coexist on next-generation, power-efficient embedded platforms, with preserved timing guarantees.


2017 - A static scheduling approach to enable safety-critical OpenMP applications [Relazione in Atti di Convegno]
Melani, A.; Serrano, M. A.; Bertogna, M.; Cerutti, I.; Quinones, E.; Buttazzo, G.
abstract

Parallel computation is fundamental to satisfy the performance requirements of advanced safety-critical systems. OpenMP is a good candidate to exploit the performance opportunities of parallel platforms. However, safety-critical systems are often based on static allocation strategies, whereas current OpenMP implementations are based on dynamic schedulers. This paper proposes two OpenMP-compliant static allocation approaches: an optimal but costly approach based on an ILP formulation, and a sub-optimal but tractable approach that computes a worst-case makespan bound close to the optimal one.


2017 - Adaptive coordination in autonomous driving: Motivations and perspectives [Relazione in Atti di Convegno]
Bertogna, Marko; Burgio, Paolo; Cabri, Giacomo; Capodieci, Nicola
abstract

As autonomous cars are entering mainstream, new research directions are opening involving several domains, from hardware design to control systems, from energy efficiency to computer vision. An exciting direction of research is represented by the coordination of the different vehicles, moving the focus from the single one to a collective system. In this paper we propose some challenging examples thatshow the motivations for a coordination approach in autonomous driving. Moreover, we present some techniques borrowed from distributed artificial intelligence that can be exploited to tackle the previously mentioned challenges.


2017 - An analysis of lazy and eager limited preemption approaches under DAG-based global fixed priority scheduling [Relazione in Atti di Convegno]
Serrano, M. A.; Melani, A.; Kehr, S.; Bertogna, M.; Quinones, E.
abstract

DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficulties to find tight estimations of the worst-case timing parameters of tasks that may arbitrarily be preempted/migrated at any instruction. An efficient approach to increase the system predictability is to limit task preemptions to a set of pre-defined points. This limited preemption model supports two different preemption approaches, eager and lazy, which have been analyzed only for sequential task-sets. This paper proposes a new response time analysis that computes an upper bound on the lower priority blocking that each task may incur with eager and lazy preemptions. We evaluate our analysis with both, synthetic DAG-based task-sets and a real case-study from the automotive domain. Results from the analysis demonstrate that, despite the eager approach generates a higher number of priority inversions, the blocking impact is generally smaller than in the lazy approach, leading to a better schedulability performance.


2017 - Exact Response Time Analysis for Fixed Priority Memory-Processor Co-scheduling [Articolo su rivista]
Melani, Alessandra; Bertogna, Marko; Davis, Robert; Bonifaci, Vincenzo; Marchetti Spaccamela, Alberto; Buttazzo, Giorgio
abstract

Recent technological advances have led to an increasing gap between memory and processor performance, since memory bandwidth is progressing at a much slower pace than processor bandwidth. Pre-fetching techniques are traditionally used to bridge this gap and achieve high processor utilization while tolerating high memory latencies. Following this trend, new computational models have been proposed to split task execution in two consecutive phases: a memory phase in which the required instructions and data are pre-fetched to local memory (M-phase), and an execution phase in which the task is executed with no memory contention (C-phase). Decoupling memory and execution phases not only simplifies the timing analysis, but also allows a more efficient (and predictable) pipelining of memory and execution phases through proper co-scheduling algorithms. This paper takes a further step towards the design of smart co-scheduling algorithms for sporadic real-time tasks complying with the memory-computation (M/C) model, by proposing a theoretical framework aimed at tightly characterizing the schedulability improvement obtainable with the adopted M/C task model on single-core systems. In particular, a critical instant is identified for M/C tasks scheduled with fixed priority and an exact response time analysis with pseudo-polynomial complexity is provided. Then, we investigate the problem of priority assignment for M/C tasks, showing that a necessary condition to achieve optimality is to allow different priorities for the two phases. Our experiments show that the proposed techniques provide a significant schedulability improvement with respect to classic execution models, placing an important building block towards the design of more efficient partitioned multi-core systems.


2017 - Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms [Relazione in Atti di Convegno]
Cavicchioli, R.; Capodieci, N.; Bertogna, M.
abstract

Most of today’s mixed criticality platforms feature Systems on Chip (SoC) where a multi-core CPU complex (the host) competes with an integrated Graphic Processor Unit (iGPU, the device) for accessing central memory. The multi-core host and the iGPU share the same memory controller, which has to arbitrate data access to both clients through often undisclosed or non-priority driven mechanisms. Such aspect becomes critical when the iGPU is a high performance massively parallel computing complex potentially able to saturate the available DRAM bandwidth of the considered SoC. The contribution of this paper is to qualitatively analyze and characterize the conflicts due to parallel accesses to main memory by both CPU cores and iGPU, so to motivate the need of novel paradigms for memory centric scheduling mechanisms. We analyzed different well known and commercially available platforms in order to estimate variations in throughput and latencies within various memory access patterns, both at host and device side.


2017 - Preface [Relazione in Atti di Convegno]
Bertogna, M.; Maggio, M.
abstract


2017 - Schedulability Analysis of Conditional Parallel Task Graphs in Multicore Systems [Articolo su rivista]
Melani, Alessandra; Bertogna, Marko; Bonifaci, Vincenzo; Marchetti Spaccamela, Alberto; Buttazzo, Giorgio
abstract

Several task models have been introduced in the literature to describe the intrinsic parallelism of real-time activities, including fork/join, synchronous parallel, DAG-based, etc. Although schedulability tests and resource augmentation bounds have been derived for these task models in the context of multicore systems, they are still too pessimistic to describe the execution flow of parallel tasks characterized by multiple (and nested) conditional statements, where it is hard to decide which execution path to select for modeling the worst-case scenario. To overcome this problem, this paper proposes a task model that integrates control flow information by considering conditional parallel tasks (cp-tasks) represented by DAGs with both precedence and conditional edges. For this task model, a set of meaningful parameters are identified and computed by efficient algorithms and a response-time analysis is presented for different scheduling policies. Experimental results are finally reported to evaluate the efficiency of the proposed schedulability tests and their performance with respect to classic tests based on both conditional and non-conditional existing approaches.


2017 - SiGAMMA: Server based integrated GPU arbitration mechanism for memory accesses [Relazione in Atti di Convegno]
Capodieci, Nicola; Cavicchioli, Roberto; Valente, Paolo; Bertogna, Marko
abstract

In embedded systems, CPUs and GPUs typically share main memory. The resulting memory contention may significantly inflate the duration of CPU tasks in a hard-to-predict way. Despite initial solutions have been devised to control this undesired inflation, these approaches do not consider the interference due to memoryintensive components in COTS embedded systems like integrated Graphical Processing Units. Dealing with this kind of interference might require custom-made hardware components that are not integrated in off-the-shelf platforms. We address these important issues by proposing a memory-arbitration mechanism, SiGAMMA (Siσ), for eliminating the interference on CPU tasks caused by conflicting memory requests from the GPU. Tasks on the CPU are assumed to comply with a prefetch-based execution model (PREM) proposed in the real-time literature, while memory accesses from the GPU are arbitrated through a predictable mechanism that avoids contention. Our experiments show that Siσ proves to be very effective in guaranteeing almost null inflation to memory phases of CPU tasks, while at the same time avoiding excessive starvation of GPU tasks.


2016 - A Software Stack for Next-Generation Automotive Systems on Many-Core Heterogeneous Platforms [Relazione in Atti di Convegno]
Burgio, Paolo; Bertogna, Marko; Olmedo, Ignacio Sanudo; Gai, Paolo; Marongiu, Andrea; Sojka, Michal
abstract

The advent of commercial-of-the-shelf (COTS) heterogeneous many-core platforms is opening up a series of opportunities in the embedded computing market. Integrating multiple computing elements running at smaller frequencies allows obtaining impressive performance capabilities at a reduced power consumption. These platforms can be successfully adopted to build the next-generation of self-driving vehicles, where Advanced Driver Assistance Systems (ADAS) need to process unprecedently higher computing workloads at low power budgets. Unfortunately, the current methodologies for providing real-time guarantees are uneffective when applied to the complex architectures of modern many-cores. Having impressive average performances with no guaranteed bounds on the response times of the critical computing activities is of little if no use to these applications. Project HERCULES will provide the required technological infrastructure to obtain an order-of-magnitude improvement in the cost and power consumption of next generation automotive systems. This paper presents the integrated software framework of the project, which allows achieving predictable performance on top of cutting-edge heterogeneous COTS platforms. The proposed software stack will let both real-time and non real-time application coexist on next-generation, power-efficient embedded platform, with preserved timing guarantees.


2016 - A review of priority assignment in real-time systems [Articolo su rivista]
Davis, Robert I.; Cucu Grosjean, Liliana; Bertogna, Marko; Burns, Alan
abstract

It is over 40 years since the first seminal work on priority assignment for real-time systems using fixed priority scheduling. Since then, huge progress has been made in the field of real-time scheduling with more complex models and schedulability analysis techniques developed to better represent and analyse real systems. This tutorial style review provides an in-depth assessment of priority assignment techniques for hard real-time systems scheduled using fixed priorities. It examines the role and importance of priority in fixed priority scheduling in all of its guises, including: pre-emptive and non-pre-emptive scheduling; covering single- and multi-processor systems, and networks. A categorisation of optimal priority assignment techniques is given, along with the conditions on their applicability. We examine the extension of these techniques via sensitivity analysis to form robust priority assignment policies that can be used even when there is only partial information available about the system. The review covers priority assignment in a wide variety of settings including: mixed-criticality systems, systems with deferred pre-emption, and probabilistic real-time systems with worst-case execution times described by random variables. It concludes with a discussion of open problems in the area of priority assignment.


2016 - A survey on shared disk I/O management in virtualized environments under real time constraints [Relazione in Atti di Convegno]
Sanudo, I.; Cavicchioli, R.; Capodieci, N.; Valente, P.; Bertogna, M.
abstract

In the embedded systems domain, hypervisors are increasingly being adopted to guarantee timing isolation and appropriate hardware resource sharing among different software components. However, managing concurrent and parallel requests to shared hardware resources in a predictable way still represents an open issue. We argue that hypervisors can be an effective means to achieve an efficient and predictable arbitration of competing requests to shared devices in order to satisfy real-time requirements. As a representative example, we consider the case for mass storage (I/O) devices like Hard Disk Drives (HDD) and Solid State Disks (SSD), whose access times are orders of magnitude higher than those of central memory and CPU caches, therefore having a greater impact on overall task delays. We provide a comprehensive and up-to-date survey of the literature on I/O management within virtualized environments, focusing on software solutions proposed in the open source community, and discussing their main limitations in terms of realtime performance. Then, we discuss how the research in this subject may evolve in the future, highlighting the importance of techniques that are focused on scheduling not uniquely the processing bandwidth, but also the access to other important shared resources, like I/O devices.


2016 - Guest editorial: special issue on multicore systems [Articolo su rivista]
Caccamo, M.; Bertogna, M.
abstract


2016 - On the compatibility of exact schedulability tests for global fixed priority pre-emptive scheduling with Audsley’s optimal priority assignment algorithm [Articolo su rivista]
Davis, Robert I.; Bertogna, Marko; Bonifaci, Vincenzo
abstract

Audsley's optimal priority assignment (OPA) algorithm can be applied to multiprocessor scheduling provided that three conditions hold with respect to the schedulability tests used. In this short paper, we prove that no exact test for global fixed priority pre-emptive scheduling of sporadic tasks can be compatible with Audsley's algorithm, and hence the OPA algorithm cannot be used to obtain an optimal priority assignment for such systems.


2016 - Partitioning and Interface Synthesis in Hierarchical Multiprocessor Real-Time Systems [Relazione in Atti di Convegno]
Biondi, Alessandro; Buttazzo, Giorgio; Bertogna, Marko
abstract

Hierarchical scheduling is an effeective approach developed to support the integration of independently developed ap- plications on the same computing platform. In particular, the M-BROE framework has been recently proposed and analyzed to efficiently support component-based development on multiprocessor platforms through the virtual multiprocessor abstraction implemented by reservation servers, in the presence of shared resources. However, the problems of partitioning applications to virtual processors and defining reservation parameters were not addressed. This paper fills this gap by proposing a design methodology as an optimization problem for partitioning applications to virtual processors, performing a synthesis of the component interface and allocating virtual processors to physical processors. Experimental results are also presented to evaluate the proposed methodology.


2016 - Preface [Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)] [Relazione in Atti di Convegno]
Bertogna, M.; Pinho, L. M.; Quinones, E.
abstract


2016 - Response-Time Analysis of DAG Tasks under Fixed Priority Scheduling with Limited Preemption [Relazione in Atti di Convegno]
Serrano, Maria; Melani, Alessandra; Bertogna, Marko; Quiñones, Eduardo
abstract

Limited preemptive (LP) scheduling has been demonstrated to effectively improve the schedulability of fully preemptive (FP) and fully non-preemptive (FNP) paradigms. On one side, LP reduces the preemption related overheads of FP; on the other side, it restricts the blocking effects of FNP. However, LP has been applied to multi-core scenarios only when completely sequential task systems are considered. This paper extends the current state-of-the-art response time analysis for global fixed priority scheduling with fixed preemption points by deriving a new response time analysis for DAG-based task-sets.


2016 - Schedulability Analysis of Hierarchical Real-Time Systems under Shared Resources [Articolo su rivista]
Biondi, Alessandro; Buttazzo, Giorgio C.; Bertogna, Marko
abstract

Sharing resources in hierarchical real-time systems implemented with reservation servers requires the adoption of special budget management protocols that preserve the bandwidth allocated to a specific component. In addition, blocking times must be accurately estimated to guarantee both the global feasibility of all the servers and the local schedulability of applications running on each component. This paper presents two new local schedulability tests to verify the schedulability of real-time applications running on reservation servers under fixed priority and EDF local schedulers. Reservation servers are implemented with the BROE algorithm. A simple extension to the SRP protocol is also proposed to reduce the blocking time of the server when accessing global resources shared among components. The performance of the new schedulability tests are compared with other solutions proposed in the literature, showing the effectiveness of the proposed improvements. Finally, an implementation of the main protocols on a lightweight RTOS is described, highlighting the main practical issues that have been encountered.


2015 - A memory-centric approach to enable timing-predictability within embedded many-core accelerators [Relazione in Atti di Convegno]
Burgio, Paolo; Marongiu, Andrea; Valente, Paolo; Bertogna, Marko
abstract

There is an increasing interest among real-time systems architects for multi- and many-core accelerated platforms. The main obstacle towards the adoption of such devices within industrial settings is related to the difficulties in tightly estimating the multiple interferences that may arise among the parallel components of the system. This in particular concerns concurrent accesses to shared memory and communication resources. Existing worst-case execution time analyses are extremely pessimistic, especially when adopted for systems composed of hundreds-tothousands of cores. This significantly limits the potential for the adoption of these platforms in real-time systems. In this paper, we study how the predictable execution model (PREM), a memory-aware approach to enable timing-predictability in realtime systems, can be successfully adopted on multi- and manycore heterogeneous platforms. Using a state-of-the-art multi-core platform as a testbed, we validate that it is possible to obtain an order-of-magnitude improvement in the WCET bounds of parallel applications, if data movements are adequately orchestrated in accordance with PREM. We identify which system parameters mostly affect the tremendous performance opportunities offered by this approach, both on average and in the worst case, moving the first step towards predictable many-core systems.


2015 - Global and partitioned multiprocessor fixed priority scheduling with deferred preemption [Articolo su rivista]
Davis, Robert I.; Burns, Alan; Marinho, Jose; Nelis, Vincent; Petters, Stefan M.; Bertogna, Marko
abstract

This article introduces schedulability analysis for Global Fixed Priority Scheduling with Deferred Preemption (gFPDS) for homogeneous multiprocessor systems. gFPDS is a superset of Global Fixed Priority Preemptive Scheduling (gFPPS) and Global Fixed Priority Nonpreemptive Scheduling (gFPNS). We show how schedulability can be improved using gFPDS via appropriate choice of priority assignment and final nonpreemptive region lengths, and provide algorithms that optimize schedulability in this way. Via an experimental evaluation we compare the performance of multiprocessor scheduling using global approaches: gFPDS, gFPPS, and gFPNS, and also partitioned approaches employing FPDS, FPPS, and FPNS on each processor.


2015 - Memory-processor co-scheduling in fixed priority systems [Relazione in Atti di Convegno]
Melani, Alessandra; Bertogna, Marko; Bonifaci, Vincenzo; Marchetti Spaccamela, Alberto; Buttazzo, Giorgio
abstract

A major obstacle towards the adoption of multi-core platforms for real-time systems is given by the difficulties in characterizing the interference due to memory contention. The simple fact that multiple cores may simultaneously access shared memory and communication resources introduces a significant pessimism in the timing and schedulability analysis. To counter this problem, predictable execution models have been proposed splitting task executions into two consecutive phases: a memory phase in which the required instruction and data are pre-fetched to local memory (M-phase), and an execution phase in which the task is executed with no memory contention (C-phase). Decoupling memory and execution phases not only simplifies the timing analysis, but it also allows a more efficient (and predictable) pipelining of memory and execution phases through proper co-scheduling algorithms. In this paper, we take a further step towards the design of smart co-scheduling algorithms for sporadic real-time tasks complying with the M/C (memory-computation) model. We provide a theoretical framework that aims at tightly characterizing the schedulability improvement obtainable with the adopted M/C task model on a single-core systems. We identify a tight critical instant for M/C tasks scheduled with fixed priority, providing an exact response-time analysis with pseudo-polynomial complexity. We show in our experiments that a significant schedulability improvement may be obtained with respect to classic execution models, placing an important building block towards the design of more efficient partitioned multi-core systems.


2015 - Multiprocessor fixed priority scheduling with limited preemptions [Relazione in Atti di Convegno]
Thekkilakattil, Abhilash; Davis, Robert I.; Dobrin, Radu; Punnekkat, Sasikumar; Bertogna, Marko
abstract

Challenges associated with allowing preemptions and migrations are compounded in multicore systems, particularly under global scheduling policies, because of the potentially high overheads. For example, multiple levels of cache greatly increase preemption and migration related overheads as well as the difficulty involved in accurately accounting for them, leading to substantially inflated worst-case execution times (WCETs). Preemption and migration related overheads can be significantly reduced, both in number and in size, by using fixed preemption points in the tasks' code; thus dividing each task into a series of non-preemptive regions (NPRs). This leads to an additional consideration in the scheduling policy. When a high priority task is released and all of the processors are executing non-preemptive regions of lower priority tasks, then there is a choice to be made in terms of how to manage the next preemption. With an eager approach the first lower priority task to reach a preemption point is preempted even if it is not the lowest priority running task. Alternatively, with a lazy approach, preemption is delayed until the lowest priority currently running task reaches its next preemption point. In this paper, we show that under global fixed priority scheduling with eager preemptions each task suffers from at most a single priority inversion each time it resumes execution. Building on this observation, we derive a new response time based schedulability test for tasks with fixed preemption points. Experimental evaluations show that global fixed priority scheduling with eager preemptions is significantly more effective than with lazy preemption using link based scheduling in terms of task set schedulability.


2015 - P-SOCRATES: A parallel software framework for time-critical many-core systems [Articolo su rivista]
Pinho, Luís Miguel; Nélis, Vincent; Yomsi, Patrick Meumeu; Quiñones, Eduardo; Bertogna, Marko; Burgio, Paolo; Marongiu, Andrea; Scordino, Claudio; Gai, Paolo; Ramponi, Michele; Mardiak, Michal
abstract

Current generation of computing platforms is embracing multi-core and many-core processors to improve the overall performance of the system, meeting at the same time the stringent energy budgets requested by the market. Parallel programming languages are nowadays paramount to extracting the tremendous potential offered by these platforms: parallel computing is no longer a niche in the high performance computing (HPC) field, but an essential ingredient in all domains of computer science. The advent of next-generation many-core embedded platforms has the chance of intercepting a converging need for predictable high-performance coming from both the High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. This converging demand raises the problem about how to guarantee timing requirements in presence of parallel execution. The paper presents how the time-criticality and parallelisation challenges are addressed by merging techniques coming from both HPC and EC domains, and provides an overview of the proposed framework to achieve these objectives.


2015 - Response-Time Analysis of Conditional DAG Tasks in Multiprocessor Systems [Relazione in Atti di Convegno]
Melani, Alessandra; Bertogna, Marko; Bonifaci, Vincenzo; Marchetti Spaccamela, Alberto; Buttazzo, Giorgio C.
abstract

Different task models have been proposed to represent the parallel structure of real-time tasks executing on manycore platforms: fork/join, synchronous parallel, DAG-based, etc. Despite different schedulability tests and resource augmentation bounds are available for these task systems, we experience difficulties in applying such results to real application scenarios, where the execution flow of parallel tasks is characterized by multiple (and nested) conditional structures. When a conditional branch drives the number and size of sub-jobs to spawn, it is hard to decide which execution path to select for modeling the worst-case scenario. To circumvent this problem, we integrate control flow information in the task model, considering conditional parallel tasks (cp-tasks) represented by DAGs composed of both precedence and conditional edges. For this task model, we identify meaningful parameters that characterize the schedulability of the system, and derive efficient algorithms to compute them. A response time analysis based on these parameters is then presented for different scheduling policies. A set of simulations shows that the proposed approach allows efficiently checking the schedulability of the addressed systems, and that it significantly tightens the schedulability analysis of non-conditional (e.g., Classic DAG) tasks over existing approaches.


2015 - Supporting Component-Based Development in Partitioned Multiprocessor Real-Time Systems [Relazione in Atti di Convegno]
Biondi, Alessandro; Buttazzo, Giorgio C.; Bertogna, Marko
abstract

The fast evolution of multicore systems, combined with the need of sharing the same platform for independently developed software, demands for new methodologies and algorithms that allow resource partitioning, while guaranteeing the isolation of concurrent applications. Unfortunately, a major problem that can break the isolation property of concurrent partitions is resource sharing. Although a number of resource access protocols exist for hierarchical uniprocessor systems, no protocols are available today for managing hierarchical partitions implemented on top a multiporcessor platform under partitioned scheduling. This paper presents a framework to support component based design on a multiprocessor platform and proposes a novel reservation server mechanism, called M-BROE, to handle shared resources in multiprocessor systems in the presence of resource reservation scheduling mechanisms.


2015 - Timing characterization of OpenMP4 tasking model [Relazione in Atti di Convegno]
Serrano, Maria A.; Melani, Alessandra; Vargas, Roberto; Marongiu, Andrea; Bertogna, Marko; Quiñones, Eduardo
abstract

OpenMP is increasingly being supported by the newest high-end embedded many-core processors. Despite the lack of any notion of real-time execution, the latest specification of OpenMP (v4.0) introduces a tasking model that resembles the way real-time embedded applications are modeled and designed, i.e., as a set of periodic task graphs. This makes OpenMP4 a convenient candidate to be adopted in future real-time systems. However, OpenMP4 incorporates as well features to guarantee backward compatibility with previous versions that limit its practical usability in real-time systems. The most notable example is the distinction between tied and untied tasks. Tied tasks force all parts of a task to be executed on the same thread that started the execution, whereas a suspended untied task is allowed to resume execution on a different thread. Moreover, tied tasks are forbidden to be scheduled in threads in which other non-descendant tied tasks are suspended. As a result, the execution model of tied tasks, which is the default model in OpenMP to simplify the coexistence with legacy constructs, clearly restricts the performance and has serious implications on the response time analysis of OpenMP4 applications, making difficult to adopt it in real-time environments. In this paper, we revisit OpenMP design choices, introducing timing predictability as a new and key metric of interest. Our first results confirm that even if tied tasks can be timing analyzed, the quality of the analysis is much worse than with untied tasks. We thus reason about the benefits of using untied tasks, deriving a response time analysis for this model, and so allowing OpenMP4 untied model to be applied to real-time systems.


2014 - Explicit Preemption Placement for Real-Time Conditional Code [Relazione in Atti di Convegno]
Peng, Bo; Fisher, Nathan; Bertogna, Marko
abstract

In the limited-preemption scheduling model, tasks cooperate to offer suitable preemption points for reducing the overall preemption overhead. In the fixed preemption-point model, tasks are allowed to preempt only at statically defined preemption points, reducing the variability of the preemption delay and making the system more predictable. Different works have been proposed to determine the optimal selection of preemption points for minimizing the preemption overhead without affecting the system schedulability due to increased non-preemptivity. However, all works are based on very restrictive task models, without being able to deal with common coding structures like branches, conditional statements and loops. In this work, we overcome this limitation, by proposing a pseudo-polynomial-time algorithm that is capable of determining the optimal set of preemption points to minimize the worst-case execution time of jobs represented by control flow graphs with arbitrarily-nested conditional structures, while preserving system schedulability. Exhaustive experiments are included to show that the proposed approach is able to significantly improve the bounds on the worst-case execution times of limited preemptive tasks.


2014 - Hard Constant Bandwidth Server: Comprehensive formulation and critical scenarios [Relazione in Atti di Convegno]
Biondi, Alessandro; Melani, Alessandra; Bertogna, Marko
abstract

The Constant Bandwidth Server (CBS) is one of the most used algorithms for implementing resource reservation upon deadline-based schedulers. Although many CBS variants are available in the literature, no proper formalization has been proposed for the CBS in the context of hard reservations, where it is essential to guarantee a bounded-delay service across applications. Existing formulations are affected by a problem that can expose the system to dangerous deadline misses in the presence of blocking. This paper analyzes such a problem and presents a comprehensive and consistent formulation of the CBS for hard reservation scenarios. An overview of the contexts in which a hard CBS can be applied is also provided, focusing on the impact that previous formulations can have on schedulability, when used in conjunction with specific resource sharing protocols or other scheduling mechanisms that may cause a server to block.


2014 - On the effectiveness of energy-aware real-time scheduling algorithms on single-core platforms [Relazione in Atti di Convegno]
Bambagini, Mario; Bertogna, Marko; Buttazzo, Giorgio
abstract

Energy-aware scheduling is a challenging problem that has been studied for decades, investigating the trade-off between performance and energy consumption. In early CMOS circuits, Dynamic Voltage and Frequency Scaling (DVFS) techniques allowed drastically reducing the power consumption. Recent technological advancements have decreased the portion of dissipation which is affected by speed scaling, making Dynamic Power Management (DPM) algorithms more effective. However, the adoption of simplistic power models often biased the decision on which technique to adopt, decreasing the effectiveness of the selected implementation. This paper discusses the factors to consider when deciding which technique to implement on a given single-core architecture, highlight the limitations of the current mainstream.


2014 - Optimal Design for Reservation Servers under Shared Resources [Relazione in Atti di Convegno]
Biondi, Alessandro; Melani, Alessandra; Bertogna, Marko; Buttazzo, Giorgio
abstract

Modularity and hierarchical-based design are crucial features that need to be supported in complex embedded systems characterized by multiple applications with timing requirements.Resource reservation is a powerful scheduling mechanism for achieving such goals and providing temporal isolation among different real-time applications. When different applications share mutually exclusive resources, a precise feasibility analysis can still be performed in isolation, using specific resource access protocols, taking into account only the application features and the reservation parameters. This paper presents a methodology for selecting the parameters of each reservation in order to guarantee the feasibility of the served applications and minimize the required bandwidth.


2014 - P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems [Relazione in Atti di Convegno]
Pinho, L M; Quinones, E; Bertogna, M; Marongiu, A; Carlos, J P; Scordino, C; Ramponi, M
abstract

The advent of next-generation many-core embedded platforms has the chance of intercepting a converging need for predictable high-performance coming from both the High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. This converging demand, however, raises the problem about how to guarantee timing requirements in presence of parallel execution. This paper presents the approach of project P-SOCRATES for the design of an integrated framework for the execution of workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The time-criticality and parallelisation challenges are addressed by merging techniques coming from both HPC and EC domains, identifying the main sources of indeterminism and proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.


2014 - Response-Time Analysis of Synchronous Parallel Tasks in Multiprocessor Systems [Relazione in Atti di Convegno]
Maia, Cláudio; Bertogna, Marko; Nogueira, Luís; Pinho, Luis Miguel
abstract

Programmers resort to user-level parallel frameworks in order to exploit the parallelism provided by multiprocessor platforms. While such general frameworks do not support the stringent timing requirements of real-time systems, they offer a useful model of computation based on the standard fork/join, for which the analysis of timing properties makes sense. Very few works analyse the schedulability of synchronous parallel real-time tasks, which is a generalisation of the standard fork/join model. This paper proposes to narrow the gap by presenting a model that analyses the response-time of synchronous parallel real-time tasks. The model under consideration targets tasks with fixed priorities, composed of several segments with an arbitrary number of parallel and independent units of execution. We contribute to the state-of-the-art by analysing the response-time behaviour of synchronous parallel tasks. To accomplish this, we take into account concepts previously proposed in the literature and define new concepts such as carry-out decomposition and sliding window technique in order to compute the worst-case workload in a window of interest. Results show that the proposed approach is significantly better than current approaches, improving the state-of-the-art analysis of parallel real-time tasks.


2014 - The Challenge of Time-Predictability in Modern Many-Core Architectures [Relazione in Atti di Convegno]
Nã©lis, Vincent; Yomsi, Patrick Meumeu; Pinho, Luís Miguel; Fonseca, José Carlos; Bertogna, Marko; Quiã±ones, Eduardo; Vargas, Roberto; Marongiu, Andrea
abstract

The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way. © Vincent Nélis, Patrick Meumeu Yomsi, Luís Miguel Pinho, José Carlos Fonseca, Marko Bertogna, Eduardo Quiñones, Roberto Vargas, and Andrea Marongiu.


2013 - An energy-aware algorithm exploiting limited preemptive scheduling under fixed priorities [Relazione in Atti di Convegno]
Bambagini, Mario; Bertogna, Marko; Marinoni, Mauro; Buttazzo, Giorgio
abstract

This paper presents a new energy-aware algorithm that integrates Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM) techniques to further reduce energy consumption in embedded systems. It consists of an off-line DVFS-stage, for computing the speed that minimizes energy consumption during active intervals while guaranteeing timing constraints, and an online DPM-stage, for prolonging sleep intervals by postponing task execution. Moreover, limited preemptive scheduling is exploited to reduce preemption costs and further extend sleep intervals under fixed-priority systems, with respect to fully preemptive schedulers. The online algorithm has a constant complexity and preemption costs are taken into account in the analysis. A set of simulation experiments are reported to illustrate the behavior of the proposed approach as a function of different parameters and compare its performance with the state-of-art methods available in the literature.


2013 - Global fixed priority scheduling with deferred pre-emption [Relazione in Atti di Convegno]
Davis, R. I.; Burns, A.; Marinho, J.; Nelis, V.; Petters, S. M.; Bertogna, Marko
abstract

This paper introduces schedulability analysis for global fixed priority scheduling with deferred pre-emption (gFPDS) for homogeneous multiprocessor systems. gFPDS is a superset of global fixed priority pre-emptive scheduling (gFPPS) and global fixed priority non-pre-emptive scheduling (gFPNS). We show how schedulability can be improved via appropriate choice of priority assignment and final non-pre-emptive region lengths, and we provide algorithms which optimize schedulability in this way. An experimental evaluation shows that gFPDS significantly outperforms both gFPPS and gFPNS.


2013 - Limited Pre-emptive Global Fixed Task Priority [Relazione in Atti di Convegno]
Marinho, Jose; Nelis, Vincent; Petters, Stefan M.; Bertogna, Marko; Davis, Robert I.
abstract

In this paper a limited pre-emptive global fixed task priority scheduling policy for multiprocessors is presented. This scheduling policy is a generalization of global fully pre-emptive and non-pre-emptive fixed task priority policies for platforms with at least two homogeneous processors. The scheduling protocol devised is such that a job can only be blocked at most once by a body of lower priority non-pre-emptive workload. The presented policy dominates both fully pre-emptive and fully non-pre-emptive with respect to schedulability. A sufficient schedulability test is presented for this policy. Several approaches to estimate the blocking generated by lower priority non-pre-emptive regions are presented. As a last contribution it is experimentally shown that, on the average case, the number of pre-emptions observed in a schedule are drastically reduced in comparison to global fully pre-emptive scheduling.


2013 - Limited Preemptive Scheduling for Real-Time Systems. A Survey [Articolo su rivista]
Buttazzo, Giorgio C.; Bertogna, Marko; Yao, Gang
abstract

The question whether preemptive algorithms are better than nonpreemptive ones for scheduling a set of real-time tasks has been debated for a long time in the research community. In fact, especially under fixed priority systems, each approach has advantages and disadvantages, and no one dominates the other when both predictability and efficiency have to be taken into account in the system design. Recently, limited preemption models have been proposed as a viable alternative between the two extreme cases of fully preemptive and nonpreemptive scheduling. This paper presents a survey of the existing approaches for reducing preemptions and compares them under different metrics, providing both qualitative and quantitative performance evaluations.


2012 - Energy Saving Exploiting the Limited Preemption Task Model [Relazione in Atti di Convegno]
Mario, Bambagini; Giorgio, Buttazzo; Bertogna, Marko
abstract

Limited preemptive scheduling has been shown to dominate both non-preemptive and fully preemptive scheduling under fixed priority systems, as far as schedulability is concerned. This paper suggests the use of DVS and DMP techniques under limited preemptive scheduling to further reduce energy consumption with respect to a fully preemptive or non-preemptive approach.


2012 - Execution and Resource Management in QoS-Aware Virtualized Infrastructures [Capitolo/Saggio]
D., Lamp; S., Berger; M., Stein; T., Voith; T., Cucinotta; Bertogna, Marko
abstract

Both real-time systems and virtualization have been important research topics for quite some time now. Having competing goals, research on the correlation of these topics has started only recently. This chapter overviews recent results in the research literature on virtualized large-scale systems and soft real-time systems. These concepts constitute the fundamental background over which the execution environment of any large-scale service-oriented real-time architecture for highly interactive, distributed, and virtualized applications will be built in the future. While many aspects covered in this chapter have already been adopted in commercial products, others are still under intensive investigation in research labs all over the world.


2012 - Extending Fixed Task-Priority Schedulability by Interference Limitation [Relazione in Atti di Convegno]
José, Marinho; Stefan M., Petters; Bertogna, Marko
abstract

While the earliest deadline first algorithm is known to be optimal as a uniprocessor scheduling policy, the implementation comes at a cost in terms of complexity. Fixed task-priority algorithms on the other hand have lower complexity but higher likelihood of task sets being declared unschedulable, when compared to earliest deadline first (EDF). Various attempts have been undertaken to increase the chances of proving a task set schedulable with similar low complexity. In some cases, this was achieved, by modifying applications to limit preemptions, at the cost of flexibility. In this work we explore several variants of a concept to limit interference by locking down the ready queue at certain instances. The aim is to increase the prospects of schedulability of a given task system, without compromising on complexity or flexibility, when compared to the regular fixed task-priority algorithm. As a final contribution a new preemption threshold assignment algorithm is provided which is less complex and more straightforward than the previous method available in the literature.


2012 - Optimal Fixed Priority Scheduling with Deferred Preemptions [Relazione in Atti di Convegno]
Davis, Rob; Bertogna, Marko
abstract

The schedulability of systems using fixed priority pre-emptive scheduling can be improved by the use of non-pre-emptive regions at the end of each task's execution, an approach referred to as deferred pre-emption. Choosing the appropriate length for the final non-pre-emptive region of each task is a trade-off between improving the worst-case response time of the task itself and increasing the amount of blocking imposed on higher priority tasks. In this paper we present an optimal algorithm for determining both the priority ordering of tasks and the lengths of their final non-pre-emptive regions. This algorithm is optimal for fixed priority scheduling with deferred pre-emption, in the sense that it is guaranteed to find a schedulable combination of priority ordering and final non-pre-emptive region lengths if such a schedulable combination exists.


2012 - Optimal fixed priority scheduling with deferred pre-emption [Relazione in Atti di Convegno]
Davis, R. I.; Bertogna, M.
abstract

The schedulability of systems using fixed priority pre-emptive scheduling can be improved by the use of non-pre-emptive regions at the end of each task's execution, an approach referred to as deferred pre-emption. Choosing the appropriate length for the final non-pre-emptive region of each task is a trade-off between improving the worst-case response time of the task itself and increasing the amount of blocking imposed on higher priority tasks. In this paper we present an optimal algorithm for determining both the priority ordering of tasks and the lengths of their final non-pre-emptive regions. This algorithm is optimal for fixed priority scheduling with deferred pre-emption, in the sense that it is guaranteed to find a schedulable combination of priority ordering and final non-pre-emptive region lengths if such a schedulable combination exists. © 2012 IEEE.


2011 - Feasibility Analysis under Fixed Priority Scheduling with Limited Preemptions [Articolo su rivista]
Gang, Yao; Giorgio, Buttazzo; Bertogna, Marko
abstract

Preemptive scheduling often generates a significant runtime overhead that may increase task worst-case execution times up to 40%, with respect to a fully non preemptive execution. In small embedded systems, such an extra cost results in longer and more variable response times that can significantly affect the overall energy consumption, as well as the system predictability. Limiting preemptions is often possible without jeopardizing schedulability. Although several authors addressed schedulability analysis under different forms of limited preemptive scheduling, current results exhibit two major deficiencies: (i) The maximum lengths of the non-preemptive regions for each task are still unknown under fixed priorities; (i) The exact response time analysis for tasks with fixed preemption points is too complex.This paper presents the schedulability analysis of real-time tasks with non preemptive regions, under fixed priority assignments. In particular, two different preemption models are considered: the floating and the fixed preemption point model. Under each model, the feasibility analysis is addressed by deriving simple and effective schedulability tests, as well as an algorithm for computing the maximum length of the non-preemptive regions for each task. Finally, simulation experiments are presented to compare the two models in terms of schedulability.


2011 - Improving Feasibility of Fixed Priority Tasks using Non-Preemptive Regions [Relazione in Atti di Convegno]
Bertogna, Marko; G., Buttazzo; G., Yao
abstract

Preemptive schedulers have been widely adopted in single processor real-time systems to avoid the blocking associated with the non-preemptive execution of lower priority tasks and achieve a high processor utilization. However, under fixed priority assignments, there are cases in which limiting preemptions can improve schedulability with respect to a fully preemptive solution. This is true even neglecting preemption overhead, as it will be shown in the paper. In previous works, limited-preemption schedulers have been mainly considered to reduce the preemption overhead, and make the estimation of worst-case execution times more predictable. In this work, we instead show how to improve the feasibility of fixed-priority task systems by executing the last portion of each task in a non-preemptive fashion. A proper dimensioning of such a region of code allows increasing the number of task sets that are schedulable with a fixed priority algorithm. Simulation experiments are also presented to validate the effectiveness of the proposed approach.


2011 - Optimal Selection of Preemption Points to Minimize Preemption Overhead [Relazione in Atti di Convegno]
Bertogna, Marko; O., Xhani; M., Marinoni; F., Esposito; G., Buttazzo
abstract

A central issue for verifying the schedulability of hard real-time systems is the correct evaluation of task execution times. These values are significantly influenced by the preemption overhead, which mainly includes the cache related delays and the context switch times introduced by each preemption. Since such an overhead significantly depends on the particular point in the code where preemption takes place, this paper proposes a method for placing suitable preemption points in each task in order to maximize the chances of finding a schedulable solution. In a previous work, we presented a method for the optimal selection of preemption points under the restrictive assumption of a fixed preemption cost, identical for each preemption point. In this paper, we remove such an assumption, exploring a more realistic and complex scenario where the preemption cost varies throughout the task code. Instead of modeling the problem with an integer programming formulation, with exponential worst-case complexity, we derive an optimal algorithm that has a linear time and space complexity. This somewhat surprising result allows selecting the best preemption points even in complex scenarios with a large number of potential preemption locations. Experimental results are also presented to show the effectiveness of the proposed approach in increasing the system schedulability.


2011 - Tests for global EDF schedulability analysis [Articolo su rivista]
Bertogna, Marko; S., Baruah
abstract

Several schedulability tests have been proposed for global EDF scheduling on identical multiprocessors. All these tests are sufficient, rather than exact. These different tests were, for the most part, independently developed. The relationships among such tests have not been adequately investigated, so that it is difficult to understand which test is most appropriate in a particular given scenario. This paper represents an attempt to remedy this, by means of three major contributions. First, we summarize the main existing results for the schedulability analysis of multiprocessor systems scheduled with global edf, showing, when possible, existing dominance relations. We compare these algorithms taking into consideration different aspects, namely, run-time complexity, average performances over randomly generated workloads, sustainability properties and speedup factors. Second, based on this comparative evaluation we propose a recommended approach to schedulability analysis, that suggests a particular order in which to apply preexisting tests, thereby accomplishing both good provable performance and good behavior in practice. And finally, we propose a further improvement to one of these preexisting tests to improve its run-time performance by an order of magnitude, while completely retaining its ability to correctly identify schedulable systems.


2011 - The Explicit Preemption Placement Problem for Real-Time Conditional Code [Relazione in Atti di Convegno]
Bertogna, Marko; Nathan, Fisher
abstract

In this abstract, we propose specific open problems towardsextending the EPP approach of Bertogna et al. for handlinggeneral conditional code. Furthermore, our objective is toobtain a solution that retains the efficient running time of thenon-branching version of the problem. We believe that suchextensions are absolutely necessary for the EPP approach tobe widely applicable and useful to a real-time system designer.


2010 - Comparative evaluation of limited preemptive methods [Relazione in Atti di Convegno]
G., Yao; G., Buttazzo; Bertogna, Marko
abstract

Schedulability analysis of real-time systems requires the knowledge of the worst-case execution time (WCET) of each computational activity. A precise estimation of such a task parameter is quite difficult to achieve, because execution times depend on many factors, including the task structure, the system architecture details, operating system features and so on. While some of these features are not under our control, selecting a proper scheduling algorithm can reduce the runtime overhead and make the WCETs smaller and more predictable. In particular, since task execution times can be significantly affected by preemptions, a number of scheduling methods have been proposed in the real-time literature to limit preemption during task execution. In this paper, we provide a comprehensive overview of the possible scheduling approaches that can be used to contain preemptions and present a comparative study aimed at evaluating their impact on task execution times.


2010 - Feasibility Analysis under Fixed Priority Scheduling with Fixed Preemption Points [Relazione in Atti di Convegno]
G., Yao; G., Buttazzo; Bertogna, Marko
abstract

Limited preemption models have been proposed as a viable alternative between the two extreme cases of fully preemptive and non-preemptive scheduling. In particular, allowing preemption to occur only at predefined preemption points reduces context switch costs, simplifies the access to shared resources, and allows more predictable estimations of worst-case execution times. Current results related to such a model, however, exhibit two major deficiencies: (i) The exact response time analysis has a high computational complexity; (ii) The maximum lengths of then on-preemptive regions was not completely investigated in all possible scenarios. In this paper, we address the problem of scheduling a set of real-time tasks having fixed priorities and fixed preemption points. In particular, under specific but not restrictive assumptions we simplified the feasibility analysis and proposed an efficient feasibility test. Finally, an algorithm for computing the maximum length of fixed non-preemptive regions for each task is described, and some simulation experiments are presented to validate the proposed approach.


2010 - Limited preemption EDF scheduling of sporadic task systems [Articolo su rivista]
Bertogna, Marko; S., Baruah
abstract

The optimality of the Earliest Deadline First scheduler for uniprocessor systems is one of the main reasons behind the popularity of this algorithm among real-time systems. The ability of fully utilizing the computational power of a processing unit however requires the possibility of preempting a task before its completion. When preemptions are disabled, the schedulability overhead could be significant, leading to deadline misses even at system utilizations close to zero. On the other hand, each preemption causes an increase in the runtime overhead due to the operations executed during a context switch and the negative cache effects resulting from interleaving tasks' executions. These factors have been often neglected in previous theoretical works, ignoring the cost of preemption in real applications. A hybrid limited-preemption real-time scheduling algorithm is derived here, that aims to have low runtime overhead while scheduling all systems that can be scheduled by fully preemptive algorithms. This hybrid algorithm permits preemption where necessary for maintaining feasibility, but attempts to avoid unnecessary preemptions during runtime. The positive effects of this approach are not limited to a reduced runtime overhead, but will be extended as well to a simplified handling of shared resources.


2010 - Preemption points placement for sporadic task sets [Relazione in Atti di Convegno]
Bertogna, Marko; G., Buttazzo; M., Marinoni; G., Yao; F., Esposito; M., Caccamo
abstract

Limited preemption scheduling has been introduced as a viable alternative to non-preemptive and fully preemptive scheduling when reduced blocking times need to coexist with an acceptable context switch overhead. To achieve this goal, preemptions are allowed only at selected points of the code of each task, decreasing the preemption overhead and simplifying the estimation of worst-case execution parameters. Unfortunately, the problem of how to place these preemption points is rather complex and has not been solved. In this paper, a method is presented for the optimal placement of preemption points under simplifying conditions, namely, a fixed preemption overhead at each point. We will prove that if our method is not able to produce a feasible schedule, then no other possible preemption point placement (including non-preemptive and fully preemptive scheduling) can find a schedulable solution. The presented method is general enough to be applicable to both EDF and Fixed Priority scheduling, with limited modifications.


2010 - Reducing the Peak Power through Real-Time Scheduling Techniques in Cyber-Physical Energy Systems [Relazione in Atti di Convegno]
T., Facchinetti; E., Bini; Bertogna, Marko
abstract

This paper presents a method for applying realtime scheduling techniques to balance the power usage of electric loads in cyber-physical energy systems. The aim of the proposed approach is to achieve predictability of the activation of electric loads to guarantee an upper bound on the peak electric power consumption.The contribution of this paper encompasses several aspects. The relevance of balancing electric loads is discussed, motivating the use of real-time scheduling techniques to achieve predictability on electric-load management. We introduce the innovation of modeling the physical system as a set of periodically activated loads, that can be effectively managed by adequately adapting traditional real-time system models and scheduling algorithms, to guarantee an upper bound on the peak power consumption. For this purpose, we present a problem formulation based on linear programming, while a low-complexity heuristic is proposed to limit the complexity of the optimization process. Simulation results are presented to assess the performance of proposed methods.


2010 - Sporadic Server Revisited [Relazione in Atti di Convegno]
D., Faggioli; Bertogna, Marko; F., Checconi
abstract

The Sporadic Server (SS) overcomes the major limitations of other Resource Reservation Fixed Priority based techniques, but it also presents some drawbacks, mainly related to an increased scheduling overhead and a not so efficient behavior during overrun situations.In this paper we introduce and prove the effectiveness of an improved SS with reduced overhead and fairer handling of server overrun situations. We also show how this can be efficiently exploited to provide temporal isolation in a multiprocessor platform, adapting already existing schedulability tests.


2010 - The Parallel Supply Function Abstraction for a Virtual Multiprocessor [Relazione in Atti di Convegno]
E., Bini; Bertogna, Marko; S., Baruah
abstract

A new abstraction — the Parallel Supply Function (PSF) — is proposed for representing the computing capabilities offered by virtual platforms implemented atop identical multiprocessors. It is shown that this abstraction is strictly more powerful than previously-proposed ones, from the perspective of more accurately representing the inherent parallelism of the provided computing capabilities. Sufficient tests are derived for determining whether a given real-time task system, represented as a collection of sporadic tasks, is guaranteed to always meet all deadlines when scheduled upon a specified virtual platform using the global EDF scheduling algorithm.


2009 - An Implementation of the Earliest Deadline First Algorithm in Linux [Relazione in Atti di Convegno]
D., Faggioli; M., Trimarchi; F., Checconi; Bertogna, Marko; A., Mancina
abstract

Recently, many projects have been started to introduce some real-time mechanisms into general purpose operating systems (GPOS) in order to make them capable of providing the users with some temporal guarantees. Many of these projects focused especially on Linux for its capillary and widespread adoption throughout many different research and industrial environments.By tracking the kernel release cycle, we propose an efficient Earliest Deadline First implementation in the form of a patch-set against the 2.6.27 version, that is the latest released one, as of now. Our implementation provides the user with the possibility to choose SCHED_EDF as one of the possible scheduling policies for a task, with an enhanced version of the standard algorithm. In fact, we propose a new approach to shared resources' access which, differently from many other previous existing works, does not require the user to specify any parameters about the critical sections every task will enter during its execution.


2009 - Bounding the Maximum Length of Non-Preemptive Regions Under Fixed Priority Scheduling [Relazione in Atti di Convegno]
G., Yao; G., Buttazzo; Bertogna, Marko
abstract

The question whether preemptive systems are better than non-preemptive systems has been debated for a long time, but only partial answers have been provided in the real-time literature and still some issues remain open. In fact, each approach has advantages and disadvantages, and no one dominates the other when both predictability and efficiency have to be taken into account in the system design. In particular, limiting preemptions allows increasing program locality, making timing analysis more predictable with respect to the fully preemptive case. In this paper, we integrate the features of both preemptive and non-preemptive scheduling by considering that each task can switch to non-preemptive mode, at any time, for a bounded interval. Three methods (with different complexity and performance) are presented to calculate the longest non-preemptive interval that can be executed by each task, under fixed priorities, without degrading the schedulability of the task set, with respect to the fully preemptive case. The methods are also compared by simulations to evaluate their effectiveness in reducing the number of preemptions.


2009 - Evaluation of existing schedulability tests for global EDF [Relazione in Atti di Convegno]
Bertogna, Marko
abstract

The increasing attention on global scheduling algorithms for identical multiprocessor platforms produced different, independently developed, schedulability tests. However, the existing relations among such tests have not been sufficiently clarified, so that it is difficult to understand which strategy provides the best performances in a particular scenario. In this paper, we will summarize the main existing results for the schedulability analysis of multiprocessor systems scheduled with global EDF, showing, when possible, existing dominance relations. We will compare these algorithms taking into consideration different aspects, namely, run-time complexity, average performances over a randomly generated workload, sustainability properties and speedup factors.


2009 - Improving Task Responsiveness with Limited Preemptions [Relazione in Atti di Convegno]
Y., Wu; Bertogna, Marko
abstract

The optimality of preemptive EDF scheduling with relation to the achievable system utilization is a clear advantage of this scheduling policy for single processor real-time systems. However, recent works suggested that the run-time behavior of EDF might be improved by limiting the preemption support only to particular time instants, dividing each task into a sequence of non-preemptive chunks of execution, without affecting the schedulability of the system. In this paper, we will take a closer look to limited preemption EDF scheduling (LP-EDF), evaluating the potential advantages offered by this policy in terms of response-time reduction and improved control performances. In particular, we will show how to increase the responsiveness of a control application by placing non-preemptive regions of maximal length at the end of the code of selected tasks. The effectiveness of the proposed method will be proved both analytically and by extensive simulations.


2009 - Resource holding times: Computation and Optimization [Articolo su rivista]
Bertogna, Marko; N., Fisher; S., Baruah
abstract

In scheduling hard-real-time systems, the primary objective is to meet all deadlines. We study the scheduling of such systems with the secondary objective of minimizing the duration of time for which the system locks each shared resource. We abstract out this objective into the resource hold time (rht)—the largest length of time that may elapse between the instant that a system locks a resource and the instant that it subsequently releases the resource, and study properties of the rht. We present an algorithm for computing resource hold times for every resource in a task system that is scheduled using Earliest Deadline First scheduling, with resource access arbitrated using the Stack Resource Policy. We also present and prove the correctness of algorithms for decreasing these rht’s without changing the semantics of the application or compromising application feasibility.


2009 - Resource-sharing servers for Open Environments [Articolo su rivista]
Bertogna, Marko; N., Fisher; S., Baruah
abstract

We study the problem of executing a collection of independently designed and validated task systems upon a common platform composed of a preemptive processor and additional shared resources. We present an abstract formulation of the problem and identify the major issues that must be addressed in order to solve this problem. We present and prove the correctness of algorithms that address these issues, and thereby obtain a design for an open real-time environment.


2009 - Schedulability Analysis of Global Scheduling Algorithms on Multiprocessor Platforms [Articolo su rivista]
Bertogna, Marko; M., Cirinei; G., Lipari
abstract

This paper addresses the schedulability problem of periodic and sporadic real-time task sets with constrained deadlines preemptively scheduled on a multiprocessor platform composed by identical processors. We assume that a global work-conserving scheduler is used and migration from one processor to another is allowed during task lifetime. First, a general method to derive schedulability conditions for multiprocessor real-time systems will be presented. The analysis will be applied to two typical scheduling algorithms: Earliest Deadline First (EDF) and Fixed Priority (FP). Then, the derived schedulability conditions will be tightened, refining the analysis with a simple and effective technique that significantly improves the percentage of accepted task sets. The effectiveness of the proposed test is shown through an extensive set of synthetic experiments.


2009 - The Multy Supply Function Abstraction for Multiprocessors [Relazione in Atti di Convegno]
E., Bini; G. C., Buttazzo; Bertogna, Marko
abstract

Multi-core platforms are becoming the dominant computing architecture for next generation embedded systems. Nevertheless, designing, programming, and analyzing such systems is not easy and a solid methodology is still missing. In this paper, we propose two powerful abstractions to model the computing power of a parallel machine, which provide a general interface for developing and analyzing real-time applications in isolation, independently of the physical platform. The proposed abstractions can be applied on top of different types of service mechanisms, such as periodic servers, static partitions, and P-fair time partitions. In addition, we developed the schedulability analysis of a set of real-time tasks on top of a parallel machine that is compliant with the proposed abstractions.


2009 - Virtual Multiprocessor Platforms: Specification and Use [Relazione in Atti di Convegno]
E., Bini; Bertogna, Marko; S., Baruah
abstract

A new abstraction - the parallel supply function (PSF) - is proposed for representing the computing capabilities offered by virtual platforms implemented atop identical multiprocessors. It is shown that this abstraction is strictly more powerful than previously-proposed ones, from the perspective of more accurately representing the inherent parallelism of the provided computing capabilities. Sufficient tests are derived for determining whether a given real-time task system, represented as a collection of sporadic tasks, is guaranteed to always meet all deadlines when scheduled upon a specified virtual platform using the global EDF scheduling algorithm.


2008 - EDZL scheduling analysis [Articolo su rivista]
T. P., Baker; M., Cirinei; Bertogna, Marko
abstract

A schedulability test is derived for the global Earliest Deadline Zero Laxity (EDZL) scheduling algorithm on a platform with multiple identical processors. The test is sufficient, but not necessary, to guarantee that a system of independent sporadic tasks with arbitrary deadlines will be successfully scheduled, with no missed deadlines, by the multiprocessor EDZL algorithm. Global EDZL is known to be at least as effective as global Earliest-Deadline-First (EDF) in scheduling task sets to meet deadlines. It is shown, by testing on large numbers of pseudo-randomly generated task sets, that the combination of EDZL and the new schedulability test is able to guarantee that far more task sets meet deadlines than the combination of EDF and known EDF schedulability tests.In the second part of the paper, an improved version of the EDZL-schedulability test is presented. This new algorithm is able to efficiently exploit information on the slack values of interfering tasks, to iteratively refine the estimation of the interference a task can be subjected to. This iterative algorithm is shown to have better performance than the initial test, in terms of schedulable task sets detected.


2008 - Non-Preemptive Access to Shared Resources in Hierarchical Real-Time Systems [Relazione in Atti di Convegno]
Bertogna, Marko; F., Checconi; D., Faggioli
abstract

This paper presents a new strategy to arbitrate the access to globally shared resources in hierarchical EDF scheduled real-time systems, without needing any information on the duration of each critical section. Previous works addressing this problem assumed each task worst-case critical section length be known in advance. However, this assumption is valid only in restricted system domains, and is definitely inadequate for general purpose real-time operating systems. To sidestep this problem, we will instead measure at run-time the amount of time for which atask keeps a resource locked, assuring that there is enough bandwidth to tolerate the interferences associated to such measured blocking times. The protocol will execute each critical section non-preemptively, exploiting a previously proposed server that performs a budget check before each locking operation. Two methods with different complexities will be derived to compute upper-bounds on the maximum time for which a critical section may be executed non-preemptively in a given hierarchical system.


2007 - Resource-Locking Durations in EDF-Scheduled Systems [Relazione in Atti di Convegno]
N., Fisher; Bertogna, Marko; S. K., Baruah
abstract

The duration of time for which each application locks each shared resource is critically important in composing multiple independently-developed applications upon a shared "open" platform. The concept of resource hold time (RHT) - the largest length of time that may elapse between the instant that an application system locks a resource and the instant that it subsequently releases the resource - is formally defined and studied in this paper. An algorithm is presented for computing resource hold times for every resource in an application that is scheduled using earliest deadline first scheduling, with resource access arbitrated using the stack resource policy. An algorithm is presented for decreasing these RHT's without changing the semantics of the application or compromising application feasibility


2007 - Response-Time Analysis for Globally Scheduled Symmetric Multiprocessor Platforms [Relazione in Atti di Convegno]
Bertogna, Marko; M., Cirinei
abstract

In the last years, a progressive migration from single processor chips to multi-core computing devices has taken place in the general-purpose and embedded system market. The development of multi-processor systems is already a core activity for the most important hardware companies. A lot of different solutions have been proposed to overcome the physical limits of single core devices and to address the increasing computational demand of modern multimedia applications. The real-time community followed this trend with an increasing number of results adapting the classical scheduling analysis to parallel computing systems. This paper will contribute to refine the schedulability analysis for symmetric multi-processor (SMP) real-time systems composed by a set of periodic and sporadic tasks. We will focus on both fixed and dynamic priority global scheduling algorithms, where tasks can migrate from one processor to another during execution. By increasing the complexity of the analysis, we will show that an improvement is possible over existing schedulability tests, significantly increasing the number of schedulable task sets detected. The added computational effort is comparable to the cost of techniques widely used in the uniprocessor case. We believe this is a reasonable cost to pay, given the intrinsically higher complexity of multi-processor devices.


2007 - Static-Priority Scheduling and Resource Hold Times [Relazione in Atti di Convegno]
Bertogna, Marko; N., Fisher; S., Baruah
abstract

The duration of time for which each application locks each shared resource is critically important in composing multiple independently-developed applications upon a shared "open" platform. In a companion paper, we formally defined and studied the concept of resource hold time (RHT) - the largest length of time that may elapse between the instant that an application system locks a resource and the instant that it subsequently releases the resource. We extend the discussion and results from to systems scheduled using static-priority scheduling algorithms, with resource access arbitrated using stack resource policy (SRP), or priority ceiling protocol (PCP). We present a method to compute resource hold times for every resource, and an algorithm to decrease them without changing the semantics of the application or compromising application feasibility.


2007 - The Design of an EDF-scheduled Resource-sharing Open Environment [Relazione in Atti di Convegno]
N., Fisher; Bertogna, Marko; S. K., Baruah
abstract

We study the problem of executing a collection of independently designed and validated task systems upon a common platform comprised of a preemptive processor and additional shared resources. We present an abstract formulation of the problem and identify the major issues that must be addressed in order to solve this problem. We present (and prove the correctness of) algorithms that address these issues, and thereby obtain a design for an open real-time environment in the presence of shared global resources.


2005 - First integrated continuously tunable AWG-based laser using electro-optical phase shifters [Relazione in Atti di Convegno]
L., Babaud; F. H., Groen; X., Leijtens; Bertogna, Marko; J., DEN BESTEN; Y., Barbarin; Y. S., Oei; F., Karouta; J. J., Binsma; M. K., Smit
abstract

In this paper, we present a novel continuously tunable arrayed-waveguide grating (AWG) based laser. Electrooptical phase shifters in the array arms allow for wavelength tuning over the complete AWG free spectral range. Preliminary measurements with a single electrode show tuning over a range of 6 nm.


2005 - Improved schedulability analysis of EDF on multiprocessor platforms [Relazione in Atti di Convegno]
Bertogna, Marko; M., Cirinei; G., Lipari
abstract

Multiprocessor hardware platforms are now being considered for embedded systems, due to their high computational power and little additional cost when compared to single processor systems. When scheduling real-time applications on multiprocessor platforms, a possibility is to use global scheduling, where a scheduling algorithm dynamically assign tasks to processors, and tasks can migrate from one processor to another during their execution. In this paper, we tackle the problem of schedulability analysis of sporadic tasks in global scheduling systems, where the scheduler is the earliest deadline first (EDF) algorithm. We provide two main contributions. First, we show that two recently proposed tests perform poorly when the task set contains heavy tasks (i.e. tasks with high utilization). We also show that neither test dominates the other. As a second contribution, we introduce a new schedulability test that improves significantly the percentage of accepted task sets, especially when considering task sets containing heavy tasks. We show the effectiveness of the proposed test through an extensive set of experiments.


2005 - New schedulability tests for real-time task sets scheduled by deadline monotonic on multiprocessors [Relazione in Atti di Convegno]
Bertogna, Marko; M., Cirinei; G., Lipari
abstract

In this paper, we address the problem of schedulability analysis of a set of real-time periodic (or sporadic) tasks on multiprocessor hardware platforms, under fixed priority global scheduling. In a multiprocessor system with M processors, a global scheduler consists of a single queue of ready tasks for all processors, and the scheduler selects the first M tasks to execute on the M processors. We allow preemption and migration of tasks between processors.This paper presents two different contributions. First, we derive a sufficient schedulability test for periodic and sporadic task system scheduled with fixed priority when priorities are assigned according to Deadline Monotonic. This test is efficient when dealing with heavy tasks (i.e. tasks with high utilization). Then, we develop an independent analysis for preperiod deadline systems. This leads to a new schedulability test with density and utilization bounds that are tighter than the existing ones.