Paolo VALENTE - personale UniMoRe

Nuova ricerca

Paolo VALENTE

Ricercatore Universitario
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica

Pubblicazioni

2023 - Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-based Heterogeneous SoCs [Relazione in Atti di Convegno]
Brilli, G.; Valente, G.; Capotondi, A.; Burgio, P.; Di Masciov, T.; Valente, P.; Marongiu, A.
abstract

2022 - A measurement of the K + → π + μ + μ− decay [Articolo su rivista]
Cortina Gil, E.; Kleimenova, A.; Minucci, E.; Padolski, S.; Petrov, P.; Shaikhiev, A.; Volpe, R.; Numao, T.; Petrov, Y.; Velghe, B.; Wong, V. W. S.; Bryman, D.; Fu, J.; Husek, T.; Jerhot, J.; Kampf, K.; Zamkovsky, M.; Aliberti, R.; Khoriauli, G.; Kunze, J.; Lomidze, D.; Peruzzo, L.; Vormstein, M.; Wanke, R.; Dalpiaz, P.; Fiorini, M.; Neri, I.; Norton, A.; Petrucci, F.; Wahl, H.; Cotta Ramusino, A.; Gianoli, A.; Iacopini, E.; Latino, G.; Lenti, M.; Parenti, A.; Bizzeti, A.; Bucci, F.; Antonelli, A.; Georgiev, G.; Kozhuharov, V.; Lanfranchi, G.; Martellotti, S.; Moulson, M.; Spadaro, T.; Tinti, G.; Ambrosino, F.; Capussela, T.; Corvino, M.; Di Filippo, D.; Fiorenza, R.; Massarotti, P.; Mirra, M.; Napolitano, M.; Saracino, G.; Anzivino, G.; Brizioli, F.; Imbergamo, E.; Lollini, R.; Piandani, R.; Santoni, C.; Barbanera, M.; Cenci, P.; Checcucci, B.; Lubrano, P.; Lupi, M.; Pepe, M.; Piccini, M.; Costantini, F.; Di Lella, L.; Doble, N.; Giorgi, M.; Giudici, S.; Lamanna, G.; Lari, E.; Pedreschi, E.; Sozzi, M.; Cerri, C.; Fantechi, R.; Pontisso, L.; Spinella, F.; Mannelli, I.; D'Agostini, G.; Raggi, M.; Biagioni, A.; Cretaro, P.; Frezza, O.; Leonardi, E.; Lonardo, A.; Turisini, M.; Valente, P.; Vicini, P.; Ammendola, R.; Bonaiuto, V.; Fucci, A.; Salamon, A.; Sargeni, F.; Arcidiacono, R.; Bloch-Devaux, B.; Boretto, M.; Menichetti, E.; Migliore, E.; Soldi, D.; Biino, C.; Filippi, A.; Marchetto, F.; Engelfried, J.; Estrada-Tristan, N.; Bragadireanu, A. M.; Ghinescu, S. A.; Hutanu, O. E.; Baeva, A.; Baigarashev, D.; Emelyanov, D.; Enik, T.; Falaleev, V.; Kekelidze, V.; Korotkova, A.; Litov, L.; Madigozhin, D.; Misheva, M.; Molokanova, N.; Movchan, S.; Polenkevich, I.; Potrebenikov, Y.; Shkarovskiy, S.; Zinchenko, A.; Fedotov, S.; Gushchin, E.; Khotyantsev, A.; Kudenko, Y.; Kurochka, V.; Medvedeva, M.; Mefodev, A.; Kholodenko, S.; Kurshetsov, V.; Obraztsov, V.; Ostankov, A.; Semenov, V.; Sugonyaev, V.; Yushchenko, O.; Bician, L.; Blazek, T.; Cerny, V.; Kucerova, Z.; Bernhard, J.; Ceccucci, A.; Danielsson, H.; De Simone, N.; Duval, F.; Dobrich, B.; Federici, L.; Gamberini, E.; Gatignon, L.; Guida, R.; Hahn, F.; Holzer, E. B.; Jenninger, B.; Koval, M.; Laycock, P.; Lehmann Miotto, G.; Lichard, P.; Mapelli, A.; Marchevski, R.; Massri, K.; Noy, M.; Palladino, V.; Perrin-Terrin, M.; Pinzino, J.; Ryjov, V.; Schuchmann, S.; Venditti, S.; Bache, T.; Brunetti, M. B.; Duk, V.; Fascianelli, V.; Fry, J. R.; Gonnella, F.; Goudzovski, E.; Henshaw, J.; Iacobuzio, L.; Lazzeroni, C.; Lurkin, N.; Newson, F.; Parkinson, C.; Romano, A.; Sergi, A.; Sturgess, A.; Swallow, J.; Tomczak, A.; Heath, H.; Page, R.; Trilov, S.; Angelucci, B.; Britton, D.; Graham, C.; Protopopescu, D.; Carmignani, J.; Dainton, J. B.; Jones, R. W. L.; Ruggiero, G.; Fulton, L.; Hutchcroft, D.; Maurice, E.; Wrona, B.; Conovaloff, A.; Cooper, P.; Coward, D.; Rubin, P.
abstract

A sample of 2.8 × 104K+ → π+μ+μ− candidates with negligible background was collected by the NA62 experiment at the CERN SPS in 2017–2018. The model-independent branching fraction is measured to be (9.15 ± 0.08) × 10−8, a factor three more precise than previous measurements. The decay form factor is presented as a function of the squared dimuon mass. A measurement of the form factor parameters and their uncertainties is performed using a description based on Chiral Perturbation Theory at O(p6).

2022 - Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs [Articolo su rivista]
Brilli, Gianluca; Cavicchioli, Roberto; Solieri, Marco; Valente, Paolo; Marongiu, Andrea
abstract

High-performance embedded platforms are increasingly adopting heterogeneous systems-on-chip (HeSoC) that couple multi-core CPUs with accelerators such as GPU, FPGA, or AI engines. Adopting HeSoCs in the context of real-time workloads is not immediately possible, though, as contention on shared resources like the memory hierarchy—and in particular the main memory (DRAM)—causes unpredictable latency increase. To tackle this problem, both the research community and certification authorities mandate (i) that accesses from parallel threads to the shared system resources (typically, main memory) happen in a mutually exclusive manner by design, or (ii) that per-thread bandwidth regulation is enforced. Such arbitration schemes provide timing guarantees, but make poor use of the memory bandwidth available in a modern HeSoC. Controlled Memory Request Injection (CMRI) is a recently-proposed bandwidth limitation concept that builds on top of a mutually-exclusive schedule but still allows the threads currently not entitled to access memory to use as much of the unused bandwidth as possible without losing the timing guarantee. CMRI has been discussed in the context of a multi-core CPU, but the same principle applies also to a more complex system such as an HeSoC. In this article, we introduce two CMRI schemes suitable for HeSoCs: Voluntary Throttling via code refactoring and Bandwidth Regulation via dynamic throttling. We extensively characterize a proof-of-concept incarnation of both schemes on two HeSoCs: an NVIDIA Tegra TX2 and a Xilinx UltraScale+, highlighting the benefits and the costs of CMRI for synthetic workloads that model worst-case DRAM access. We also test the effectiveness of CMRI with real benchmarks, studying the effect of interference among the host CPU and the accelerators.

2020 - Evaluating Controlled Memory Request Injection to Counter PREM Memory Underutilization [Relazione in Atti di Convegno]
Cavicchioli, R.; Capodieci, N.; Solieri, M.; Bertogna, M.; Valente, P.; Marongiu, A.
abstract

Modern heterogeneous systems-on-chip (HeSoC) feature high-performance multi-core CPUs tightly integrated with data-parallel accelerators. Such HeSoCS heavily rely on shared resources, which hinder their adoption in the context of Real-Time systems. The predictable execution model (PREM) has proven effective at preventing uncontrolled execution time lengthening due to memory interference in HeSoC sharing main memory (DRAM). However, PREM only allows one task at a time to access memory, which inherently under-utilizes the available memory bandwidth in modern HeSoCs. In this paper, we conduct a thorough experimental study aimed at assessing the potential benefits of extending PREM so as to inject controlled amounts of memory requests coming from other tasks than the one currently granted exclusive DRAM access. Focusing on a state-of-the-art HeSoC, the NVIDIA TX2, we extensively characterize the relation between the injected bandwidth and the latency experienced by the task under test. The results confirm that for various types of workload it is possible to exploit the available bandwidth much more efficiently than standard PREM arbitration, often close to its maximum, while keeping latency inflation below 10%. We discuss possible practical implementation directions, highlighting the expected benefits and technical challenges.

2020 - Managing Human-driven and Autonomous Vehicles at Smart Intersections [Relazione in Atti di Convegno]
Cabri, G.; Montangero, M.; Muzzini, F.; Valente, P.
abstract

Auction-based crossing management approaches are used to design coordination policies for autonomous vehicles and improve smart intersections by providing for differentiated latencies. In this paper we exploit auction-based mechanisms to design a management intersections system re-using traffic lights and coordinating human driven and autonomous vehicles. We first describe in detail this system that uses already present traffic lights and the bidding policy of our auction mechanisms. We then describe our experimental scenario and the research issue that will be addressed by means of future simulations.

2019 - A Parallel Branch-and-Bound Algorithm to Compute a Tighter Tardiness Bound for Preemptive Global EDF [Articolo su rivista]
Leoncini, Mauro; Montangero, Manuela; Valente, Paolo
abstract

In this paper we present a parallel exact algorithm to compute an upper bound to tardiness of preemptive Global EDF (G-EDF) schedulers, named harmonic bound, which has been proved to be up to 30% tighter than previously proposed bounds. Tightness is a crucial property of tardiness bounds: a too loose bound may cause a feasible soft real-time application to be mistakenly deemed unfeasible. Unfortunately, no polynomial-time algorithm is known to date to compute the harmonic bound. Although there is no proof of hardness of any sort either, the complex formula of the bound apparently provides no hints to devise algorithms with sub-exponential worst-case cost. In this paper we address this issue by proposing a parallel, exact, branch-andbound algorithm to compute the harmonic bound, called harm-BB, which proves to be extremely fast in a large number of experiments. More specifically, we compare its execution times with those of existing polynomial-time algorithms for other known tardiness bounds on 630000 random task sets. harm-BB outperforms, or is comparable to, the competitor algorithms in all scenarios but the ones with the highest number of processors (7 and 8) and tasks (50). In the latter scenarios harm-BB is indeed slower than the other algorithms; yet, it was still feasible, as it takes only about 2:8 s to compute the bound on a commodity Dual-Core CPU. Even better, we show that harm-BB has a high parallel efficiency, thus its execution time may be largely cut down on highly-parallel platforms.

2019 - Deterministic memory hierarchy and virtualization for modern multi-core embedded systems [Relazione in Atti di Convegno]
Kloda, Tomasz; Solieri, M.; Mancuso, R.; Capodieci, N.; Valente, P.; Bertogna, M.
abstract

One of the main predictability bottlenecks of modern multi-core embedded systems is contention for access to shared memory resources. Partitioning and software-driven allocation of memory resources is an effective strategy to mitigate contention in the memory hierarchy. Unfortunately, however, many of the strategies adopted so far can have unforeseen side-effects when practically implemented latest-generation, high-performance embedded platforms. Predictability is further jeopardized by cache eviction policies based on random replacement, targeting average performance instead of timing determinism.In this paper, we present a framework of software-based techniques to restore memory access determinism in high-performance embedded systems. Our approach leverages OS-transparent and DMA-friendly cache coloring, in combination with an invalidation-driven allocation (IDA) technique. The proposed method allows protecting important cache blocks from (i) external eviction by tasks concurrently executing on different cores, and (ii) internal eviction by tasks running on the same core. A working implementation obtained by extending the Jailhouse partitioning hypervisor is presented and evaluated with a combination of synthetic and real benchmarks.

2018 - A survey on shared disk I/O management in virtualized environments under real time constraints [Articolo su rivista]
Sanudo Olmedo, Ignacio; Cavicchioli, Roberto; Capodieci, Nicola; Valente, Paolo; Bertogna, Marko
abstract

In the embedded systems domain, hypervisors are increasingly being adopted to guarantee timing isolation and appropriate hardware resource sharing among different software components. However, managing concurrent and parallel requests to shared hardware resources in a predictable way still represents an open issue. We argue that hypervisors can be an effective means to achieve an efficient and predictable arbitration of competing requests to shared devices in order to satisfy real-time requirements. As a representative example, we consider the case for mass storage (I/O) devices like Hard Disk Drives (HDD) and Solid State Disks (SSD), whose access times are orders of magnitude higher than those of central memory and CPU caches, therefore having a greater impact on overall task delays. We provide a comprehensive and up-to-date survey of the literature on I/O management within virtualized environments, focusing on software solutions proposed in the open source community, and discussing their main limitations in terms of realtime performance. Then, we discuss how the research in this subject may evolve in the future, highlighting the importance of techniques that are focused on scheduling not uniquely the processing bandwidth, but also the access to other important shared resources, like I/O devices.

2018 - Achieving a High Throughput and a Low Latency through a Modular Packet Scheduler [Articolo su rivista]
Casoni, M.; Grazia, C. A.; Valente, P.
abstract

Providing QoS guarantees in a wireless environment is a challenging task because of the idiosyncrasies of the wireless media. State-of-the-art solutions for QoS provisioning over wireless links are based on cross-layering packet schedulers that deal both with the QoS guarantees and the wireless link issues. Unfortunately, such an approach is not flexible and requires technology-dependent solutions. To address these issues, we present a modular architecture which permits the use of existing high-performance packet schedulers for wired links over generic wireless technologies, as they are, and at the same time allows the flexibility to adapt to different channel conditions. We validate the effectiveness of our modular solution through a formal analysis. We also present high-throughput twin fair scheduler (HFS), a novel packet scheduler based on the modular architecture. HFS has constant execution time, accurate fairness, and low latency.

2018 - PSPAT: Software packet scheduling at hardware speed [Articolo su rivista]
Rizzo, Luigi; Valente, Paolo; Lettieri, Giuseppe; Maffione, Vincenzo
abstract

Tenants in a cloud environment run services, such as Virtual Network Function instantiations, that may legitimately generate millions of packets per second. The hosting platform, hence, needs robust packet scheduling mechanisms that support these rates and, at the same time, provide isolation and dependable service guarantees under all load conditions. Current hardware or software packet scheduling solutions fail to meet all these requirements, most commonly lacking on either performance or guarantees. In this paper we propose an architecture, called PSPAT to build efficient and robust software packet schedulers suitable to high speed, highly concurrent environments. PSPAT decouples clients, scheduler and device driver through lock-free mailboxes, thus removing lock contention, providing opportunities to parallelize operation, and achieving high and dependable performance even under overload. We describe the operation of our system, discuss implementation and system issues, provide analytical bounds on the service guarantees of PSPAT, and validate the behavior of its Linux implementation even at high link utilization, comparing it with current hardware and software solutions. Our prototype can make over 28 million scheduling decisions per second, and keep latency low, even with tens of concurrent clients running on a multi-core, multi-socket system.

2017 - A branch-and-bound algorithm to compute a tighter bound to tardiness for preemptive global EDF scheduler [Relazione in Atti di Convegno]
Leoncini, Mauro; Montangero, Manuela; Valente, Paolo
abstract

Tightness is a crucial property of tardiness bounds; in fact, a too loose bound may cause a feasible soft real-time application to be mistakenly deemed unfeasible. Recently, a new tardiness bound for preemptive global EDF (G-EDF), named harmonic bound, has been proposed and proved to be up to 30% tighter than previous bounds. Unfortunately, no polynomial-time algorithm is known to date to compute the harmonic bound. Although there is no proof of hardness of any sort either, the complex formula of the bound apparently provides no hints to devise algorithms with sub-exponential worst-case cost. In this paper we address this issue by proposing an exact branch-and-bound algorithm, called harm-BB, which proved to be extremely fast in a large number of experiments. More specifically, we compared its execution times with those of existing polynomial-time algorithms for other known similar bounds on 630000 random task sets. harm-BB outperformed the competitor algorithms in all scenarios but the ones with the highest number of processors and tasks (8 and â ¼50, respectively). However, even in the latter cases, harm-BB was at most 1.5 slower than the other algorithms, and, in absolute terms, took only about 4.4 ms to compute the bound on commodity hardware.

2017 - SiGAMMA: Server based integrated GPU arbitration mechanism for memory accesses [Relazione in Atti di Convegno]
Capodieci, Nicola; Cavicchioli, Roberto; Valente, Paolo; Bertogna, Marko
abstract

In embedded systems, CPUs and GPUs typically share main memory. The resulting memory contention may significantly inflate the duration of CPU tasks in a hard-to-predict way. Despite initial solutions have been devised to control this undesired inflation, these approaches do not consider the interference due to memoryintensive components in COTS embedded systems like integrated Graphical Processing Units. Dealing with this kind of interference might require custom-made hardware components that are not integrated in off-the-shelf platforms. We address these important issues by proposing a memory-arbitration mechanism, SiGAMMA (Siσ), for eliminating the interference on CPU tasks caused by conflicting memory requests from the GPU. Tasks on the CPU are assumed to comply with a prefetch-based execution model (PREM) proposed in the real-time literature, while memory accesses from the GPU are arbitrated through a predictable mechanism that avoids contention. Our experiments show that Siσ proves to be very effective in guaranteeing almost null inflation to memory phases of CPU tasks, while at the same time avoiding excessive starvation of GPU tasks.

2016 - A survey on shared disk I/O management in virtualized environments under real time constraints [Relazione in Atti di Convegno]
Sanudo, I.; Cavicchioli, R.; Capodieci, N.; Valente, P.; Bertogna, M.
abstract

2016 - Using a lag-balance property to tighten tardiness bounds for global EDF [Articolo su rivista]
Valente, Paolo
abstract

Several tardiness bounds for global EDF and global-EDF-like schedulers have been proposed over the last decade. These bounds contain a component that is explicitly or implicitly proportional to how much the system may be cumulatively lagging behind, in serving tasks, with respect to an ideal schedule. This cumulative lag is in its turn upper-bounded by upper-bounding each per-task component in isolation, and then summing individual per-task bounds. Unfortunately, this approach leads to an over-pessimistic cumulative upper bound. In fact, it does not take into account a lag-balance property of any work-conserving scheduling algorithm. In this paper we show how to get a new tardiness bound for global EDF by integrating this property with the approach used to prove the first tardiness bounds proposed in the literature. In particular, we compute a new tardiness bound for implicit-deadline tasks, scheduled by preemptive global EDF on a symmetric multiprocessor. According to our experiments, as the number of processors increases, this new tardiness bound becomes tighter and tighter than the tightest bound available in the literature, with a maximum tightness improvement of 29 %. A negative characteristic of this new bound is that computing its value takes an exponential time with a brute-force algorithm (no faster exact or approximate algorithm is available yet). As a more general result, the property highlighted in this paper might help to improve the analysis for other scheduling algorithms, possibly on different systems and with other types of task sets. In this respect, our experimental results also point out the following negative fact: existing tardiness bounds for global EDF, including the new bound we propose, may become remarkably loose if every task has a low utilization (ratio between the execution time and the minimum inter-arrival time of the jobs of the task), or if the sum of the utilizations of the tasks is lower than the total capacity of the system.

2015 - A memory-centric approach to enable timing-predictability within embedded many-core accelerators [Relazione in Atti di Convegno]
Burgio, Paolo; Marongiu, Andrea; Valente, Paolo; Bertogna, Marko
abstract

There is an increasing interest among real-time systems architects for multi- and many-core accelerated platforms. The main obstacle towards the adoption of such devices within industrial settings is related to the difficulties in tightly estimating the multiple interferences that may arise among the parallel components of the system. This in particular concerns concurrent accesses to shared memory and communication resources. Existing worst-case execution time analyses are extremely pessimistic, especially when adopted for systems composed of hundreds-tothousands of cores. This significantly limits the potential for the adoption of these platforms in real-time systems. In this paper, we study how the predictable execution model (PREM), a memory-aware approach to enable timing-predictability in realtime systems, can be successfully adopted on multi- and manycore heterogeneous platforms. Using a state-of-the-art multi-core platform as a testbed, we validate that it is possible to obtain an order-of-magnitude improvement in the WCET bounds of parallel applications, if data movements are adequately orchestrated in accordance with PREM. We identify which system parameters mostly affect the tremendous performance opportunities offered by this approach, both on average and in the worst case, moving the first step towards predictable many-core systems.

2015 - Evolution of the BFQ Storage-I/O scheduler [Relazione in Atti di Convegno]
Valente, Paolo; Avanzini, Arianna
abstract

An accurate storage-I/O scheduler, named Budget Fair Queueing (BFQ), was integrated with a special set of heuristics a few years ago. The resulting, improved scheduler, codenamed BFQ-v1, was able to guarantee a number of desirable service properties, including a high responsiveness, to applications and system services. In the intervening years, BFQ-v1 has become relatively popular on desktop and handheld systems, and has further evolved. But no official, comprehensive and concentrated documentation has been provided about the improvements that have followed each other. In this paper we fill this documentation gap, by describing the current, last version of BFQ (v7r8). We also show the performance of BFQ-v7r8 through some experimental results, in terms of throughput and application responsiveness, and on both an HDD and an SSD.

2015 - Integrating Linux and the real-time ERIKA OS through the Xen hypervisor [Relazione in Atti di Convegno]
Avanzini, Arianna; Valente, Paolo; Faggioli, Dario; Gai, Paolo
abstract

Modern user interfaces grow more and more complex and cannot be possibly handled by the same software components in charge of the timely execution of safety-critical control tasks. Evidence Srl recently proposed a single-board dual-OS system aimed at combining the flexibility of the Linux general-purpose operating system, which is able to produce any complex user interface, and the reliability of the automotive-grade ERIKA Enterprise operating system, a small-footprint real-time OS suitable for safety-critical control tasks and able to execute commands triggered by Linux. The operating systems run on dedicated cores and, for efficiency reasons, they share memory with limited support for memory protection: although the system allows running two operating systems, from a safety certification point of view it suffers from the fact that safety-critical and non-safety-critical components should be isolated from each other. In this paper we present, as an improvement to the initial implementation, again a double-OS system running, on a dual-core platform, ERIKA Enterprise and a full-featured Linux OS, but using the Xen hypervisor to run the two operating systems in two isolated domains. In the proposed setup, each of the domains runs on a dedicated core, assigned statically by the hypervisor. Linux runs as the control domain, and is therefore able to execute any of the components of the Xen toolstack; it is also able to grant to the real-time operating system access to any I/O-memory range needed for control tasks. The described system also provides a simple, safe communication mechanism between the two operating systems, based on Xen's inter-domain event notification primitives and explicit sharing of a dedicated set of memory pages by the real-time operating system.

2015 - On service guarantees of fair-queueing schedulers in real systems [Articolo su rivista]
Rizzo, Luigi; Valente, Paolo
abstract

Abstract In most systems, fair-queueing packet schedulers are the algorithms of choice for providing bandwidth and delay guarantees. These guarantees are computed assuming that the scheduler is directly attached to the transmit unit with no interposed buffering, and, for timestamp-based schedulers, that the exact number of bits transmitted is known when timestamps need to be updated. Unfortunately, both assumptions are unrealistic. In particular, real communication devices normally include FIFO queues (possibly very deep ones) between the scheduler and the transmit unit. And the presence of these queues does invalidate the proofs of the service guarantees of existing timestamp-based fair-queueing schedulers. In this paper we address these issues with the following two contributions. First, we show how to modify timestamp-based, worst-case optimal and quasi-optimal fair-queueing schedulers so as to comply with the presence of FIFO\queues, and with uncertainty on the number of bits transmitted. Second, we provide analytical bounds of the actual guarantees provided, in these real-world conditions, both by modified timestamp-based fair-queueing schedulers and by basic round-robin schedulers. These results should help designers to make informed decisions and sound tradeoffs when building systems.

2014 - A Low-Latency and High-Throughput Scheduler for Emergency and Wireless Networks [Relazione in Atti di Convegno]
Casoni, Maurizio; Grazia, CARLO AUGUSTO; Valente, Paolo
abstract

Providing QoS guarantees, boosting throughput and saving energy over wireless links is a challenging task, especially in emergency networks, where all of these features are crucial during a disaster event. A common solution is using a single, integrated scheduler that deals both with the QoS guarantees and the wireless link issues. Unfortunately, such an approach is not flexible and does not allow any of the existing high-quality schedulers for wired links to be used without modifications. We address these issues through a modular architecture which permits the use of existing packet schedulers for wired links over wireless links, as they are, and at the same time allows the flexibility to adapt to different channel conditions. We validate the effectiveness of our modular architecture by showing, through formal analysis as well as experimental results, that this architecture enables us to get a new scheduler with the following features, by just combining existing schedulers: execution time and energy consumption close to that of just a Deficit Round Robin, accurate fairness and low latency, possibility to set the desired trade-off between throughput-boosting level and granularity of service guarantees, by changing one parameter. In particular, we show that this scheduler, which we named Highthroughput Twin Fair scheduler (HFS), outperforms one of the most accurate and efficient integrated schedulers available in the literature.

2014 - Reducing the execution time of fair-queueing packet schedulers [Articolo su rivista]
Valente, Paolo
abstract

Deficit Round Robin (DRR) is probably the most scalable fair-queueing packet scheduler. Unfortunately, it suffers from high delay and jitter with respect to a perfectly fair (and smooth) service. Schedulers providing much better service guarantees exist, but have a higher computational cost. In this paper we deal with this issue by proposing a modification scheme for reducing the amortized execution time of the latter, more accurate schedulers. Modified schedulers preserve guarantees close to the original ones, and can also handle seamlessly both leaves and internal nodes in a hierarchical setting. We also present Quick Fair Queueing Plus (QFQ+), a fast fair-queueing scheduler that we defined using this scheme, and that is now in mainline Linux. On one hand, QFQ+ guarantees near-optimal worst-case deviation with respect to a perfectly fair service. On the other hand, with QFQ+, the time and energy needed to process packets are close to those needed with DRR, and may be even lower than with DRR exactly in the scenarios where the better service properties of QFQ+ make a difference.

2014 - TEMPEST: Test EnvironMent for Performance Evaluation of the Scheduling of packeTs [Relazione in Atti di Convegno]
Casoni, Maurizio; Grazia, CARLO AUGUSTO; Valente, Paolo
abstract

The packet scheduling problem has been deeply studied for lot of years by researchers in the computer science and telecommunications fields as an important solution that decides the order in which packets are sent over a link in order to provide QoS on a network. Recently, the packet scheduling has become again a challenging topic due to the massive use of wireless technologies (e.g. WiFi, LTE, 4G/5G), both for public and PPDR networks, with which to provide high QoS guarantees is still an open problem. Unfortunately, it is difficult to compare different solutions and actually test them in order to select the most proper packet scheduler for each particular environment. In this paper we present TEMPEST, a new Test EnvironMent for Performance Evaluation of the Scheduling of packeTs, which is a novel tool able to help the research in the packet scheduling field. TEMPEST is able to measure the actual performance of a packet scheduler in several environments, both wired or wireless, like the execution time, QoS metrics and throughput, giving prompt feedback about the quality of the solution studied. The goal of this paper is to present in detail the current features of TEMPEST, showing how it is easy to add, configure, test and evaluate several scheduling solutions in multiple scenarios.

2014 - TEMPEST: a new Test EnvironMent for Performance Evaluation of the Scheduling of packeTs [Articolo su rivista]
Casoni, Maurizio; Grazia, CARLO AUGUSTO; Valente, Paolo
abstract

The packet scheduling problem has been deeply studied for lot of years by researchers in the computer science and telecommunications fields as an important solution that decides the order in which packets are sent over a link in order to provide QoS on a network. Recently, the packet scheduling has become again a challenging topic due to the massive use of wireless technologies (e.g. WiFi, LTE, 4G/5G) with which to provide high QoS guarantees is still an open problem. Unfortunately, it is difficult to compare different solutions and actually test them in order to select the most proper packet scheduler for each particular environment. In this paper we present TEMPEST, a new Test EnvironMent for Performance Evaluation of the Scheduling of packeTs, which is a novel tool able to help the research in the packet scheduling field. TEMPEST is able to measure the actual performance of a packet scheduler in several environments, both wired or wireless, like the execution time, QoS metrics and throughput, giving prompt feedback about the quality of the solution studied. The goal of this paper is to present in detail the current features of TEMPEST, showing how it is easy to add, configure, test and evaluate several scheduling solutions in multiple scenarios.

2013 - A Flexible and Green Scheduler for providing QoS and High Throughput over Wireless Links [Relazione in Atti di Convegno]
Casoni, Maurizio; Grazia, CARLO AUGUSTO; Valente, Paolo
abstract

Realizing a communication system over wireless links that simultaneously offers QoS guarantees, high throughput and low energy consumption is a difficult task. A common solution is using a single, integrated scheduler that deals both with the QoS guarantees and the wireless link issues. Unfortunately, such an approach is little flexible and does not allow any of the existing high-quality schedulers for wired links to be used without modification. To address these issues, in this paper we validate a modular architecture which permits the use, as they are, of existing packet schedulers for wired links over wireless links, and at the same time allows the flexibility to adapt to different channel conditions. We validate the effectiveness of the modular architecture by showing, through experimental results, that this architecture enables us to get a new scheduler with the following features, just by combining existing schedulers: execution time and energy consumption close to that of just a Deficit Round Robin, accurate fairness and delay guarantees, possibility to set, by changing one parameter, the desired trade-off between throughput-boosting level and granularity of the service guarantees. In particular, we show that this scheduler, which we named High-throughput Twin Fair scheduler (HFS), outperforms one of the most accurate and efficient integrated schedulers available in the literature.

2013 - A Green, Modular and Fair Packet Scheduler for Boosting Throughput over Wireless Links [Articolo su rivista]
Casoni, Maurizio; Grazia, CARLO AUGUSTO; Valente, Paolo
abstract

We study the problem of defining and implementing a packet scheduler able to give the high performance of a wired packet scheduler over a wireless link, i.e. a low execution time with low power consumption, high QoS guarantees and high throughput. A common solution is using a single, integrated scheduler that deals both with the QoS guarantees and the wireless link issues. Unfortunately, such an approach is little flexible and does not allow any of the existing high-quality schedulers for wired links to be used without modification. To address these issues, in this paper we validate a modular architecture which permits the use, as they are, of existing packet schedulers for wired links over wireless links, and at the same time allows the flexibility to adapt to different channel conditions. We validate the effectiveness of the modular architecture by showing, through experimental results, that this architecture enables us to get a new scheduler with the following features, just by combining existing schedulers: execution time and energy consumption close to that of just a Deficit Round Robin, accurate fairness and delay guarantees, possibility to set, by changing one parameter, the desired trade-off between throughput-boosting level and granularity of the service guarantees. In particular, we show that this scheduler, which we named High-throughput Twin Fair scheduler (HFS), outperforms one of the most accurate and efficient integrated schedulers available in the literature.

2013 - A Modular Architecture for QoS Provisioning over Wireless Links [Relazione in Atti di Convegno]
Casoni, Maurizio; Paganelli, Alessandro; Valente, Paolo
abstract

We consider the problem of providing QoS guarantees, and at the same time boosting the throughput or saving energy over a wireless link. A common solution is using a single, integrated scheduler that deals both with the QoS guarantees and the wireless link issues. Unfortunately, such an approach does not allow any of the existing high-quality schedulers for wired links to be used without modification. And it is little flexible, as a scheduler compliant with a given wireless technology may need to be modified to fit a different technology or a different solution for saving energy. To address these issues, in this paper we propose a modular architecture that basically extends the network stack by adding a special middle layer on top of the MAC. On the bottom side, this middle layer deals with the idiosyncrasies of the underlying wireless link, and possibly uses channel state information to boost performance and save energy. On the top side, the middle layer exports the abstraction of a link to which the higher layers must only pass the packets to transmit. On top of this middle layer, existing packet schedulers for wired links can be used without modification. It is then possible also to use the same packet scheduler on heterogeneous wireless technologies, by changing only the the middle layer.

2013 - Providing Near-Optimal Fair-Queueing Guarantees at Round-Robin Amortized Cost [Relazione in Atti di Convegno]
Valente, Paolo
abstract

Round-robin schedulers are the most efficient solution for providing strong QoS guarantees on high-speed links. Yet these schedulers suffer from a high worst-case delay with respect to an ideal, perfectly fair service. More costly schedulers are needed to provide better service guarantees. In this paper we tackle this problem by proposing a simple modification scheme for reducing the amortized execution time of fair-queueing schedulers. We prove that, applying this scheme to existing accurate schedulers, we can define new schedulers providing near-optimal service guarantees at an amortized computational cost close to that of just Deficit Round Robin (DRR). Finally, we show Quick Fair Queueing Plus (QFQ+), a new scheduler obtained by applying our scheme to QFQ. QFQ+ replaced QFQ in the Linux kernel. According to our experimental results, and exactly in the scenarios where QFQ+ provides better service guarantees than a round-robin sched- uler, the time and the energy needed to process packets with QFQ+ is lower than with DRR.

2013 - QFQ: Efficient Packet Scheduling With Tight Guarantees [Articolo su rivista]
F., Checconi; L., Rizzo; Valente, Paolo
abstract

Packet scheduling, together with classification, is one of the most expensive processing steps in systems providing tight bandwidth and delay guarantees at high packet rates. Schedulers with near-optimal service guarantees and O(1) time complexity have been proposed in the past, using techniques such as timestamp rounding and flow grouping to keep their execution time small. However, even the two best proposals in this family have a per-packet cost component that is linear either in the number of groups or in the length of the packet being transmitted. Furthermore, no studies are available on the actual execution time of these algorithms. In this paper we make two contributions. First, we present Quick Fair Queueing (QFQ), a new O(1) scheduler that provides near-optimal guarantees and is the first to achieve that goal with a truly constant cost also with respect to the number of groups and the packet length. The QFQ algorithm has no loops and uses very simple instructions and data structures that contribute to its speed of operation. Second, we have developed production-quality implementations of QFQ and of its closest competitors, which we use to present a detailed comparative performance analysis of the various algorithms. Experiments show that QFQ fulfills our expectations, outperforming the other algorithms in the same class. In absolute terms, even on a low-end workstation, QFQ takes about 110 ns for an enqueue()/dequeue() pair (only twice the time of DRR, but with much better service guarantees).

2012 - Direct vs 2-stage approaches to structured motif finding [Articolo su rivista]
Federico, Maria; Leoncini, Mauro; Montangero, Manuela; Valente, Paolo
abstract

Background The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of "irrelevant" base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identication and successive combination of simple motifs. Results We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it nds all the motifs conforming to the specications. It does so in two stages: rst it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benets of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a signicant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. Conclusions A reection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

2012 - Improving application responsiveness with the BFQ disk I/O scheduler [Relazione in Atti di Convegno]
Valente, Paolo; Andreolini, Mauro
abstract

BFQ (Budget Fair Queueing) is a production-quality, proportional-share disk scheduler with a relatively large user base. Part of its success is due to a set of simple heuristics that we added to the original algorithm about one year ago. These heuristics are the main focus of this paper. The first heuristic enriches BFQ with one of the most desirable properties for a desktop or handheld system: responsiveness. The remaining heuristics improve the robustness of BFQ across heterogeneous devices, and help BFQ to preserve a high throughput under demanding workloads. To measure the performance of these heuristics we have implemented a suite of micro and macro benchmarks mimicking several real-world tasks, and have run it on three different systems with a single rotational disk. We have also compared our results against Completely Fair Queueing (CFQ), the default Linux disk scheduler.

2010 - High Throughput Disk Scheduling with Fair Bandwidth Distribution [Articolo su rivista]
Valente, Paolo; Fabio, Checconi
abstract

Mainstream applications–such as file copy/transfer, Web, DBMS, or video streaming–typically issue synchronous diskrequests. As shown in this paper, this fact may cause work-conserving schedulers to fail both to enforce guarantees and to provide a high disk throughput. A high throughput can be however recovered by just idling the disk for a short time interval after the completion of each request. In contrast, guarantees may still be violated by existing timestamp-based schedulers, because of the rules they use to tag requests.Budget Fair Queueing (BFQ), the new disk scheduler presented in this paper, is an example of how disk idling, combined with properback-shifting of request timestamps, may allow a timestamp-based disk scheduler to preserve both guarantees and a high throughput.Under BFQ each application is always guaranteed–over any time interval and independently of whether it issues synchronous requests–a bounded lag with respect to its reserved fraction of the total number of bytes transferred by the disk device.We show the single-disk performance of our implementation of BFQ in the Linux kernel through experiments with real and emulated mainstream applications.

2009 - An Efficient Algorithm for Planted Composit Motif Extraction [Relazione in Atti di Convegno]
Federico, M.; Valente, Paolo; Leoncini, Mauro; Montangero, Manuela; Cavicchioli, R.
abstract

In this paper we present an algorithm for the problem of planted structured motif extraction from a set of sequences. This problem is strictly related to the structured motif extraction problem, which has many important applications in molecular biology. We propose an algorithm that uses a simple two-stage approach: first it extracts simple motifs, then the simple motifs are combined in order to extract structured motifs. We compare our algorithm with existing algorithms whose code is available, and which are based on more complex approaches. Our experiments show that, even if in general the problem is NP-hard, our algorithm is able to handle complex instances of the problem in a reasonable amount of time.

2008 - An STDMA-Based Framework for QoS Provisioning in Wireless Mesh Networks [Relazione in Atti di Convegno]
Leoncini, Mauro; Santi, P; Valente, Paolo
abstract

Providing strong QoS guarantees for wireless multi-hop networks is very challenging, due to many factors such as use of a shared communication medium, variability in wireless link quality, and so on. However, wireless mesh technology gives the opportunity to alleviate some of these problems, due to lack of mobility in the wireless infrastructure, and presence of natural centralization points in the network. The main contribution of this paper is the definition of a simple framework that exploits these features to provide provable, strong QoS guarantees to network clients. In particular, admitted clients are guaranteed a certain minimum bandwidth and maximum delay on their connections. The framework is based on STDMA scheduling at the MAC layer, which is periodically executed at the network manager to adapt to changes in traffic demand. While scheduling computation is centralized, admission control is performed locally at the wireless backbone nodes, thus reducing signaling. We propose two bandwidth distribution and related admission control policies, which are at opposite ends of the network utilization/spatial fairness trade-off. Through extensive simulations, we show that the proposed framework achieves its design goals of providing strong QoS guarantees to VoIP clients while not sacrificing throughput in a realistic mesh network scenario, also in presence of highly unbalanced load at the backbone nodes. To the best of our knowledge, this is the first proposal with similar features for wireless mesh networks.

2007 - Exact GPS simulation and optimal fair scheduling with logarithmic complexity [Articolo su rivista]
Valente, Paolo
abstract

Generalized processor sharing (GPS) is a fluid scheduling policy providing perfect fairness over both constant-rate and variable-rate links. The minimum deviation (lead/lag) with respect to the GPS service achievable by a packet scheduler is one maximum packet size. To the best of our knowledge, the only packet scheduler guaranteeing the minimum deviation is worst-case fair weighted fair queueing , which requires on-line GPS simulation. Existing algorithms to perform GPS simulation have worst-case computational complexity per packet transmission (being the number of competing flows). Hence, has been charged for complexity too. However it has been proven that the lower bound complexity to guarantee deviation is, yet a scheduler achieving such a result has remained elusive so far. In this paper, we present L-GPS, an algorithm that performs exact GPS simulation with worst-case complexity and small constants. As such it improves the complexity of all the packet schedulers based on GPS simulation. We also present , an implementation of based on L-GPS. has complexity with small constants, and, since it achieves the minimum possible deviation, it does match the aforementioned complexity lower bound. Furthermore, both L-GPS and comply with constant-rate as well as variable-rate links. We assess the effectiveness of both algorithms by simulating real-world scenarios.

2005 - A unified framework for managing different resources with QoS guarantees [Relazione in Atti di Convegno]
Luigi, Palopoli; Tommaso, Cucinotta; Luca, Marzario; Antonio, Mancina; Valente, Paolo
abstract

This paper describes ongoing research activities aimed at developing a unified framework by which a general pur pose operative system is able to support applications with Quality of Service requirements. The work is focused on a feedback based dynamic management of the resources re quired by an application, where each resource is handled through the use of a Resource Reservation paradigm. This allows applications to share the access to a resource by specifying the fraction of usage, where such fractions are dynamically adapted by the system on the basis of observa tions made on the hosted activities. Research in this area comprises development of both a theoretical framework for modelling applications and analysing the impact of control theoretic feedback strategies to the QoS experienced by applications, and a prototype im plementation of an architecture with the ability of providing the needed functionality on a Linux Operative System.

2005 - An Upper Bound to the Lateness of Soft Real-time Tasks Scheduled by EDF on Multiprocessors [Relazione in Atti di Convegno]
Valente, Paolo; Giuseppe, Lipari
abstract

Multiprocessors are now commonplace for efficiently achieving high computational power, even in embedded systems. A considerable research effort is being addressed to schedulability analysis of global scheduling in symmetric multiprocessor platforms (SMP), where there is a global queue of ready tasks, and preemption and migration are allowed. In many soft real-time applications (as e.g. multimedia and telecommunication) a bounded lateness is often tolerated. Unfortunately, when considering priority-driven scheduling of periodic/sporadic tasks, previous results only focused on guaranteeing all deadlines, and provided worst-case utilization bounds that are lower than the maximum available computational power. In particular, until now, the existence of an upper bound on the lateness of soft real-time tasks for a fully utilized SMP was still an open problem. In this paper we do solve this problem by providing an upper bound to the lateness of periodic/sporadic tasks - with relative deadlines equal to periods/minimum inter-arrival times - scheduled by EDF on a SMP, under the only assumption that the total utilization is no higher than the total system capacity

2005 - Design and testing of scalable Web-based systems with performance constraints [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Valente, Paolo
abstract

Modern Web sites provide multiple services that are deployed through complex infrastructures consisting of distributed components, processes and server nodes. The importance and the economic impact of consumer-oriented Web sites introduces significant requirements, mainly in terms of user perceived performance and scalability. This paper presents some methods for the design and testing of modern, dynamic Web sites and for the improvement of existing systems that must satisfy some performance constraints even in the case of unpredictable load variations. A case study describes the application of the proposed methods to a typical consumer-oriented Web site.

2005 - Extending WF2Q+ to support a dynamic traffic mix [Relazione in Atti di Convegno]
Valente, Paolo
abstract

WF2Q+ is a packet scheduler providing optimal QoS guarantees at a low computational complexity. It allows a fraction of the total link capacity to be reserved to each packet flow to transmit, and it guarantees to each flow the minimum possible deviation with respect to its reserved service over any time interval. WF2Q+ has been defined assuming that the set of the packet flows to transmit is fixed and known at system design time. Unfortunately, such assumption is widely violated in many systems, such as Web servers or Internet routers. In this paper we propose a general scheme for extending WF2Q+ to support also the case where the set of packet flows to transmit is unknown beforehand and varies over time. The scheme preserves the service guarantees provided by WF2Q+, and allows different tradeoffs to be realized between computational complexity and system responsiveness to changing traffic mixes. After investigating the pros and cons of the possible solutions based on such general scheme, we present a simple and efficient algorithm for enabling WF2Q+ to support a dynamic traffic mix

2004 - Exact GPS simulation with logarithmic complexity, and its application to an optimally fair scheduler [Relazione in Atti di Convegno]
Valente, Paolo
abstract

Generalized Processor Sharing (GPS) is a fluid scheduling policy providing perfect fairness. The minimum deviation (lead/lag) with respect to the GPS service achievable by a packet scheduler is one packet size. To the best of our knowledge, the only packet scheduler guaranteeing such minimum deviation is Worst-case Fair Weighted Fair Queueing (WF2Q), that requires on-line GPS simulation. Existing algorithms to perform GPS simulation have O(N) complexity per packet transmission (Nbeing the number of competing flows). Hence WF2Q has been charged for O(N) complexity too. Schedulers with lower complexity have been devised, but at the price of at least O(N) deviation from the GPS service, which has been shown to be detrimental for real-time adaptive applications and feedback based applications. Furthermore, it has been proven that the lower bound complexity to guarantee O(1) deviation is Omega(log N), yet a scheduler achieving such result has remained elusive so far.In this paper we present an algorithm that performs exact GPS simulation with O(log N) worst-case complexity and small constants. As such it improves the complexity of all the packet schedulers based on GPS simulation. In particular, using our algorithm within WF2Q, we achieve the minimum deviation from the GPS service with O(log N) complexity, thus matching the aforementioned lower bound. Furthermore, we assess the effectiveness of the proposed solution by simulating real-world scenarios.

2004 - Hybrid: achieving deterministic fairness and high throughput in disk scheduling [Relazione in Atti di Convegno]
Rizzo, L; Valente, Paolo
abstract

Many computer applications require moving data from/to disk devices. Since the throughput of disk devices is extremely sensitive to the locality of accesses, the disk schedulers proposed in the literature typically assume the knowledge of the disks' physical parameters (e.g. seek and rotational latency) to meet the different Quality of Service requirements. Unfortunately, the unavoidable discrepancies between the estimated and actual values of such parameters affect the service guarantees, both in terms of response time and disk bandwidth distribution.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni