Mauro ANDREOLINI - personale UniMoRe

Nuova ricerca

Mauro ANDREOLINI

Ricercatore Universitario
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica

Pubblicazioni

2023 - A Framework for Automating Security Assessments with Deductive Reasoning [Relazione in Atti di Convegno]
Andreolini, M.; Artioli, A.; Ferretti, L.; Marchetti, M.; Colajanni, M.; Righi, C.
abstract

Proper testing of hardware and software infrastructure and applications has become mandatory. To this purpose, security researchers and software companies have released a plethora of domain specific tools, libraries and frameworks that assist human operators (penetration testers, red teamers, bug hunters) in finding and exploiting specific vulnerabilities, and orchestrating the activities of a security assessment. Most tools also require minor reconfigurations in order to operate properly with isomorphic systems, characterized by the same exploitation path even in presence of different configurations. In this paper we present a human-assisted framework that tries to overcome the aforementioned limitations. Our proposal is based on a Prolog-based expert system with facts and deductive rules that allow to infer new facts from existing ones. Rules are bound to actions whose results are fed back into the knowledge base as further facts. In this way, a security assessment is treated like a theorem that has to be proven. We have built an initial prototype and evaluated it in different security assessments of increasing complexity (jeopardy and boot-to-root machines). Our preliminary results show that the proposed approach can address the following challenges; (a) reaching non-standard goals (which would be missed by most tools and frameworks); (b) solving isomorphic systems without the need for reconfiguration; (c) identifying vulnerabilities from chained weaknesses and exposures.

2023 - How (Not) to Index Order Revealing Encrypted Databases [Relazione in Atti di Convegno]
Ferretti, L.; Trabucco, M.; Andreolini, M.; Marchetti, M.
abstract

Order Reveling Encryption (ORE) enables efficient range queries on encrypted databases, but may leak information that could be exploited by inference attacks. State-of-the-art ORE schemes claim different security guarantees depending on the adversary attack surface. Intuitively, online adversaries who access the database server at runtime may access information leakage; offline adversaries who access only a snapshot of the database data should not be able to gain useful information. We focus on offline security of the ORE scheme proposed by Lewi and Wu (LW-ORE, CCS 2016), which guarantees semantic security of ciphertexts stored in the database, but requires that ciphertexts are maintained sorted with regard to the corresponding plaintexts to support sublinear time queries. The design of LW-ORE does not discuss how to build indexing data structures to maintain sorting. The risk is that practitioners consider indexes as a technicality whose design does not affect security. We show that indexes can affect offline security of LW-ORE because they may leak duplicate plaintext values, and statistical information on plaintexts distribution and on transactions history. As a real-world demonstration, we found two open source implementations related to academic research (JISA 2018, VLDB 2019), and both adopt standard search trees which may introduce such vulnerabilities. We discuss necessary conditions for indexing data structures to be secure for ORE databases, and we outline practical solutions. Our analyses could represent an insightful lesson in the context of security failures due to gaps between theoretical modeling and actual implementation, and may also apply to other cryptographic techniques for securing outsourced databases.

2023 - Practical Evaluation of Graph Neural Networks in Network Intrusion Detection [Relazione in Atti di Convegno]
Venturi, A.; Pellegrini, D.; Andreolini, M.; Ferretti, L.; Marchetti, M.; Colajanni, M.
abstract

The most recent proposals of Machine and Deep Learning algorithms for Network Intrusion Detection Systems (NIDS) leverage Graph Neural Networks (GNN). These techniques create a graph representation of network traffic and analyze both network topology and netflow features to produce more accurate predictions. Although prior research shows promising results, they are biased by evaluation methodologies that are incompatible with real-world online intrusion detection. We are the first to identify these issues and to evaluate the performance of a state-of-the-art GNN-NIDS under real-world constraints. The experiments demonstrate that the literature overestimates the detection performance of GNN-based NIDS. Our results analyze and discuss the trade-off between detection delay and detection performance for different types of attacks, thus paving the way for the practical deployment of GNN-based NIDS.

2022 - DAGA: Detecting Attacks to in-vehicle networks via n-Gram Analysis [Articolo su rivista]
Stabili, D.; Ferretti, L.; Andreolini, M.; Marchetti, M.
abstract

Recent research showcased several cyber-attacks against unmodified licensed vehicles, demonstrating the vulnerability of their internal networks. Many solutions have already been proposed by industry and academia, aiming to detect and prevent cyber-attacks targeting in-vehicle networks. The majority of these proposals borrow security algorithms and techniques from the classical ICT domain, and in many cases they do not consider the inherent limitations of legacy automotive protocols and resource-constrained microcontrollers. This paper proposes DAGA, an anomaly detection algorithm for in-vehicle networks exploiting n-gram analysis. DAGA only uses sequences of CAN message IDs for the definition of the n-grams used in the detection process, without requiring the content of the payload or other CAN message fields. The DAGA framework allows the creation of detection models characterized by different memory footprints, allowing their deployment on microcontrollers with different hardware constraints. Experimental results based on three prototype implementations of DAGA showcase the trade off between hardware requirements and detection performance. DAGA outperforms the state-of-the-art detectors on the most performing microcontrollers, and can execute with lower performance on simple microcontrollers that cannot support the vast majority of IDS approaches proposed in literature. As additional contributions, we publicly release the full dataset and our reference DAGA implementations.

2022 - Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems [Articolo su rivista]
Apruzzese, Giovanni; Andreolini, Mauro; Ferretti, Luca; Marchetti, Mirco; Colajanni, Michele
abstract

2021 - DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems [Articolo su rivista]
Venturi, A.; Apruzzese, G.; Andreolini, M.; Colajanni, M.; Marchetti, M.
abstract

We present the first dataset that aims to serve as a benchmark to validate the resilience of botnet detectors against adversarial attacks. This dataset includes realistic adversarial samples that are generated by leveraging two widely used Deep Reinforcement Learning (DRL) techniques. These adversarial samples are proved to evade state of the art detectors based on Machine- and Deep-Learning algorithms. The initial corpus of malicious samples consists of network flows belonging to different botnet families presented in three public datasets containing real enterprise network traffic. We use these datasets to devise detectors capable of achieving state-of-the-art performance. We then train two DRL agents, based on Double Deep Q-Network and Deep Sarsa, to generate realistic adversarial samples: the goal is achieving misclassifications by performing small modifications to the initial malicious samples. These alterations involve the features that can be more realistically altered by an expert attacker, and do not compromise the underlying malicious logic of the original samples. Our dataset represents an important contribution to the cybersecurity research community as it is the first including thousands of automatically generated adversarial samples that are able to thwart state of the art classifiers with a high evasion rate. The adversarial samples are grouped by malware variant and provided in a CSV file format. Researchers can validate their defensive proposals by testing their detectors against the adversarial samples of the proposed dataset. Moreover, the analysis of these samples can pave the way to a deeper comprehension of adversarial attacks and to some sort of explainability of machine learning defensive algorithms. They can also support the definition of novel effective defensive techniques.

2021 - Message from the Program Chairs [Relazione in Atti di Convegno]
Colajanni, M.; Correia, M.; Andreolini, M.
abstract

2021 - Survivable zero trust for cloud computing environments [Articolo su rivista]
Ferretti, L.; Magnanini, F.; Andreolini, M.; Colajanni, M.
abstract

The security model relying on the traditional defense of the perimeter cannot protect modern dynamic organizations. The emerging paradigm called zero trust proposes a modern alternative that enforces access control on every request and avoids implicit trust based on the physical location of people and devices. These architectures rely on several trusted components, but existing proposals make the unrealistic assumption that attackers cannot compromise some of them. We overcome these assumptions and present a novel survivable zero trust architecture that can guarantee the necessary security level for cloud computing environments. The proposed architecture guarantees a high level of security and robustness and under specific conditions it can tolerate intrusions and can recover from failures and successful attacks.

2020 - A Framework for the Evaluation of Trainee Performance in Cyber Range Exercises [Articolo su rivista]
Andreolini, M.; Colacino, V. G.; Colajanni, M.; Marchetti, M.
abstract

This paper proposes a novel approach for the evaluation of the performance achieved by trainees involved in cyber security exercises implemented in modern cyber ranges. Our main contributions include: the definition of a distributed monitoring architecture for gathering relevant information about trainees activities; an algorithm for modeling the trainee activities using directed graphs; novel scoring algorithms, based on graph operations, that evaluate different aspects (speed, precision) of a trainee during an exercise. With respect to previous work, our proposal allows to measure exactly how fast a user is progressing towards an objective and where he does wrong. We highlight that this is currently not possible in the most popular cyber ranges.

2020 - AppCon: Mitigating evasion attacks to ML cyber detectors [Articolo su rivista]
Apruzzese, G.; Andreolini, M.; Marchetti, M.; Colacino, V. G.; Russo, G.
abstract

Adversarial attacks represent a critical issue that prevents the reliable integration of machine learning methods into cyber defense systems. Past work has shown that even proficient detectors are highly affected just by small perturbations to malicious samples, and that existing countermeasures are immature. We address this problem by presenting AppCon, an original approach to harden intrusion detectors against adversarial evasion attacks. Our proposal leverages the integration of ensemble learning to realistic network environments, by combining layers of detectors devoted to monitor the behavior of the applications employed by the organization. Our proposal is validated through extensive experiments performed in heterogeneous network settings simulating botnet detection scenarios, and consider detectors based on distinct machine-and deep-learning algorithms. The results demonstrate the effectiveness of AppCon in mitigating the dangerous threat of adversarial attacks in over 75% of the considered evasion attempts, while not being affected by the limitations of existing countermeasures, such as performance degradation in non-adversarial settings. For these reasons, our proposal represents a valuable contribution to the development of more secure cyber defense platforms.

2020 - Deep Reinforcement Adversarial Learning against Botnet Evasion Attacks [Articolo su rivista]
Apruzzese, G.; Andreolini, M.; Marchetti, M.; Venturi, A.; Colajanni, M.
abstract

As cybersecurity detectors increasingly rely on machine learning mechanisms, attacks to these defenses escalate as well. Supervised classifiers are prone to adversarial evasion, and existing countermeasures suffer from many limitations. Most solutions degrade performance in the absence of adversarial perturbations; they are unable to face novel attack variants; they are applicable only to specific machine learning algorithms. We propose the first framework that can protect botnet detectors from adversarial attacks through deep reinforcement learning mechanisms. It automatically generates realistic attack samples that can evade detection, and it uses these samples to produce an augmented training set for producing hardened detectors. In such a way, we obtain more resilient detectors that can work even against unforeseen evasion attacks with the great merit of not penalizing their performance in the absence of specific attacks. We validate our proposal through an extensive experimental campaign that considers multiple machine learning algorithms and public datasets. The results highlight the improvements of the proposed solution over the state-of-the-art. Our method paves the way to novel and more robust cybersecurity detectors based on machine learning applied to network traffic analytics.

2020 - Hardening Random Forest Cyber Detectors Against Adversarial Attacks [Articolo su rivista]
Apruzzese, G.; Andreolini, M.; Colajanni, M.; Marchetti, M.
abstract

Machine learning algorithms are effective in several applications, but they are not as much successful when applied to intrusion detection in cyber security. Due to the high sensitivity to their training data, cyber detectors based on machine learning are vulnerable to targeted adversarial attacks that involve the perturbation of initial samples. Existing defenses assume unrealistic scenarios; their results are underwhelming in non-adversarial settings; or they can be applied only to machine learning algorithms that perform poorly for cyber security. We present an original methodology for countering adversarial perturbations targeting intrusion detection systems based on random forests. As a practical application, we integrate the proposed defense method in a cyber detector analyzing network traffic. The experimental results on millions of labelled network flows show that the new detector has a twofold value: it outperforms state-of-the-art detectors that are subject to adversarial attacks; it exhibits robust results both in adversarial and non-adversarial scenarios.

2018 - A symmetric cryptographic scheme for data integrity verification in cloud databases [Articolo su rivista]
Ferretti, Luca; Marchetti, Mirco; Andreolini, Mauro; Colajanni, Michele
abstract

Cloud database services represent a great opportunity for companies and organizations in terms of management and cost savings. However, outsourcing private data to external providers leads to risks of confidentiality and integrity violations. We propose an original solution based on encrypted Bloom filters that addresses the latter problem by allowing a cloud service user to detect unauthorized modifications to his outsourced data. Moreover, we propose an original analytical model that can be used to minimize storage and network overhead depending on the database structure and workload. We assess the effectiveness of the proposal as well as its performance improvements with respect to existing solutions by evaluating storage and network costs through micro-benchmarks and the TPC-C workload standard.

2015 - A collaborative framework for intrusion detection in mobile networks [Articolo su rivista]
Andreolini, Mauro; Colajanni, Michele; Marchetti, Mirco
abstract

Abstract Mobile devices are becoming the most popular way of connection, but protocols supporting mobility represent a serious source of concerns because their initial design did not enforce strong security. This paper introduces a novel class of stealth network attacks, called mobility-based evasion, where an attacker splits a malicious payload in such a way that no part can be recognized by existing defensive mechanisms including the most modern network intrusion detection systems operating in stateful mode. We propose an original cooperative framework for intrusion detection that can prevent mobility-based evasion. The viability and performance of the proposed solution is shown through a prototype applied to Mobile IPv4, Mobile IPv6 and WiFi protocols.

2015 - A scalable monitor for large systems [Relazione in Atti di Convegno]
Andreolini, M.; Pietri, M.; Tosi, S.; Lancellotti, R.
abstract

Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multitenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.

2015 - Adaptive, scalable and reliable monitoring of big data on clouds [Articolo su rivista]
Andreolini, Mauro; Colajanni, Michele; Pietri, Marcello; Tosi, Stefania
abstract

Real-time monitoring of cloud resources is crucial for a variety of tasks such as performance analysis, workload management, capacity planning and fault detection. Applications producing big data make the monitoring task very difficult at high sampling frequencies because of high computational and communication overheads in collecting, storing, and managing information. We present an adaptive algorithm for monitoring big data applications that adapts the intervals of sampling and frequency of updates to data characteristics and administrator needs. Adaptivity allows us to limit computational and communication costs and to guarantee high reliability in capturing relevant load changes. Experimental evaluations performed on a large testbed show the ability of the proposed adaptive algorithm to reduce resource utilization and communication overhead of big data monitoring without penalizing the quality of data, and demonstrate our improvements to the state of the art.

2014 - Monitoring large cloud-based systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Pietri, Marcello; Tosi, Stefania; Balboni, Andrea
abstract

Large scale cloud-based services are built upon a multitude of hardware and software resources, disseminated in one or multiple data centers. Controlling and managing these resources requires the integration of several pieces of software that may yield a representative view of the data center status. Today’s both closed and open-source monitoring solutions fail in different ways, including the lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that addresses the aforementioned issues. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through a bottleneck analysis achieved by experimental results.

2014 - Resilient and adaptive networked systems [Capitolo/Saggio]
Andreolini, M.; Casolari, S.; Pietri, M.; Tosi, S.
abstract

Nowadays, networks form the backbone of most of the computing systems, and modern system infrastructures must accommodate continuously changing demands for different types of workloads and time constraints. In a similar context, adaptive management of virtualized application environments among networked systems is becoming one of the most important strategies to guarantee resilience and performance of available computing resources. In this chapter, Mauro Andreolini et al. analyze the management algorithms that decide, in an adaptive manner, the transparent reallocation of live sessions of virtual machines in large numbers of networked hosts. They discuss the main challenges and solutions related to the adaptive activation of the migration process.

2013 - Real-time adaptive algorithm for resource monitoring [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Pietri, Marcello; Stefania, Tosi
abstract

In large scale systems, real-time monitoring of hardware and software resources is a crucial means for any management purpose. In architectures consisting of thousands of servers and hundreds of thousands of component resources, the amount of data monitored at high sampling frequencies represents an overhead on system performance and communication, while reducing sampling may cause quality degradation. We present a real-time adaptive algorithm for scalable data monitoring that is able to adapt the frequency of sampling and data updating for a twofold goal: to minimize computational and communication costs, to guarantee that reduced samples do not affect the accuracy of information about resources. Experiments carried out on heterogeneous data traces referring to synthetic and real environments confirm that the proposed adaptive approach reduces utilization and communication overhead without penalizing the quality of data with respect to existing monitoring algorithms.

2012 - A scalable architecture for real-time monitoring of large information systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Pietri, Marcello
abstract

Data centers supporting cloud-based services are characterized by a huge number of hardware and software resources often cooperating in complex and unpredictable ways. Understanding the state of these systems for reasons of management and service level agreement requires scalable monitoring architectures that should gather and evaluate continuosly large flows in almost real-time periods. We propose a novel monitoring architecture that, by combining a hierarchical approach with decentralized monitors, addresses these challenges. In this context, fully centralized systems do not scale to the required number of flows, while pure peer-to-peer architectures cannot provide a global view of the system state. We evaluate the monitoring architecture for computational units of gathering and evaluation in real contexts that demonstrate the scalability potential of the proposed system.

2012 - Improving application responsiveness with the BFQ disk I/O scheduler [Relazione in Atti di Convegno]
Valente, Paolo; Andreolini, Mauro
abstract

BFQ (Budget Fair Queueing) is a production-quality, proportional-share disk scheduler with a relatively large user base. Part of its success is due to a set of simple heuristics that we added to the original algorithm about one year ago. These heuristics are the main focus of this paper. The first heuristic enriches BFQ with one of the most desirable properties for a desktop or handheld system: responsiveness. The remaining heuristics improve the robustness of BFQ across heterogeneous devices, and help BFQ to preserve a high throughput under demanding workloads. To measure the performance of these heuristics we have implemented a suite of micro and macro benchmarks mimicking several real-world tasks, and have run it on three different systems with a single rotational disk. We have also compared our results against Completely Fair Queueing (CFQ), the default Linux disk scheduler.

2011 - A software architecture for the analysis of large sets of data streams in cloud infrastructures [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Tosi, Stefania
abstract

System management algorithms in private andpublic cloud infrastructures have to work with literally thousands of data streams generated from resource, applicationand event monitors. This cloud context opens two novel issuesthat we address in this paper: how to design a softwarearchitecture that is able to gather and analyze all informationwithin real-time constraints; how it is possible to reduce theanalysis of the huge collected data set to the investigationof a reduced set of relevant information. The application ofthe proposed architecture is based on the most advancedsoftware components, and is oriented to the classiﬁcation of thestatistical behavior of servers and to the analysis of signiﬁcantstate changes. These results guide model-driven managementsystems to investigate only relevant servers and to applysuitable decision models considering the determ

2011 - Assessing the overhead and scalability of system monitors for large data centers [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

Current data centers are shifting towards cloud-based architectures as a means to obtain a scalable, cost-effective, robust service platform. In spite of this, the underlying management infrastructure has grown in terms of hardware resources and software complexity, making automated resource monitoring a necessity.There are several infrastructure monitoring tools designed to scale to a very high number of physical nodes. However, these tools either collect performance measure at a low frequency (missing the chance to capture the dynamics of a short-term management task) or are simply not equipped with instrumentation specific to cloud computing and virtualization. In this scenario, monitoring the correctness and efficiency of live migrations can become a nightmare. This situation will only worsen in the future, with the increased service demand due to spreading of the user base.In this paper, we assess the scalability of a prototype monitoring subsystem for different user scenarios. We also identify all the major bottlenecks and give insight on how to remove them.

2011 - Dynamic request management algorithms for Web-based services in cloud computing [Relazione in Atti di Convegno]
Lancellotti, Riccardo; Andreolini, Mauro; Canali, Claudia; Colajanni, Michele
abstract

Service providers of Web-based services can take advantage ofmany convenient features of cloud computing infrastructures, but theystill have to implement request management algorithms that are able toface sudden peaks of requests. We consider distributed algorithmsimplemented by front-end servers to dispatch and redirect requests amongapplication servers. Current solutions based on load-blind algorithms, orconsidering just server load and thresholds are inadequate to cope with thedemand patterns reaching modern Internet application servers. In thispaper, we propose and evaluate a request management algorithm, namelyPerformanceGain Prediction, that combines several pieces ofinformation (server load, computational cost of a request, usersession migration and redirection delay) to predict whether theredirection of a request to another server may result in a shorterresponse time. To the best of our knowledge, no other studycombines information about infrastructure status, user requestcharacteristics and redirection overhead for dynamic requestmanagement in cloud computing. Our results showthat the proposed algorithm is able to reduce the responsetime with respect to existing request management algorithmsoperating on the basis of thresholds.

2010 - A hierarchical architecture for on-line control of private cloud-based systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, Sara; Tosi, Stefania
abstract

Several enterprise data centers are adopting the private cloud computing paradigm as a scalable, cost-effective, robust way to provide services to their end users. The management and control of the underlying hw/sw infrastructure pose several interesting problems. In this paper we are interested to evidence that the monitoring process needs to scale to thousands of heterogeneous resources at different levels (system, network, storage, application) and at different time scales; it has to cope with missing data and detect anomalies in the performance samples; it has to transform all data into meaningful information and pass it to the decision process (possibly through different, ad-hoc algorithms for different resources). In most cases of interest for this paper, the control management system must operate under real-time constraints. We propose a hierarchical architecture that is able to support the efficient orchestration of an on-line management mechanism for a private cloud-based infrastructure. This architecture integrates a framework that collects samples from monitors, validates and aggregates them. We motivate the choice of a hierarchical scheme and show some data manipulation, orchestration and control strategies at different time scales. We then focus on a specific context referring to mid-term management objectives.We have applied the proposed hierarchical architecture successfully to data centers made of a large number of nodes that require short to mid-term control and in our experience we can conclude that it is a viable approach for the control of private cloud-based systems.

2010 - Dynamic load management of virtual machines in cloud architectures [Relazione in Atti di Convegno]
Andreolini, M.; Casolari, S.; Colajanni, M.; Messori, M.
abstract

Cloud infrastructures must accommodate changing demands for different types of processing with heterogeneous workloads and time constraints. In a similar context, dynamic management of virtualized application environments is becoming very important to exploit computing resources, especially with recent virtualization capabilities that allow live sessions to be moved transparently between servers. This paper proposes novel management algorithms to decide about reallocations of virtual machines in a cloud context characterized by large numbers of hosts. The novel algorithms identify just the real critical instances and take decisions without recurring to typical thresholds. Moreover, they consider load trend behavior of the resources instead of instantaneous or average measures. Experimental results show that proposed algorithms are truly selective and robust even in variable contexts, thus reducing system instability and limit migrations when really necessary. © Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering 2010.

2010 - Open Source Live Distributions for Computer Forensics [Relazione in Atti di Convegno]
Giustini, Giancarlo; Andreolini, Mauro; Colajanni, Michele
abstract

Current distributions of open source forensic software provide digital investigators with a large set of heterogeneous tools. Their use is not always focused on the target and requires high technical expertise. We present a new GNU/Linux live distribution, named CAINE (Computer Aided INvestigative Environment) that contains a collection of tools wrapped up into a user friendly environment. The CAINE forensic framework introduces novel important features, aimed at filling the interoperability gap across different forensic tools. Moreover, it provides a homogeneous graphical interface that drives digital investigators during the acquisition and analysis of electronic evidence, and it offers a semi-automatic mechanism for the creation of the final report.

2009 - A flexible and robust lookup algorithm for P2P systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Lancellotti, Riccardo
abstract

One of the most critical operations performed in a P2P system is the lookup of a resource. The main issues to be addressed by lookup algorithms are: (1) support for flexible search criteria (e.g., wildcard or multi-keyword searches), (2) effectiveness - i.e., ability to identify all the resources that match the search criteria, (3) efficiency - i.e. low overhead, (4) robustness with respect to node failures and churning. Flood-based P2P networks provide flexible lookup facilities and robust performance at the expense of high overhead, while other systems (e.g. DHT) provide a very efficient lookup mechanism, but lacks flexibility.In this paper, we propose a novel resource lookup algorithm, namely fuzzy-DHT, that solves this trade-off by introducing a flexible and robust lookup criteria based on multiple keywords on top of a distributed hash table algorithm. We demonstrate that the fuzzy-DHT algorithm satisfies all the requirements of P2P lookup systems combining the flexibility of flood-based mechanisms while preserving high efficiency, effectiveness ad robustness.

2009 - Dynamic load management of virtual machines in a cloud architecture [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, Sara; Colajanni, Michele; Messori, Michele
abstract

Cloud infrastructures must accommodate changing demandsfor different types of processing with heterogeneous workloads and time constraints. In a similar context, dynamic management of virtualized application environments is becoming very important to exploit computing resources, especially with recent virtualization capabilities that allow live sessions to be moved transparently between servers. This paper proposes novel management algorithms to decide about reallocations of virtual machines in a cloud context characterized by large numbers of hosts. The novel algorithms identify just the real critical instances and take decisions without recurring to typical thresholds. Moreover, they consider load trend behavior of the resources instead of instantaneous or average measures. Experimental results show that proposed algorithmsare truly selective and robust even in variable contexts, thus reducing system instability and limit migrations when really necessary.

2008 - Autonomic request management algorithms for geographically distributed Internet-based systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, S; Colajanni, Michele
abstract

Supporting Web-based services through geographical distributed clusters of servers is a common solution to the increasing volume and variability of modern traffic. These architectures pose interesting challenges to request management strategies where the most important goal is not to achieve maximum performance, but to guarantee stable and robust results. In this paper, we propose novel request management algorithms that are based on autonomic principles that is, on loose collaboration among the closest nodes and no knowledge about the global system state. Experimental evaluation shows that our autonomic-enhanced algorithms can guarantee robust performance in a variety of settings and reduce standard deviations of the response times with respect to existing request management algorithms.

2008 - CAINE: A new open-source live distribution for digital forensics [Relazione in Atti di Convegno]
Giustini, G; Andreolini, Mauro; Colajanni, Michele
abstract

Current distributions of open source forensic software provide digital investigators with a large set of heterogeneous programs. Their use is barely focused on the target and requires high technical expertise. We present a GNU/Linux live distribution, named CAINE (Computer Aided INvestigative Environment), that contains a collection of tools wrapped up into a user friendlyenvironment. The CAINE forensic framework introduces novel important features, because it aims to fill the interoperability gap across different forensic tools, it provides a homogeneous GUI that drives digital investigators during the acquisition and analysis of electronic evidence, it offers a semi-automatic process for the documentation and report compilation.

2008 - Impact of technology trends on the performance of current and future Web-based systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

The hardware technology continues to improve at a considerable rate. Besides the Moore law increments of the CPU speed, in the last years the capacity of the main memory is increasing at an even more impressive rate. One of the consequences of a continuous increment of the main memory resources is the possibility of designing and implementing memory-embedded Web sites in the near future, where both the static resources and the database information is kept in the main memory of the server machines. In this paper, we evaluate the impact of memory and network technology trends on the performance of e-commerce sites that continue to be an important reference for Web-based services in terms of complexity of the hardware/software technology and in terms of performance, availability and scalability requirements. However, most of the achieved considerations can be easily extended to other Webbased services. We demonstrate through experiments on a real system how the system bottlenecks change depending on the amount of memory that is (or will be) available for storing the information of a Web site, taking or not into account the effects of a WAN. This analysis allows us to anticipate some indications about the interventions on the hardware/software components that could improve the capacity of present and future Web-based services.

2008 - Models and framework for supporting run-time decisions in Web-based systems [Articolo su rivista]
Andreolini, Mauro; Casolari, Sara; Colajanni, Michele
abstract

Efficient management of distributed Web-based systems requires several mechanisms that decide on request dispatching, load balance, admission control, request redirection. The algorithms behind these mechanisms typically make fast decisions on the basis of the load conditions of the system resources. The architecture complexity and workloads characterizing most Web-based services make it extremely difficult to deduce a representative view of a resource load from collected measures that show extreme variability even at different time scales. Hence, any decision based on instantaneous or average views of the system load may lead to useless or even wrong actions. As an alternative, we propose a two-phase strategy that first aims to obtain a representative view of the load trend from measured system values and then applies this representation to support runtime decision systems. We consider two classical problems behind decisions: how to detect significant and nontransient load changes of a system resource and how to predict its future load behavior. The two-phase strategy is based on stochastic functions that are characterized by a computational complexity that is compatible with runtime decisions. We describe, test, and tune the two-phase strategy by considering as a first example a multitier Web-based system that is subject to different classes of realistic and synthetic workloads. Also, we integrate the proposed strategy into a framework that we validate by applying it to support runtime decisions in a cluster Web system and in a locally distributed Network Intrusion Detection System.

2008 - Runtime prediction models for Web-based system resources [Relazione in Atti di Convegno]
Casolari, Sara; Andreolini, Mauro; Colajanni, Michele
abstract

Several activities of Web-based architectures are managed by algorithms that take runtime decisions on the basis of continuous information about the state of the internal system resources. The problem is that in this extremely dynamic context the observed data points are characterized by high variability, dispersion and noise at different time scales to the extent that existing models cannot guarantee accurate predictions at runtime. In this paper, we evaluate the predictability of the internal resource state and point out the necessity to filter the noise of raw data measures. We then verify that more accurate prediction models are required which take into account the non stationary effects of the data sets, the time series trends and the runtime constraints. To these purposes, we propose a new prediction model, called trend-aware regression. It is specifically designed to deal with on the fly and short-term forecast of time series which originate from filtered data points belonging to internal resources of Web system. The experiment evaluation for different workload scenarios shows that the proposed trend-aware regression model improves the prediction accuracy with respect to popular algorithms based on auto-regressive and linear models, while satisfying the computational constraints of runtime prediction.

2007 - Dynamic load balancing for network intrusion detection systems based on distributed architectures [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, Sara; Colajanni, Michele; Marchetti, Mirco
abstract

Increasing traffic and the necessity of stateful analyses impose strong computational requirements on network intrusion detection systems (NIDS), and motivate the need of distributed architectures with multiple sensors. In a context of high traffic with heavy tailed characteristics, static rules for dispatching traffic slices among distributed sensors cause severe imbalance. Hence, the distributed NIDS architecture must be combined with adequate mechanisms for dynamic load redistribution.In this paper, we propose and compare different policies for the activation/deactivation of the dynamic load balancer. In particular, we consider and compare single vs. double threshold schemes, and load representations based on resource measures vs. load aggregation models.Our experimental results show that the best combination of a double threshold scheme with a linear aggregation of resource measures is able to achieve a really satisfactory balance of the sensor loads together with a sensible reduction of the number of load balancer activations.

2007 - Impact of request dispatching granularity in geographically distributed Web systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Canali, Claudia; Lancellotti, Riccardo
abstract

The advent of the mobileWeb and the increasing demand for personalized contents arise the need for computationally expensive services, such as dynamic generation and on-the-fly adaptation of contents. Providing these services exacerbates the performance issues that have to be addressed by the underlying Web architecture. When performance issues are addressed through geographically distributed Web systems with a large number of nodes located on the network edge, the dispatching mechanism that distributes requests among the system nodes becomes a critical element. In this paper, we investigate how the granularity of request dispatching may affect the performance of a distributed Web system for personalized contents. Through a real prototype, we compare dispatching mechanisms operating at various levels of granularity for different workload and network scenarios. We demonstrate that the choice of the best granularity for request dispatching strongly depends on the characteristics of the workload in terms of heterogeneity and computational requirements. A coarsegrain dispatching is preferable only when the requests have similar computational requirements. In all other instances of skewed workloads, that we can consider more realistic, a fine-grain dispatching augments the control on the node load and allows the system to achieve better performance.

2007 - Self-inspection mechanisms for the support of autonomic decisions in Internet-based systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, S; Colajanni, Michele
abstract

Any autonomic system must implement mechanisms to automatically capture the most significant information about the internal state and also adapt the monitoring system to internal and external conditions. We refer to these activities as self-inspection and we consider them in the context of Internet-based services that are subject to workloads characterized by burst arrivals and heavy-tailed distributions. The large majority of the mechanisms driving these systems must take fast decisions on the basis of past and/or present load conditions of the system resources. In this context, self-inspection requires an adequate representation of the load behavior of the system resources that makes it possible to perform good actions under soft real-time constraints. In this paper, we show through a large set of experiments the need of basing load analyses and decisions on linear and non-linear models, such as the Exponential Moving Average and the 90-percentile models. All the considered models are applied to a multi-tier Web-based system that is instrumented with suitable self-inspection mechanisms at operating system level. However, the results can be extended to other Internet-based contexts where the systems are characterized by similar workload and resource behaviors.

2007 - Trend-based load balancer for a distributed Web system [Relazione in Atti di Convegno]
Andreolini, M.; Casolari, S.; Colajanni, M.
abstract

The unexpected and continuous changes of the workload reaching any Internet-based service make really difficult to guarantee a balanced utilization of the server resources. In this paper, we propose a novel class of state-aware dispatching algorithms that take into account not only the present resource load but also the behavioral trend of the server load, that is, whether it is increasing, decreasing or oscillating. We apply one algorithm of this class to a multitier Web-based system and demonstrate that it is able to improve load balancing of the most critical server resources. © 2008 IEEE.

2007 - Trend-based load balancer for a multi-tier distributed system [Relazione in Atti di Convegno]
Andreolini, Mauro; Casolari, S; Colajanni, Michele
abstract

2006 - A distributed architecture for gracefully degradable Web-based services [Relazione in Atti di Convegno]
Andreolini, Mauro; S., Casolari; Colajanni, Michele
abstract

Modern Web sites provide multiple services that are often deployed through distributed architectures. The importance and the economic impact of Web-based services introduces significant requirements in terms of performance and quality of service. In this paper, we propose an access control mechanism for dynamic, Web-based systems. The proposed architecture takes into account two goals: the service of all requests pertaining to an admitted session until system saturation, and a graceful, controlled degradation of performance in case of overwhelming user request loads. The session-oriented behavior is obtained through an admission control mechanism that denies access to requests starting new sessions if the system is judged as overloaded. Graceful degradation is achieved through the refusal of single requests with increasing priority. Static priorities, determined for example, by the user category (guest, member, gold) are taken into account first. If the system is still in overload, the access control mechanism evaluates dinamically the popularity of single requests, and drops the least popular requests.

2006 - Load prediction models in Web-based systems. [Relazione in Atti di Convegno]
Andreolini, Mauro; S., Casolari
abstract

Run-time management of modern Web-based services requires the integration of several algorithms and mechanisms for job dispatching, load sharing, admission control, overload detection. All these algorithms should take decisions on the basis of present and/or future load conditions of the system resources. In particular, we address the issue of predicting future resource loads under real-time constraints in the context of Internet-based systems. In this situation, it is extremely difficult to deduce a representative view of a system resource from collected raw measures that show very large variability even at different time scales. For this reason, we propose a two-step approach that first aims to get a representative view of the load trend from measured raw data, and then applies a load prediction algorithm to load trends. This approach is suitable to support different decision systems even for highly variable contexts and is characterized by a computational complexity that is compatible to run-time decisions. The proposed models are applied to a multi-tier Web-based system, but the results can be extended to other Internet-based contexts where the systems are characterized by similar workloads and resource behaviors.

2006 - Web System Reliability and Performance [Capitolo/Saggio]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

Abstract. Modern Web sites provide multiple services that are deployed through complex technologies. The importance and the economic impact of consumeroriented Web sites introduce significant requirements in terms of performance and reliability. This chapter presents some methods for the design of novel Web sites and for the improvement of existing systems that must satisfy some performance requirements even in the case of unpredictable load variations. The chapter is concluded with a case study that describes the application of the proposed methods to a typical consumeroriented Web site.

2005 - Design and testing of scalable Web-based systems with performance constraints [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Valente, Paolo
abstract

Modern Web sites provide multiple services that are deployed through complex infrastructures consisting of distributed components, processes and server nodes. The importance and the economic impact of consumer-oriented Web sites introduces significant requirements, mainly in terms of user perceived performance and scalability. This paper presents some methods for the design and testing of modern, dynamic Web sites and for the improvement of existing systems that must satisfy some performance constraints even in the case of unpredictable load variations. A case study describes the application of the proposed methods to a typical consumer-oriented Web site.

2005 - HoneySpam: Honeypots fighting spam at the source [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; F., Mazzoni; A., Bulgarelli
abstract

In this paper, we present the design and implementation of HoneySpam, a fully operating framework that is based on honeypot technologies and is able to address the most common malicious spammer activities. The idea is that of limiting unwanted traffic by fighting spamming at the sources rather than at the receivers, as it is done by the large majority of present proposals and products. The features of HoneySpam include slowdown of the e-mail harvesting process, poisoning of e-mail databases through apparently working addresses, increased spammer traceability through the use of fake open proxies and open relays.

2005 - Impact of memory technology trends on performance of Web systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

The hardware technology continues to improve at a considerable rate. Besides the Moore law increments of the CPU speed, it should be considered that the capacity of the main memory in the last years is increasing at an even more impressive rate. One of the consequences of a continuous increment of memory resource is that we can design and implement memory-embedded Web sites, where both the static resources and the database information is kept in main memory. In this paper, we evaluate the impact of memory trends on the performance of e-commerce sites that continue to be an important reference for Internet-based services in terms of complexity of the hardware/software technology and in terms of performance, availability and scalability requirements. However, most results are valid even for other Web-based services. We demonstrate through experiments on a real system how the system bottlenecks change depending on the amount of memory that is (or will be) available for the Web site data. This analysis allows us to anticipate the interventions on the hardware/software components that could improve the capacity of present and future Web systems for content generation and delivery.

2005 - Impact of technology trends on performance of Web-based services [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

2005 - Web system reliability and performance: design and testing methodologies [Capitolo/Saggio]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

Modern Web sites provide multiple services that are deployed through complex technologies. The importance and the economic impact of consumeroriented Web sites introduce significant requirements in terms of performance and reliability. This chapter presents some methods for the design of novel Web sites and for the improvement of existing systems that must satisfy some performance requirements even in the case of unpredictable load variations. The chapter is concluded with a case study that describes the application of the proposed methods to a typical consumeroriented Web site.

2004 - A cluster-based Web system providing differentiated and guaranteed services [Articolo su rivista]
Andreolini, Mauro; Casalicchio, E.; Colajanni, Michele; Mambelli, M.
abstract

In a world where many users rely on the Web for up-to-date personal and business information and transactions, it is fundamental to build Web systems that allow service providers to differentiate user expectations with multi-class Service Level Agreements (SLAs). In this paper we focus on the server components of the Web, by implementing QoS principles in a Web-server cluster that is, an architecture composed by multiple servers and one front-end node called Web switch. We first propose a methodology to determine a set of confident SLAs in a real Web cluster for multiple classes of users and services. We then decide to implement at the Web switch level all mechanisms that transform a best-effort Web cluster into a QoS-enhanced system. We also compare three QoS-aware policies through experimental results in a real test-bed system. We show that the policy implementing all QoS principles allows a Web content provider to guarantee the contractual SLA targets also in severe load conditions. Other algorithms lacking some QoS principles cannot be used for respecting SLA constraints although they provide acceptable performance for some load and system conditions.

2004 - A cluster-based Web system providing differentiated and guaranteed services [Articolo su rivista]
Andreolini, Mauro; Casalicchio, E.; Colajanni, Michele; Mambelli, M.
abstract

2004 - Analysis of peer-to-peer systems: workload characterization and effects on traffic cacheability [Relazione in Atti di Convegno]
Andreolini, Mauro; Lancellotti, Riccardo; Yu, P. S.
abstract

Peer-to-peer file sharing networks have emerged as a new popular application in the Internet scenario. In this paper, we provide an analytical model of the resources size and of the contents shared at a given node. We also study the composition of the content workload hosted in the Gnutella network over time. Finally, we investigate the negative impact of oversimplified hypotheses (e.g., the use of filenames as resource identifiers) on the potentially achievable hit rate of a file sharing cache. The message coming out of our findings is clear: file sharing traffic can be reduced by using a cache to minimize download time and network usage. The design and tuning of the cache server should take into account the presence of different resources sharing the same name and should consider push-based downloads. Failing to do so can result in reduced effectiveness of the caching mechanism.

2004 - Fine grain performance evaluation of e-commerce sites [Articolo su rivista]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo; Mazzoni, F.
abstract

E-commerce sites are still a reference for the Web technology in terms of complexity and performance requirements, including availability and scalability. In this paper we show that a coarse grain analysis, that is used in most performance studies, may lead to incomplete or false deductions about the behavior of the hardware and software components supporting e-commerce sites. Through a fine grain performance evaluation of a medium size e-commerce site, we find some interesting results that demonstrate the importance of an analysis approach that is carried out at the software function level with the combination of distribution oriented metrics instead of average values.

2004 - Open issues in self-inspection and self-decision mechanisms for supporting complex and heterogeneous information systems [Relazione in Atti di Convegno]
Colajanni, Michele; Andreolini, Mauro; Lancellotti, Riccardo
abstract

Self-* properties seem an inevitable mean to manage the increasing complexity of networked information systems. The implementation of these properties imply sophisticated software and decision supports. Most research results have focused on the former aspects with many proposals of passing from traditional to reflective middleware. In this paper we focus instead on the supports to the run-time decisions that any self-* software should take, independently of the underlying software used to achieve some self-properties. We evidence the problems of self-inspection and self-decision models and mechanisms that have to operate in real-time and in extremely heterogeneous environments. Without an adequate solution to these inspection and decision problems, self-* systems have no chance of real applicability to complex and heterogeneous information systems.

2004 - Peer-to-Peer workload characterization: techniques and open issues [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Lancellotti, Riccardo
abstract

The popularity of peer-to-peer file sharing networks has attracted multiple interests even in the research community. In this paper, we focus on workload characterization of file-sharing systems that should be at the basis of performance evaluation and investigations for possible improvements.The contribution of this paper is twofold: first, we provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection.We also point out open issues in file-sharing workload characterization and suggest novel approaches to workload studies.

2003 - Benchmarking of Locally and Geographically Distributed Web-Server Systems [Relazione in Atti di Convegno]
Colajanni, Michele; Andreolini, Mauro; V., Cardellini
abstract

abstract

2003 - Kernel-based Web switches providing content-aware routing [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Nuccio, Marcello
abstract

Locally distributed Web server systems represent a cost-effective solution to the performance problems due to high traffic volumes reaching popular Web sites. In this paper, we focus on architectures based on layer-7 Web switches because they allow a much richer set of possibilities for the Web site architecture, at the price of a scalability much lower than that provided by a layer-4 switch. In this paper, we compare the performance of three solutions for layer-7 Web switch: a two-way application-layer architecture, a two-way kernel-based architecture, and a one-way kernel-based architecture. We show quantitatively how much better the one-way architecture performs with respect to a two-way scheme, even if implemented at the kernel level. We conclude that an accurate implementation of a layer-7 Web switch may become a viable solution to the performance requirements of the majority of cluster-based information systems.

2003 - Scalability of content-aware server switches for cluster-based Web information systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Colajanni, Michele; Nuccio, M.
abstract

A cluster-based architecture with a front-end Web switch and locally distributed servers seems the most appreciated solution to face the ever increasing demand for complex services offered through Web interfaces. The complexity of the novel services is often related to the possibility of content-level identification and personalization that can be achieved through a content-aware front-end component. It is common belief that content-based operations prevent the scalability of the Web cluster, to the extent that a content-aware switch alone is seldom used as the front-end of a popular Web site. In this paper, we demonstrate that a careful design and optimized implementation choices based on a modern PC-based architecture can give a Web switch with content-aware functionality and very limited overheads. We present the design and prototype implementation of a so called one-way system based on Linux kernel, single CPU and SMP-based architectures, for HTTP/1.0 and HTTP/1.1 protocols. The experimental results confirm that the proposed solution is extremely scalable, thus making a content-aware Web switch a viable solution to the performance requirements of the majority of cluster-based architectures.

2002 - Benchmarking models and tools for distributed Web-server systems [Relazione in Atti di Convegno]
Andreolini, Mauro; Cardellini, V.; Colajanni, Michele
abstract

This tutorial reviews benchmarking tools and techniques that can be used to evaluate the performance and scalability of highly accessedWeb-server systems. The focus is on design and testing of locally and geographically distributed architectures where the performance evaluation is obtained through workload generators and analyzers in a laboratory environment. The tutorial identifies the qualities and issues of existing tools with respect to the main features that characterize a benchmarking tool (workload representation, load generation, data collection, output analysis and report) and theirapplicability to the analysis of distributed Web-server systems.

2002 - Performance study of dispatching algorithms in multi-tier Web architectures [Articolo su rivista]
Andreolini, Mauro; Colajanni, Michele; Morselli, R.
abstract

The number and heterogeneity of requests to Web sites are increasing also because the Web technology is becoming the preferred interface for information systems. Many systems hosting current Web sites are complex architectures composed by multiple server layers with strong scalability and reliability issues. In this paper we compare the performance of several combinations of centralized and distributed dispatching algorithms working at the first and second layer, and using different levels of state information. We confirm some known results about load sharing in distributed systems and give new insights to the problem of dispatching requests in multi-tier cluster-based Web systems.

2002 - QoS-aware switching policies for a locally distributed Web system [Relazione in Atti di Convegno]
Andreolini, Mauro; Casalicchio, E.; Colajanni, Michele; Mambelli, M.
abstract

We present the implementation and experiments of a Web switch that transforms a cluster-based Web system (Web cluster) with best-effort management policies into a system that gives guaranteed performance to different classes of users and applications.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni