Nuova ricerca


Dipartimento di Ingegneria "Enzo Ferrari"

Home | Curriculum(pdf) |


2022 - A big data platform exploiting auditable tokenization to promote good practices inside local energy communities [Articolo su rivista]
Gagliardelli, Luca; Zecchini, Luca; Ferretti, Luca; Beneventano, Domenico; Simonini, Giovanni; Bergamaschi, Sonia; Orsini, Mirko; Magnotta, Luca; Mescoli, Emma; Livaldi, Andrea; Gessa, Nicola; De Sabbata, Piero; D’Agosta, Gianluca; Paolucci, Fabrizio; Moretti, Fabio

The Energy Community Platform (ECP) is a modular system conceived to promote a conscious use of energy by the users inside local energy communities. It is composed of two integrated subsystems: the Energy Community Data Platform (ECDP), a middleware platform designed to support the collection and the analysis of big data about the energy consumption inside local energy communities, and the Energy Community Tokenization Platform (ECTP), which focuses on tokenizing processed source data to enable incentives through smart contracts hosted on a decentralized infrastructure possibly governed by multiple authorities. We illustrate the overall design of our system, conceived considering some real-world projects (dealing with different types of local energy community, different amounts and nature of incoming data, and different types of users), analyzing in detail the key aspects of the two subsystems. In particular, the ECDP acquires data of a different nature in a heterogeneous format from multiple sources and supports a data integration workflow and a data lake workflow, designed for different uses of the data. We motivate our technological choices and present the alternatives taken into account, both in terms of software and of architectural design. On the other hand, the ECTP operates a tokenization process via smart contracts to promote good behaviors of users within the local energy community. The peculiarity of this platform is to allow external parties to audit the correct behavior of the whole tokenization process while protecting the confidentiality of the data and the performance of the platform. The main strengths of the presented system are flexibility and scalability (guaranteed by its modular architecture), which allow its applicability to any type of local energy community.

2022 - Big Data Integration & Data-Centric AI for eHealth [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni; Zecchini, Luca

La big data integration, ovvero l’integrazione di grandi quantità di dati provenienti da molteplici sorgenti, rappresenta una delle principali sfide per l’impiego di tecniche e strumenti basati sull’intelligenza artificiale in ambito medico (eHealth). In questo contesto risulta inoltre di primaria importanza garantire la qualità dei dati su cui operano tali strumenti e tecniche (Data-Centric AI), che rivestono un ruolo ormai centrale nel settore. Le attività di ricerca del Database Group (DBGroup) del Dipartimento di Ingegneria "Enzo Ferrari" dell’Università degli Studi di Modena e Reggio Emilia si muovono in questa direzione. Presentiamo quindi i principali progetti di ricerca del DBGroup nel campo dell’eHealth, che si inseriscono nell’ambito di collaborazioni in diversi settori applicativi.

2022 - Big Data Integration for Data-Centric AI [Abstract in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Simonini, Giovanni; Gagliardelli, Luca; Aslam, Adeel; De Sabbata, Giulio; Zecchini, Luca

Big data integration represents one of the main challenges for the use of techniques and tools based on Artificial Intelligence (AI) in several crucial areas: eHealth, energy management, enterprise data, etc. In this context, Data-Centric AI plays a primary role in guaranteeing the quality of the data on which these tools and techniques operate. Thus, the activities of the Database Research Group (DBGroup) of the “Enzo Ferrari” Engineering Department of the University of Modena and Reggio Emilia are moving in this direction. Therefore, we present the main research projects of the DBGroup, which are part of collaborations in various application sectors.

2022 - ECDP: A Big Data Platform for the Smart Monitoring of Local Energy Communities [Relazione in Atti di Convegno]
Gagliardelli, Luca; Zecchini, Luca; Beneventano, Domenico; Simonini, Giovanni; Bergamaschi, Sonia; Orsini, Mirko; Magnotta, Luca; Mescoli, Emma; Livaldi, Andrea; Gessa, Nicola; De Sabbata, Piero; D’Agosta, Gianluca; Paolucci, Fabrizio; Moretti3, Fabio

2022 - Entity Resolution On-Demand [Articolo su rivista]
Simonini, Giovanni; Zecchini, Luca; Bergamaschi, Sonia; Naumann, Felix

Entity Resolution (ER) aims to identify and merge records that refer to the same real-world entity. ER is typically employed as an expensive cleaning step on the entire data before consuming it. Yet, determining which entities are useful once cleaned depends solely on the user's application, which may need only a fraction of them. For instance, when dealing with Web data, we would like to be able to filter the entities of interest gathered from multiple sources without cleaning the entire, continuously-growing data. Similarly, when querying data lakes, we want to transform data on-demand and return the results in a timely manner---a fundamental requirement of ELT (Extract-Load-Transform) pipelines. We propose BrewER, a framework to evaluate SQL SP queries on dirty data while progressively returning results as if they were issued on cleaned data. BrewER tries to focus the cleaning effort on one entity at a time, following an ORDER BY predicate. Thus, it inherently supports top-k and stop-and-resume execution. For a wide range of applications, a significant amount of resources can be saved. We exhaustively evaluate and show the efficacy of BrewER on four real-world datasets.

2022 - Progressive Entity Resolution with Node Embeddings [Relazione in Atti di Convegno]
Simonini, Giovanni; Gagliardelli, Luca; Rinaldi, Michele; Zecchini, Luca; De Sabbata, Giulio; Aslam, Adeel; Beneventano, Domenico; Bergamaschi, Sonia

Entity Resolution (ER) is the task of finding records that refer to the same real-world entity, which are called matches. ER is a fundamental pre-processing step when dealing with dirty and/or heterogeneous datasets; however, it can be very time-consuming when employing complex machine learning models to detect matches, as state-of-the-art ER methods do. Thus, when time is a critical component and having a partial ER result is better than having no result at all, progressive ER methods are employed to try to maximize the number of detected matches as a function of time. In this paper, we study how to perform progressive ER by exploiting graph embeddings. The basic idea is to represent candidate matches in a graph: each node is a record and each edge is a possible comparison to check—we build that on top of a well-known, established graph-based ER framework. We experimentally show that our method performs better than existing state-of-the-art progressive ER methods on real-world benchmark datasets.

2022 - Task-Driven Big Data Integration [Relazione in Atti di Convegno]
Zecchini, Luca

Data integration aims at combining data acquired from different autonomous sources to provide the user with a unified view of this data. One of the main challenges in data integration processes is entity resolution, whose goal is to detect the different representations of the same real-world entity across the sources, in order to produce a unique and consistent representation for it. The advent of big data has challenged traditional data integration paradigms, making the offline batch approach to entity resolution no longer suitable for several scenarios (e.g., when performing data exploration or dealing with datasets that change with a high frequency). Therefore, it becomes of primary importance to produce new solutions capable of operating effectively in such situations. In this paper, I present some contributions made during the first half of my PhD program, mainly focusing on the design of a framework to perform entity resolution in an on-demand fashion, building on the results achieved by the progressive and query-driven approaches to this task. Moreover, I also briefly describe two projects in which I took part as a member of my research group, touching on some real-world applications of big data integration techniques, to conclude with some ideas on the future directions of my research.

2021 - Progressive Query-Driven Entity Resolution [Relazione in Atti di Convegno]
Zecchini, Luca

Entity Resolution (ER) aims to detect in a dirty dataset the records that refer to the same real-world entity, playing a fundamental role in data cleaning and integration tasks. Often, a data scientist is only interested in a portion of the dataset (e.g., data exploration); this interest can be expressed through a query. The traditional batch approach is far from optimal, since it requires to perform ER on the whole dataset before executing a query on its cleaned version, performing a huge number of useless comparisons. This causes a waste of time, resources and money. Proposed solutions to this problem follow a query-driven approach (perform ER only on the useful data) or a progressive one (the entities in the result are emitted as soon as they are solved), but these two aspects have never been reconciled. This paper introduces BrewER framework, which allows to execute clean queries on dirty datasets in a query-driven and progressive way, thanks to a preliminary filtering and an iteratively managed sorted list that defines emission priority. Early results obtained by first BrewER prototype on real-world datasets from different domains confirm the benefits of this combined solution, paving the way for a new and more comprehensive approach to ER.

2021 - The Case for Multi-task Active Learning Entity Resolution [Relazione in Atti di Convegno]
Simonini, Giovanni; Saccani, Henrique; Gagliardelli, Luca; Zecchini, Luca; Beneventano, Domenico; Bergamaschi, Sonia

2020 - Entity resolution on camera records without machine learning [Relazione in Atti di Convegno]
Zecchini, L.; Simonini, G.; Bergamaschi, S.

This paper reports the runner-up solution to the ACM SIGMOD 2020 programming contest, whose target was to identify the specifications (i.e., records) collected across 24 e-commerce data sources that refer to the same real-world entities. First, we investigate the machine learning (ML) approach, but surprisingly find that existing state-of-the-art ML-based methods fall short in such a context-not reaching 0.49 F-score. Then, we propose an efficient solution that exploits annotated lists and regular expressions generated by humans that reaches a 0.99 F-score. In our experience, our approach was not more expensive than the dataset labeling of match/non-match pairs required by ML-based methods, in terms of human efforts.