Nuova ricerca

FRANCESCO DEL BUONO

Assegnista di ricerca
Dipartimento di Ingegneria "Enzo Ferrari"


Home |


Pubblicazioni

2023 - An Intrinsically Interpretable Entity Matching System [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Guerra, F.; Paganelli, M.; Vincini, M.
abstract

Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.


2023 - Interpretable Clustering of Multivariate Time Series with Time2Feat [Articolo su rivista]
Bonifati, A.; Del Buono, F.; Guerra, F.; Lombardi, M.; Tiano, D.
abstract


2023 - Interpretable Entity Matching with WYM [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Guerra, F.; Guiduzzi, G.; Paganelli, M.; Vincini, M.
abstract


2022 - A Framework to Evaluate the Quality of Integrated Datasets [Articolo su rivista]
Buono, Francesco Del; Faggioli, Guglielmo; Paganelli, Matteo; Baraldi, Andrea; Guerra, Francesco; Ferro, Nicola
abstract


2022 - Analyzing How BERT Performs Entity Matching [Articolo su rivista]
Paganelli, M.; Del Buono, F.; Baraldi, A.; Guerra, F.
abstract

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions. In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching / non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.


2022 - Evaluating the integration of datasets [Relazione in Atti di Convegno]
Paganelli, Matteo; Buono, Francesco Del; Guerra, Francesco; Ferro, Nicola
abstract

Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.


2022 - Landmark Explanation: A Tool for Entity Matching [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Paganelli, M.; Guerra, F.
abstract

We introduce Landmark Explanation, a framework that extends the capabilities of a post-hoc perturbationbased explainer to the EM scenario. Landmark Explanation leverages on the specific schema typically adopted by the EM datasets, representing pairs of entity descriptions, for generating word-based explanations that effectively describe the matching model.


2022 - Novelty Detection with Autoencoders for System Health Monitoring in Industrial Environments [Articolo su rivista]
Del Buono, Francesco; Calabrese, Francesca; Baraldi, Andrea; Paganelli, Matteo; Guerra, Francesco
abstract

Predictive Maintenance (PdM) is the newest strategy for maintenance management in industrial contexts. It aims to predict the occurrence of a failure to minimize unexpected downtimes and maximize the useful life of components. In data-driven approaches, PdM makes use of Machine Learning (ML) algorithms to extract relevant features from signals, identify and classify possible faults (diagnostics), and predict the components’ remaining useful life (prognostics). The major challenge lies in the high complexity of industrial plants, where both operational conditions change over time and a large number of unknown modes occur. A solution to this problem is offered by novelty detection, where a representation of the machinery normal operating state is learned and compared with online measurements to identify new operating conditions. In this paper, a systematic study of autoencoder-based methods for novelty detection is conducted. We introduce an architecture template, which includes a classification layer to detect and separate the operative conditions, and a localizer for identifying the most influencing signals. Four implementations, with different deep learning models, are described and used to evaluate the approach on data collected from a test rig. The evaluation shows the effectiveness of the architecture and that the autoencoders outperform the current baselines.


2022 - Time2Feat: Learning Interpretable Representations for Multivariate Time Series Clustering [Articolo su rivista]
Bonifati, Angela; DEL BUONO, Francesco; Guerra, Francesco; Tiano, Donato
abstract


2021 - Automated Machine Learning for Entity Matching Tasks [Relazione in Atti di Convegno]
Paganelli, Matteo; DEL BUONO, Francesco; Pevarello, Marco; Guerra, Francesco; Vincini, Maurizio
abstract

The paper studies the application of automated machine learning approaches (AutoML) for addressing the problem of Entity Matching (EM). This would make the existing, highly effective, Machine Learning (ML) and Deep Learning based approaches for EM usable also by non-expert users, who do not have the expertise to train and tune such complex systems. Our experiments show that the direct application of AutoML systems to this scenario does not provide high quality results. To address this issue, we introduce a new component, the EM adapter, to be pipelined with standard AutoML systems, that preprocesses the EM datasets to make them usable by automated approaches. The experimental evaluation shows that our proposal obtains the same effectiveness as the state-of-the-art EM systems, but it does not require any skill on ML to tune it.


2021 - Transforming ML Predictive Pipelines into SQL with MASQ [Relazione in Atti di Convegno]
Del Buono, F.; Paganelli, M.; Sottovia, P.; Interlandi, M.; Guerra, F.
abstract

Inference of Machine Learning (ML) models, i.e. the process of obtaining predictions from trained models, is often an overlooked problem. Model inference is however one of the main contributors of both technical debt in ML applications and infrastructure complexity. MASQ is a framework able to run inference of ML models directly on DBMSs. MASQ not only averts expensive data movements for those predictive scenarios where data resides on a database, but it also naturally exploits all the "Enterprise-grade"features such as governance, security and auditability which make DBMSs the cornerstone of many businesses. MASQ compiles trained models and ML pipelines implemented in scikit-learn directly into standard SQL: no UDFs nor vendor-specific syntax are used, and therefore queries can be readily executed on any DBMS. In this demo, we will showcase MASQ's capabilities through a GUI allowing attendees to: (1) train ML pipelines composed of data featurizers and ML models; (2) compile the trained pipelines into SQL, and deploy them on different DBMSs (MySQL and SQLServer in the demo); and (3) compare the related performance under different configurations (e.g., the original pipeline on the ML framework against the SQL implementations).


2021 - Using Landmarks for Explaining Entity Matching Models [Relazione in Atti di Convegno]
Baraldi, Andrea; DEL BUONO, Francesco; Paganelli, Matteo; Guerra, Francesco
abstract

The state of the art approaches for performing Entity Matching (EM) rely on machine & deep learning models for inferring pairs of matching / non-matching entities. Although the experimental evaluations demonstrate that these approaches are effective, their adoption in real scenarios is limited by the fact that they are difficult to interpret. Explainable AI systems have been recently proposed for complementing deep learning approaches. Their application to the scenario offered by EM is still new and requires to address the specificity of this task, characterized by particular dataset schemas, describing a pair of entities, and imbalanced classes. This paper introduces Landmark Explanation, a generic and extensible framework that extends the capabilities of a post-hoc perturbation-based explainer over the EM scenario. Landmark Explanation generates perturbations that take advantage of the particular schemas of the EM datasets, thus generating explanations more accurate and more interesting for the users than the ones generated by competing approaches.