Nuova ricerca

Francesco GUERRA

Professore Ordinario
Dipartimento di Ingegneria "Enzo Ferrari"


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2024 - Explaining Entity Matching with Clusters of Words [Relazione in Atti di Convegno]
Benassi, R.; Guerra, F.; Paganelli, M.; Tiano, D.
abstract

Deep learning models achieve state-of-the-art per-formance in solving the task of Entity Matching, which aims to identify records that refer to the same real-world entity. However, they act as black-box models for the user, who has limited insights into the rationales behind their decisions. Several explainers (e.g., LIME, Mojito, Landmark, LEMON, and CERTA) have been proposed in the literature to address this issue. Their main focus is to generate explanations that are faithful to the model without considering their comprehensibility to the user. For example, verbose explanations could be very complex to analyze, hindering the model's understanding. In this paper, we propose CREW, an explanation system for Entity Matching models that combines the comprehensibility of the explanations and fidelity to the model. To achieve this, CREW creates explanations as clusters of words. The clusters are created by exploiting three different forms of knowledge: the semantic similarity of the words, their arrangement into the dataset attributes, and their importance in explaining the model. Experiments show that CREW generates explanations that are more interpretable for the user and more faithful to the model than those generated by competing explanation techniques.


2024 - Pushing ML Predictions into DBMSs (Extended Abstract) [Relazione in Atti di Convegno]
Paganelli, M.; Sottovia, P.; Park, K.; Interlandi, M.; Guerra, F.
abstract


2023 - A multi-facet analysis of BERT-based entity matching models [Articolo su rivista]
Paganelli, Matteo; Tiano, Donato; Guerra, Francesco
abstract


2023 - An Intrinsically Interpretable Entity Matching System [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Guerra, F.; Paganelli, M.; Vincini, M.
abstract

Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.


2023 - Interpretable Clustering of Multivariate Time Series with Time2Feat [Articolo su rivista]
Bonifati, A.; Del Buono, F.; Guerra, F.; Lombardi, M.; Tiano, D.
abstract


2023 - Interpretable Entity Matching with WYM [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Guerra, F.; Guiduzzi, G.; Paganelli, M.; Vincini, M.
abstract


2023 - Progetto di Basi di Dati Relazionali [Monografia/Trattato scientifico]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Guerra, Francesco; Vincini, Maurizio
abstract

L’obiettivo del volume è fornire al lettore le nozioni fondamentali di progettazione e di realizzazione di applicazioni di basi di dati relazionali. Relativamente alla progettazione, vengono trattate le fasi di progettazione concettuale e logica e vengono presentati i modelli dei dati Entity-Relationship e Relazionale che costituiscono gli strumenti di base, rispettivamente, per la progettazione concettuale e la progettazione logica. Viene inoltre introdotto lo studente alla teoria della normalizzazione di basi di dati relazionali. Relativamente alla realizzazione, vengono presentati elementi ed esempi del linguaggio standard per RDBMS (Relational Database Management Systems) SQL. Ampio spazio è dedicato ad esercizi svolti sui temi trattati.


2023 - Pushing ML Predictions into DBMSs [Articolo su rivista]
Paganelli, M.; Sottovia, P.; Park, K.; Interlandi, M.; Guerra, F.
abstract

In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with some sort of ML capability; (2) behind every web page, application, and enterprise there is a DBMS, whereby in-DBMS inference is an appealing solution both for efficiency (e.g., less data movement), performance (e.g., cross-optimizations between relational operators and ML) and governance. In this paper, we study whether DBMSs are a good fit for prediction serving. We introduce a technique for translating trained ML pipelines containing both featurizers (e.g., one-hot encoding) and models (e.g., linear and tree-based models) into SQL queries, and we compare in-DBMS performance against popular ML frameworks such as Sklearn and ml.net. Our experiments show that, when pushed inside a DBMS, trained ML pipelines can have performance comparable to ML frameworks in several scenarios, while they perform quite poorly on text featurization and over (even simple) neural networks.


2022 - A Framework to Evaluate the Quality of Integrated Datasets [Articolo su rivista]
Buono, Francesco Del; Faggioli, Guglielmo; Paganelli, Matteo; Baraldi, Andrea; Guerra, Francesco; Ferro, Nicola
abstract


2022 - Analyzing How BERT Performs Entity Matching [Articolo su rivista]
Paganelli, M.; Del Buono, F.; Baraldi, A.; Guerra, F.
abstract

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT, for generating highly contextualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions. In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching / non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.


2022 - Evaluating the integration of datasets [Relazione in Atti di Convegno]
Paganelli, Matteo; Buono, Francesco Del; Guerra, Francesco; Ferro, Nicola
abstract

Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.


2022 - Landmark Explanation: A Tool for Entity Matching [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Paganelli, M.; Guerra, F.
abstract

We introduce Landmark Explanation, a framework that extends the capabilities of a post-hoc perturbationbased explainer to the EM scenario. Landmark Explanation leverages on the specific schema typically adopted by the EM datasets, representing pairs of entity descriptions, for generating word-based explanations that effectively describe the matching model.


2022 - Novelty Detection with Autoencoders for System Health Monitoring in Industrial Environments [Articolo su rivista]
Del Buono, Francesco; Calabrese, Francesca; Baraldi, Andrea; Paganelli, Matteo; Guerra, Francesco
abstract

Predictive Maintenance (PdM) is the newest strategy for maintenance management in industrial contexts. It aims to predict the occurrence of a failure to minimize unexpected downtimes and maximize the useful life of components. In data-driven approaches, PdM makes use of Machine Learning (ML) algorithms to extract relevant features from signals, identify and classify possible faults (diagnostics), and predict the components’ remaining useful life (prognostics). The major challenge lies in the high complexity of industrial plants, where both operational conditions change over time and a large number of unknown modes occur. A solution to this problem is offered by novelty detection, where a representation of the machinery normal operating state is learned and compared with online measurements to identify new operating conditions. In this paper, a systematic study of autoencoder-based methods for novelty detection is conducted. We introduce an architecture template, which includes a classification layer to detect and separate the operative conditions, and a localizer for identifying the most influencing signals. Four implementations, with different deep learning models, are described and used to evaluate the approach on data collected from a test rig. The evaluation shows the effectiveness of the architecture and that the autoencoders outperform the current baselines.


2022 - Time2Feat: Learning Interpretable Representations for Multivariate Time Series Clustering [Articolo su rivista]
Bonifati, Angela; DEL BUONO, Francesco; Guerra, Francesco; Tiano, Donato
abstract


2021 - Automated Machine Learning for Entity Matching Tasks [Relazione in Atti di Convegno]
Paganelli, Matteo; DEL BUONO, Francesco; Pevarello, Marco; Guerra, Francesco; Vincini, Maurizio
abstract

The paper studies the application of automated machine learning approaches (AutoML) for addressing the problem of Entity Matching (EM). This would make the existing, highly effective, Machine Learning (ML) and Deep Learning based approaches for EM usable also by non-expert users, who do not have the expertise to train and tune such complex systems. Our experiments show that the direct application of AutoML systems to this scenario does not provide high quality results. To address this issue, we introduce a new component, the EM adapter, to be pipelined with standard AutoML systems, that preprocesses the EM datasets to make them usable by automated approaches. The experimental evaluation shows that our proposal obtains the same effectiveness as the state-of-the-art EM systems, but it does not require any skill on ML to tune it.


2021 - Landmark Explanation: An Explainer for Entity Matching Models [Relazione in Atti di Convegno]
Baraldi, A.; Del Buono, F.; Paganelli, M.; Guerra, F.
abstract

State-of-the-art approaches model Entity Matching (EM) as a binary classification problem, where Machine (ML) or Deep Learning (DL) based techniques are applied to evaluate if descriptions of pairs of entities refer to the same real-world instance. Despite these approaches have experimentally demonstrated to achieve high effectiveness, their adoption in real scenarios is limited by the lack of interpretability of their behavior. This paper showcases Landmark Explanation1, a tool that makes generic post-hoc (model-agnostic) perturbation-based explanation systems able to explain the behavior of EM models. In particular, Landmark Explanation computes local interpretations, i.e., given a description of a pair of entities and an EM model, it computes the contribution of each term in generating the prediction. The demonstration shows that the explanations generated by Landmark Explanation are effective even for non-matching pairs of entities, a challenge for explanation systems.


2021 - Preface [Relazione in Atti di Convegno]
Mottin, D.; Lissandrini, M.; Roy, S. B.; Velegrakis, Y.; Athanassoulis, M.; Augsten, N.; Hamadou, H. B.; Bergamaschi, S.; Bikakis, N.; Bonifati, A.; Dimou, A.; Di Rocco, L.; Fletcher, G.; Foroni, D.; Freytag, J. -C.; Groth, P.; Guerra, F.; Hartig, O.; Karras, P.; Ke, X.; Kondylakis, H.; Koutrika, G.; Manolescu, I.
abstract


2021 - Transforming ML Predictive Pipelines into SQL with MASQ [Relazione in Atti di Convegno]
Del Buono, F.; Paganelli, M.; Sottovia, P.; Interlandi, M.; Guerra, F.
abstract

Inference of Machine Learning (ML) models, i.e. the process of obtaining predictions from trained models, is often an overlooked problem. Model inference is however one of the main contributors of both technical debt in ML applications and infrastructure complexity. MASQ is a framework able to run inference of ML models directly on DBMSs. MASQ not only averts expensive data movements for those predictive scenarios where data resides on a database, but it also naturally exploits all the "Enterprise-grade"features such as governance, security and auditability which make DBMSs the cornerstone of many businesses. MASQ compiles trained models and ML pipelines implemented in scikit-learn directly into standard SQL: no UDFs nor vendor-specific syntax are used, and therefore queries can be readily executed on any DBMS. In this demo, we will showcase MASQ's capabilities through a GUI allowing attendees to: (1) train ML pipelines composed of data featurizers and ML models; (2) compile the trained pipelines into SQL, and deploy them on different DBMSs (MySQL and SQLServer in the demo); and (3) compare the related performance under different configurations (e.g., the original pipeline on the ML framework against the SQL implementations).


2021 - Using Landmarks for Explaining Entity Matching Models [Relazione in Atti di Convegno]
Baraldi, Andrea; DEL BUONO, Francesco; Paganelli, Matteo; Guerra, Francesco
abstract

The state of the art approaches for performing Entity Matching (EM) rely on machine & deep learning models for inferring pairs of matching / non-matching entities. Although the experimental evaluations demonstrate that these approaches are effective, their adoption in real scenarios is limited by the fact that they are difficult to interpret. Explainable AI systems have been recently proposed for complementing deep learning approaches. Their application to the scenario offered by EM is still new and requires to address the specificity of this task, characterized by particular dataset schemas, describing a pair of entities, and imbalanced classes. This paper introduces Landmark Explanation, a generic and extensible framework that extends the capabilities of a post-hoc perturbation-based explainer over the EM scenario. Landmark Explanation generates perturbations that take advantage of the particular schemas of the EM datasets, thus generating explanations more accurate and more interesting for the users than the ones generated by competing approaches.


2021 - Using descriptions for explaining entity matches [Relazione in Atti di Convegno]
Paganelli, M.; Sottovia, P.; Maccioni, A.; Interlandi, M.; Guerra, F.
abstract

Finding entity matches in large datasets is currently one of the most attractive research challenges. The recent interest of the research community towards Machine and Deep Learning techniques has led to the development of many and reliable approaches. Nevertheless, these are conceived as black-box tools that identify the matches between the entities provided as input. The lack of explainability of the process hampers its application to real-world scenarios where domain experts need to know and understand the reasons why entities can be considered as match, i.e., they represent the same real-world entity. In this paper, we show how data descriptions—a set of compact, readable and insightful formulas of boolean predicates—can be used to guide domain experts in understanding and evaluating the results of entity matching processes.


2020 - A comparison of approaches for measuring the semantic similarity of short texts based on word embeddings [Articolo su rivista]
Babic, K.; Guerra, F.; Martincic-Ipsic, S.; Mestrovic, A.
abstract

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.


2020 - Explaining data with descriptions [Articolo su rivista]
Paganelli, Matteo; Sottovia, Paolo; Maccioni, Antonio; Interlandi, Matteo; Guerra, Francesco
abstract


2020 - Unsupervised Evaluation of Data Integration Processes [Relazione in Atti di Convegno]
Paganelli, M.; Buono, F. D.; Guerra, F.; Ferro, N.
abstract

Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.


2019 - Big Data Integration of Heterogeneous Data Sources: The Re-Search Alps Case Study [Relazione in Atti di Convegno]
Guerra, Francesco; Sottovia, Paolo; Paganelli, Matteo; Vincini, Maurizio
abstract

The application of big data integration techniques in real scenarios needs to address practical issues related to the scalability of the process and the heterogeneity of data sources. In this paper, we describe the pipeline that has been developed in the context of the Re-search Alps project, a project funded by the EU Commission through the INEA Agency in the CEF Telecom framework, that aims at creating an open dataset describing research centers located in the Alpine area.


2019 - Finding Synonymous Attributes in Evolving Wikipedia Infoboxes [Relazione in Atti di Convegno]
Sottovia, Paolo; Paganelli, Matteo; Guerra, Francesco; Velegrakis, Yannis
abstract

Wikipedia Infoboxes are semi-structured data structures organized in an attribute-value fashion. Policies establish for each type of entity represented in Wikipedia the attribute names that the Infobox should contain in the form of a template. However, these requirements change over time and often users choose not to strictly obey them. As a result, it is hard to treat in an integrated way the history of the Wikipedia pages, making it difficult to analyze the temporal evolution of Wikipedia entities through their Infobox and impossible to perform direct comparison of entities of the same type. To address this challenge, we propose an approach to deal with the misalignment of the attribute names and identify clusters of synonymous Infobox attributes. Elements in the same cluster are considered as a temporal evolution of the same attribute. To identify the clusters we use two different distance metrics. The first is the co-occurrence degree that is treated as a negative distance, and the second is the co-occurrence of similar values in the attributes that are treated as a positive evidence of synonymy. We formalize the problem as a correlation clustering problem over a weighted graph constructed with attributes as nodes and positive and negative evidence as edges. We solve it with a linear programming model that shows a good approximation. Our experiments over a collection of Infoboxes of the last 13 years shows the potential of our approach.


2019 - Parallelizing computations of full disjunctions [Articolo su rivista]
Paganelli, Matteo; Beneventano, Domenico; Guerra, Francesco; Sottovia, Paolo
abstract

In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.


2019 - Short Texts Semantic Similarity Based on Word Embeddings [Relazione in Atti di Convegno]
Babić, Karlo; Martinčić-Ipšić, Sanda; Meštrović, Ana; Guerra, Francesco
abstract

Evaluating semantic similarity of texts is a task that assumes paramount importance in real-world applications. In this paper, we describe some experiments we carried out to evaluate the performance of different forms of word embeddings and their aggregations in the task of measuring the similarity of short texts. In particular, we explore the results obtained with two publicly available pre-trained word embeddings (one based on word2vec trained on a specific dataset and the second extending it with embeddings of word senses). We test five approaches for aggregating words into text. Two approaches are based on centroids and summarize a text as a word embedding. The other approaches are some variations of the Okapi BM25 function and provide directly a measure of the similarity of two texts.


2019 - TuneR: Fine Tuning of Rule-based Entity Matchers [Relazione in Atti di Convegno]
Paganelli, Matteo; Sottovia, Paolo; Guerra, Francesco; Velegrakis, Yannis
abstract

A rule-based entity matching task requires the definition of an effective set of rules, which is a time-consuming and error-prone process. The typical approach adopted for its resolution is a trial and error method, where the rules are incrementally added and modified until satisfactory results are obtained. This approach requires significant human intervention, since a typical dataset needs the definition of a large number of rules and possible interconnections that cannot be manually managed. In this paper, we propose TuneR, a software library supporting developers (i.e., coders, scientists, and domain experts) in tuning sets of matching rules. It aims to reduce human intervention by offering a tool for the optimization of rule sets based on user-defined criteria (such as effectiveness, interpretability, etc.). Our goal is to integrate the framework in the Magellan ecosystem, thus completing the functionalities required by the developers for performing Entity Matching tasks.


2019 - Understanding Data in the Blink of an Eye [Relazione in Atti di Convegno]
Paganelli, Matteo; Sottovia, Paolo; Maccioni, Antonio; Interlandi, Matteo; Guerra, Francesco
abstract

Many data analysis and knowledge mining tasks require a basic understanding of the content of a dataset prior to any data access. In this demo, we showcase how data descriptions---a set of compact, readable and insightful formulas of boolean predicates---can be used to guide users in understanding datasets. Finding the best description for a dataset is, unfortunately, both computationally hard and task-specific. This demo shows that not only we can generate descriptions at interactive speed, but also that diverse user needs---from anomaly detection to data exploration---can be accommodated through a user-driven process exploiting dynamic programming in concert with a set of heuristics.


2018 - The KEYSTONE IC1302 COST Action [Relazione in Atti di Convegno]
Guerra, Francesco; Velegrakis, Yannis; Cardoso, Jorge; Breslin, John G.
abstract

As more and more data becomes available on the Web, as its complexity increases and as the Web’s user base shifts towards a more general non-technical population, keyword searching is becoming a valuable alternative to traditional SQL queries, mainly due to its simplicity and the lower effort/expertise it requires. Existing approaches suffer from a number of limitations when applied to multi-source scenarios requiring some form of query planning, without direct access to database instances, and with frequent updates precluding any effective implementation of data indexes. Typical scenarios include Deep Web databases, virtual data integration systems and data on the Web. Therefore, building effective keyword searching techniques can have an extensive impact since it allows non-professional users to access large amounts of information stored in structured repositories through simple keyword-based query interfaces. This revolutionises the paradigm of searching for data since users are offered access to structured data in a similar manner to the one they already use for documents. To build a successful, unified and effective solution, the action “semantic KEYword-based Search on sTructured data sOurcEs” (KEYSTONE) promoted synergies across several disciplines, such as semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning, user interaction, interface design, and natural language processing. This paper describes the main achievements of this COST Action.


2018 - Wikidata and DBpedia: A Comparative Study [Relazione in Atti di Convegno]
Abián, D.; Guerra, F.; Martínez-Romanos, J.; Trillo-Lado, Raquel
abstract

DBpedia and Wikidata are two online projects focused on offering structured data from Wikipedia in order to ease its exploitation on the Linked Data Web. In this paper, a comparison of these two widely-used structured data sources is presented. This comparison considers the most relevant data quality dimensions in the state of the art of the scientific research. As fundamental differences between both projects, we can highlight that Wikidata has an open centralised nature, whereas DBpedia is more popular in the Semantic Web and the Linked Open Data communities and depends on the different linguistic editions of Wikipedia.


2017 - Back to the sketch-board: Integrating keyword search, semantics, and information retrieval [Relazione in Atti di Convegno]
Azzopardi, Joel; Benedetti, Fabio; Guerra, Francesco; Lupu, Mihai
abstract

We reproduce recent research results combining semantic and information retrieval methods. Additionally, we expand the existing state of the art by combining the semantic representations with IR methods from the probabilistic relevance framework. We demonstrate a significant increase in performance, as measured by standard evaluation metrics.


2017 - Cleaning mapreduce workflows [Relazione in Atti di Convegno]
Interlandi, Matteo; Lacroix, Julien; Boucelma, Omar; Guerra, Francesco
abstract

Integrity constraints (ICs) such as Functional Dependencies (FDs) or Inclusion Dependencies (INDs) are commonly used in databases to check if input relations obey to certain pre-defined quality metrics. While Data-Intensive Scalable Computing (DISC) platforms such as MapReduce commonly accept as input (semi-structured) data not in relational format, still data is often transformed in key/value pairs when data is required to be re-partitioned; a process commonly referred to as shuffle. In this work, we present a Provenance-Aware model for assessing the quality of shuffled data: more precisely, we capture and model provenance using the PROV-DM W3C recommendation and we extend it with rules expressed à la Datalog to assess data quality dimensions by means of ICs metrics over DISC systems. In this way, data (and algorithmic) errors can be promptly and automatically detected without having to go through a lengthy process of output debugging.


2017 - Data exploration on large amount of relational data through keyword queries [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; Velegrakis, Yannis
abstract

The paper describes a new approach for querying relational databases through keyword search by exploting Information Retrieval (IR) techniques. When users do not know the structures and the content, keyword search becomes the only efficient and effective solution for allowing people exploring a relational database. The approach is based on a unified view of the database relations (performed through the full disjunction operator), where its composing tuples will be considered as documents to be indexed and searched by means of an IR search engine. Moreover, as it happens in relational databases, the system can merge the data stored in different documents for providing a complete answer to the user. In particular, two documents can be joined because either their tuples in the original database share some Primary Key or, always in the original database, some tuple is connected by a Primary / Foreign Key Relation. Our preliminary proposal, the description of the tabular data structure for storing and retrieving the possible connections among the documents and a metrics for scoring the results are introduced in the paper.


2017 - Exploiting Linguistic Analysis on URLs for Recommending Web Pages: A Comparative Study [Articolo su rivista]
Cadegnani, Sara; Guerra, Francesco; Ilarri, Sergio; del Carmen Rodríguez Hernández, María; Trillo Lado, Raquel; Velegrakis, Yannis; Amaro, Raquel
abstract

Nowadays, citizens require high level quality information from public institutions in order to guarantee their transparency. Institutional websites of governmental and public bodies must publish and keep updated a large amount of information stored in thousands of web pages in order to satisfy the demands of their users. Due to the amount of information, the “search form”, which is typically available in most such websites, is proven limited to support the users, since it requires them to explicitly express their information needs through keywords. The sites are also affected by the so-called “long tail” phenomenon, a phenomenon that is typically observed in e-commerce portals. The phenomenon is the one in which not all the pages are considered highly important and as a consequence, users searching for information located in pages that are not condiered important are having a hard time locating these pages. The development of a recommender system than can guess the next best page that a user wouild like to see in the web site has gained a lot of attention. Complex models and approaches have been proposed for recommending web pages to individual users. These approached typically require personal preferences and other kinds of user information in order to make successful predictions. In this paper, we analyze and compare three different approaches to leverage information embedded in the structure of web sites and the logs of their web servers to improve the effectiveness of web page recommendation. Our proposals exploit the context of the users’ navigations, i.e., their current sessions when surfing a specific web site. These approaches do not require either information about the personal preferences of the users to be stored and processed, or complex structures to be created and maintained. They can be easily incorporated to current large websites to facilitate the users’ navigation experience. Last but not least, the paper reports some comparative experiments using a real-world website to analyze the performance of the proposed approaches.


2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.


2017 - Modeling and estimating the economic and social impact of the results of the project Re-search Alps [Working paper]
Russo, M.; Guerra, F.; Pagliacci, F.; Paganelli, M.; Petit, L.; Olland, F.; Weisenburger, E.; Zilio, E.
abstract

The idea behind the Re-search Alps project has been conceived inside within the EUSALP Action Group 1 - “to develop an effective research and innovation ecosystem” (AG1). EUSALP is the EU-Strategy for the Alpine Region, which is composed of seven countries: Austria, France, Germany, Italy Liechtenstein, Slovenia and Switzerland. The strategy aims at ensuring mutually beneficial interactions between the mountain regions at its core and the surrounding lowlands and urban areas. The goal of the Re-search Alps project is the publication on the web of an open dataset describing the private and public laboratories, research and innovation centers (hereinafter, referred as “labs”, in short) existing in the seven aforementioned countries, with particular reference to the 48 Regions constituting the Alpine Area.


2017 - The RE-SEARCH ALPS (research laboratories in the alpine area) project [Relazione in Atti di Convegno]
Guerra, Francesco; Russo, Margherita; Fontana, Marco; Paganelli, Matteo; Bancilhon, Francois; Frisch, Christian; Petit, Loic; Giorgi, Anna; Zilio, Emanuela
abstract

The paper describes the RE-SEARCH ALPS project, which aims to gather, consolidate, harmonize and make available to different targets (public and private bodies working at local, regional and national level) data about laboratories, research and innovation centers which are active in particular in the regions of seven countries which constitute the Alpine Area (France, Italy, Switzerland, Austria, Germany, Liechtenstein and Slovenia). The project is complemented with a search engine which allows the users to directly query the dataset and to obtain geo referenced data as result. The data will be properly visualized thanks a visualizer developed in the project. From a research perspective, the project has to address hot and challenging Big Data issues, such as big data integration (to join data sources), entity recognition and linkage in large amount of data (to discover the same Institution represented in different sources), data cleaning and reconciliation (to address issues related to different representation of the same real object). The project has been applied in a call for the cration of Open Datasets promoted by the European Innovation and Networks Executive Agency through the Connecting Europe Facility (CEF) funding instrument. The project has been recently approved (AGREEMENT No INEA/CEF/ICT/A2016/1296967): it lasts two years and will start on July 2017.


2016 - Combining User and Database Perspective for Solving Keyword Queries over Relational Databases [Articolo su rivista]
Bergamaschi, Sonia; Interlandi, Matteo; Guerra, Francesco; TRILLO LADO, Raquel; Velegrakis, Yannis
abstract

Over the last decade, keyword search over relational data has attracted considerable attention. A possible approach to face this issue is to transform keyword queries into one or more SQL queries to be executed by the relational DBMS. Finding these queries is a challenging task since the information they represent may be modeled across different tables and attributes. This means that it is needed to identify not only the schema elements where the data of interest is stored, but also to find out how these elements are interconnected. All the approaches that have been proposed so far provide a monolithic solution. In this work, we, instead, divide the problem into three steps: the first one, driven by the user׳s point of view, takes into account what the user has in mind when formulating keyword queries, the second one, driven by the database perspective, considers how the data is represented in the database schema. Finally, the third step combines these two processes. We present the theory behind our approach, and its implementation into a system called QUEST (QUEry generator for STructured sources), which has been deeply tested to show the efficiency and effectiveness of our approach. Furthermore, we report on the outcomes of a number of experimental results that we have conducted.


2016 - Entity-Based Keyword Search in Web Documents [Articolo su rivista]
Sartori, Enrico; Velegrakis, Yannis; Guerra, Francesco
abstract

The set of algorithms that compose a search engine rely on the infor- mation contained in the document representation to perform their task. Most of the traditional approaches represent a document as a flat list of words, but this model encounters difficulties in linking information regarding the same object but referring to it using different words. Moreover this approach to document modeling can’t give information about the relationship insisting among objects appearing into a document. What we propose in this work, is a novel approach to document representation and query answering, which addresses the aforemen- tioned problems through: i) Entity-based representation of objects referenced in documents, ii) Representation of the document aware of the relationship insisting among the objects appearing in the text. We provide a test implementation of the approach presented in the paper and present the result of the tests performed to measure its performance.


2016 - Keyword-Based Search Over Databases: A Roadmap for a Reference Architecture Paired with an Evaluation Framework [Articolo su rivista]
Bergamaschi, Sonia; Ferro, Nicola; Guerra, Francesco; Silvello, Gianmaria
abstract

Structured data sources promise to be the next driver of a significant socio-economic impact for both people and companies. Nevertheless, accessing them through formal languages, such as SQL or SPARQL, can become cumbersome and frustrating for end-users. To overcome this issue, keyword search in databases is becoming the technology of choice, even if it suffers from efficiency and effectiveness problems that prevent it from being adopted at Web scale. In this paper, we motivate the need for a reference architecture for keyword search in databases to favor the development of scalable and effective components, also borrowing methods from neighbor fields, such as information retrieval and natural language processing. Moreover, we point out the need for a companion evaluation framework, able to assess the efficiency and the effectiveness of such new systems and in the light of real and compelling use cases.


2016 - Providing Insight into Data Source Topics [Articolo su rivista]
Bergamaschi, Sonia; Ferrari, Davide; Guerra, Francesco; Simonini, Giovanni; Velegrakis, Yannis
abstract

A fundamental service for the exploitation of the modern large data sources that are available online is the ability to identify the topics of the data that they contain. Unfortunately, the heterogeneity and lack of centralized control makes it difficult to identify the topics directly from the actual values used in the sources. We present an approach that generates signatures of sources that are matched against a reference vocabulary of concepts through the respective signature to generate a description of the topics of the source in terms of this reference vocabulary. The reference vocabulary may be provided ready, may be created manually, or may be created by applying our signature-generated algorithm over a well-curated data source with a clear identification of topics. In our particular case, we have used DBpedia for the creation of the vocabulary, since it is one of the largest known collections of entities and concepts. The signatures are generated by exploiting the entropy and the mutual information of the attributes of the sources to generate semantic identifiers of the various attributes, which combined together form a unique signature of the concepts (i.e. the topics) of the source. The generation of the identifiers is based on the entropy of the values of the attributes; thus, they are independent of naming heterogeneity of attributes or tables. Although the use of traditional information-theoretical quantities such as entropy and mutual information is not new, they may become untrustworthy due to their sensitivity to overfitting, and require an equal number of samples used to construct the reference vocabulary. To overcome these limitations, we normalize and use pseudo-additive entropy measures, which automatically downweight the role of vocabulary items and property values with very low frequencies, resulting in a more stable solution than the traditional counterparts. We have materialized our theory in a system called WHATSIT and we experimentally demonstrate its effectiveness.


2016 - Towards Keyword-based Pull Recommendation Systems [Relazione in Atti di Convegno]
Guerra, Francesco; Trillo Lado, Raquel; Ilarri, Sergio; Rodríguez Hernández, María del Carmen
abstract

Due to the high availability of data, users are frequently overloaded with a huge amount of alternatives when they need to choose a particular item. This has motivated an increased interest in research on recommendation systems, which filter the options and provide users with suggestions about specific elements (e.g., movies, restaurants, hotels, books, etc.) that are estimated to be potentially relevant for the user. In this paper, we describe and evaluate two possible solutions to the problem of identification of the type of item (e.g., music, movie, book, etc.) that the user specifies in a pull-based recommendation (i.e., recommendation about certain types of items that are explicitly requested by the user). We evaluate two alternative solutions: one based on the use of the Hidden Markov Model and another one exploiting Information Retrieval techniques. Comparing both proposals experimentally, we can observe that the Hidden Markov Model performs generally better than the Informatio n Retrieval technique in our preliminary experimental setup.


2015 - A First Step Towards Keyword-Based Searching for Recommendation Systems [Relazione in Atti di Convegno]
Rodrguez Hernandez, Mara del Carmen; Guerra, Francesco; Ilarri, Sergio; Trillo Lado, Raquel
abstract

Due to the high availability of data, users are frequently overloaded with a huge amount of alternatives when they need to choose a particular item. This has motivated an increased interest in research on recommendation systems, which lter the options and provide users with suggestions about specic elements (e.g., movies, restaurants, hotels, news, etc.) that are estimated to be potentially relevant for the user. Recommendation systems are still an active area of research, and particularly in the last years the concept of context-aware recommendation systems has started to be popular, due to the interest of considering the context of the user in the recommendation process. In this paper, we describe our work-in-progress concerning pull-based recommendations (i.e., recommendations about certain types of items that are explicitly requested by the user). In particular, we focus on the problem of detecting the type of item the user is interested in. Due to its popularity, we consider a keyword-based user interface: the user types a few keywords and the system must determine what the user is searching for. Whereas there is extensive work in the field of keyword-based search, which is still a very active research area, keyword searching has not been applied so far in most recommendation contexts.


2015 - Improving css-KNN classification performance by shifts in training data [Relazione in Atti di Convegno]
Draszawka, Karol; Szymański, Julian; Guerra, Francesco
abstract

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose a training data preprocessing phase that tries to alleviate the lack of learning. The idea is to compute training data modifications, such that class representative instances are optimized before the actual k-NN algorithm is employed. The empirical text classification experiments using mid-size Wikipedia data sets show that carefully crossvalidated settings of such preprocessing yields significant improvements in k-NN performance compared to classification without this step. The proposed approach can be useful for improving the effectivenes of other classifiers as well as it can find applications in domain of recommendation systems and keyword-based search


2015 - Keyword Search in structured data and Network Analysis: a preliminary experiment over DBLP [Relazione in Atti di Convegno]
Bernabei, Chiara; Guerra, Francesco; Trillo Lado, Raquel
abstract

Identifying similar items to the ones provided as input to a search system, is a challenging task. The main issues concern not only the management of large collections of data, but also the profiling of the users, who usually have different opinions, tastes and expertise. In this paper we propose a preliminary investigation about the improvements in the accuracy of a search system provided by network analysis techniques supporting the discovery of relations among the items stored in the repository. For this reason, we have developed the SEEN prototype, a keyword search tool exploiting network analysis. SEEN has been evaluated against a relational version of the DBLP repository. The results of the preliminary experiments show that the the information provided by networks can improve the effectiveness of the results.


2015 - Perspective Look at Keyword-based Search Over Relation Data and its Evaluation (Extended Abstract) [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Ferro, Nicola; Guerra, Francesco; Silvello, Gianmaria
abstract

This position paper discusses the need for considering keyword search over relational databases in the light of broader systems, where keyword search is just one of the components and which are aimed at better supporting users in their search tasks. These more complex systems call for appropriate evaluation methodologies which go beyond what is typically done today, i.e. measuring performances of components mostly in isolation or not related to the actual user needs, and, instead, able to consider the system as a whole, its constituent components, and their inter-relations with the ultimate goal of supporting actual user search tasks.


2015 - Preface [Relazione in Atti di Convegno]
Cardoso, J.; Guerra, F.; Houben, G. -J.; Pinto, A. M.; Velegrakis, Y.
abstract


2015 - Recommending Web Pages Using Item-Based Collaborative Filtering Approaches [Relazione in Atti di Convegno]
Cadegnani, Sara; Guerra, Francesco; Ilarri, Sergio; R. o. d. r. i. g. u. e. z. Hernandez, Marıa del Carmen; Trillo Lado, Raquel; Velegrakis, Yannis
abstract

Predicting the next page a user wants to see in a large website has gained importance along the last decade due to the fact that the Web has become the main communication media between a wide set of entities and users. This is true in particular for institutional government and public organization websites, where for transparency reasons a lot of information has to be provided. The “long tail” phenomenon affects also this kind of websites and users need support for improving the effectiveness of their navigation. For this reason, complex models and approaches for recommending web pages that usually require to process personal user preferences have been proposed. In this paper, we propose three different approaches to leverage information embedded in the structure of web sites and their logs to improve the effectiveness of web page recommendation by considering the context of the users, i.e., their current sessions when surfing a specific web site. This proposal does not require either information about the personal preferences of the users to be stored and processed or complex structures to be created and maintained. So, it can be easily incorporated to current large websites to facilitate the users’ navigation experience. Experiments using a real-world website are described and analyzed to show the performance of the three approaches.


2015 - Semantic keyword-based search on structured data sources: First COST action IC1302 international KEYSTONE conference, IKC 2015 coimbra, Portugal, September 8-9, 2015 revised selected papers [Curatela]
Cardoso, Jorge; Guerra, Francesco; Houben, Geert Jan; Pinto, Alexandre Miguel; Velegrakis, Yannis
abstract

Proceedings of the First KEYSTONE Conference


2015 - Support of part-whole relations in query answering [Relazione in Atti di Convegno]
Kozikowski, Piotr; Ioannou, Ekaterini; Velegrakis, Yannis; Guerra, Francesco
abstract

Part-whole relations are ubiquitous in our world, yet they do not get “first-class” treatment in the data managements systems most commonly used today. One aspect of part-whole relations that is particularly important is that of attribute transitivity. Some attributes of a whole are also attributes of its parts, and vice versa. We propose an extension to a generic entity-centric data model to support part-whole relations and attribute transitivity and provide more meaningful results to certain types of queries as a result. We describe how this model can be implemented using an RDF repository and three approaches to infer the implicit information necessary for query answering that adheres to the semantics of the model. The first approach is a naive implementation and the other two use indexing to improve performance. We evaluate several aspects of our implementations in a series of experimental results that show that the two approaches that use indexing are far superior to the naive approach and exhibit some advantages and disadvantages when compared to each other.


2015 - Supporting Image Search with Tag Clouds: A Preliminary Approach [Articolo su rivista]
Guerra, Francesco; Simonini, Giovanni; Vincini, Maurizio
abstract

Algorithms and techniques for searching in collections of data address a challenging task, since they have to bridge the gap between the ways in which users express their interests, through natural language expressions or keywords, and the ways in which data is represented and indexed.When the collections of data include images, the task becomes harder, mainly for two reasons. From one side the user expresses his needs through one medium (text) and he will obtain results via another medium (some images). From the other side, it can be difficult for a user to understand the results retrieved; that is why a particular image is part of the result set. In this case, some techniques for analyzing the query results and giving to the users some insight into the content retrieved are needed. In this paper, we propose to address this problem by coupling the image result set with a tag cloud of words describing it. Some techniques for building the tag cloud are introduced and two application scenarios are discussed.


2014 - Discovering the topics of a data source: A statistical approach? [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Ferrari, Davide; Guerra, Francesco; Simonini, Giovanni
abstract

In this paper, we present a preliminary approach for automatically discovering the topics of a structured data source with respect to a reference ontology. Our technique relies on a signature, i.e., a weighted graph that summarizes the content of a source. Graph-based approaches have been already used in the literature for similar purposes. In these proposals, the weights are typically assigned using traditional information-theoretical quantities such as entropy and mutual information. Here, we propose a novel data-driven technique based on composite likelihood to estimate the weights and other main features of the graphs, making the resulting approach less sensitive to overfitting. By means of a comparison of signatures, we can easily discover the topic of a target data source with respect to a reference ontology. This task is provided by a matching algorithm that retrieves the elements common to both the graphs. To illustrate our approach, we discuss a preliminary evaluation in the form of running example.


2014 - Keyword Search over Relational Databases: Issues, Approaches and Open Challenges [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Simonini, Giovanni
abstract

In this paper, we overview the main research approaches developed in the area of Keyword Search over Relational Databases. In particular, we model the process for solving keyword queries in three phases: the management of the user’s input, the search algorithms, the results returned to the user. For each phase we analyze the main problems, the solutions adopted by the most important system developed by researchers and the open challenges. Finally, we introduce two open issues related to multi-source scenarios and database sources handling instance not fully accessible.


2014 - Using Big Data to Support Automatic Word Sense Disambiguation [Relazione in Atti di Convegno]
Guerra, Francesco; Simonini, Giovanni
abstract

Word Sense Induction (WSI) usually relies on data structures built upon the words to be disambiguated. This is a time-consuming process that requires a huge computational effort. In this paper, we propose an approach to automatically build a generic sense inventory (called iSC) to be used as a reference for disambiguation. The sense inventory is built extracting insight from Big Data exploiting a community detection algorithm. Since generate taking into account large corpora of data, the iSCis independent of the domain of application and of predefined target words.


2013 - Keyword Search and Evaluation over Relational Databases: an Outlook to the Future [Relazione in Atti di Convegno]
Bergamaschi, Sonia; N., Ferro; Guerra, Francesco; G., Silvello
abstract

This position paper discusses the need for considering keyword search over relational databases in the light of broader systems, where keyword search is just one of the components and which are aimed at better supporting users in their search tasks. These more complex systems call for appropriate evaluation methodologies which go beyond what is typically done today, i.e. measuring performances of components mostly in isolation or not related to the actual user needs, and, instead, able to consider the system as a whole, its constituent components, and their inter-relations with the ultimate goal of supporting actual user search tasks.


2013 - QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques [Articolo su rivista]
Bergamaschi, Sonia; Guerra, Francesco; Interlandi, Matteo; Trillo Lado, R.; Velegrakis, Y.
abstract

We showcase QUEST (QUEry generator for STructured sources), a search engine for relational databases that combines semantic and machine learning techniques for transforming keyword queries into meaningful SQL queries. The search engine relies on two approaches: the forward, providing mappings of keywords into database terms (names of tables and attributes, and domains of attributes), and the backward, computing the paths joining the data structures identified in the forward step. The results provided by the two approaches are combined within a probabilistic framework based on the Dempster-Shafer Theory. We demonstrate QUEST capabilities, and we show how, thanks to the flexibility obtained by the probabilistic combination of different techniques, QUEST is able to compute high quality results even with few training data and/or with hidden data sources such as those found in the Deep Web.


2013 - The Prosumer Paradigm for Life Cycle Assessment ServicesFrameworks of IT Prosumption for Business Development [Capitolo/Saggio]
Guerra, Francesco; Vincini, Maurizio
abstract

Enterprises, governments, and government agencies have started to publish their data on the Internet, especially in the form of open structured data sources. The real exploitation of these free, large open data sources is more and more becoming a crucial activity for obtaining information and knowledge (i.e. competitive elements) in several business sectors. In addition, with the proliferation of Web 2.0 techniques and applications such as blogs, wikis, tagging systems, and mashups, the notion of user-centricity has gained a significant momentum to put ordinary users in the leading role of delivering exciting and personalized content and services. The term "prosumer," coined by the futurist Alvin Toffler in 1980, has been often referenced in business-related contexts to identify this situation. The chapter describes the application of the "prosumer paradigm" to a real data integration system of Life Cycle Assessment (LCA). ENEA, the Italian National Agency for new Technologies, Energy, and Sustainable Economic Development, promoted the adoption of such practice in small companies belonging to the industrial and agricultural sector supplying them with a simplified LCA system. In this chapter, the authors show how a domain expert user (the prosumer) can use the framework to easily map the classification of data flows and processes provided by the simplified LCA system into the ELCD database, containing a standard classification provided by the EU. This makes the proposal completely shareable with the whole thematic classification and vision promoted by the European Commission.


2013 - Using a HMM based approach for mapping keyword queries into database terms [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; M., Interlandi; S., Rota; R., Trillo; Y., Velegrakis
abstract

Systems translating keyword queries into SQL queries over relational databases are usually referred to in the literature as schema-based approaches. These techniques exploit the information contained in the database schema to build SQL queries that express the intended meaning of the user query. Besides, typically, they perform a preliminary step that associates keywords in the user query with database elements (names of tables, attributes and domain attributes). In this paper, we present a probabilistic approach based on a Hidden Markov Model to provide such mappings. In contrast to most existing techniques, our proposal does not require any a-priori knowledge of the database extension.


2012 - Agents and Peer-to-Peer Computing7th International Workshop, AP2PC 2008 Estoril, Portugal, May 2008 and 8th International Workshop, AP2PC 2009 Budapest, Hungary, May 2009, Revised Selected Papers [Curatela]
Beneventano, Domenico; Zoran, Despotovic; Guerra, Francesco; Sam, Joseph; Gianluca, Moro; Adrián Perreau de, Pinninck
abstract

7th InternationalWorkshop, AP2PC 2008, Estoril, Portugal, May 13, 2008 and 8th InternationalWorkshop, AP2PC 2009 Budapest, Hungary, May 11, 2009 Revised Selected Papers


2012 - Introduction [Prefazione o Postfazione]
De Virgilio, R.; Guerra, F.; Velegrakis, Y.
abstract


2012 - Introduction to the Special Issue on Semantic Web Data Management [Articolo su rivista]
Roberto De, Virgilio; Fausto, Giunchiglia; GUERRA, Francesco; Letizia, Tanca; Yannis, Velegrakis
abstract

During the last decade we have witnessed a tremendousincrease in the amount of data that is available onthe Web in almost every field of human activity. Financialinformation, weather reports, news feeds, product information,and geographical maps are only a few examplesof such data, all intended to be consumed by the millionsof users surfing the Web. The advent of Web 2.0 applications,such as Wikis, social networking sites and mashupshave brought new forms of data and have radicallychanged the nature of modern Web. They have transformedthe Web from a publishing-only environment intoa vibrant place for information exchange. Web users areno longer plain data consumers but have become activedata producers and data dissemination agents, contributingfurther to the increase of the information plethora onthe Web.


2012 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface [Relazione in Atti di Convegno]
Beneventano, D.; Despotovic, Z.; Guerra, F.; Joseph, S.; Moro, G.; De Pinninck, A. P.
abstract


2012 - Mapping and Integration of Dimensional Attributes Using Clustering Techniques. [Relazione in Atti di Convegno]
Guerra, Francesco; Marius Octavian, Olaru; Vincini, Maurizio
abstract

Following recent trends in Data Warehousing, companies realized that there is a great potential in combining their information repositories to obtain a broader view of the economical market. Unfortunately, even though Data Warehouse (DW) integration has been defined from a theoretical point of view, until now no complete, widely used methodology has been proposed to support the integration of the information coming from heterogeneous DWs. This paper deals with the automatic integration of dimensional attributes from heterogeneous DWs. A method relying on topological properties that similar dimensions maintain is proposed for discovering mappings of dimensions, and a technique based on clustering algorithms is introduced for integrating the data associated to the dimensions.


2012 - Semantic Search over the Web [Curatela]
Roberto De, Virgilio; Guerra, Francesco; Yannis, Velegrakis
abstract

The Web has become the world’s largest database, with search being the main tool that allows organizations and individuals to exploit its huge amount of information. Search on the Web has been traditionally based on textual and structural similarities, ignoring to a large degree the semantic dimension, i.e., understanding the meaning of the query and of the document content. Combining search and semantics gives birth to the idea of semantic search. Traditional search engines have already advertised some semantic dimensions. Some of them, for instance, can enhance their generated result sets with documents that are semantically related to the query terms even though they may not include these terms. Nevertheless, the exploitation of the semantic search has not yet reached its full potential. In this book, Roberto De Virgilio, Francesco Guerra and Yannis Velegrakis present an extensive overview of the work done in Semantic Search and other related areas. They explore different technologies and solutions in depth, making their collection a valuable and stimulating reading for both academic and industrial researchers.The book is divided into three parts. The first introduces the readers to the basic notions of the Web of Data. It describes the different kinds of data that exist, their topology, and their storing and indexing techniques. The second part is dedicated to Web Search. It presents different types of search, like the exploratory or the path-oriented, alongside methods for their efficient and effective implementation. Other related topics included in this part are the use of uncertainty in query answering, the exploitation of ontologies, and the use of semantics in mashup design and operation. The focus of the third part is on linked data, and more specifically, on applying ideas originating in recommender systems on linked data management, and on techniques for the efficiently querying answering on linked data.


2012 - Understanding the Semantics of Keyword Queries on Relational Data Without Accessing the Instance [Capitolo/Saggio]
Bergamaschi, Sonia; Domnori, Elton; Guerra, Francesco; Rota, Silvia; Raquel Trillo, Lado; Yannis, Velegrakis
abstract

This chapter deals with the problem of answering a keyword query over a relational database. To do so, one needs to understand the meaning of the keywords in the query, “guess” its possible semantics, and materialize them as SQL queries that can be executed directly on the relational database. The focus of the chapter is on techniques that do not require any prior access to the instance data, making them suitable for sources behind wrappers or Web interfaces or, in general, for sources that disallow prior access to their data in order to construct an index. The chapter describes two techniques that use semantic information and metadata from the sources, alongside the query itself, in order to achieve that. Apart from understanding the semantics of the keywords themselves, the techniques are also exploiting the order and the proximity of the keywords in the query to make a more educated guess. The first approach is based on an extension of the Hungarian algorithm for identifying the data structures having the maximum likelihood to contain the user keywords. In the second approach, the problem of associating keywords into data structures of the relational source is modeled by means of a hidden Markov model, and the Viterbi algorithm is exploited for computing the mappings. Both techniques have been implemented in two systems called KEYMANTIC and KEYRY, respectively.


2012 - Working in a dynamic environment: the NeP4B approach as a MAS [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Mandreoli, Federica; Vincini, Maurizio
abstract

Integration of heterogeneous information in the context of Internet is becoming a key activity to enable a more organized and semantically meaningful access to several kinds of information in the form of data sources, multimediadocuments and web services. In NeP4B (Networked Peers for Business), a project funded by the Italian Ministry of University and Research, we developed an approach for providing a uniform representation of data, multimedia and services,thus allowing users to obtain sets of data, multimedia documents and lists of webservices as query results. NeP4B is based on a P2P network of semantic peers, connected one with each other by means of automatically generated mappings.In this paper we present a new architecture for NeP4B, based on a Multi-Agent System.We claim that such a solution may be more efficient and effective, thanks to the agents’ autonomy and intelligence, in a dynamic environment, where sources are frequently added (or deleted) to (from) the network.


2011 - 2nd International Workshop on Data Engineering meets the Semantic [Esposizione]
Guerra, Francesco; Bergamaschi, Sonia
abstract

The goal of DESWeb is to bring together researchers and practitioners from both fields of Data Management and Semantic Web. It aims at investigating the new challenges that Semantic Web technologies have introduced and new ways through which these technologies can improve existing data management solutions. Furthermore, it intends to study what data management systems and technologies can offer in order to improve the scalability and performance of Semantic Web applications.


2011 - A Hidden Markov Model Approach to Keyword-Based Search over Relational Databases [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Rota, Silvia; Yannis, Velegrakis
abstract

We present a novel method for translating keyword queries over relationaldatabases into SQL queries with the same intended semantic meaning. Incontrast to the majority of the existing keyword-based techniques, our approachdoes not require any a-priori knowledge of the data instance. It follows a probabilisticapproach based on a Hidden Markov Model for computing the top-K bestmappings of the query keywords into the database terms, i.e., tables, attributesand values. The mappings are then used to generate the SQL queries that areexecuted to produce the answer to the keyword query. The method has been implementedinto a system called KEYRY (from KEYword to queRY).


2011 - A Semantic Approach to ETL Technologies [Articolo su rivista]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; Claudio, Sartori; Vincini, Maurizio
abstract

Data warehouse architectures rely on extraction, transformation and loading (ETL) processes for the creation of anupdated, consistent and materialized view of a set of data sources. In this paper, we aim to support these processes byproposing a tool for the semi-automatic definition of inter-attribute semantic mappings and transformation functions.The tool is based on semantic analysis of the schemas for the mapping definitions amongst the data sources and thedata warehouse, and on a set of clustering techniques for defining transformation functions homogenizing data comingfrom multiple sources. Our proposal couples and extends the functionalities of two previously developed systems: theMOMIS integration system and the RELEVANT data analysis system.


2011 - Aggregated search of data and services [Articolo su rivista]
Matteo, Palmonari; Sala, Antonio; Andrea, Maurino; Guerra, Francesco; Gabriella, Pasi; Giuseppe, Frisoni
abstract

From a user perspective, data and services provide a complementary view of an information source: data provide detailed information about specific needs, while services execute processes involving data and returning an informative result as well. For this reason, users need to perform aggregated searches to identify not only relevant data, but also services able to operate on them. At the current state of the art such aggregated search can be only manually performed by expert users, who first identify relevant data, and then identify existing relevant services.In this paper we propose a semantic approach to perform aggregated search of data and services. In particular, we define a technique that, on the basis of an ontological representation of both data and services related to a domain, supports the translation of a data query into a service discovery process.In order to evaluate our approach, we developed a prototype that combines a data integration system with a novel information retrieval-based Web Service discovery engine (XIRE). The results produced by a wide set of experiments show the effectiveness of our approach with respect to IR approaches, especially when Web Service descriptions are expressed by means of a heterogeneous terminology.


2011 - DESWEB: Data engineering meets the semantic web - A message from the chairs [Relazione in Atti di Convegno]
Guerra, F.; Velegrakis, Y.
abstract


2011 - Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Guerra, Francesco; Orsini, Mirko
abstract

Given the many data integration approaches, a complete and exhaustivecomparison of all the research activities is not possible. In this chapter, we willpresent an overview of the most relevant research activities andideas in the field investigated in the last 20 years. We will also introduce the MOMISsystem, a framework to perform information extraction and integration from bothstructured and semistructured data sources, that is one of the most interesting resultsof our research activity. An open source version of the MOMIS system was deliveredby the academic startup DataRiver (www.datariver.it).


2011 - KEYRY: A Keyword-Based Search Engine over Relational Databases Based on a Hidden Markov Model [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Rota, Silvia; Yannis, Velegrakis
abstract

We propose the demonstration of KEYRY, a tool for translating keywordqueries over structured data sources into queries in the native language ofthe data source. KEYRY does not assume any prior knowledge of the source contents.This allows it to be used in situations where traditional keyword searchtechniques over structured data that require such a knowledge cannot be applied,i.e., sources on the hidden web or those behind wrappers in integration systems.In KEYRY the search process is modeled as a Hidden Markov Model and the ListViterbi algorithm is applied to computing the top-k queries that better representthe intended meaning of a user keyword query. We demonstrate the tool’s capabilities,and we show how the tool is able to improve its behavior over time byexploiting implicit user feedback provided through the selection among the top-ksolutions generated.


2011 - Keyword search over relational databases: a metadata approach [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Domnori, Elton; Guerra, Francesco; Raquel Trillo, Lado; Yannis, Velegrakis
abstract

Keyword queries offer a convenient alternative to traditionalSQL in querying relational databases with large, often unknown,schemas and instances. The challenge in answering such queriesis to discover their intended semantics, construct the SQL queriesthat describe them and used them to retrieve the respective tuples.Existing approaches typically rely on indices built a-priori on thedatabase content. This seriously limits their applicability if a-prioriaccess to the database content is not possible. Examples include theon-line databases accessed through web interface, or the sources ininformation integration systems that operate behind wrappers withspecific query capabilities. Furthermore, existing literature has notstudied to its full extend the inter-dependencies across the ways thedifferent keywords are mapped into the database values and schemaelements. In this work, we describe a novel technique for translatingkeyword queries into SQL based on the Munkres (a.k.a. Hungarian)algorithm. Our approach not only tackles the above twolimitations, but it offers significant improvements in the identificationof the semantically meaningful SQL queries that describe theintended keyword query semantics. We provide details of the techniqueimplementation and an extensive experimental evaluation.


2011 - Keyword-based Search in Data Integration Systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Domnori, Elton; Guerra, Francesco; Raquel Trillo, Lado; Yannis, Velegrakis
abstract

In this paper we describe Keymantic, a framework for translating keywordqueries into SQL queries by assuming that the only available information isthe source metadata, i.e., schema and some external auxiliary information. Sucha framework finds application when only intensional knowledge about the datasource is available like in Data Integration Systems.


2011 - The List Viterbi Training Algorithm and Its Application to Keyword Search over Databases [Relazione in Atti di Convegno]
Rota, Silvia; Bergamaschi, Sonia; Guerra, Francesco
abstract

Hidden Markov Models (HMMs) are today employed in a varietyof applications, ranging from speech recognition to bioinformatics.In this paper, we present the List Viterbi training algorithm, aversion of the Expectation-Maximization (EM) algorithm based onthe List Viterbi algorithm instead of the commonly used forwardbackwardalgorithm. We developed the batch and online versionsof the algorithm, and we also describe an interesting application inthe context of keyword search over databases, where we exploit aHMM for matching keywords into database terms. In our experimentswe tested the online version of the training algorithm in asemi-supervised setting that allows us to take into account the feedbacksprovided by the users.


2011 - Understanding linked open data through keyword searching: the KEYRY approach [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Rota, Silvia; Yannis, Velegrakis
abstract

We introduce KEYRY, a tool for translating keyword queries overstructured data sources into queries formulated in their native querylanguage. Since it is not based on analysis of the data sourcecontents, KEYRY finds application in scenarios where sourceshold complex and huge schemas, apt to frequent changes, such assources belonging to the linked open data cloud. KEYRY is basedon a probabilistic approach that provides the top-k results that betterapproximate the intended meaning of the user query.


2010 - 1st International Workshop on Data Engineering meets the Semantic Web (DESWeb 2010) [Esposizione]
Guerra, Francesco; Yannis, Velegrakis; Bergamaschi, Sonia
abstract

Modern web applications like Wiki’s, social networking sites and mashups, are radically changing thenature of modern Web from a publishing-only environment into a vivant place for information exchange.The successful exploitation of this information largely depends on the ability to successfully communicatethe data semantics, which is exactly the vision of the Semantic Web. In this context, new challengesemerge for semantic-aware data management systems.The contribution of the data management community in the Semantic Web effort is fundamental. RDFhas already been adopted as the representation model and exchange format for the semantics of thedata on the Web. Although, until recently, RDF had not received considerable attention, the recentpublication in RDF format of large ontologies with millions of entities from sites like Yahoo! andWikipedia, the huge amounts of microformats in RDF from life science organizations, and the giganticRDF bibliographic annotations from publishers, have made clear the need for advanced managementtechniques for RDF data.On the other hand, traditional data management techniques have a lot to gain by incorporating semanticinformation into their frameworks. Existing data integration, exchange and query solutions are typicallybased on the actual data values stored in the repositories, and not on the semantics of these values.Incorporation of semantics in the data management process improves query accuracy, and permit moreefficient and effective sharing and distribution services. Integration of new content, on-the-fly generationof mappings, queries on loosely structured data, keyword searching on structured data repositories, andentity identification, are some of the areas that can benefit from the presence of semantic knowledgealongside the data.The goal of DESWeb is to bring together researchers and practitioners from both fields of DataManagement and Semantic Web. It aims at investigating the new challenges that Semantic Webtechnologies have introduced and new ways through which these technologies can improve existing datamanagement solutions. Furthermore, it intends to study what data management systems andtechnologies can offer in order to improve the scalability and performance of Semantic Web applications.


2010 - Guest editors' introduction: Information overload [Articolo su rivista]
Bergamaschi, Sonia; Guerra, Francesco; Barry, Leiba
abstract

Search the Internet for the phrase “information overload definition,” andGoogle will return some 7,310,000results (at the time of this writing).Bing gets 9,760,000 results for thesame query. How is it possible for usto process that much data, to select themost interesting information sources,to summarize and combine differentfacets highlighted in the results, andto answer the questions we set out toask? Information overload is present ineverything we do on the Internet.Despite the number of occurrences ofthe term on the Internet, peer-reviewedliterature offers only a few accuratedefinitions of information overload.Among them, we prefer the one thatdefines it as the situation that “occursfor an individual when the informationprocessing demands on time (InformationLoad, IL) to perform interactionsand internal calculations exceed thesupply or capacity of time available (Information Processing Capacity, IPC) for such processing.”1 In other words, when the information available exceeds the user’s ability to process it. This formaldefinition provides a measure that we can express algebraically as IL > IPC, offering a way for classifying and comparing the different situations in which the phenomenon occurs. But measuring IL and IPC is a complex task because they strictly depend on a set of factors involving both the individual and the information (such as the individual’s skill), as well as the motivations and goals behind the information request.Clay Shirky, who teaches at New York University,takes a different view, focusing on how we sift through the information that’s available to us. We’ve long had access to “more reading material than you could finish in a lifetime,” he says, and “there is no such thing as information overload, there’s only filter failure.”2 But howeverwe look at it, whether it’s too much productionor failure in filtering, it’s a general and common problem, and information overload management requires the study and adoption of special, user- and context-dependent solutions.Due to the amount of information available that comes with no guarantee of importance, trust, or accuracy, the Internet’s growth has inevitably amplified preexisting information overload issues. Newspapers, TV networks, and press agencies form an interesting example of overload producers: they collectively make available hundreds of thousands of partially overlapping news articles each day. This large quantity gives rise to information overload in a “spatial” dimension — news articles about the same subject are published in different newspapers— and in a “temporal” dimension — news articles about the same topic are published and updated many times in a short time period.The effects of information overload include difficulty in making decisions due to time spent searching and processing information,3 inabilityto select among multiple information sources providing information about the same topic,4 and psychological issues concerning excessive interruptions generated by too many informationsources.5 To put it colloquially, this excess of information stresses Internet users out.


2010 - IEEE Internet Computing Special Issue on Information Overload [Curatela]
Bergamaschi, Sonia; Guerra, Francesco; Barry, Leiba
abstract

Search the Internet for the phrase “information overload definition,” and Google will return some 7,310,000 results (at the time of this writing). Bing gets 9,760,000 results for the same query. How is it possible for us to process that much data, to select the most interesting information sources, to summarize and combine different facets highlighted in the results, and to answer the questions we set out to ask? Information overload is present in everything we do on the Internet.Despite the number of occurrences of the term on the Internet, peer-reviewed literature offers only a few accurate definitions of information overload.Among them, we prefer the one that defines it as the situation that “occurs for an individual when the information processing demands on time (Information Load, IL) to perform interactionsand internal calculations exceed the supply or capacity of time available (Information Processing Capacity, IPC) for such processing.” In other words, when the information available exceeds the user’s ability to process it. This formal definition provides a measure that we can express algebraically as IL > IPC, offering a way for classifying and comparing the different situations in which the phenomenon occurs. But measuring IL and IPC is a complex task because they strictly depend on a set of factors involving both the individual and the information (such as the individual’s skill), as well as the motivations and goals behind the information request.Clay Shirky, who teaches at New York University, takes a different view, focusing on how we sift through the information that’s available to us. We’ve long had access to “more reading material than you could finish in a lifetime,” he says, and “there is no such thing as information overload, there’s only filter failure.” But however we look at it, whether it’s too much production or failure in filtering, it’s a general and common problem, and information overload management requires the study and adoption of special, user- and context-dependent solutions.Due to the amount of information available that comes with no guarantee of importance, trust, or accuracy, the Internet’s growth has inevitably amplified preexisting information overload issues. Newspapers, TV networks, and press agencies form an interesting example of overload producers: they collectively make available hundreds of thousands of partially overlapping news articles each day. This large quantity gives rise to information overload in a “spatial” dimension — news articles about the same subject are published in different newspapers— and in a “temporal” dimension — news articles about the same topic are published and updated many times in a short time period.The effects of information overload include difficulty in making decisions due to time spent searching and processing information, inability to select among multiple information sources providing information about the same topic, and psychological issues concerning excessive interruptions generated by too many information sources. To put it colloquially, this excess of information stresses Internet users out.


2010 - Keymantic: Semantic Keyword-based Searching in Data Integration Systems [Software]
Bergamaschi, Sonia; Domnori, Elton; Guerra, Francesco; Orsini, Mirko; Raquel Trillo, Lado; Yannis, Velegrakis
abstract

Keymantic is a systemfor keyword-based searching in relational databases thatdoes not require a-priori knowledge of instances held in adatabase. It finds numerous applications in situations wheretraditional keyword-based searching techniques are inappli-cable due to the unavailability of the database contents forthe construction of the required indexes.


2010 - Keymantic: Semantic Keyword-based Searching in Data Integration Systems [Articolo su rivista]
Bergamaschi, Sonia; Domnori, Elton; Guerra, Francesco; Orsini, Mirko; R., Trillo Lado; Y., Velegrakis
abstract

We propose the demonstration of Keymantic, a system for keyword-based searching in relational databases that does not require a-priori knowledge of instances held in a database. It nds numerous applications in situations where traditional keyword-based searching techniques are inapplicable due to the unavailability of the database contents for the construction of the required indexes.


2010 - Message from the DESWeb'10 general chairs [Relazione in Atti di Convegno]
Guerra, F.; Velegrakis, Y.
abstract


2009 - An ETL tool based on semantic analysis of schemata and instances [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; C., Sartori; Vincini, Maurizio
abstract

In this paper we propose a system supporting the semi-automatic definition of inter-attribute mappings and transformation functions used as an ETL tool in a data warehouse project. The tool supports both schema level analysis, exploited for the mapping definitions amongst the data sources and the data warehouse,and instance level operations, exploited for defining transformation functions that integrate data coming from multiple sources in a common representation.Our proposal couples and extends the functionalities of two previously developed systems: the MOMIS integration system and the RELEVANT data analysis system.


2009 - Improving Extraction and Transformation in ETL by Semantic Analysis [Relazione in Atti di Convegno]
Guerra, Francesco; Bergamaschi, Sonia; Orsini, Mirko; Claudio, Sartori; Vincini, Maurizio
abstract

Extraction, Transformation and Loading processes (ETL) are crucial for the data warehouseconsistency and are typically based on constraints and requirements expressed in natural language in the form ofcomments and documentations. This task is poorly supported by automatic software applications, thus makingthese activities a huge works for data warehouse. In a traditional business scenario, this fact does not representa real big issue, since the sources populating a data warehouse are fixed and directly known by the dataadministrator. Nowadays, the actual business needs require enterprise information systems to have a greatflexibility concerning the allowed business analysis and the treated data. Temporary alliances of enterprises,market analysis processes, the data availability on Internet push enterprises to quickly integrate unexpected datasources for their activities. Therefore, the reference scenario for data warehouse systems extremely changes,since data sources populating the data warehouse may not directly be known and managed by the designers,thus creating new requirements for ETL tools related to the improvement of the automation of the extraction andtransformation process, the need of managing heterogeneous attribute values and the ability to manage differentkinds of data sources, ranging from DBMS, to flat file, XML documents and spreadsheets. In this paper wepropose a semantic-driven tool that couples and extends the functionalities of two systems: the MOMISintegration system and the RELEVANT data analysis system. The tool aims at supporting the semi-automaticdefinition of ETL inter-attribute mappings and transformations in a data warehouse project. By means of asemantic analysis, two tasks are performed: 1) identification of the parts of the schemata of the data sourceswhich are related to the data warehouse; 2) supporting the definition of transformation rules for populating thedata warehouse. We experimented the approach in a real scenario: preliminary qualitative results show that ourtool may really support the data warehouse administrator’s work, by considerably reducing the data warehousedesign time.


2009 - Keymantic: A keyword Based Search Engine using Structural Knwoledge [Relazione in Atti di Convegno]
Guerra, Francesco; Bergamaschi, Sonia; Orsini, Mirko; Sala, Antonio; Sartori, C.
abstract

Traditional techniques for query formulation need the knowledge of the database contents, i.e. which data are stored in the data source and how they are represented.In this paper, we discuss the development of a keyword-based search engine for structured data sources. The idea is to couple the ease of use and flexibility of keyword-based search with metadata extracted from data schemata and extensional knowledge which constitute a semantic network of knowledge. Translating keywords into SQL statements, we will develop a search engine that is effective, semantic-based, and applicablealso when instance are not continuously available, such as in integrated data sources or in data sources extracted from the deep web.


2009 - Keymantic: A keyword-based search engine using structural knowledge [Relazione in Atti di Convegno]
Guerra, F.; Bergamaschi, S.; Orsini, M.; Sala, A.; Sartori, C.
abstract

Traditional techniques for query formulation need the knowledge of the database contents, i.e. which data are stored in the data source and how they are represented. In this paper, we discuss the development of a keyword-based search engine for structured data sources. The idea is to couple the ease of use and flexibility of keyword-based search with metadata extracted from data schemata and extensional knowledge which constitute a semantic network of knowledge. Translating keywords into SQL statements, we will develop a search engine that is effective, semantic-based, and applicable also when instance are not continuously available, such as in integrated data sources or in data sources extracted from the deep web.


2009 - Searching for Data and Services [Relazione in Atti di Convegno]
Guerra, Francesco; A., Maurino; M., Palmonari; G., Pasi; Sala, Antonio
abstract

The increasing availability of data and eServices on the Weballows users to search for relevant information and to perform operations through eServices. Current technologies do not support users in the execution of such activities as a unique task; thus users have first to find interesting information, and then, as a separate activity, to find and use Services. In this paper we present a framework able to query an integrated view of heterogeneous data and to search for eServices related toretrieved data.


2009 - Semantic Analysis for an Advanced ETL framework [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; C., Sartori; Vincini, Maurizio
abstract

In this paper we propose a system supporting the semi-automatic definition of inter-attribute mappings and transformation functions used as ETL tool in a data warehouse project. The tool supports both schema level analysis, exploited for the mapping definitions amongst the data sources and the data warehouse, and instance level operations, exploited for defining transformationfunctions that integrate in a common representation data coming from multiple sources.Our proposal couples and extends the functionalities of two previously developed systems: the MOMIS integration system and the RELEVANT data analysis system.


2009 - Unified Semantic Search of Data and Services [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; A., Maurino; M., Palmonari; G., Pasi; Sala, Antonio
abstract

The increasing availability of data and eServices on the Weballows users to search for relevant information and to perform operations through eServices. Current technologies do not support users in the execution of such activities as a unique task; thus users have first to find interesting information, and then, as a separate activity, to find and use eServices. In this paper we present a framework able to query an integrated view of heterogeneous data and to search for eServices related to retrieved data. A unique view of data and semantically describedeServices is the way in which it is possible to unify data andservice perspectives.


2008 - 2nd International Workshop on Semantic Web Architectures For Enterprises [Esposizione]
Bergamaschi, Sonia; Guerra, Francesco; Yannis, Velegrakis
abstract

The Semantic Web vision aims at building a "web of data", where applications may share their data on the Internet and relate them to real world objects for interoperability and exchange. Similar ideas have been applied to web services, where different modeling architectures have been proposed for adding semantics to web service descriptions making services on the web widely available. The potential impact envisaged by these approaches on real business applications is also important in areas such as: Semantic-based business integration: business integration allows enterprises to share their data and services with other enterprises for business purposes. Making data and services available satisfies both "structural" requirements of enterprises (e.g. the possibility of sharing data about products or about available services), and "dynamic" requirement (e.g. business-to-business partnerships to execute an order). Information systems implementing semantic web architectures can enable and strongly support this process. Semantic interoperability: metadata and ontologies support the dynamic and flexible exchange of data and services across information systems of different organizations. Adding semantics to representations of data and services allows accurate data querying and service discovering. Semantic-based lifecycle management: metadata, ontologies and rules are becoming an effective way for modeling corporate processes and business domains, effectively supporting the maintenance and evolution of business processes, corporate data, and knowledge. Knowledge management: ontologies and automated reasoning tools seem to provide an innovative support to the elicitation, representation and sharing of corporate knowledge. SWAE (Semantic Web Architectures for Enterprises) aims at evaluating how and how much the Semantic Web vision has met its promises with respect to business and market needs. Papers and demonstrations of interest for the workshop will show and highlight the interactions between Semantic Web technologies and business applications. The workshop aims at collecting models, tools, use cases and practical experience in which Semantic Web techniques have been developed and applied to support any relevant business processes. It aims at assessing their degree of success, the challenges that have been addressed, the solutions that have been provided and the new tools that have been implemented. Special attention will be paid to proposals of “complete architecture”, i.e. applications that can effectively support the maintenance and evolution of business processes as a whole and applications that are able to combine representations of data and services in order to realize a common business knowledge management system.


2008 - A Mediator System for Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Claudio, Gennaro; Guerra, Francesco; Matteo, Mordacchini; Sala, Antonio
abstract

Managing data and multimedia sources with a unique tool is a challenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying an integrated virtual view of data and multimedia sources.


2008 - A Methodology for Building and Querying an Ontology representing Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; C., Gennaro
abstract

Managing data and multimedia sources with a unique tool is a challenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying a populated ontology representing data and multimedia sources.


2008 - DEXA 2008: Second international workshop on Semantic Web Architectures for Enterprises - SWAE'08 [Relazione in Atti di Convegno]
Bergamaschi, S.; Guerra, F.; Velegrakis, Y.
abstract

The aim of the second edition of the workshop on Semantic Web Architectures for Enterprises (SWAE) is to evaluate how and how much the Semantic Web vision has met its promises with respect to business and market needs. On the basis of our research experience within the basic research Italian project NeP4B (http://www.dbgroup.unimo.it/nep4b/it/index.htm), the European projects SEWASIE (www.sewasie.org), STASIS (http://www.dbgroup.unimo.it/stasis/), OKKAM (www.okkam.org) and Papyrus (www.ict-papyrus.eu), we focus on the permeation of the Semantic Web technologies in industrial and real applications.


2008 - Eighth International Workshop on Agents and Peer-to-Peer Computing (AP2PC09) [Esposizione]
Adrián Perreau de, Pinninck; Guerra, Francesco; Gianluca, Moro
abstract

P2P networking is the term being used to describe a new crop of decentralized approaches to self-organize large overlay networks where participants can share and exploit enormous autonomous resources. At their heart P2P systems embody the earliest principles of the internet, decentralised systems of similarly enabled 'peers'. What makes P2P networking different is that the times have changed; the numbers of peers involved has multiplied, their rate of turn-over has increased, and they now operate as an overlay within the network application layer. New techniques such as distributed hash-tables (DHTs), semantic routing, and Plaxton Meshes are being combined with traditional concepts such as Hypercubes, Trust Metrics and caching techniques to pool together the untapped computing power at the "edges" of the internet. The possibilities of this paradigm have generated a lot of interest in research, industrial and social networks. P2P network collaboration is redefining the way of communicating, publishing, doing business and building collective knowledge thanks mainly to the advent of free or affordable technologies. For instance, the major film studios and the music corporations after realizing the economic potential of p2p networks, have started selling their product online. Citizen journalism is an example based on P2P interactions, in which the idea is that people without professional journalism training can use the tools of modern technology and the global distribution of the Internet to create, augment or fact-check media on their own or in collaboration with others; P2P reputation-based mechanisms are used to validate facts/news. P2P lending allows person to skip the bank and borrow from individuals; people can borrow from complete strangers or just use P2P lending services to structure loans between friends and family (e.g. Booper, Zopa, Kiva). Recently projects based on P2P architectures, for exchanging and sharing knowledge among companies (e.g. NeP4B), have been funded; the companies of any nature, size and geographic location will be able to search for partners, exchange data, negotiate and collaborate without limitations and constraints. For these and other similar phenomena has been coined at Harvard Law School the term Commons-based peer production to describe a new model of economic production in which the creative energy of large numbers of people is coordinated into large, meaningful projects, mostly without traditional hierarchical organization or financial compensation. The Internet is going to be revolutionized by applications able to harness the power of P2P networking to bring together communities of people and organizations with similar interests or goals, and the agent technology offers the potential for developing such systems. In P2P computing peers and services organise themselves dynamically without central coordination in order to foster knowledge sharing and collaboration, both in cooperative and non-cooperative environments. The success of P2P systems strongly depends on a number of factors. First, the ability to ensure equitable distribution of content and services. Economic and business models which rely on incentive mechanisms to supply contributions to the system are being developed, along with methods for controlling the "free riding" issue. Second, the ability to enforce provision of trusted services. Reputation based P2P trust management models are becoming a focus of the research community as a viable solution. The trust models must balance both constraints imposed by the environment (e.g. scalability) and the unique properties of trust as a social and psychological phenomenon. Recently, we are also witnessing a move of the P2P paradigm to embrace mobile computing and sensor networks in an attempt to achieve even higher ubiquitousness. The possibility of services related to physical location and the relation with agents in physical proximity introduces new opportunities and also new technical


2008 - Search using Metadata, Semantic, and Ontologies [Curatela]
Jorge, Cardoso; Christoph, Bussler; Guerra, Francesco
abstract

Traditional search techniques establish a direct connection between the information provided by users with the search engine. Users are only allowed to specify a set of keywords that will be syntactically matched against a database of keywords and references. This simple approach has several drawbacks since it gives rise to a low precision (the ratio of positive results with respect to the total number of false and positive results retrieved) and low recall (the ratio of positive results retrieved with respect to the total number of positive results in the reference base). Many factors influence this low precision and recall, namely polysemy and synonymy. In the first case, one word specified in a query might have several meanings and, in the second case, distinct words may designate the same concept. If appropriate strategies are used and included in a new generation of search engines, the number of false results can be drastically reduced. As a result, the impact of these two degrading factors can be reduced and even eliminated. As the interconnection of research areas such as artificial intelligence, semantic web, and linguistics becomes stronger and more mature, it is reasonable to explore how better search engines can be developed to more adequately respond to users’ needs. A new kind of search engine that has been explored for a few years now has been termed “semantic-based search engines” by many researchers. The underlying paradigm of these engines is to find resources based on similar concepts and logical relationships and not just similar words. These engines typically rely on the use of metadata, controlled vocabularies, thesauri, taxonomy, and ontologies to describe the searchable resources to ensure that the most relevant items of information are returned. The intend of this special issue is to bring together a compilation of recent research and developments toward the creation of a new paradigm for search engines that relies on metadata, semantics and ontologies, by providing readers with a “broad spectrum vision” of the most important issues on semantic search engines. One of the main problems concerns the recognition of items of interest in web documents.


2007 - A new type of metadata for querying data integration systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; C., Sartori
abstract

Research on data integration has provided languages and systems able to guarantee an integrated intensional representation of a given set of data sources.A significant limitation common to most proposals is that only intensional knowledge is considered, with little or no consideration for extensional knowledge. In this paper we propose a technique to enrich the intension of an attribute with a new sort of metadata: the “relevant values”, extracted from the attribute values.Relevant values enrich schemata with domain knowledge; moreover they can be exploited by a user in the interactive process of creating/refining a query. The technique, fully implemented in a prototype, is automatic, independent of the attribute domain and it is based on data mining clustering techniques and emerging semantics from data values. It is parametrized with various metrics for similarity measures and is a viable tool for dealing with frequently changing sources.


2007 - Extracting Relevant Attribute Values for Improved Search [Articolo su rivista]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; C., Sartori
abstract

A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these values is automatic and independent of an attribute domain but parameterized with various metrics for similarity measures. The authors describe a fully implemented prototype and some experimental results to show the effectiveness of "relevant values" when searching a knowledge base.


2007 - MELIS: An Incremental Method For The Lexical Annotation Of Domain Ontologies [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELISis its incrementality: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of MELIS as a standalone tool and as a component integrated in MOMIS.


2007 - MELIS: a tool for the incremental annotation of domain ontologies [Software]
Bergamaschi, Sonia; Paolo, Bouquet; Daniel, Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Melis is a software tool for enablingan incremental process of automatic annotation of local schemas (e.g. re-lational database schemas, directory trees) with lexical information. Thedistinguishing and original feature of MELIS is its incrementality: thehigher the number of schemas which are processed, the more back-ground/domain knowledge is cumulated in the system (a portion of do-main ontology is learned at every step), the better the performance ofthe systems on annotating new schemas.


2007 - Melis: an incremental method for the lexical annotation of domain ontologies [Articolo su rivista]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS.


2007 - Progetto di Basi di Dati Relazionali [Monografia/Trattato scientifico]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

L'obiettivo del volume è fornire al lettore le nozioni fondamentali di progettazione e di realizzazione di applicazioni di basi di dati relazionali. Relativamente alla progettazione, vengono trattate le fasi di progettazione concettuale e logica e vengono presentati i modelli dei dati Entity-Relationship e Relazionale che costituiscono gli strumenti di base, rispettivamente, per la progettazione concettuale e la progettazione logica. Viene inoltre introdotto lo studente alla teoria della normalizzazione di basi di dati relazionali. Relativamente alla realizzazione, vengono presentati elementi ed esempi del linguaggio standard per RDBMS (Relational Database Management Systems) SQL. Ampio spazio è dedicato ad esercizi svolti sui temi trattati. Il volume nasce dalla pluriennale esperienza didattica condotta dagli autori nei corsi di Basi di Dati e di Sistemi Informativi per studenti dei corsi di laurea e laurea specialistica della Facoltà di Ingegneria di Modena, della Facoltà di Ingegneria di Reggio Emilia e della Facoltà di Economia "Marco Biagi" dell'Università degli Studi di Modena e Reggio Emilia. Il volume attuale estende notevolmente le edizioni precedenti arricchendo la sezione di progettazione logica e di SQL.La sezione di esercizi è completamente nuova, inoltre, ulteriori esercizi sono reperibili su questa pagina web. Come le edizioni precedenti, costituisce più una collezione di appunti che un vero libro nel senso che tratta in modo rigoroso ma essenziale i concetti forniti. Inoltre, non esaurisce tutte le tematiche di un corso di Basi di Dati, la cui altra componente fondamentale è costituita dalla tecnologia delle basi di dati. Questa componente è, a parere degli autori, trattata in maniera eccellente da un altro testo di Basi di Dati, scritto dai nostri colleghi e amici Paolo Ciaccia e Dario Maio dell'Università di Bologna. Il volume, pure nella sua essenzialità, è ricco di esercizi svolti e quindi può costituire un ottimo strumento per gruppi di lavoro che, nell'ambito di software house, si occupino di progettazione di applicazioni di basi di dati relazionali.


2007 - Querying a super-peer in a schema-based super-peer network [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

We propose a novel approach for defining and querying a super-peer within a schema-based super-peer network organized into a two-level architecture: the low level, called the peer level (which contains a mediator node), the second one, called super-peer level (which integrates mediators peers with similar content).We focus on a single super-peer and propose a method to define and solve a query, fully implemented in the SEWASIE project prototype. The problem we faced is relevant as a super-peer is a two-level data integrated system, then we are going beyond traditional setting in data integration. We have two different levels of Global as View mappings: the first mapping is at the super-peer level and maps several Global Virtual Views (GVVs) of peers into the GVV of the super-peer; the second mapping is within a peer and maps the data sources into the GVV of the peer. Moreover, we propose an approach where the integration designer, supported by a graphical interface, can implicitly define mappings by using Resolution Functions to solve data conflicts, and the Full Disjunction operator that has been recognized as providing a natural semantics for data merging queries.


2007 - RELEvant VAlues geNeraTor [Software]
Bergamaschi, Sonia; Claudio, Sartori; Guerra, Francesco; Orsini, Mirko
abstract

A new kind of metadata offers a synthesized view of an attribute's values for a user to exploit when creating or refining a search query in data-integration systems. The extraction technique that obtains these values is automatic and independent of an attribute domain but parameterized with various metrics for similarity measures.


2007 - Relevant News: a semantic news feed aggregator [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; Sartori, C; Vincini, Maurizio
abstract

In this paper we present RELEVANTNews, a web feed reader that automatically groups news related to the same topic published in different newspapers in different days. The tool is based on RELEVANT, a previously developed tool, which computes the “relevant values”, i.e. a subset of the values of a string attribute.Clustering the titles of the news feeds selected by the user, it is possible identify sets of related news on the basis of syntactic and lexical similarity.RELEVANTNews may be used in its default configuration or in a personalized way: the user may tune some parameters in order to improve the grouping results. We tested the tool with more than 700 news published in 30 newspapers in four daysand some preliminary results are discussed.


2007 - Relevant values: new metadata to provide insight on attribute values at schema level [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Orsini, Mirko; C., Sartori
abstract

Research on data integration has provided languages and systems able to guarantee an integrated intensionalrepresentation of a given set of data sources. A significant limitation common to most proposals is that only intensional knowledge is considered, with little or no consideration for extensional knowledge.In this paper we propose a technique to enrich the intension of an attribute with a new sort of metadata: the “relevant values”, extracted from the attribute values. Relevant values enrich schemata with domain knowledge; moreover they can be exploited by a user in the interactive process of creating/refining a query. The technique, fully implemented in a prototype, is automatic, independent of the attribute domain and it is basedon data mining clustering techniques and emerging semantics from data values. It is parametrized with various metrics for similarity measures and is a viable tool for dealing with frequently changing sources, as in the Semantic Web context.


2007 - The SEWASIE MAS for Semantic Search [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The capillary diffusion of the Internet has made available access to an overwhelming amount of data, allowing users having benefit of vast information. However, information is not really directly available: internet data are heterogeneous and spread over different places, with several duplications, and inconsistencies. The integration of such heterogeneous inconsistent data, with data reconciliation and data fusion techniques, may therefore represent a key activity enabling a more organized and semantically meaningful access to data sources. Some issues are to be solved concerning in particular the discovery and the explicit specification of the relationships between abstract data concepts and the need for data reliability in dynamic, constantly changing network. Ontologies provide a key mechanism for solving these challenges, but the web’s dynamic nature leaves open the question of how to manage them.Many solutions based on ontology creation by a mediator system have been proposed: a unified virtual view (the ontology) of the underlying data sources is obtained giving to the users a transparent access to the integrated data sources. The centralized architecture of a mediator system presents several limitations, emphasized in the hidden web: firstly, web data sources hold information according to their particular view of the matter, i.e. each of them uses a specific ontology to represent its data. Also, data sources are usually isolated, i.e. they do not share any topological information concerning the content or structure of other sources.Our proposal is to develop a network of ontology-based mediator systems, where mediators are not isolated from each other and include tools for sharing and mapping their ontologies. In this paper, we describe the use of a multi-agent architecture to achieve and manage the mediators network. The functional architecture is composed of single peers (implemented as mediator agents) independently carrying out their own integration activities. Such agents may then exchange data and knowledge with other peers by means of specialized agents (called brokering agents) which provide a coherent access plan to the peer network. In this way, two layers are defined in the architecture: at the local level, peers maintain an integrated view of local sources; at the network level, agents maintain mappings among the different peers. The result is the definition of a new type of mediator system network intended to operate in web economies, which we realized within SEWASIE (SEmantic Webs and AgentS in Integrated Economies), an RDT project supported by the 5th Framework IST program of the European Community, successfully ended on September 2005.


2007 - The SEWASIE Network of Mediator Agents for Semantic Search [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Integration of heterogeneous information in the context of Internet becomes a key activity to enable a more organized and semantically meaningful access to data sources. As Internet can be viewed as a data-sharing network where sites are data sources, the challenge is twofold. Firstly, sources present information according to their particular view of the matter, i.e. each of them assumes a specific ontology. Then, data sources are usually isolated, i.e. they do not share any topological information concerning the content or the structure of other sources. The classical approach to solve these issues is provided by mediator systems which aim at creating a unified virtual view of the underlying data sources in order to hide the heterogeneity of data and give users a transparent access to the integrated information.In this paper we propose to use a multi-agent architecture to build and manage a mediators network. While a single peer (i.e. a mediator agent) independently carries out data integration activities, it exchanges knowledge with other peers by means of specialized agents (i.e. brokers) which provide a coherent access plan to access information in the peer network. This defines two layers in the system: at local level, peers maintain an integrated view of local sources, while at network level agents maintain mappings among the different peers. The result is the definition of a new networked mediator system intended to operate in web economies, which we realized in the SEWASIE (SEmantic Webs and AgentS in Integrated Economies) project. SEWASIE is a RDT project supported by the 5th Framework IST program of the European Community successfully ended on September 2005.


2007 - Toward a Unified View of Data and Services [Relazione in Atti di Convegno]
M., Palmonari; Guerra, Francesco; A., Turati; A., Maurino; Beneventano, Domenico; E., DELLA VALLE; Sala, Antonio; D., Cerizza
abstract

We propose an approach for describing a unified view of dataand services in a peer-to-peer environment. The researchareas of data and services are usually represented with dif-ferent models and queried by different tools with differentrequirements. Our approach aims at providing the user witha “complete” knowledge (in terms of data and services) ofa domain. Our proposal is not alternative to the techniquesdeveloped for representing and querying integrated data anddiscovering services, but works in conjunction with them byimproving the user knowledge.We are experimenting the approach within the ItalianFIRB project NeP4B (Networked Peers for Business), whichaims at developing an advanced technological infrastruc-ture to enable companies to search for partners, exchangedata, negotiate and collaborate without limitations and con-straints.


2007 - W10 - SWAE '07: 1st International Workshop on Semantic Web Architectures for Enterprises [Esposizione]
Bergamaschi, Sonia; Paolo, Bouquet; Guerra, Francesco
abstract

SWAE aims at evaluating how and how much the Semantic Web vision has met its promises with respect to business and market needs. Even though the Semantic Web is a relatively new branch of scientific and technological research, its relevance has already been envisaged for some crucial business processes: Semantic-based business data integration: data integration satisfies both "structural" requirements of enterprises (e.g. the possibility of consulting its data in a unified manner), and "dynamic" requirement (e.g. business-to-business partnerships to execute an order). Information systems implementing semantic web architectures can strongly support this process, or simply enable it. Semantic interoperability: metadata and ontologies support the dynamic and flexible exchange of data and services across information systems of different organizations. The development of applications for the automatic classification of services and goods on the basis of standard hierarchies, and the translation of such classifications into the different standards used by companies is a clear example of the potential for semantic interoperability methods and tools. Knowledge management: ontologies and automated reasoning tools seem to provide an innovative support to the elicitation, representation and sharing of corporate knowledge. In particular, for the shift from document-centric KM to an entity-centric KM approach. Enterprise and process modeling: ontologies and rules are becoming an effective way for modeling corporate processes and business domains (for example, in cost reduction). The goal of the workshop is to evaluate and assess how deep the permeation of Semantic Web models, languages, technologies and applications has been in effective enterprise business applications. It would also identify how semantic web based systems, methods and theories sustain business applications such as decision processes, workflow management processes, accountability, and production chain management. A particular attention will be dedicated to metrics and criteria that evaluate cost-effectiveness of system designing processes, knowledge encoding and management, system maintenance, etc.


2006 - An incremental method for meaning elicitation of a domain ontology [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Internet has opened the access to an overwhelming amount of data, requiring the development of new applications to automatically recognize, process and manage informationavailable in web sites or web-based applications. The standardSemantic Web architecture exploits ontologies to give a shared(and known) meaning to each web source elements.In this context, we developed MELIS (Meaning Elicitation and Lexical Integration System). MELIS couples the lexical annotation module of the MOMIS system with some components from CTXMATCH2.0, a tool for eliciting meaning from severaltypes of schemas and match them. MELIS uses the MOMIS WNEditor and CTXMATCH2.0 to support two main tasks in theMOMIS ontology generation methodology: the source annotationprocess, i.e. the operation of associating an element of a lexicaldatabase to each source element, and the extraction of lexicalrelationships among elements of different data sources.


2006 - An intelligent data integration approach for collaborative project management in virtual enterprises [Articolo su rivista]
Bergamaschi, Sonia; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

The increasing globalization and flexibility required by companies has generated new issues in the last decade related to the managing of large scale projects and to the cooperation of enterprises within geographically distributed networks. ICT support systems are required to help enterprises share information, guarantee data-consistency and establish synchronized and collaborative processes. In this paper we present a collaborative project management system that integrates data coming from aerospace industries with a main goal: to facilitate the activity of assembling, integration and the verification of a multi-enterprise project. The main achievement of the system from a data management perspective is to avoid inconsistencies generated by updates at the sources' level and minimizes data replications. The developed system is composed of a collaborative project management component supported by a web interface, a multi-agent data integration system, which supports information sharing and querying, and web-services that ensure the interoperability of the software components. The system was developed by the University of Modena and Reggio Emilia. Gruppo Formula S.p.A. and tested by Alenia Spazio S.p.A. within the EU WINK Project (Web-linked Integration of Network based Knowledge-IST-2000-28221).


2006 - Instances Navigation for Querying Integrated Data from Web-Sites [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia; Bruschi, Stefania; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

Research on data integration has provided a set of rich and well understood schema mediation languages and systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal with data instances.Such meta-data are necessary for querying classes result of an integration process: the end user typically does not know the contents of such classes, he simply defines his queries on the basis of the names of classes and attributes.In this paper we introduce an approach enriching the description of selected attributes specifying as meta-data a list of the “relevant values” for such attributes. Furthermore relevant values may be hierarchically collected in a taxonomy. In this way, the user may exploit new meta-data in the interactive process of creating/refining a query. The same meta-data are also exploited by the system in the query rewriting/unfolding process in orderto filter the results showed to the user.We conducted an evaluation of the strategy in an e-business context within the EU-IST SEWASIE project. The evaluation proved the practicability of the approach for large value instances.


2006 - Instances navigation for querying integrated data from web-sites [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Bruschi, Stefania; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

Research on data integration has provided a set of rich and well understood schema mediation languages and systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal with data instances.Such meta-data are necessary for querying classes result of an integration process: the end user typically does not know the contents of such classes, he simply defines his queries on the basis of the names of classes and attributes.In this paper we introduce an approach enriching the description of selected attributes specifying as meta-data a list of the “relevant values” for such attributes. Furthermore relevant values may be hierarchically collected in a taxonomy. In this way, the user may exploit new meta-data in the interactive process of creating/refining a query. The same meta-data are also exploited by the system in the query rewriting/unfolding process in orderto filter the results showed to the user.We conducted an evaluation of the strategy in an e-business context within the EU-IST SEWASIE project. The evaluation proved the practicability of the approach for large value instances.


2006 - Using Balanced Scorecards for supporting participations in Public Administrations [Relazione in Atti di Convegno]
Guerra, Francesco
abstract

In recent years, several social-economical changes have been generating new challenges in the Public Administration actions. In particular, the diffusion of ICTs increased the request and need of developing new models for the e-government. In this paper, we propose to apply a general modification of Balanced Scorecard model, a framework developed for spreading knowledge about strategic actions and monitoring the activity in business companies, for e-government purposes. We claim that scorecards may encourage the citizens’ participations, since they completely allow evaluating the activities of a local government.


2005 - Building a tourism information provider with the MOMIS system [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The tourism industry is a good candidate for taking up Semantic Web technology. In fact, there are many portals and websites belonging to the tourism domain that promote tourist products (places to visit, food to eat, museums, etc.) and tourist services (hotels, events, etc.), published by several operators (tourist promoter associations, public agencies, etc.). This article presents how the MOMIS system may be used for building a tourism information provider by exploiting the tourism information that is available in Internet websites. MOMIS (Mediator envirOnment for Multiple Information Sources) is a mediator framework that performs information extraction and integration from heterogeneous distributed data sources and includes query management facilities to transparently support queries posed to the integrated data sources.


2005 - SEWASIE - SEmantic Webs and AgentS in Integrated Economies. [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Vincini, Maurizio; Guerra, Francesco
abstract

SEWASIE (SEmantic Webs and AgentS in Integrated Economies) aims to design and implement an advanced search engine enabling intelligent access to heterogeneous data sources on the web via semantic enrichment to provide the basis of structured secure web-based communication. SEWASIE implemented an advanced search engine that provides intelligent access to heterogeneous data sources on the web via semantic enrichment to provide the basis of structured secure web-based communication. SEWASIE provides users with a search client that has an easy-to-use query interface, and which can extract the required information from the Internet and can show it in a useful and user-friendly format. From an architectural point of view, the prototype provides a search engine client and indexing servers and ontologies.


2004 - A Web Service based framework for the semantic mapping between product classification schemas [Articolo su rivista]
Beneventano, Domenico; Guerra, Francesco; Magnani, Stefania; Vincini, Maurizio
abstract

A marketplace is the place where the demands and offers of buyers and sellers participating in a business transaction may meet. Therefore, electronic marketplaces are virtual communities in which buyers may receive proposals from several suppliers and make the best choice. In the electronic commerce world, the comparison between different products is not possible due to the lack of common standards, used by the community, describing and classifying them. Therefore, B2B and B2C marketplaces have to reclassify products and goods according to different standardization models. In this paper, we propose a semi-automatic methodology, supported by a web service based framework, to define semantic mappings amongst different product classification schemas (ecommerce standards and catalogues) and we provide the ability to be able to search and navigate these mappings.The proposed methodology is shown over fragments of UNSPSC and ecl@ss standards and over a fragment of the eBay online catalogue.


2004 - A peer-to-peer information system for the semantic web [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Data integration, in the context of the web, faces new problems, due in particular to the heterogeneity of sources, to the fragmentation of the information and to the absence of a unique way to structure and view information. In such areas, the traditional paradigms, on which database foundations are based (i.e. client server architecture, few sources containing large information), have to be overcome by new architectures. The peer-to-peer (P2P) architecture seems to be the best way to fulfill these new kinds of data sources, offering an alternative to traditional client/server architecture.In this paper we present the SEWASIE system that aims at providing access to heterogeneous web information sources. An enhancement of the system architecture in the direction of P2P architecture, where connections among SEWASIE peers rely on exchange of XML metadata, is described.


2004 - MOMIS: an Ontology-based Information Integration System(software) [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

The Mediator Environment for Multiple Information Sources (Momis), developed by the database research group at the University of Modena and Reggio Emilia, aims to construct synthesized, integrated descriptions of information coming from multiple heterogeneous sources. Our goal is to provide users with a global virtual view (GVV) of information sources, independent oftheir location or their data’s heterogeneity.An open source version of the MOMIS system was released on April 2010 by the spin-off DATARIVER (www.datariver.it)Such a view conceptualizes the underlying domain; you can think of it as an ontology describing the sources involved. The Semantic Web exploits semantic markups to provide Web ages with machine-readable definitions. It thus relieson the a priori existence of ontologies that represent the domains associated with the given information sources. This approachrelies on the selected reference ontology’s accuracy, but we find that most ontologies in common use are generic and that theannotation phase (in which semantic annotations connect Web page parts to ontology items) causes a loss of semantics. Byinvolving the sources themselves, our approach builds an ontology that more precisely represents the domain. Moreover,the GVV is annotated according to a lexical ontology, which provides an easily understandable meaning to content.


2004 - SOAP-ENABLED WEB SERVICES FOR KNOWLEDGE MANAGEMENT [Articolo su rivista]
I., Benetti; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The widespread diffusion of the World Wide Web among medium/small companies yields a huge amount of information to make business available online. Nevertheless the heterogeneity of that information, forces even trading partners involved in the same business process to face daily interoperability issues.The challenge is the integration of distributed business processes, which, in turn, means integration of heterogeneous data coming from distributed sources.This paper presents the new web services-based architecture of the MOMIS (Mediator envirOnment for Multiple Information Sources) framework that enhances the semantic integration features of MOMIS, leveraging new technologies such as XML web services and the SOAP protocol.The new architecture decouples the different MOMIS modules, publishing them as XML web services. Since the SOAP protocol used to access XML web services requires the same network security settings as a normal internet browser, companies are enabled to share knowledge without softening their protection strategies.


2004 - Synthesizing an Integrated Ontology with MOMIS [Relazione in Atti di Convegno]
Benassi, Roberta; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The Mediator EnvirOnment for Multiple Information Sources (MOMIS) aims at constructing synthesized, integrated descriptions of the information coming from multiple heterogeneous sources, in order to provide the user with a global virtual view of the sources independent from their location and the level of hetero-geneity of their data. Such a global virtual view is a con-ceptualization of the underlying domain and then may be thought of as an ontology describing the involved sources. In this article we explore the framework’s main elements and discuss how the output of the integration process can be exploited to create a conceptualization of the underly-ing domain


2003 - A Experiencing AUML for the WINK Multi-Agent System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

In the last few years, efforts have been done towards bridging thegap between agent technology and de facto standard technologies,aiming at introducing multi-agent systems in industrialapplications. This paper presents an experience done by using oneof such proposals, Agent UML. Agent UML is a graphicalmodelling language based on UML. The practical usage of thisnotation has brought to suggest some refinements of the AgentUML features.


2003 - A Peer-to-Peer Agent-Based Semantic Search Engine [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Fergnani, Alain; Guerra, Francesco; Vincini, Maurizio; D., Montanari
abstract

Several architectures, protocols, languages, and candidate standards, have been proposed to let the "semantic web'' idea take off. In particular, searching for information requires cooperation of the information providers and seekers. Past experience and history show that a successful architecture must support ease of adoption and deployment by a wide and heterogeneous population, a flexible policy to establish an acceptable cost-benefit ratio for using the system, and the growth of a cooperative distributed infrastructure with no central control. In this paper an agent-based peer-to-peer architecture is defined to support search through a flexible integration of semantic information.Two levels of integration are foreseen: strong integration of sources related to the same domain into a single information node by means of a mediator-based system; weak integration of information nodes on the basis of semantic relationships existing among concepts of different nodes.The EU IST SEWASIE project is described as an instantiation of this architecture. SEWASIE aims at implementing an advanced search engine, which will provide SMEs with intelligent access to heterogeneous information on the Internet.


2003 - Building an Ontology with MOMIS [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco
abstract

Nowadays the Web is a huge collection of data and its expansion rate is very high. Web users need new ways to exploit all this available information and possibilities. A new vision of the Web, the Semantic Web , where resources are annotated with machine-processable metadata providing them with background knowledge and meaning, arises. A fundamental component of the Semantic Web is the ontology; this “explicit specification of a conceptualization” allows information providers to give a understandable meaning to their documents. MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework for information extraction and integration of heterogeneous information sources. The system implements a semi-automatic methodology for data integration that follows the Global as View (GAV) approach. The result of the integration process is a global schema, which provides a reconciled, integrated and virtual view of the underlying sources, GVV (Global Virtual View). The GVV is composed of a set of (global) classes that represent the information contained in the sources. In this paper, we focus on the MOMIS application into a particular kind of source (i.e. web documents), and show how the result of the integration process can be exploited to create a conceptualization of the underlying domain, i.e. a domain ontology for the integrated sources. GVV is then semi-automatically annotated according to a lexical ontology. With reference to the Semantic Web area, where generally the annotation process consists of providing a web page with semantic markups according to an ontology, we firstly markup the local metadata descriptions and then the MOMIS system generates an annotated conceptualization of the sources. Moreover, our approach “builds” the domain ontology as the synthesis of the integration process, while the usual approach in the Semantic Web is based on “a priori” existence of ontology


2003 - Building an integrated Ontology within SEWASIE system [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Guerra, F.; Vincini, M.
abstract

The SEWASIE (SEmantic Webs and AgentS in Integrated Economies) project (IST-2001-34825) is an European research project that aims at designing and implementing an advanced search engine enabling intelligent access to heterogeneous data sources on the web. In this paper we focus on the Ontology Builder component of the SEWASIE system, that is a framework for information extraction and integration of heterogeneous structured and semi-structured information sources, built upon the MOMIS (Mediator envirOnment for Multiple Information Sources) system. The result of the integration process is a Global Virtual View (in short GVV) which is a set of (global) classes that represent the information contained in the sources being used. In particular, we present the application of our integration concerning a specific type of source (i.e. web documents), and show the extension of a built-up GVV by the addition of another source.


2003 - Building an integrated Ontology within the SEWASIE system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework for information extraction and integration of heterogeneous structured and semi-structured information sources. The result of the integration process is a Global Virtual View (in short GVV) which is a set of (global) classesthat represent the information contained in the sources being used. In this paper, we present the application of our integration concerning a specific type of source (i.e. web documents), and show how the result of the integration approach can be exploited to create a conceptualization of the domain belonging the sources, i.e. an ontology. Two new achievements of the MOMIS system are presented: the semi-automatic annotation of the GVV and the extension of a built-up ontology by the addition of another source.


2003 - MIKS: an agent framework supporting information access and integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; J., Gelati; Guerra, Francesco; Vincini, Maurizio
abstract

Providing an integrated access to multiple heterogeneous sourcesis a challenging issue in global information systems for cooperation and interoperability. In the past, companies haveequipped themselves with data storing systems building upinformative systems containing data that are related one another,but which are often redundant, not homogeneous and not alwayssemantically consistent. Moreover, to meet the requirements ofglobal, Internet-based information systems, it is important thatthe tools developed for supporting these activities aresemi-automatic and scalable as much as possible.To face the issues related to scalability in the large-scale, in this paper we propose the exploitation of mobile agents in the information integration area, and, in particular, their integration in the Momis infrastructure. MOMIS (Mediator EnvirOnment for Multiple Information Sources) is a system that has been conceived as a pool of tools to provide an integrated access to heterogeneous information stored in traditional databases (for example relational, object oriented databases) or in file systems, as well as in semi-structured data sources (XML-file).This proposal has been implemented within the MIKS (Mediator agent for Integration of Knowledge Sources) system and it is completely described in this paper.


2003 - Peer to Peer Paradigm for a Semantic Search Engine [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco
abstract

This paper provides, firstly, a general description of the research project SEWASIE and, secondly, a proposal of an architectural evolution of the SEWASIE system in the direction of peer-to-peer paradigm. The SEWASIE project has the aim to design and implement an advanced search engine enabling intelligent access to heterogeneous data sources on the web using community-specific multilingual ontologies. After a presentation of the main features of the system a preliminar proposal of architectural evolutions of the SEWASIE system in the direction of peer-to-peer paradigm is proposed.


2003 - Synthesizing, an integrated ontology [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

To exploit the Internet’s expanding data collection, current Semantic Web approaches employ annotation techniques to link individual information resources with machine-comprehensible metadata. Before we can realize the potential this new vision presents, however, several issues must be solved. One of these is the need for data reliability in dynamic, constantly changing networks. Another issue is how to explicitly specify relationships between abstract data concepts. Ontologies provide a key mechanism for solving these challenges, but the Web’s dynamic nature leaves open the question of how to manage them. The Mediator Environment for Multiple Information Sources (Momis), developed by the database research group at the University of Modena and Reggio Emilia, aims to construct synthesized, integrated descriptions of information coming from multiple heterogeneous sources. Our goal is to provide users with a global virtual view (GVV) of information sources, independent of their location or their data’s heterogeneity. Such a view conceptualizes the underlying domain; you can think of it as an ontology describing the sources involved. The Semantic Web exploits semantic markups to provide Web pages with machine-readable definitions. It thus relies on the a priori existence of ontologies that represent the domains associated with the given information sources. This approach relies on the selected reference ontology’s accuracy, but we find that most ontologies in common use are generic and that the annotation phase (in which semantic annotations connect Web page parts to ontology items) causes a loss of semantics. By involving the sources themselves, our approach builds an ontology that more precisely represents the domain. Moreover, the GVV is annotated according to a lexical ontology, which provides an easily understandable meaning to content. In this article, we use Web documents as a representative information source to describe the Momis methodology’s general application. We explore the framework’s main elements and discuss how the output of the integration process can be exploited to create a conceptualization of the underlying domain. In particular, our method provides a way to extend previously created conceptualizations, rather than starting from scratch, by inserting a new source.


2003 - WINK: A web-based system for collaborative project management in virtual enterprises [Relazione in Atti di Convegno]
Bergamaschi, S.; Gelati, G.; Guerra, F.; Vincini, M.
abstract

The increasing of globalization and flexibility required to the companies has generated, in the last decade, new issues, related to the managing of large scale projects within geographically distributed networks and to the cooperation of enterprises. ICT support systems are required to allow enterprises to share information, guarantee data-consistency and establish synchronized and collaborative processes. In this paper we present a collaborative project management system that integrates data coming from aerospace industries with two main goals: avoiding inconsistencies generated by updates at the sources’ level and minimizing data replications. The proposed system is composed of a collaborative project management component supported by a web interface, a multi-agent data integration component, which supports information sharing and querying, and SOAP enabled web-services which ensure the whole interoperability of the software components. The system was developed by the University of Modena and Reggio Emilia, Gruppo Formula S.p.A. and Alenia Spazio S.p.A. within the EU WINK Project (Web-linked Integration of Network based Knowledge - IST-2000-28221).


2002 - A data integration framework for e-commerce product classification [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

A marketplace is the place in which the demand and supply of buyers and vendors participating in a business process may meet. Therefore, electronic marketplaces are virtual communities in which buyers may meet proposals of several suppliers and make the best choice. In the electronic commerce world, the comparison between different products is blocked due to the lack of standards (on the contrary, the proliferation of standards) describing and classifying them. Therefore, the need for B2B and B2C marketplaces is to reclassify products and goods according to different standardization models. This paper aims to face this problem by suggesting the use of a semi-automatic methodology, supported by a tool (SI-Designer), to define the mapping among different e-commerce product classification standards. This methodology was developed for the MOMIS system within the Intelligent Integration of Information research area. We describe our extension to the methodology that makes it applyable in general to product classification standard, by selecting a fragment of ECCMA/UNSPSC and ecl @ss standard.


2002 - An Agent framework for Supporting the MIKS Integration Process [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; M., Felice; D., Gazzotti; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

Providing an integrated access to multiple heterogeneous sourcesis a challenging issue in global information systems forcooperation and interoperability. In the past, companies haveequipped themselves with data storing systems building upinformative systems containing data that are related one another,but which are often redundant, not homogeneous and not alwayssemantically consistent. Moreover, to meet the requirements ofglobal, Internet-based information systems, it is important thatthe tools developed for supporting these activities aresemi-automatic and scalable as much as possible.To face the issues related to scalability in the large-scale, inthis paper we propose the exploitation of mobile agents inthe information integration area, and, in particular, the rolesthey play in enhancing the feature of the Momis infrastructure.Momis (Mediator agent for Integration of Knowledge Sources) is asystem that has been conceived as a pool of tools to provide anintegrated access to heterogeneous information stored intraditional databases (for example relational, object orienteddatabases) or in file systems, as well as in semi-structured datasources (XML-file).In this paper we describe the new agent-based framework concerning the integration process as implemented in Miks (Mediator agent for Integration of Knowledge Sources) system.


2002 - An information integration framework for E-commerce [Articolo su rivista]
I., Benetti; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The Web has transformed electronic information systems from single, isolated nodes into a worldwide network of information exchange and business transactions. In this context, companies have equipped themselves with high-capacity storage systems that contain data in several formats. The problems faced by these companies often emerge because the storage systems lack structural and application homogeneity in addition to a common ontology.The semantic differences generated by a lack of consistent ontology can lead to conflicts that range from simple name contradictions (when companies use different names to indicate the same data concept) to structural incompatibilities (when companies use different models to represent the same information types).One of the main challenges for e-commerce infrastructure designers is information sharing and retrieving data from different sources to obtain an integrated view that can overcome any contradictions or redundancies. Virtual catalogs can help overcome this challenge because they act as instruments to retrieve information dynamically from multiple catalogs and present unified product data to customers. Instead of having to interact with multiple heterogeneous catalogs, customers can instead interact with a virtual catalog in a straightforward, uniform manner.This article presents a virtual catalog project called Momis (mediator environment for multiple information sources). Momis is a mediator-based system for information extraction and integration that works with structured and semistructured data sources. Momis includes a component called the SI-Designer for semiautomatically integrating the schemas of heterogeneous data sources, such as relational, object, XML, or semistructured sources. Starting from local source descriptions, the Global Schema Builder generates an integrated view of all data sources and expresses those views using XML. Momis lets you use the infrastructure with other open integration information systems by simply interchanging XML data files.Momis creates XML global schema using different stages, first by creating a common thesaurus of intra and interschema relationships. Momis extracts the intraschema relationships by using inference techniques, then shares these relationships in the common thesaurus. After this initial phase, Momis enriches the common thesaurus with interschema relationships obtained using the lexical WordNet system (www.cogsci.princeton.edu/wn), which identifies the affinities between interschema concepts on the basis of their lexicon meaning. Momis also enriches the common thesaurus using the Artemis system, which evaluates structural affinities among interschema concepts.


2002 - MOMIS: Exploiting agents to support information integration [Articolo su rivista]
Cabri, Giacomo; Guerra, Francesco; Vincini, Maurizio; Bergamaschi, Sonia; Leonardi, Letizia; Zambonelli, Franco
abstract

Information overloading introduced by the large amount of data that is spread over the Internet must be faced in an appropriate way. The dynamism and the uncertainty of the Internet, along with the heterogeneity of the sources of information are the two main challenges for today's technologies related to information management. In the area of information integration, this paper proposes an approach based on mobile software agents integrated in the MOMIS (Mediator envirOnment for Multiple Information Sources) infrastructure, which enables semi-automatic information integration to deal with the integration and query of multiple, heterogeneous information sources (relational, object, XML and semi-structured sources). The exploitation of mobile agents in MOMIS can significantly increase the flexibility of the system. In fact, their characteristics of autonomy and adaptability well suit the distributed and open environments, such as the Internet. The aim of this paper is to show the advantages of the introduction in the MOMIS infrastructure of intelligent and mobile software agents for the autonomous management and coordination of integration and query processing over heterogeneous data sources.


2002 - Product Classification Integration for E-Commerce [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

A marketplace is the place where the demand and supply of buyers and vendors participating in a business process may meet. Therefore, electronic marketplaces are virtual communities in which buyers may meet proposals of several suppliers and make the best choice. In the electronic commerce world, the comparison between different products is blocked due to the lack of standards (on the contrary, the proliferation of standards) describing and classifying them. Therefore, the need for B2B and B2C marketplaces is to reclassify products and goods according to different standardization models. This paper aims to face this problem by suggesting the use of a semi-automatic methodology to define a mapping among different e-commerce product classification standards. This methodology is an extension of the MOMIS-system, a mediator system developed within the Intelligent Integration of Information research area.


2002 - SI-Web: a Web based interface for the MOMIS project [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; D., Bianco; Guerra, Francesco; Vincini, Maurizio
abstract

The MOMIS project (Mediator envirOnment for MultipleInformation Sources) developed in the past years allows the integration of data from structured and semi-structured data sources. SI-Designer (Source Integrator Designer) is a designer support tool implemented within the MOMIS project for semi-automatic integration of heterogeneous sources schemata. It is a java application where all modules involved are available as CORBA Object and interact using established IDL interfaces. The goal of this demonstration is to present a new tool: SI-Web (Source Integrator on Web), it offers the same features of SI-Designer but it has got the great advantage of being usable onInternet through a web browser.


2002 - Semantic Integration and Query Optimization of Heterogeneous Data Sources [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Castano, S; DE ANTONELLIS, V; Ferrara, A; Guerra, Francesco; Mandreoli, Federica; ORNETTI G., C; Vincini, Maurizio
abstract

In modern Internet/Intranet-based architectures, an increasing number of applications requires an integrated and uniform accessto a multitude of heterogeneous and distributed data sources. Inthis paper, we describe the ARTEMIS/MOMIS system for the semantic integration and query optimization of heterogeneous structured and semistructured data sources.


2002 - The WINK Project for Virtual Enterprise Networking and Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gazzotti, Davide; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

To stay competitive (or sometimes simply to stay) on the market companies and manufacturers more and more often have to join their forces to survive and possibly flourish. Among other solutions, the last decade has experienced the growth and spreading of an original business model called Virtual Enterprise. To manage a Virtual Enterprise modern information systems have to tackle technological issues as networking, integration and cooperation. The WINK project, born form the partnership between University of Modena and Reggio Emilia and Gruppo Formula, addresses these problems. The ultimate goal is to design, implement and finally test on a pilot case (provided by Alenia), the WINK system, as combination of two existing and promising software systems (the WHALES and MIKS systems), to provide the Virtual Enterprise requirement for data integration and cooperation amd management planning.


2001 - Agents Supporting Information Integration: the MIKS Framework [Relazione in Atti di Convegno]
Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

During past years we have developed the MOMIS (Mediator envirOnment for Multiple Information Sources) system for the integration of data from structured and semi-structured data sources.In this paper we propose some preliminary considerations about one feasible extension of the system, intended to improve some of the functionalities by exploiting intelligent and mobile agents. The new framework is named a MIKS (Mediator agent for Integration of Knowledge Sources).


2001 - Agents Supporting Information Integration: the MIKS Framework [Articolo su rivista]
Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

During past years we have developed the MOMIS (Mediator envirOnment for Multiple Information Sources) system for the integration of data from structured and semi-structured data sources.In this paper we propose some preliminary considerations about one feasible extension of the system, intended to improve some of the functionalities by exploiting intelligent and mobile agents. The new framework is named a MIKS (Mediator agent for Integration of Knowledge Sources).


2001 - Exploiting extensional knowledge for query reformulation and object fusion in a data integration system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Query processing in global information systems integrating multiple heterogeneous sources is a challenging issue in relation to the effective extraction of information available on-line. In this paper we propose intelligent, tool-supported techniques for querying global information systems integrating both structured and semistructured data sources. The techniques have been developed in the environment of a data integration, wrapper/mediator based system, MOMIS, and try to achieve two main goals: optimized query reformulation w.r.t local sources and object fusion, i.e. grouping together information (from the same or different sources) about the same real-world entity. The developed techniques rely on the availability of integrationknowledge, i.e. local source schemata, a virtual mediated schema and its mapping descriptions, that is semantic mappings w.r.t. the underlying sources both at the intensional and extensional level. Mapping descriptions, obtained as a result of the semi-automatic integration process of multiple heterogeneous sources developed for the MOMIS system, include, unlike previous data integration proposals, extensional intra/interschema knowledge. Extensional knowledge is exploited to detect extensionally overlapping classes and to discover implicit join criteria among classes, which enables the goals of optimized query reformulation and object fusion to be achieved.The techniques have been implemented in the MOMIS system but can be applied, in general, to data integration systems including extensional intra/interschema knowledge in mapping descriptions.


2001 - SI-Designer: a tool for intelligent integration of information [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; I., Benetti; Corni, Alberto; Guerra, Francesco; G., Malvezzi
abstract

SI-Designer (Source Integrator Designer) is a designer supporttool for semi- automatic integration of heterogeneoussources schemata (relational, object and semi structuredsources); it has been implemented within the MOMIS projectand it carries out integration following a semantic approachwhich uses intelligent Description Logics-based techniques,clustering techniques and an extended ODMG-ODL language,ODL-I3, to represent schemata, extracted, integratedinformation. Starting from the sources’ ODL-I3 descriptions(local schemata) SI-Designer supports the designer inthe creation of an integrated view of all the sources (globalschema) which is expressed in the same ODL-I3 language.We propose SI-Designer as a tool to build virtual catalogsin the E-Commerce environment.


2001 - SI-Designer: an Integration Framework for E-Commerce [Relazione in Atti di Convegno]
I., Benetti; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Electronic commerce lets people purchase goods and exchange information on business transactions on-line. Therefore one of the main challenges for the designers of the e-commerce infrastructures is the information sharing, retrieving data located in different sources thus obtaining an integrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach as they are conceived as instruments to dynamically retrieve information from multiple catalogs and present product data in a unified manner, without directly storing product data from catalogs.In this paper we propose SI-Designer, a support tool for the integration of data from structured and semi-structured data sources, developed within the MOMIS (Mediator environment for Multiple Information Sources) project.


2001 - Supporting information integration with autonomous agents [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Cabri, Giacomo; Guerra, Francesco; Leonardi, Letizia; Vincini, Maurizio; Zambonelli, Franco
abstract

The large amount of information that is spread over the Internet is an important resource for all people but also introduces some issues that must be faced. The dynamism and the uncertainty of the Internet, along with the heterogeneity of the sources of information are the two main challanges for the today’s technologies. This paper proposes an approach based on mobile agents integrated in an information integration infrastructure. Mobile agents can significantly improve the design and the development of Internet applications thanks to their characteristics of autonomy and adaptability to open and distributed environments, such as the Internet. MOMIS (Mediator envirOnment for Multiple Information Sources) is an infrastructure for semi-automatic information integrationthat deals with the integration and query of multiple, heterogeneous information sources (relational, object, XML and semi-structured sources). The aim of this paper is to show the advantage of the introduction in the MOMIS infrastructureof intelligent and mobile software agents for the autonomous management and coordination of the integration and query processes over heterogeneous sources.


2001 - The MOMIS approach to information integration [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Guerra, F.; Vincini, M.
abstract

The web explosion, both at internet and intranet level, has transformed the electronic information system from single isolated node to an entry points into a worldwide network of information exchange and business transactions. Business and commerce has taken the opportunity of the new technologies to define the ecommerce activity. Therefore one of the main challenges for the designers of the e-commerce infrastructures is the information sharing, retrieving data located in different sources thus obtaining an integrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach as they are conceived as instruments to dynamically retrieve information from multiple catalogs and present product data in a unified manner, without directly storing product data from catalogs. Customers, instead of having to interact with multiple heterogeneous catalogs, can interact in a uniform way with a virtual catalog. In this paper we propose a designer support tool, called SI-Designer, for information integration developed within the MOMIS project. The MOMIS project (Mediator environment for Multiple Information Sources) aims to integrate data from structured and semi-structured data sources.


2001 - The Momis approach to Information Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The web explosion, both at internet and intranet level, has transformed the electronic information systemfrom single isolated node to an entry points into a worldwide network of information exchange and businesstransactions. Business and commerce has taken the opportunity of the new technologies to define the ecommerceactivity. Therefore one of the main challenges for the designers of the e-commerceinfrastructures is the information sharing, retrieving data located in different sources thus obtaining anintegrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach asthey are conceived as instruments to dynamically retrieve information from multiple catalogs and presentproduct data in a unified manner, without directly storing product data from catalogs. Customers, instead ofhaving to interact with multiple heterogeneous catalogs, can interact in a uniform way with a virtual catalog.In this paper we propose a designer support tool, called SI-Designer, for information integration developedwithin the MOMIS project. The MOMIS project (Mediator environment for Multiple Information Sources)aims to integrate data from structured and semi-structured data sources.


2001 - Towards a comprehensive methodological framework for integration [Relazione in Atti di Convegno]
D., Calvanese; S., Castano; Guerra, Francesco; D., Lembo; M., Melchiori; G., Terracina; D., Ursino; Vincini, Maurizio
abstract

Nowadays, data can be represented and stored by using different formats ranging from non structured data, typical of file systems, to semi-structured data, typical of Web sources, to highly structured data, typical of relational database systems. Therefore,the necessity arises to define new models and approaches for uniformly handling all these heterogeneous information sources. In this paper we propose a framework which aims at uniformly managing information sources having different formats and structures for obtaining a global, integrated and uniform representation.