Nuova ricerca

Domenico BENEVENTANO

Professore Associato
Dipartimento di Ingegneria "Enzo Ferrari"


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2024 - The AGILEScience mobile application for the AGILE space mission [Articolo su rivista]
Parmiggiani, N.; Bulgarelli, A.; Tavani, M.; Pittori, C.; Baroncelli, L.; Malaspina, M.; Beneventano, D.; Castaldini, L.; Di Piano, A.; Falco, R.; Fioretti, V.; Lucarelli, F.; Panebianco, G.; Verrecchia, F.
abstract

AGILE is a space mission launched in 2007 to study X-ray and gamma-ray phenomena through data acquired by different payload instruments. The AGILE Team developed an application called AGILEScience that allows to visualize information about the AGILE space mission from mobile devices, such as smartphones and tablets. The AGILEScience application can be downloaded freely for iOS and Android devices. Beside sharing information about the AGILE space mission with the public for outreach purposes, similarly to what other applications do, the AGILEScience app offers some new and unique features in gamma-ray astrophysics: (i) it gives public access in nearly real -time to the sky view of a gamma-ray satellite for the first time, (ii) it interacts with the AGILE remote gamma-ray data storage and analysis system, allowing data analysis to be sent and results to be visualized, and (iii) it allows the AGILE Team to access a passwordprotected section of the app to view detailed AGILE pipeline results and submit advanced analyses. The last two features are critical to allow remote and easy access to the results of the AGILE automated pipelines. In particular, the ability to visualize results and execute manual data analysis from mobile devices is key during the follow-up of transient events and to easily monitor the satellite status via smartphone.


2023 - A big data platform exploiting auditable tokenization to promote good practices inside local energy communities [Articolo su rivista]
Gagliardelli, Luca; Zecchini, Luca; Ferretti, Luca; Beneventano, Domenico; Simonini, Giovanni; Bergamaschi, Sonia; Orsini, Mirko; Magnotta, Luca; Mescoli, Emma; Livaldi, Andrea; Gessa, Nicola; De Sabbata, Piero; D’Agosta, Gianluca; Paolucci, Fabrizio; Moretti, Fabio
abstract

The Energy Community Platform (ECP) is a modular system conceived to promote a conscious use of energy by the users inside local energy communities. It is composed of two integrated subsystems: the Energy Community Data Platform (ECDP), a middleware platform designed to support the collection and the analysis of big data about the energy consumption inside local energy communities, and the Energy Community Tokenization Platform (ECTP), which focuses on tokenizing processed source data to enable incentives through smart contracts hosted on a decentralized infrastructure possibly governed by multiple authorities. We illustrate the overall design of our system, conceived considering some real-world projects (dealing with different types of local energy community, different amounts and nature of incoming data, and different types of users), analyzing in detail the key aspects of the two subsystems. In particular, the ECDP acquires data of a different nature in a heterogeneous format from multiple sources and supports a data integration workflow and a data lake workflow, designed for different uses of the data. We motivate our technological choices and present the alternatives taken into account, both in terms of software and of architectural design. On the other hand, the ECTP operates a tokenization process via smart contracts to promote good behaviors of users within the local energy community. The peculiarity of this platform is to allow external parties to audit the correct behavior of the whole tokenization process while protecting the confidentiality of the data and the performance of the platform. The main strengths of the presented system are flexibility and scalability (guaranteed by its modular architecture), which allow its applicability to any type of local energy community.


2023 - Privacy-Preserving Data Integration for Digital Justice [Relazione in Atti di Convegno]
Trigiante, L.; Beneventano, D.; Bergamaschi, S.
abstract

The digital transformation of the Justice domain and the resulting availability of vast amounts of data describing people and their criminal behaviors offer significant promise to feed multiple research areas and enhance the criminal justice system. Achieving this vision requires the integration of different sources to create an accurate and unified representation that enables detailed and extensive data analysis. However, the collection and processing of sensitive legal-related data about individuals imposes consideration of privacy legislation and confidentiality implications. This paper presents the lesson learned from the design and develop of a Privacy-Preserving Data Integration (PPDI) architecture and process to address the challenges and opportunities of integrating personal data belonging to criminal and court sources within the Italian Justice Domain in compliance with GDPR.


2023 - Progetto di Basi di Dati Relazionali [Monografia/Trattato scientifico]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Guerra, Francesco; Vincini, Maurizio
abstract

L’obiettivo del volume è fornire al lettore le nozioni fondamentali di progettazione e di realizzazione di applicazioni di basi di dati relazionali. Relativamente alla progettazione, vengono trattate le fasi di progettazione concettuale e logica e vengono presentati i modelli dei dati Entity-Relationship e Relazionale che costituiscono gli strumenti di base, rispettivamente, per la progettazione concettuale e la progettazione logica. Viene inoltre introdotto lo studente alla teoria della normalizzazione di basi di dati relazionali. Relativamente alla realizzazione, vengono presentati elementi ed esempi del linguaggio standard per RDBMS (Relational Database Management Systems) SQL. Ampio spazio è dedicato ad esercizi svolti sui temi trattati.


2023 - The AGILE real-time analysis software system to detect short-transient events in the multi-messenger era [Articolo su rivista]
Parmiggiani, N.; Bulgarelli, A.; Ursi, A.; Addis, A.; Baroncelli, L.; Fioretti, V.; Di Piano, A.; Panebianco, G.; Tavani, M.; Pittori, C.; Verrecchia, F.; Beneventano, D.
abstract


2023 - [Vision Paper] Privacy-Preserving Data Integration [Relazione in Atti di Convegno]
Trigiante, Lisa; Beneventano, Domenico; Bergamaschi, Sonia
abstract

The digital transformation of different processes and the resulting availability of vast amounts of data describing people and their behaviors offer significant promise to advance multiple research areas and enhance both the public and private sectors. Exploiting the full potential of this vision requires a unified representation of different autonomous data sources to fa- cilitate detailed data analysis capacity. Collecting and processing sensitive data about individuals leads to consideration of privacy requirements and confidentiality concerns. This vision paper pro- vides a concise overview of the research field concerning Privacy- Preserving Data Integration (PPDI), the associated challenges, opportunities, and unexplored aspects, with the primary aim of designing a novel and comprehensive PPDI framework based on a Trusted Third-Party microservices architecture.


2022 - Big Data Integration & Data-Centric AI for eHealth [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni; Zecchini, Luca
abstract

La big data integration, ovvero l’integrazione di grandi quantità di dati provenienti da molteplici sorgenti, rappresenta una delle principali sfide per l’impiego di tecniche e strumenti basati sull’intelligenza artificiale in ambito medico (eHealth). In questo contesto risulta inoltre di primaria importanza garantire la qualità dei dati su cui operano tali strumenti e tecniche (Data-Centric AI), che rivestono un ruolo ormai centrale nel settore. Le attività di ricerca del Database Group (DBGroup) del Dipartimento di Ingegneria "Enzo Ferrari" dell’Università degli Studi di Modena e Reggio Emilia si muovono in questa direzione. Presentiamo quindi i principali progetti di ricerca del DBGroup nel campo dell’eHealth, che si inseriscono nell’ambito di collaborazioni in diversi settori applicativi.


2022 - Big Data Integration for Data-Centric AI [Abstract in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Simonini, Giovanni; Gagliardelli, Luca; Aslam, Adeel; De Sabbata, Giulio; Zecchini, Luca
abstract

Big data integration represents one of the main challenges for the use of techniques and tools based on Artificial Intelligence (AI) in several crucial areas: eHealth, energy management, enterprise data, etc. In this context, Data-Centric AI plays a primary role in guaranteeing the quality of the data on which these tools and techniques operate. Thus, the activities of the Database Research Group (DBGroup) of the “Enzo Ferrari” Engineering Department of the University of Modena and Reggio Emilia are moving in this direction. Therefore, we present the main research projects of the DBGroup, which are part of collaborations in various application sectors.


2022 - ECDP: A Big Data Platform for the Smart Monitoring of Local Energy Communities [Relazione in Atti di Convegno]
Gagliardelli, Luca; Zecchini, Luca; Beneventano, Domenico; Simonini, Giovanni; Bergamaschi, Sonia; Orsini, Mirko; Magnotta, Luca; Mescoli, Emma; Livaldi, Andrea; Gessa, Nicola; De Sabbata, Piero; D’Agosta, Gianluca; Paolucci, Fabrizio; Moretti3, Fabio
abstract


2022 - Progressive Entity Resolution with Node Embeddings [Relazione in Atti di Convegno]
Simonini, Giovanni; Gagliardelli, Luca; Rinaldi, Michele; Zecchini, Luca; De Sabbata, Giulio; Aslam, Adeel; Beneventano, Domenico; Bergamaschi, Sonia
abstract

Entity Resolution (ER) is the task of finding records that refer to the same real-world entity, which are called matches. ER is a fundamental pre-processing step when dealing with dirty and/or heterogeneous datasets; however, it can be very time-consuming when employing complex machine learning models to detect matches, as state-of-the-art ER methods do. Thus, when time is a critical component and having a partial ER result is better than having no result at all, progressive ER methods are employed to try to maximize the number of detected matches as a function of time. In this paper, we study how to perform progressive ER by exploiting graph embeddings. The basic idea is to represent candidate matches in a graph: each node is a record and each edge is a possible comparison to check—we build that on top of a well-known, established graph-based ER framework. We experimentally show that our method performs better than existing state-of-the-art progressive ER methods on real-world benchmark datasets.


2022 - The AGILE real-time analysis pipelines in the multi-messenger era [Relazione in Atti di Convegno]
Parmiggiani, Nicolò; Bulgarelli, Andrea; Ursi, Alessandro; Fioretti, Valentina; Baroncelli, Leonardo; Addis, Antonio; Di Piano, Ambra; Pittori, Carlotta; Verrecchia, Francesco; Lucarelli, Fabrizio; Tavani, Marco; Beneventano, Domenico
abstract

In the multi-messenger era, space and ground-based observatories usually develop real-time analysis (RTA) pipelines to rapidly detect transient events and promptly share information with the scientific community to enable follow-up observations. These pipelines can also react to science alerts shared by other observatories through networks such as the Gamma-Ray Coordinates Network (GCN) and the Astronomer's Telegram (ATels). AGILE is a space mission launched in 2007 to study X-ray and gamma-ray phenomena. This contribution presents the technologies used to develop two types of AGILE pipelines using the RTApipe framework and an overview of the main scientific results. The first type performs automated analyses on new AGILE data to detect transient events and automatically sends AGILE notices to the GCN network. Since May 2019, this pipeline has sent more than 50 automated notices with a few minutes delay since data arrival. The second type of pipeline reacts to multi-messenger external alerts (neutrinos, gravitational waves, GRBs, and other transients) received through the GCN network and performs hundreds of analyses searching for counterparts in all AGILE instruments' data. The AGILE Team uses these pipelines to perform fast follow-up of science alerts reported by other facilities, which resulted in the publishing of several ATels and GCN circulars....


2022 - The RTApipe framework for the gamma-ray real-time analysis software development [Articolo su rivista]
Parmiggiani, N.; Bulgarelli, A.; Beneventano, D.; Fioretti, V.; Di Piano, A.; Baroncelli, L.; Addis, A.; Tavani, M.; Pittori, C.; Oya, I.
abstract

In the multi-messenger era, coordinating observations between astronomical facilities is mandatory to study transient phenomena (e.g. Gamma-ray bursts) and is achieved by sharing information with the scientific community through networks such as the Gamma-ray Coordinates Network. The facilities usually develop real-time scientific analysis pipelines to detect transient events, alert the astrophysical community, and speed up the reaction time of science alerts received from other observatories. We present in this work the RTApipe framework, designed to facilitate the development of real-time scientific analysis pipelines for present and future gamma-ray observatories. This framework provides pipeline architecture and automatisms, allowing the researchers to focus on the scientific aspects and integrate existing science tools developed with different technologies. The pipelines automatically execute all the configured analyses during the data acquisition. This framework can be interfaced with science alerts networks to perform follow-up analysis of transient events shared by other facilities. The analyses are performed in parallel and can be prioritised. The workload is highly scalable on a cluster of machines. The framework provides the required services using containerisation technology for easy deployment. We present the RTA pipelines developed for the AGILE space mission and the prototype of the SAG system for the ground-based future Cherenkov Telescope Array observatory confirming that the RTApipe framework can be used to successfully develop pipelines for the gamma-ray observatories, both space and ground-based.


2021 - A Deep Learning Method for AGILE-GRID Gamma-Ray Burst Detection [Articolo su rivista]
Parmiggiani, N.; Bulgarelli, A.; Fioretti, V.; Di Piano, A.; Giuliani, A.; Longo, F.; Verrecchia, F.; Tavani, M.; Beneventano, D.; Macaluso, A.
abstract

The follow-up of external science alerts received from gamma-ray burst (GRB) and gravitational wave detectors is one of the AGILE Team's current major activities. The AGILE team developed an automated real-time analysis pipeline to analyze AGILE Gamma-Ray Imaging Detector (GRID) data to detect possible counterparts in the energy range 0.1-10 GeV. This work presents a new approach for detecting GRBs using a convolutional neural network (CNN) to classify the AGILE-GRID intensity maps by improving the GRB detection capability over the Li & Ma method, currently used by the AGILE team. The CNN is trained with large simulated data sets of intensity maps. The AGILE complex observing pattern due to the so-called "spinning mode"is studied to prepare data sets to test and evaluate the CNN. A GRB emission model is defined from the second Fermi-LAT GRB catalog and convoluted with the AGILE observing pattern. Different p-value distributions are calculated, evaluating, using the CNN, millions of background-only maps simulated by varying the background level. The CNN is then used on real data to analyze the AGILE-GRID data archive, searching for GRB detections using the trigger time and position taken from the Swift-BAT, Fermi-GBM, and Fermi-LAT GRB catalogs. From these catalogs, the CNN detects 21 GRBs with a significance of ≥3σ, while the Li & Ma method detects only two GRBs. The results shown in this work demonstrate that the CNN is more effective in detecting GRBs than the Li & Ma method in this context and can be implemented into the AGILE-GRID real-time analysis pipeline.


2021 - A deep learning method for AGILE-GRID GRB detection [Altro]
Parmiggiani, N.; Bulgarelli, A.; Fioretti, V.; Di Piano, A.; Giuliani, A.; Longo, F.; Verrecchia, F.; Tavani, M.; Beneventano, D.; Macaluso, A.
abstract


2021 - LigAdvisor: A versatile and user-friendly web-platform for drug design [Articolo su rivista]
Pinzi, L.; Tinivella, A.; Gagliardelli, L.; Beneventano, D.; Rastelli, G.
abstract

Although several tools facilitating in silico drug design are available, their results are usually difficult to integrate with publicly available information or require further processing to be fully exploited. The rational design of multi-target ligands (polypharmacology) and the repositioning of known drugs towards unmet therapeutic needs (drug repurposing) have raised increasing attention in drug discovery, although they usually require careful planning of tailored drug design strategies. Computational tools and data-driven approaches can help to reveal novel valuable opportunities in these contexts, as they enable to efficiently mine publicly available chemical, biological, clinical, and disease-related data. Based on these premises, we developed LigAdvisor, a data-driven webserver which integrates information reported in DrugBank, Protein Data Bank, UniProt, Clinical Trials and Therapeutic Target Database into an intuitive platform, to facilitate drug discovery tasks as drug repurposing, polypharmacology, target fishing and profiling. As designed, LigAdvisor enables easy integration of similarity estimation results with clinical data, thereby allowing a more efficient exploitation of information in different drug discovery contexts. Users can also develop customizable drug design tasks on their own molecules, by means of ligand- and target-based search modes, and download their results. LigAdvisor is publicly available at https://ligadvisor.unimore.it/.


2021 - The Case for Multi-task Active Learning Entity Resolution [Relazione in Atti di Convegno]
Simonini, Giovanni; Saccani, Henrique; Gagliardelli, Luca; Zecchini, Luca; Beneventano, Domenico; Bergamaschi, Sonia
abstract


2020 - BLAST2: An Efficient Technique for Loose Schema Information Extraction from Heterogeneous Big Data Sources [Articolo su rivista]
BENEVENTANO, Domenico; BERGAMASCHI, Sonia; GAGLIARDELLI, LUCA; SIMONINI, GIOVANNI
abstract

We present BLAST2 a novel technique to efficiently extract loose schema information, i.e., metadata that can serve as a surrogate of the schema alignment task within the Entity Resolution (ER) process — to identify records that refer to the same real-world entity — when integrating multiple, heterogeneous and voluminous data sources. The loose schema information is exploited for reducing the overall complexity of ER, whose naïve solution would imply O(n^2) comparisons, where is the number of entity representations involved in the process and can be extracted by both structured and unstructured data sources. BLAST2 is completely unsupervised yet able to achieve almost the same precision and recall of supervised state-of-the-art schema alignment techniques when employed for Entity Resolution tasks, as shown in our experimental evaluation performed on two real-world data sets (composed of 7 and 10 data sources, respectively).


2019 - Computing inter-document similarity with Context Semantic Analysis [Articolo su rivista]
Beneventano, Domenico; Benedetti, Fabio; Bergamaschi, Sonia; Simonini, Giovanni
abstract

We propose a novel knowledge-based technique for inter-document similarity computation, called Context Semantic Analysis (CSA). Several specialized approaches built on top of specific knowledge base (e.g. Wikipedia) exist in literature, but CSA differs from them because it is designed to be portable to any RDF knowledge base. Our technique relies on a generic RDF knowledge base (e.g. DBpedia and Wikidata) to extract from it a contextual graph and a semantic contextual vector able to represent the context of a document. We show how CSA exploits such Semantic Context Vector to compute inter-document similarity effectively. Moreover, we show how CSA can be effectively applied in the Information Retrieval domain. Experimental results show that our general technique outperforms baselines built on top of traditional methods, and achieves a performance similar to the ones built on top of specific knowledge bases.


2019 - Entity Resolution and Data Fusion: An Integrated Approach [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Simonini, Giovanni
abstract


2019 - Foreword to the Special Issue: "Semantics for Big Data Integration" [Articolo su rivista]
Beneventano, Domenico; Vincini, Maurizio
abstract

In recent years, a great deal of interest has been shown toward big data. Much of the work on big data has focused on volume and velocity in order to consider dataset size. Indeed, the problems of variety, velocity, and veracity are equally important in dealing with the heterogeneity, diversity, and complexity of data, where semantic technologies can be explored to deal with these issues. This Special Issue aims at discussing emerging approaches from academic and industrial stakeholders for disseminating innovative solutions that explore how big data can leverage semantics, for example, by examining the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context.


2019 - Parallelizing computations of full disjunctions [Articolo su rivista]
Paganelli, Matteo; Beneventano, Domenico; Guerra, Francesco; Sottovia, Paolo
abstract

In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.


2019 - SparkER: Scaling Entity Resolution in Spark [Relazione in Atti di Convegno]
Gagliardelli, Luca; Simonini, Giovanni; Beneventano, Domenico; Bergamaschi, Sonia
abstract

We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms.


2018 - How improve Set Similarity Join based on prefix approach in distributed environment [Relazione in Atti di Convegno]
Zhu, Song; Gagliardelli, Luca; Simonini, Giovanni; Beneventano, Domenico
abstract

Set similarity join is an essential operation in data integration and big data analytics, that finds similar pairs of records where the records contain string or set-based data. To cope with the increasing scale of the data, several techniques have been proposed to perform set similarity joins using distributed frameworks, such as the MapReduce framework. In particular, Vernica et al. [3] proposed a MapReduce implementation of the so-called PPJoin algorithm [2], which in a recent study, was experimentally demonstrated as one of the best set similarity join algorithm [4]. These techniques, however, usually produce huge amounts of duplicates in order to perform parallel processing successfully. The large number of duplicates incurs on both large shuffle cost and unnecessary computation cost, which significantly decrease the performance. Moreover, these approaches do not provide a load balancing guarantee, which results in a skewness problem and negatively affects the scalability properties of these techniques. To address these problems, in this paper, we propose a duplicate-free framework, called TTJoin, to perform set simi- larity joins efficiently by utilizing an innovative filter based on prefix tokens and we implement it with one of most popular distributed framework, i.e., Apache Spark. Experiments on real world datasets demonstrate the effectiveness of proposed solution with respect to either traditional PPJoin and the MapReduce implementation proposed in [3].


2017 - Analyzing mappings and properties in Data Warehouse integration [Articolo su rivista]
Beneventano, Domenico; Olaru, MARIUS OCTAVIAN; Vincini, Maurizio
abstract

The information inside the Data Warehouse (DW) is used to take strategic decisions inside the organization that is why data quality plays a crucial role in guaranteeing the correctness of the decisions. Data quality also becomes a major issue when integrating information from two or more heterogeneous DWs. In the present paper, we perform extensive analysis of a mapping-based DW integration methodology and of its properties. In particular, we will prove that the proposed methodology guarantees coherency, meanwhile in certain cases it is able to maintain soundness and consistency. Moreover, intra-schema homogeneity is discussed and analysed as a necessary condition for summarizability and for optimization by materializing views of dependent queries.


2017 - Data exploration on large amount of relational data through keyword queries [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; Velegrakis, Yannis
abstract

The paper describes a new approach for querying relational databases through keyword search by exploting Information Retrieval (IR) techniques. When users do not know the structures and the content, keyword search becomes the only efficient and effective solution for allowing people exploring a relational database. The approach is based on a unified view of the database relations (performed through the full disjunction operator), where its composing tuples will be considered as documents to be indexed and searched by means of an IR search engine. Moreover, as it happens in relational databases, the system can merge the data stored in different documents for providing a complete answer to the user. In particular, two documents can be joined because either their tuples in the original database share some Primary Key or, always in the original database, some tuple is connected by a Primary / Foreign Key Relation. Our preliminary proposal, the description of the tabular data structure for storing and retrieving the possible connections among the documents and a metrics for scoring the results are introduced in the paper.


2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.


2016 - Context Semantic Analysis: A Knowledge-Based Technique for Computing Inter-document Similarity [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Benedetti, Fabio
abstract

We propose a novel knowledge-based technique for inter-document similarity, called Context Semantic Analysis (CSA). Several specialized approaches built on top of specific knowledge base (e.g. Wikipedia) exist in literature but CSA differs from them because it is designed to be portable to any RDF knowledge base. Our technique relies on a generic RDF knowledge base (e.g. DBpedia and Wikidata) to extract from it a vector able to represent the context of a document. We show how such a Semantic Context Vector can be effectively exploited to compute inter-document similarity. Experimental results show that our general technique outperforms baselines built on top of traditional methods, and achieves a performance similar to the ones of specialized methods.


2016 - Driving Innovation in Youth Policies With Open Data [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

In December 2007, thirty activists held a meeting in California to define the concept of open public data. For the first time eight Open Government Data (OPG) principles were settled; OPG should be Complete, Primary (reporting data at an high level of granularity), Timely, Accessible, Machine processable, Non-discriminatory, Non-proprietary, License-free. Since the inception of the Open Data philosophy there has been a constant increase in information released improving the communication channel between public administrations and their citizens. Open data offers government, companies and citizens information to make better decisions. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources. This paper reports the activities carried on during a research project on Open Data for Youth Policies. The project was devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work. We identified interesting data sources both from the open data community and from the private repositories of local governments related to the Youth Policies. The selected sources have been integrated and, the result of the integration by means of a useful navigator tool have been shown up. In the end, we published new information on the web as Linked Open Data. Since the process applied and the tools used are generic, we trust this paper to be an example and a guide for new projects that aims to create new knowledge through Open Data.


2016 - Exploiting Semantics for Searching Agricultural Bibliographic Data [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Martoglia, Riccardo
abstract

Filtering and search mechanisms which permit to identify key bibliographic references are fundamental for researchers. In this paper we propose a fully automatic and semantic method for filtering/searching bibliographic data, which allows users to look for information by specifying simple keyword queries or document queries, i.e. by simply submitting existing documents to the system. The limitations of standard techniques, based on either syntactical text search and on manually assigned descriptors, are overcome by considering the semantics intrinsically associated to the document/query terms; to this aim, we exploit different kinds of external knowledge sources (both general and specific domain dictionaries or thesauri). The proposed techniques have been developed and successfully tested for agricultural bibliographic data, which plays a central role to enable researchers and policy makers to retrieve related agricultural and scientific information by using the AGROVOC thesaurus.


2015 - Managing the Process of Segmentation on the Mobile Phone Subscribers [Articolo su rivista]
Rodrigue Carlos, Nana Mbinkeu; Beneventano, Domenico
abstract

Most telecommunications providers possess a remarkable amount of data about their subscribers.The knowledge that we would discover in the database of telecommunications providers is vital to understanding the behavior of subscribers. We talk about subscribers segmentation. The segmentation will identify and select the subscribers most likely to respond favorably to offers. Our paper proposes a set of techniques to analyze and design tools that manages the process of data acquisition, data cleaning and selection of the segmentation algorithm.


2015 - MDP, a database linking drug response data to genomic information, identifies dasatinib and statins as a combinatorial strategy to inhibit YAP/TAZ in cancer cells [Articolo su rivista]
Taccioli, Cristian; Sorrentino, Giovanni; Zannini, Alessandro; Caroli, Jimmy; Beneventano, Domenico; Anderlucci, Laura; Lolli, Marco; Bicciato, Silvio; Del Sal, Giannino
abstract

Targeted anticancer therapies represent the most effective pharmacological strategies in terms of clinical responses. In this context, genetic alteration of several oncogenes represents an optimal predictor of response to targeted therapy. Integration of large-scale molecular and pharmacological data from cancer cell lines promises to be effective in the discovery of new genetic markers of drug sensitivity and of clinically relevant anticancer compounds. To define novel pharmacogenomic dependencies in cancer, we created the Mutations and Drugs Portal (MDP, http://mdp.unimore.it), a web accessible database that combines the cell-based NCI60 screening of more than 50,000 compounds with genomic data extracted from the Cancer Cell Line Encyclopedia and the NCI60 DTP projects. MDP can be queried for drugs active in cancer cell lines carrying mutations in specific cancer genes or for genetic markers associated to sensitivity or resistance to a given compound. As proof of performance, we interrogated MDP to identify both known and novel pharmacogenomics associations and unveiled an unpredicted combination of two FDA-approved compounds, namely statins and Dasatinib, as an effective strategy to potently inhibit YAP/TAZ in cancer cells.


2015 - MOMIS Goes Multimedia: WINDSURF and the Case of Top-K Queries [Relazione in Atti di Convegno]
Bartolini, Iaria; Beneventano, Domenico; Bergamaschi, Sonia; Ciaccia, Paolo; Corni, Alberto; Orsini, Mirko; Patella, Marco; Santese, MARCO MARIA
abstract

In a scenario with “traditional” and “multimedia” data sources, this position paper discusses the following question: “How can a multimedia local source (e.g., Windsurf) supporting ranking queries be integrated into a mediator system without such capabilities (e.g., MOMIS)?” More precisely, “How to support ranking queries coming from a multimedia local source within a mediator system with a “traditional” query processor based on an SQL-engine?” We first describe a na¨ıve approach for the execution of range and Top-K global queries where the MOMIS query processing method remains substantially unchanged, but, in the case of Top-K queries, it does not guarantee to obtain K results. We then discuss two alternative modalities for allowing MOMIS to return the Top-K best results of a global query.


2015 - Multilingual Word Sense Induction to Improve Web Search Result Clustering [Relazione in Atti di Convegno]
Albano, Lorenzo; Beneventano, Domenico; Bergamaschi, Sonia
abstract

In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically in- ducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innovative Word Sense Induction method based on multilingual data; key to our approach was the idea that a multilingual context representation, where the context of the words is expanded by considering its translations in different languages, may im- prove the WSI results; the experiments showed a clear per- formance gain. In this paper we give some preliminary ideas to exploit our multilingual Word Sense Induction method to Web search result clustering.


2015 - Multilingual Word Sense Induction to Improve Web Search Result Clustering [Relazione in Atti di Convegno]
Albano, Lorenzo; Beneventano, Domenico; Bergamaschi, Sonia
abstract

In [13] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically in- ducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innovative Word Sense Induction method based on multilingual data; key to our approach was the idea that a multilingual context representation, where the context of the words is expanded by considering its translations in different languages, may im- prove the WSI results; the experiments showed a clear per- formance gain. In this paper we give some preliminary ideas to exploit our multilingual Word Sense Induction method to Web search result clustering.


2015 - Open Data for Improving Youth Policies [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

The Open Data \textit{philosophy} is based on the idea that certain data should be made ​​available to all citizens, in an open form, without any copyright restrictions, patents or other mechanisms of control. Various government have started to publish open data, first of all USA and UK in 2009, and in 2015, the Open Data Barometer project (www.opendatabarometer.org) states that on 77 diverse states across the world, over 55 percent have developed some form of Open Government Data initiative. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources.This paper reports the activities carried on during a one-year research project on Open Data for Youth Policies. The project was mainly devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work.The project goals were: to identify interesting data sources both from the open data community and from the private repositories of local governments of Emilia Romagna region related to the Youth Policies; to integrate them and, to show up the result of the integration by means of a useful navigator tool; in the end, to publish new information on the web as Linked Open Data. This paper also reports the main issues encountered that may seriously affect the entire process of consumption, integration till the publication of open data.


2015 - Semantic Annotation of the CEREALAB database by the AGROVOC Linked Dataset [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Serena, Sorrentino; Vincini, Maurizio; Benedetti, Fabio
abstract

Nowadays, there has been an increment of open data government initiatives promoting the idea that particular data should be freely published. However, the great majority of these resources is published in an unstructured format and is typically accessed only by closed communities. Starting from these considerations, in a previous work related to a youth precariousness dataset, we proposed an experimental and preliminary methodology or facilitating resource providers in publishing public data into the Linked Open Data (LOD) cloud, and for helping consumers (companies and citizens) in efficiently accessing and querying them. Linked Open Data play a central role for accessing and analyzing the rapidly growing pool of life science data and, as discussed in recent meetings, it is important for data source providers themselves making their resources available as Linked Open Data. In this paper we extend and apply our methodology to the agricultural domain, i.e. to the CEREALAB database, created to store both genotypic and phenotypic data and specifically designed for plant breeding, in order to provide its publication into the LOD cloud.


2015 - The on-site analysis of the Cherenkov Telescope Array [Relazione in Atti di Convegno]
Bulgarelli, Andrea; Fioretti, Valentina; Zoli, Andrea; Aboudan, Alessio; Rodríguez Vázquez, Juan José; De Cesare, Giovanni; De Rosa, Adriano; Maier, Gernot; Lyard, Etienne; Bastieri, Denis; Lombardi, Saverio; Tosti, Gino; Bergamaschi, Sonia; Beneventano, Domenico; Lamanna, Giovanni; Jacquemier, Jean; Kosack, Karl; Angelo Antonelli, Lucio; Boisson, Catherine; Borkowski, Jerzy; Buson, Sara; Carosi, Alessandro; Conforti, Vito; Colomé, Pep; De Los Reyes, Raquel; Dumm, Jon; Evans, Phil; Fortson, Lucy; Fuessling, Matthias; Gotz, Diego; Graciani, Ricardo; Gianotti, Fulvio; Grandi, Paola; Hinton, Jim; Humensky, Brian; Inoue, Susumu; Knödlseder, Jürgen; Le Flour, Thierry; Lindemann, Rico; Malaguti, Giuseppe; Markoff, Sera; Marisaldi, Martino; Neyroud, Nadine; Nicastro, Luciano; Ohm, Stefan; Osborne, Julian; Oya, Igor; Rodriguez, Jerome; Rosen, Simon; Ribo, Marc; Tacchini, Alessandro; Schüssle, Fabian; Stolarczyk, Thierry; Torresi, Eleonora; Testa, Vincenzo; Wegner, Peter; Weinstein, Amanda
abstract

The Cherenkov Telescope Array (CTA) observatory will be one of the largest ground-based veryhigh- energy gamma-ray observatories. The On-Site Analysis will be the first CTA scientific analysis of data acquired from the array of telescopes, in both northern and southern sites. The On-Site Analysis will have two pipelines: the Level-A pipeline (also known as Real-Time Analysis, RTA) and the level-B one. The RTA performs data quality monitoring and must be able to issue automated alerts on variable and transient astrophysical sources within 30 seconds from the last acquired Cherenkov event that contributes to the alert, with a sensitivity not worse than the one achieved by the final pipeline by more than a factor of 3. The Level-B Analysis has a better sensitivity (not be worse than the final one by a factor of 2) and the results should be available within 10 hours from the acquisition of the data: for this reason this analysis could be performed at the end of an observation or next morning. The latency (in particular for the RTA) and the sensitivity requirements are challenging because of the large data rate, a few GByte/s. The remote connection to the CTA candidate site with a rather limited network bandwidth makes the issue of the exported data size extremely critical and prevents any kind of processing in real-time of the data outside the site of the telescopes. For these reasons the analysis will be performed on-site with infrastructures co-located with the telescopes, with limited electrical power availability and with a reduced possibility of human intervention. This means, for example, that the on-site hardware infrastructure should have low-power consumption. A substantial effort towards the optimization of high-throughput computing service is envisioned to provide hardware and software solutions with high-throughput, low-power consumption at a low-cost. This contribution provides a summary of the design of the on-site analysis and reports some prototyping activities.


2014 - A prototype for the real-time analysis of the Cherenkov Telescope Array [Relazione in Atti di Convegno]
Andrea, Bulgarelli; Valentina, Fioretti; Andrea, Zoli; Alessio, Aboudan; Juan José Rodríguez, Vázquez; Gernot, Maier; Etienne, Lyard; Denis, Bastieri; Saverio, Lombardi; Gino, Tosti; Adriano De, Rosa; Bergamaschi, Sonia; Matteo, Interlandi; Beneventano, Domenico; Giovanni, Lamanna; Jean, Jacquemier; Karl, Kosack; Lucio Angelo, Antonelli; Catherine, Boisson; Jerzy, Burkowski; Sara, Buson; Alessandro, Carosi; Vito, Conforti; Jose Luis, Contreras; Giovanni De, Cesare; Raquel de los, Reyes; Jon, Dumm; Phil, Evans; Lucy, Fortson; Matthias, Fuessling; Ricardo, Graciani; Fulvio, Gianotti; Paola, Grandi; Jim, Hinton; Brian, Humensky; Jürgen, Knödlseder; Giuseppe, Malaguti; Martino, Marisaldi; Nadine, Neyroud; Luciano, Nicastro; Stefan, Ohm; Julian, Osborne; Simon, Rosen; Alessandro, Tacchini; Eleonora, Torresi; Vincenzo, Testa; Massimo, Trifoglio; Amanda, Weinstein
abstract

The Cherenkov Telescope Array (CTA) observatory will be one of the biggest ground-based very-high-energy (VHE) γ- ray observatory. CTA will achieve a factor of 10 improvement in sensitivity from some tens of GeV to beyond 100 TeV with respect to existing telescopes. The CTA observatory will be capable of issuing alerts on variable and transient sources to maximize the scientific return. To capture these phenomena during their evolution and for effective communication to the astrophysical community, speed is crucial. This requires a system with a reliable automated trigger that can issue alerts immediately upon detection of γ-ray flares. This will be accomplished by means of a Real-Time Analysis (RTA) pipeline, a key system of the CTA observatory. The latency and sensitivity requirements of the alarm system impose a challenge because of the anticipated large data rate, between 0.5 and 8 GB/s. As a consequence, substantial efforts toward the optimization of highthroughput computing service are envisioned. For these reasons our working group has started the development of a prototype of the Real-Time Analysis pipeline. The main goals of this prototype are to test: (i) a set of frameworks and design patterns useful for the inter-process communication between software processes running on memory; (ii) the sustainability of the foreseen CTA data rate in terms of data throughput with different hardware (e.g. accelerators) and software configurations, (iii) the reuse of nonreal- time algorithms or how much we need to simplify algorithms to be compliant with CTA requirements, (iv) interface issues between the different CTA systems. In this work we focus on goals (i) and (ii). © (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.


2014 - PROVENANCE-AWARE SEMANTIC SEARCH ENGINES BASED ON DATA INTEGRATION SYSTEMS [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

Search engines are common tools for virtually every user of the Internet and companies, such as Google and Yahoo!, have become household names. Semantic Search Engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. Given the openness of the Web and the different sources involved, a Web Search Engine must evaluate quality and trustworthiness of the data; a common approach for such assessments is the analysis of the provenance of information. In this paper a relevant class of Provenance-aware Semantic Search Engines, based on a peer-to-peer, data integration mediator-based architecture is described. The architectural and functional features are an enhancement with provenance of the SEWASIE semantic search engine developed within the IST EU SEWASIE project, coordinated by the authors. The methodology to create a two level ontology and the query processing engine developed within the SEWASIE project, together with provenance extension are fully described.


2014 - THE AGILE ALERT SYSTEM FOR GAMMA-RAY TRANSIENTS [Articolo su rivista]
A., Bulgarelli; M., Trifoglio; F., Gianotti; M., Tavani; Parmiggiani, Nicolò; V., Fioretti; A. W., Chen; S., Vercellone; C., Pittori; F., Verrecchia; F., Lucarelli; P., Santolamazza; G., Fanari; P., Giommi; Beneventano, Domenico; A., Argan; A., Trois; E., Scalise; F., Longo; A., Pellizzoni; G., Pucella; S., Colafrancesco; V., Conforti; P., Tempesta; M., Cerone; P., Sabatini; G., Annoni; G., Valentini; L., Salotti
abstract

In recent years, a new generation of space missions has offered great opportunities for discovery in high-energy astrophysics. In this article we focus on the scientific operations of the Gamma-Ray Imaging Detector (GRID) on board the AGILE space mission. AGILE-GRID, sensitive in the energy range of 30 MeV–30 GeV, has detected many γ -ray transients of both galactic and extragalactic origin. This work presents the AGILE innovative approach to fast γ -ray transient detection, which is a challenging task and a crucial part of the AGILE scientific program. The goals are to describe (1) the AGILE Gamma-Ray Alert System, (2) a new algorithm for blind search identification of transients within a short processing time, (3) the AGILE procedure for γ -ray transient alert management, and (4) the likelihood of ratio tests that are necessary to evaluate the post-trial statistical significance of the results. Special algorithms and an optimized sequence of tasks are necessary to reach our goal. Data are automatically analyzed at every orbital downlink by an alert pipeline operating on different timescales. As proper flux thresholds are exceeded, alerts are automatically generated and sent as SMS messages to cellular telephones, via e-mail, and via push notifications from an application for smartphones and tablets. These alerts are crosschecked with the results of two pipelines, and a manual analysis is performed. Being a small scientific-class mission, AGILE is characterized by optimization of both scientific analysis and ground-segment resources. The system is capable of generating alerts within two to three hours of a data downlink, an unprecedented reaction time in γ -ray astrophysics.


2014 - Word Sense Induction with Multilingual Features Representation [Relazione in Atti di Convegno]
Lorenzo, Albano; Beneventano, Domenico; Bergamaschi, Sonia
abstract

The use of word senses in place of surface word forms has been shown to improve performance on many computational tasks, including intelligent web search. In this paper we propose a novel approach to automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Almost all the WSI approaches proposed in the literature dealt with monolingual data and only very few proposals incorporate bilingual data. The WSI method we propose is innovative as use multi-lingual data to perform WSI of words in a given language. The experiments show a clear overall improvement of the performance: the single-language setting is outperformed by the multi-language settings on almost all the considered target words. The performance gain, in terms of F-Measure, has an average value of 5% and in some cases it reaches 40%.


2013 - A mediator-based approach for integrating heterogeneous multimedia sources [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; C., Gennaro; F., Rabitti
abstract

In many applications, the information required by the user cannot be found in just one source, but has to be retrieved from many varying sources. This is true not only of formatted data in database management systems, but also of textual documents and multimedia data, such as images and videos. We propose a mediator system that provides the end-user with a single query interface to an integrated view of multiple heterogeneous data sources. We exploit the capabilities of the MOMIS integration system and the MILOS multimedia data management system. Each multimedia source is managed by an instance of MILOS, in which a collection of multimedia records is made accessible by means of similarity searches employing the query-by-example paradigm. MOMIS provides an integrated virtual view of the underlying multimedia sources, thus offering unified multimedia access services. Two features are that MILOS is flexible—it is not tied to any particular similarity function—and the MOMIS’s mediator query processor only exploits the ranks of the local answers.


2013 - Analyzing Dimension Mappings and Properties in Data Warehouse IntegrationOn the Move to Meaningful Internet Systems: OTM 2013 Conferences [Relazione in Atti di Convegno]
Beneventano, Domenico; Olaru, MARIUS OCTAVIAN; Vincini, Maurizio
abstract

ud, and ODBASE 2013


2013 - Semantic Annotation and Publication of Linked Open Data [Relazione in Atti di Convegno]
Sorrentino, Serena; Bergamaschi, Sonia; Elisa, Fusari; Beneventano, Domenico
abstract

Nowadays, there has been an increment of open data government initiatives promoting the idea that particular data produced by public administrations (such as public spending, health care, education etc.) should be freely published. However, the great majority of these resources is published in an unstructured format (such as spreadsheets or CSV) and is typically accessed only by closed communities. Starting from these considerations, we propose a semi-automatic experimental methodology for facilitating resource providers in publishing public data into the Linked Open Data (LOD) cloud, and for helping consumers (companies and citizens) in efficiently accessing and querying them. We present a preliminary method for publishing, linking and semantically enriching open data by performing automatic semantic annotation of schema elements. The methodology has been applied on a set of data provided by the Research Project on Youth Precariousness, of the Modena municipality, Italy.


2013 - Semantic annotation and publication of linked open data [Relazione in Atti di Convegno]
Sorrentino, S.; Bergamaschi, S.; Fusari, E.; Beneventano, D.
abstract

Nowadays, there has been an increment of open data government initiatives promoting the idea that particular data produced by public administrations (such as public spending, health care, education etc.) should be freely published. However, the great majority of these resources is published in an unstructured format (such as spreadsheets or CSV) and is typically accessed only by closed communities. Starting from these considerations, we propose a semi-automatic experimental methodology for facilitating resource providers in publishing public data into the Linked Open Data (LOD) cloud, and for helping consumers (companies and citizens) in efficiently accessing and querying them. We present a preliminary method for publishing, linking and semantically enriching open data by performing automatic semantic annotation of schema elements. The methodology has been applied on a set of data provided by the Research Project on Youth Precariousness, of the Modena municipality, Italy. © 2013 Springer-Verlag Berlin Heidelberg.


2013 - Semantic Annotation of the CEREALAB Database by the AGROVOC Linked Dataset [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Sorrentino, Serena
abstract

The objective of the CEREALAB database is to help the breeders in choosing molecular markers associated to the most important traits. Phenotypic and genotypic data obtained from the integration of open source databases with the data obtained by the CEREALAB project are made available to the users. The CEREALAB database has been and is currently extensively used within the frame of the CEREALAB project. This paper presents the main achievements and the ongoing research to annotate the CEREALAB database and to publish it in the Linking Open Data network, in order to facilitate breeders and geneticists in searching and exploiting linked agricultural resources. One of the main focus of this paper is to discuss the use of the AGROVOC Linked Dataset both to annotate the CEREALAB schema and to discover schema-level mappings among the CEREALAB Dataset and other resources of the Linking Open Data network, such as NALT, the National Agricultural Library Thesaurus, and DBpedia.


2013 - Semantic annotation of the CEREALAB database by the AGROVOC linked dataset [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Sorrentino, S.
abstract

The objective of the CEREALAB database is to help the breeders in choosing molecular markers associated to the most important traits. Phenotypic and genotypic data obtained from the integration of open source databases with the data obtained by the CEREALAB project are made available to the users. The CEREALAB database has been and is currently extensively used within the frame of the CEREALAB project. This paper presents the main achievements and the ongoing research to annotate the CEREALAB database and to publish it in the Linking Open Data network, in order to facilitate breeders and geneticists in searching and exploiting linked agricultural resources. One of the main focus of this paper is to discuss the use of the AGROVOC Linked Dataset both to annotate the CEREALAB schema and to discover schema-level mappings among the CEREALAB Dataset and other resources of the Linking Open Data network, such as NALT, the National Agricultural Library Thesaurus, and DBpedia. © 2013 Springer-Verlag Berlin Heidelberg.


2013 - Semantic Integration of heterogeneous data sources in the MOMIS Data Transformation System [Articolo su rivista]
Vincini, Maurizio; Bergamaschi, Sonia; Beneventano, Domenico
abstract

In the last twenty years, many data integration systems following a classical wrapper/mediator architecture and providing a Global Virtual Schema (a.k.a. Global Virtual View - GVV) have been proposed by the research community. The main issues faced by these approaches range from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. Despite the research effort, all the approaches proposed require a lot of user intervention for customizing and managing the data integration and reconciliation tasks. In some cases, the effort and the complexity of the task is huge, since it requires the development of specific programming codes. Unfortunately, due to the specificity to be addressed, application codes and solutions are not frequently reusable in other domains. For this reason, the Lowell Report 2005 has provided the guideline for the definition of a public benchmark for information integration problem. The proposal, called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches), focuses on how the data integration systems manage syntactic and semantic heterogeneities, which definitely are the greatest technical challenges in the field. We developed a Data Transformation System (DTS) that supports data transformation functions and produces query translation in order to push down to the sources the execution. Our DTS is based on MOMIS, a mediator-based data integration system that our research group is developing and supporting since 1999. In this paper, we show how the DTS is able to solve all the twelve queries of the THALIA benchmark by using a simple combination of declarative translation functions already available in the standard SQL language. We think that this is a remarkable result, mainly for two reasons: firstly to the best of our knowledge there is no system that has provided a complete answer to the benchmark, secondly, our queries does not require any overhead of new code.


2012 - Agents and Peer-to-Peer Computing7th International Workshop, AP2PC 2008 Estoril, Portugal, May 2008 and 8th International Workshop, AP2PC 2009 Budapest, Hungary, May 2009, Revised Selected Papers [Curatela]
Beneventano, Domenico; Zoran, Despotovic; Guerra, Francesco; Sam, Joseph; Gianluca, Moro; Adrián Perreau de, Pinninck
abstract

7th InternationalWorkshop, AP2PC 2008, Estoril, Portugal, May 13, 2008 and 8th InternationalWorkshop, AP2PC 2009 Budapest, Hungary, May 11, 2009 Revised Selected Papers


2012 - FACIT-SME - Facilitate IT-providing SMEs by Operation-related Models and Methods [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Martoglia, Riccardo
abstract

The FACIT SME project addresses SMEs operating in the ICT domain. The goals are (a) to facilitate the use of Software Engineering (SE) methods and to systematize their application integrated with the business processes, (b) to provide efficient and affordable certification of these processes according to internationally accepted standards, and (c) to securely share best practices, tools and experiences with development partners and customers. The project targets (1) to develop a novel Open Reference Model (ORM) for ICT SME, serving as knowledge backbone in terms of procedures, documents, tools and deployment methods; (2) to develop a customisable Open Source Enactment System (OSES) that provides IT support for the project-specific application of the ORM; and (3) to evaluate these developments with 5 ICT SMEs by establishing the ORM, the OSES and preparing the certifications. The approach combines and amends achievements from Model Generated Workplaces, Certification of SE for SMEs, and model-based document management. The consortium is shaped by 4 significant SME associations as well as a European association exclusively focused on the SME community in the ICT sector. Five R&D partners provide the required competences. Five SMEs operating in the ICT domain will evaluate the results in daily-life application. The major impact is expected for ICT SMEs by (a) optimising their processes based on best practise; (b) achieving internationally accepted certification; and (c) provision of structured reference knowledge. They will improve implementation projects and make their solutions more appealing to SMEs. ICT SME communities (organized by associations) will experience significant benefit through exchange of recent knowledge and best practises. By providing clear assets (ORM and OSES), the associations shape the service offering to their members and strengthen their community. The use of Open Source will further facilitate the spread of the results across European SMEs.


2012 - Integration and Provenance of Cereals Genotypic and Phenotypic Data [Poster]
Beneventano, Domenico; Bergamaschi, Sonia; Abdul Rahman, Dannaoui; Pecchioni, Nicola
abstract

This paper presents the ongoing research on the design and development of a Provenance Management component, PM_MOMIS, for the MOMIS Data Integration System. MOMIS has been developed by the DBGROUP of the University of Modena and Reggio Emilia (www.dbgroup.unimore.it). An open source version of the MOMIS system is delivered and maintained by the academic spin-off DataRiver (www.datariver.it).PM_MOMIS aims to provide the provenance management techniques supported by two of the most relevant data provenance systems, the "Perm" and "Trio" systems, and extends them by including the data fusion and conflict resolution techniques provided by MOMIS. PM_MOMIS functionalities have been studied and partially developed in the domain of genotypic and phenotypic cereal-data management within the CEREALAB project. The CEREALAB Data Integration Application integrates data coming from different databases with MOMIS, with the aim of creating a powerful tool for plant breeders and geneticists. Users of CEREALAB played a major role in the emergence of real needs of provenance management in their domain.We defined the provenance for the "full outerjoin-merge" operator, used in MOMIS to solve conflicts among values; this definition is based on the concept of "PI-CS-provenance" of the "Perm" system; we are using the "Perm" system as the SQL engine of MOMIS, so that to obtain the provenance in our CEREALAB Application. The main drawback of this solution is that often conflicting values represent alternatives; then our proposal is to consider the output of the "full outerjoin-merge" operator as an uncertain relation and manage it with a system that supports uncertain data and data lineage, the "Trio" system.


2012 - Integration and Provenance of Cereals Genotypic and Phenotypic Data [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Abdul Rahman, Dannaoui
abstract

This paper presents the ongoing research on the design and development of a Provenance Managementcomponent, PM_MOMIS, for the MOMIS Data Integration System. PM_MOMIS aims to provide the provenancemanagement techniques supported by two of the most relevant data provenance systems, the Perm andTrio systems, and extends them by including the data fusion and conflict resolution techniquesprovided by MOMIS.PM_MOMIS functionalities have been studied and partially developed in the domain of genotypic andphenotypic cereal-data management within the CEREALAB project. The CEREALAB Data IntegrationApplication integrates data coming from different databases with MOMIS, with the aim of creating apowerful tool for plant breeders and geneticists. Users of CEREALAB played a major role in theemergence of real needs of provenance management in their domain.


2012 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface [Relazione in Atti di Convegno]
Beneventano, D.; Despotovic, Z.; Guerra, F.; Joseph, S.; Moro, G.; De Pinninck, A. P.
abstract


2012 - Provenance Based Conflict Handling Strategies [Relazione in Atti di Convegno]
Beneventano, Domenico
abstract

A fundamental task in data integration is data fusion, the process of fusing multiple recordsrepresenting the same real-world object into a consistent representation; data fusion involves theresolution of possible conflicts between data coming from different sources; several high levelstrategies to handle inconsistent data have been described and classified in [8].The MOMIS Data Integration System [2] uses either conflict avoiding strategies (such as the trustyour friends strategy which takes the value of a preferred source) and resolution strategies (suchas the meet in the middle strategy which takes an average value).In this paper we consider other strategies proposed in literature to handle inconsistent data andwe discuss how they can be adopted and extended in the MOMIS Data Integration System. First of all,we consider the methods introduced by the Trio system [1,6] and based on the idea to tackle dataconflicts by explicitly including information on provenance to represent uncertainty and use it toanswer queries. Other possible strategies are to ignore conflicting values at the global level(i.e., only consistent values are considered) and to consider at the global level all conflictingvalues.The original contribution of this paper is a provenance-based framework which includes all the abovementioned conflict handling strategies and use them as different search strategies for querying theintegrated sources.


2012 - RELEASE OF THE CEREALAB DATABASE V 2.0 [Abstract in Atti di Convegno]
Dannaoui, Abdul Rahman; Sala, Antonio; Beneventano, Domenico; Milc, Justyna Anna; Caffagni, Alessandra; Pecchioni, Nicola
abstract

The CEREALAB database is a web-based tool realized for wheat, barley and rice, to help the breeders in choosing molecular markers associated to the most important economically phenotypic traits. It contains phenotypic and genotypic data obtained from the integration of available open source databases with the data obtained by the CEREALAB project. In this paper we describe several significant extensions to the CEREALAB database, derived from real needs of the end-user, the modern breeder that is using molecular tools. Firstly, to offer to the breeders new significant data, the CEREALAB database was extended. As a second aim, to improve and simplify the access to the database, a new user-friendly Graphic User Interface (GUI) was developed. Third, to maximize and optimize the accessibility of the available information, new functionalities and additional tools were realized. In particular, to offer to the breeders an effective tool for the analysis of data, the possibility to obtain structured reports was introduced. Finally, to insert new data in the database, a new data entry module was implemented in the interface. Database URL: http://www.cerealab.org


2012 - The CEREALAB Database: Ongoing Research and Future Challenges [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Dannaoui, Abdul Rahman; Milc, Justyna Anna; Pecchioni, Nicola; Sorrentino, Serena
abstract

The objective of the CEREALAB database is to help the breeders in choosing molecular markers associated to the most important traits. Phenotypic and genotypic data obtained from the integration of open source databases with the data obtained by the CEREALAB project are made available to the users. The first version of the CEREALAB database has been extensively used within the frame of the CEREALAB project. This paper presents the main achievements and the ongoing research related to the CEREALAB database. First, as a result of the extensive use of the CEREALAB database, several extensions and improvements to the web application user interface were introduced. Second, always derived from end-user needs, the notion of provenance was introduced and partially implemented in the context of the CEREALAB database. Third, we describe some preliminary ideas to annotate the CEREALAB database and to publish it in the Linking Open Data network.


2011 - Automatic Normalization and Annotation for Discovering Semantic Mappings [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Po, Laura; Sorrentino, Serena
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.


2011 - Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Guerra, Francesco; Orsini, Mirko
abstract

Given the many data integration approaches, a complete and exhaustivecomparison of all the research activities is not possible. In this chapter, we willpresent an overview of the most relevant research activities andideas in the field investigated in the last 20 years. We will also introduce the MOMISsystem, a framework to perform information extraction and integration from bothstructured and semistructured data sources, that is one of the most interesting resultsof our research activity. An open source version of the MOMIS system was deliveredby the academic startup DataRiver (www.datariver.it).


2011 - Data lineage in the MOMIS data fusion system [Relazione in Atti di Convegno]
Beneventano, Domenico; Abdul Rahman, Dannoui; Antonio, Sala
abstract

Data Lineage is an open research problem. This is particularly true in data integration systems, where information coming from different sources, potentially uncertain or even inconsistent with each other, is integrated. In this context, having the possibility to trace the lineage of certain data can help unraveling possible unexpected or questionable results. In this paper, we describe our preliminary work about this problem in the context of the MOMIS data Integration system. We discuss and compare the use of Lineage-CS and PI-CS provenance, introduced respectively in [1] and [2], for the data fusion operator used in the MOMIS system; in particular we evaluate how the computation of the PI-CS provenance should be extended to deal with Resolution Functions used in our data fusion system.


2011 - Information Systems: Editorial [Articolo su rivista]
Carlo, Batini; Beneventano, Domenico; Bergamaschi, Sonia; Tiziana, Catarci
abstract

Research efforts on structured data, multimedia, and services have involved non-overlapping communities. However, from a user perspective, the three kinds of information should behave and be accessed similarly. Instead, a user has to deal with different tools in order to gain a complete knowledge about a domain. There is no integrated view comprising data, multimedia and services retrieved by the specific tools that is automatically computed. A unified approach for dealing with different kinds of information may allow searches across different domains and different starting points / results in the searching processes.Multiple and challenging research issues have to be addressed to achieve this goal, including: mediating among different models for representing information, developing new techniques for extracting and mapping relevant information from heterogeneous kinds of data, devising innovative paradigms for formulating and processing queries ranging over both (multimedia) data and services, investigating new models for visualizing the results and allowing the user to easily manipulate them.This special issue "Semantic Integration of Data, Multimedia, and Services" presents advances in data, multimedia, and services interoperability.


2011 - On Provenance of Data Fusion Queries [Relazione in Atti di Convegno]
Beneventano, Domenico; A., Dannoui; A., Sala
abstract

Data Lineage is an open research problem. This is particularly true in data integration systems, where information coming from different sources, potentially uncertain or even inconsistent with each other, is integrated. In this context, having the possibility to trace the lineage of certain data can help unraveling possible unexpected or questionable results.In this paper, we describe our preliminary work about this problem in the context of the MOMIS data fusion system. We discuss and compare the use of lineage and why-provenance for the data fusion operator used in the MOMIS system; in particular we evaluate how the computation of the why-provenance should be extended to deal with Resolution Functions used in our data fusion system.


2011 - The Open Source release of the MOMIS Data Integration System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Corni, Alberto; Entela, Kazazi; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

MOMIS (Mediator EnvirOnment for Multiple InformationSources) is an Open Source Data Integration system able to aggregate data coming from heterogeneous data sources (structured and semistructured) in a semi-automatic way. DataRiver3 is a Spin-Off of the University of Modena and Reggio Emilia that has re-engineered the MOMIS system, and released its Open Source version both for commercial and academic use. The MOMIS system has been extended with a set of features to minimize the integration process costs, exploiting the semantics of the data sources and optimizing each integration phase.The Open Source MOMIS system have been successfully applied in several industrial sectors: Medical, Agro-food, Tourism, Textile, Mechanical, Logistics. This paper describes the features of the Open Source MOMIS system and how it is able to address real data integration challenges.


2010 - Data Quality Aware Queries in the MOMIS Integration System [Relazione in Atti di Convegno]
Beneventano, Domenico; R., Carlos Nana Mbinkeu
abstract

In the NeP4B project the MOMIS data integration system was extended with data quality concepts. Starting from this obtained framework, the aim of this paper is twofold; first, we consider data quality metadata in the specification of queries on the Global Schema of the integration system; in this way we can express the quality of the retrieved data by means of threshold of acceptance. Second, we will discuss how the quality constraints specified in the query are useful to perform some query optimization techniques.


2010 - MOMIS: Getting through the THALIA benchmark [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Orsini, Mirko; Vincini, Maurizio
abstract

During the last decade many data integration systems characterized by a classical wrapper/mediator architecture based on a Global Virtual Schema (Global Virtual View - GVV) have been proposed. The data sources store data, while the GVV provides a reconciled, integrated, and virtual view of the underlying sources. Each proposed system contribute to the state of the art advancement by focusing on different aspects to provide an answer to one or more challenges of the data integration problem, ranging from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. The approaches are still in part manual, requiring a great amount of customization for data reconciliation and for writing specific non reusable programming code. The specialization of mediator systems make a comparisons among the various systems difficult. Therefore, the last Lowell Report [1] has provided the guideline for the definition of a public benchmark for data integration problems. The proposal is called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) [2], and it provides researchers with a collection of downloadable data sources representing University course catalogues, a set of twelve benchmark queries, as well as a scoring function for ranking the performance of an integration system. In this paper we show how the MOMIS mediator system we developed [3,4] can deal with all the twelve queries of the THALIA benchmark by simply extending and combining the declarative translation functions available in MOMIS and without any overhead of new code. This is a remarkable result, in fact, as far as we know, no system has provided a complete answer to the benchmark.


2010 - Object Identification across Multiple Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Matteo Di, Gioia; Monica, Scannapieco
abstract

The problem of identifying the manifold generated copies of an object is known as Object Identification (OI). This problem concerns the quality of the data. Subsequently, the quality of the ob- ject (data) could be restored through the identification of the corrupted copies.In literature the solutions are mainly oriented to discover pairs of du- plicates (pairs-oriented OI) rather than sets of similar objects (group- oriented OI). We proposed a new technique to resolve the OI problem among many sources in a quasi-decentralized manner. The new technique is based on the concept of constraints and is composed by two phases: extraction phase and grouping. First we extract constraints by analyz- ing data at hand (the decentralized phase). Then, we reason about those to find the groups of similar objects (the centralized phase). We have conducted several tests that show the effectiveness of our proposal.


2010 - Quality-Driven Query Processing Techniques in the MOMIS Integration System [Relazione in Atti di Convegno]
Beneventano, Domenico; R., Carlos Nana Mbinkeu
abstract

In the NeP4B project the MOMIS data integration system was extended with data quality concepts. Starting from this obtained framework, the aim of this paper is twofold; first, we consider data quality metadata in the specification of queries on the Global Schema of the integration system; in this way we can express the quality of the retrieved data by means of threshold of acceptance. Second, we will discuss how the quality constraints specified in the query are useful to perform some query optimization techniques.


2009 - An Ontology-Based Data Integration System for Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sala, Antonio; Sorrentino, Serena
abstract

Data integration is the problem of combining data residing at distributed heterogeneous sources, including multimedia sources, and providing the user with a unified view of these data. Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources [16]. Ontologies, with respect to the integration of data sources, can be used for the identification and association of semantically correspond- ing information concepts, i.e. for the definition of semantic mappings among concepts of the information sources. MOMIS is a Data Integration System which performs in-formation extraction and integration from both structured and semi- structured data sources [6]. In [5] MOMIS was extended to manage “traditional” and “multimedia” data sources at the same time. STASIS is a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics [1]. Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been pro-posed [7]. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration for data and multimedia sources.


2009 - DataRiver [Spin Off]
Bergamaschi, Sonia; Orsini, Mirko; Beneventano, Domenico; Sala, Antonio; Corni, Alberto; Po, Laura; Sorrentino, Serena; Quix, Srl
abstract


2009 - Extending Word Net with compound nouns for semi-automatic annotation in data integration systems [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Sorrentino, Serena
abstract

The focus of data integration systems is on producing a comprehensive global schema successfully integrating data from heterogeneous data sources (heterogeneous in format and in structure). Starting from the “meanings” associated to schema elements (i.e. class/attribute labels) and exploiting the structural knowledge of sources, it is possible to discover relationships among the elements of different schemata. Lexical annotation is the explicit inclusion of the “meaning” of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotatortools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationships. In this paper we propose a new method for the annotation ofnon-dictionary compound nouns, which draws its inspiration from works in the natural languagedisambiguation area. The method extends the lexical annotation module of the MOMIS data integration system.


2009 - Lexical Knowledge Extraction: an Effective Approach to Schema and Ontology Matching [Relazione in Atti di Convegno]
Po, Laura; Sorrentino, Serena; Bergamaschi, Sonia; Beneventano, Domenico
abstract

This paper’s aim is to examine what role Lexical Knowledge Extraction plays in data integration as well as ontology engineering.Data integration is the problem of combining data residing at distributed heterogeneous sources, and providing the user with a unified view of these data; a common and important scenario in data integration are structured or semi-structure data sources described by a schema.Ontology engineering is a subfield of knowledge engineering that studies the methodologies for building and maintaining ontologies. Ontology engineering offers a direction towards solving the interoperability problems brought about by semantic obstacles, such as the obstacles related to the definitions of business terms and software classes. In these contexts where users are confronted with heterogeneous information it is crucial the support of matching techniques. Matching techniques aim at finding correspondences between semantically related entities of different schemata/ontologies.Several matching techniques have been proposed in the literature based on different approaches, often derived from other fields, such as text similarity, graph comparison and machine learning.This paper proposes a matching technique based on Lexical Knowledge Extraction: first, an Automatic Lexical Annotation of schemata/ontologies is performed, then lexical relationships are extracted based on such annotations.Lexical Annotation is a piece of information added in a document (book, online record, video, or other data), that refers to a semantic resource such as WordNet. Each annotation has the property to own one or more lexical descriptions. Lexical annotation is performed by the Probabilistic Word Sense Disambiguation (PWSD) method that combines several disambiguation algorithms.Our hypothesis is that performing lexical annotation of elements (e.g. classes and properties/attributes) of schemata/ontologies makes the system able to automatically extract the lexical knowledge that is implicit in a schema/ontology and then to derive lexical relationships between the elements of a schema/ontology or among elements of different schemata/ontologies.The effectiveness of the method presented in this paper has been proven within the data integration system MOMIS.


2009 - Multi-Source Object Identification With Constraints. [Relazione in Atti di Convegno]
Matteo Di, Gioia; Beneventano, Domenico; Monica, Scannapieco
abstract

The problem of identifying the manifold generated copies of an object is known as Object Identification (OI). This problem concerns the quality of the data. For example from n corrupted copies of an object the original object could be rebuilt. Subsequently, the quality of the ob- ject (data) could be restored through the identification of the corrupted copies.In literature the solutions are mainly oriented to discover pairs of du- plicates (pairs-oriented OI) rather than sets of similar objects (group- oriented OI). We proposed a new technique to resolve the OI problem among many sources in a quasi-decentralized manner. The new technique is based on the concept of constraints and is composed by two phases: extraction phase and grouping. First we extract constraints by analyz- ing data at hand (the decentralized phase). Then, we reason about those to find the groups of similar objects (the centralized phase). We have conducted several tests that show the effectiveness of our proposal.


2009 - Query Processing in a Mediator System for Data and Multimedia [Relazione in Atti di Convegno]
Beneventano, Domenico; Claudio, Gennaro; Matteo, Mordacchini; R., Carlos Nana Mbinkeu
abstract

Managing data and multimedia sources with a unique tool is a chal- lenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying an integrated virtual view of data and multimedia sources.


2009 - STASIS (SofTware for Ambient Semantic Interoperable Services) [Software]
Beneventano, Domenico
abstract

See http://www.sewasie.org/


2009 - The MOMIS-STASIS approach for Ontology-Based Data Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources. Ontologies can be used in an integration task to describe the semantics of the information sources and to make the contents explicit. With respect to the integration of data sources, they can be used for the identification and association of semantically corresponding information concepts, i.e. for the definition of semantic mapping among concepts of the information sources. MOMIS is a Data Integration System which performs information extraction and integration from both structured and semi-structured data sources. The goal of the STASIS project is to create a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics.Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been proposed. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration.


2009 - Unified Semantic Search of Data and Services [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; A., Maurino; M., Palmonari; G., Pasi; Sala, Antonio
abstract

The increasing availability of data and eServices on the Weballows users to search for relevant information and to perform operations through eServices. Current technologies do not support users in the execution of such activities as a unique task; thus users have first to find interesting information, and then, as a separate activity, to find and use eServices. In this paper we present a framework able to query an integrated view of heterogeneous data and to search for eServices related to retrieved data. A unique view of data and semantically describedeServices is the way in which it is possible to unify data andservice perspectives.


2008 - A Mediator System for Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Claudio, Gennaro; Guerra, Francesco; Matteo, Mordacchini; Sala, Antonio
abstract

Managing data and multimedia sources with a unique tool is a challenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying an integrated virtual view of data and multimedia sources.


2008 - A Methodology for Building and Querying an Ontology representing Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Guerra, Francesco; C., Gennaro
abstract

Managing data and multimedia sources with a unique tool is a challenging issue. In this paper, the capabilities of the MOMIS integration system and the MILOS multimedia content management system are coupled, thus providing a methodology and a tool for building and querying a populated ontology representing data and multimedia sources.


2008 - Ontological Mappings of Product Catalogues [Relazione in Atti di Convegno]
Beneventano, Domenico; Daniele, Montanari
abstract

In this paper we built on top of recent effort in the areas of semantics and interoperability to establish the basis for a comprehensive and sustainable approach to the development and later management of bridging systems among a variety of corporate system that need to be interconnected without being individually modified. In particular, we collect some preliminary evidence that a sustainable approach exists to the definition of mappings which can withstand changes of the underlying classification schemes. This in turn adds evidence towards the feasibility of a dynamic interoperable infrastructure supporting a global adaptive electronic market place.


2008 - Ontology-driven Semantic Mapping [Relazione in Atti di Convegno]
Beneventano, Domenico; N., Dahlem; S., EL HAOUM; A., Hahn; D., Montanari; M., Reinelt
abstract

When facilitating interoperability at the data level one faces the problem that different data models are used as the basis for business formats. For example relational databases are based on the relational model, while XML Schema is basically a hierarchical model (with some extensions, like references). Our goal is to provide a syntax and a data model neutral format for the representation of business schemata. We have developed a unified description of data models which is called the Logical Data Model (LDM) Ontology. It is a superset of the relational, hierarchical, network, object-oriented data models, which is represented as a graph consisting of nodes with labeled edges. For the representation of different relationships between the nodes in the data-model we introduced different types of edges. For example: is_a for the representation of the subclass relationship, identifies for the representation of unique key values, contains for the containment relationship, etc. In this paper we discuss the mapping process as it is proposed by EU project STASIS (FP6-2005-IST-5-034980). Then we describe the Logical Data-Model in detail and demonstrate its use by giving an example. Finally we discuss future research planned in this context in the STASIS project.


2007 - Mapping of heterogeneous schemata, business structures, and terminologies [Relazione in Atti di Convegno]
Beneventano, Domenico; EL HAOUM, S; Montanari, D.
abstract

The current effort to extend the power of information systems by making use of the semantics associated with terms and structures has resulted in a need to establish correspondences between different systems to allow a correspondingly rich exchange of information. This paper describes the early efforts taking place in the STASIS EU IST project (FP6-2005-IST-5-034980) to identify the issues underlying support for mapping of corresponding entities between such heterogeneous systems. The STASIS system is meant to help a user establish such mappings by exploiting a semantic environment where he/she can contribute his/her own entities and relate them with other pre-existing entities. This process needs support at the entity representation level, to encapsulate each item into an appropriately rich representation structure, and at the logical level, where the resulting model is verified for consistency towards its future use. Examples are offered and discussed to highlight the issues and propose solutions.


2007 - Progetto di Basi di Dati Relazionali [Monografia/Trattato scientifico]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

L'obiettivo del volume è fornire al lettore le nozioni fondamentali di progettazione e di realizzazione di applicazioni di basi di dati relazionali. Relativamente alla progettazione, vengono trattate le fasi di progettazione concettuale e logica e vengono presentati i modelli dei dati Entity-Relationship e Relazionale che costituiscono gli strumenti di base, rispettivamente, per la progettazione concettuale e la progettazione logica. Viene inoltre introdotto lo studente alla teoria della normalizzazione di basi di dati relazionali. Relativamente alla realizzazione, vengono presentati elementi ed esempi del linguaggio standard per RDBMS (Relational Database Management Systems) SQL. Ampio spazio è dedicato ad esercizi svolti sui temi trattati. Il volume nasce dalla pluriennale esperienza didattica condotta dagli autori nei corsi di Basi di Dati e di Sistemi Informativi per studenti dei corsi di laurea e laurea specialistica della Facoltà di Ingegneria di Modena, della Facoltà di Ingegneria di Reggio Emilia e della Facoltà di Economia "Marco Biagi" dell'Università degli Studi di Modena e Reggio Emilia. Il volume attuale estende notevolmente le edizioni precedenti arricchendo la sezione di progettazione logica e di SQL.La sezione di esercizi è completamente nuova, inoltre, ulteriori esercizi sono reperibili su questa pagina web. Come le edizioni precedenti, costituisce più una collezione di appunti che un vero libro nel senso che tratta in modo rigoroso ma essenziale i concetti forniti. Inoltre, non esaurisce tutte le tematiche di un corso di Basi di Dati, la cui altra componente fondamentale è costituita dalla tecnologia delle basi di dati. Questa componente è, a parere degli autori, trattata in maniera eccellente da un altro testo di Basi di Dati, scritto dai nostri colleghi e amici Paolo Ciaccia e Dario Maio dell'Università di Bologna. Il volume, pure nella sua essenzialità, è ricco di esercizi svolti e quindi può costituire un ottimo strumento per gruppi di lavoro che, nell'ambito di software house, si occupino di progettazione di applicazioni di basi di dati relazionali.


2007 - Query Translation on heterogeneous sources in MOMIS Data Transformation Systems [Relazione in Atti di Convegno]
Beneventano, Domenico; Vincini, Maurizio; Orsini, Mirko; Bergamaschi, Sonia; Nana, C.
abstract

Abstract


2007 - Querying a super-peer in a schema-based super-peer network [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

We propose a novel approach for defining and querying a super-peer within a schema-based super-peer network organized into a two-level architecture: the low level, called the peer level (which contains a mediator node), the second one, called super-peer level (which integrates mediators peers with similar content).We focus on a single super-peer and propose a method to define and solve a query, fully implemented in the SEWASIE project prototype. The problem we faced is relevant as a super-peer is a two-level data integrated system, then we are going beyond traditional setting in data integration. We have two different levels of Global as View mappings: the first mapping is at the super-peer level and maps several Global Virtual Views (GVVs) of peers into the GVV of the super-peer; the second mapping is within a peer and maps the data sources into the GVV of the peer. Moreover, we propose an approach where the integration designer, supported by a graphical interface, can implicitly define mappings by using Resolution Functions to solve data conflicts, and the Full Disjunction operator that has been recognized as providing a natural semantics for data merging queries.


2007 - Semantic search engines based on data integration systems [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

As the use of the World Wide Web has become increasingly widespread, the business of commercial search engines has become a vital and lucrative part of the Web. Search engines are common place tools for virtually every user of the Internet; and companies, such as Google and Yahoo!, have become household names. Semantic Search Engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. In this chapter a relevant class of Semantic Search Engines, based on a peer-to-peer, data integration mediator-based architecture is described. The architectural and functional features are presented with respect to two projects, SEWASIE and WISDOM, involving the authors. The methodology to create a two level ontology and query processing in the SEWASIE project are fully described.


2007 - The SEWASIE MAS for Semantic Search [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The capillary diffusion of the Internet has made available access to an overwhelming amount of data, allowing users having benefit of vast information. However, information is not really directly available: internet data are heterogeneous and spread over different places, with several duplications, and inconsistencies. The integration of such heterogeneous inconsistent data, with data reconciliation and data fusion techniques, may therefore represent a key activity enabling a more organized and semantically meaningful access to data sources. Some issues are to be solved concerning in particular the discovery and the explicit specification of the relationships between abstract data concepts and the need for data reliability in dynamic, constantly changing network. Ontologies provide a key mechanism for solving these challenges, but the web’s dynamic nature leaves open the question of how to manage them.Many solutions based on ontology creation by a mediator system have been proposed: a unified virtual view (the ontology) of the underlying data sources is obtained giving to the users a transparent access to the integrated data sources. The centralized architecture of a mediator system presents several limitations, emphasized in the hidden web: firstly, web data sources hold information according to their particular view of the matter, i.e. each of them uses a specific ontology to represent its data. Also, data sources are usually isolated, i.e. they do not share any topological information concerning the content or structure of other sources.Our proposal is to develop a network of ontology-based mediator systems, where mediators are not isolated from each other and include tools for sharing and mapping their ontologies. In this paper, we describe the use of a multi-agent architecture to achieve and manage the mediators network. The functional architecture is composed of single peers (implemented as mediator agents) independently carrying out their own integration activities. Such agents may then exchange data and knowledge with other peers by means of specialized agents (called brokering agents) which provide a coherent access plan to the peer network. In this way, two layers are defined in the architecture: at the local level, peers maintain an integrated view of local sources; at the network level, agents maintain mappings among the different peers. The result is the definition of a new type of mediator system network intended to operate in web economies, which we realized within SEWASIE (SEmantic Webs and AgentS in Integrated Economies), an RDT project supported by the 5th Framework IST program of the European Community, successfully ended on September 2005.


2007 - The SEWASIE Network of Mediator Agents for Semantic Search [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Integration of heterogeneous information in the context of Internet becomes a key activity to enable a more organized and semantically meaningful access to data sources. As Internet can be viewed as a data-sharing network where sites are data sources, the challenge is twofold. Firstly, sources present information according to their particular view of the matter, i.e. each of them assumes a specific ontology. Then, data sources are usually isolated, i.e. they do not share any topological information concerning the content or the structure of other sources. The classical approach to solve these issues is provided by mediator systems which aim at creating a unified virtual view of the underlying data sources in order to hide the heterogeneity of data and give users a transparent access to the integrated information.In this paper we propose to use a multi-agent architecture to build and manage a mediators network. While a single peer (i.e. a mediator agent) independently carries out data integration activities, it exchanges knowledge with other peers by means of specialized agents (i.e. brokers) which provide a coherent access plan to access information in the peer network. This defines two layers in the system: at local level, peers maintain an integrated view of local sources, while at network level agents maintain mappings among the different peers. The result is the definition of a new networked mediator system intended to operate in web economies, which we realized in the SEWASIE (SEmantic Webs and AgentS in Integrated Economies) project. SEWASIE is a RDT project supported by the 5th Framework IST program of the European Community successfully ended on September 2005.


2007 - Toward a Unified View of Data and Services [Relazione in Atti di Convegno]
M., Palmonari; Guerra, Francesco; A., Turati; A., Maurino; Beneventano, Domenico; E., DELLA VALLE; Sala, Antonio; D., Cerizza
abstract

We propose an approach for describing a unified view of dataand services in a peer-to-peer environment. The researchareas of data and services are usually represented with dif-ferent models and queried by different tools with differentrequirements. Our approach aims at providing the user witha “complete” knowledge (in terms of data and services) ofa domain. Our proposal is not alternative to the techniquesdeveloped for representing and querying integrated data anddiscovering services, but works in conjunction with them byimproving the user knowledge.We are experimenting the approach within the ItalianFIRB project NeP4B (Networked Peers for Business), whichaims at developing an advanced technological infrastruc-ture to enable companies to search for partners, exchangedata, negotiate and collaborate without limitations and con-straints.


2006 - Instances Navigation for Querying Integrated Data from Web-Sites [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia; Bruschi, Stefania; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

Research on data integration has provided a set of rich and well understood schema mediation languages and systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal with data instances.Such meta-data are necessary for querying classes result of an integration process: the end user typically does not know the contents of such classes, he simply defines his queries on the basis of the names of classes and attributes.In this paper we introduce an approach enriching the description of selected attributes specifying as meta-data a list of the “relevant values” for such attributes. Furthermore relevant values may be hierarchically collected in a taxonomy. In this way, the user may exploit new meta-data in the interactive process of creating/refining a query. The same meta-data are also exploited by the system in the query rewriting/unfolding process in orderto filter the results showed to the user.We conducted an evaluation of the strategy in an e-business context within the EU-IST SEWASIE project. The evaluation proved the practicability of the approach for large value instances.


2006 - Instances navigation for querying integrated data from web-sites [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Bruschi, Stefania; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

Research on data integration has provided a set of rich and well understood schema mediation languages and systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal with data instances.Such meta-data are necessary for querying classes result of an integration process: the end user typically does not know the contents of such classes, he simply defines his queries on the basis of the names of classes and attributes.In this paper we introduce an approach enriching the description of selected attributes specifying as meta-data a list of the “relevant values” for such attributes. Furthermore relevant values may be hierarchically collected in a taxonomy. In this way, the user may exploit new meta-data in the interactive process of creating/refining a query. The same meta-data are also exploited by the system in the query rewriting/unfolding process in orderto filter the results showed to the user.We conducted an evaluation of the strategy in an e-business context within the EU-IST SEWASIE project. The evaluation proved the practicability of the approach for large value instances.


2006 - Semantic search engines based on data integration systems [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

As the use of the World Wide Web has become increasingly widespread, the business ofcommercial search engines has become a vital and lucrative part of the Web. Search engines arecommon place tools for virtually every user of the Internet; and companies, such as Google andYahoo!, have become household names. Semantic Search Engines try to augment and improvetraditional Web Search Engines by using not just words, but concepts and logical relationships.In this chapter a relevant class of Semantic Search Engines, based on a peer-to-peer, dataintegration mediator-based architecture is described.The architectural and functional features are presented with respect to two projects, SEWASIEand WISDOM, involving the authors. The methodology to create a two level ontology and queryprocessing in the SEWASIE project are fully described.


2005 - Building a tourism information provider with the MOMIS system [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The tourism industry is a good candidate for taking up Semantic Web technology. In fact, there are many portals and websites belonging to the tourism domain that promote tourist products (places to visit, food to eat, museums, etc.) and tourist services (hotels, events, etc.), published by several operators (tourist promoter associations, public agencies, etc.). This article presents how the MOMIS system may be used for building a tourism information provider by exploiting the tourism information that is available in Internet websites. MOMIS (Mediator envirOnment for Multiple Information Sources) is a mediator framework that performs information extraction and integration from heterogeneous distributed data sources and includes query management facilities to transparently support queries posed to the integrated data sources.


2005 - SEWASIE - SEmantic Webs and AgentS in Integrated Economies. [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Vincini, Maurizio; Guerra, Francesco
abstract

SEWASIE (SEmantic Webs and AgentS in Integrated Economies) aims to design and implement an advanced search engine enabling intelligent access to heterogeneous data sources on the web via semantic enrichment to provide the basis of structured secure web-based communication. SEWASIE implemented an advanced search engine that provides intelligent access to heterogeneous data sources on the web via semantic enrichment to provide the basis of structured secure web-based communication. SEWASIE provides users with a search client that has an easy-to-use query interface, and which can extract the required information from the Internet and can show it in a useful and user-friendly format. From an architectural point of view, the prototype provides a search engine client and indexing servers and ontologies.


2004 - A framework for the classification and the reclassification of electronic catalog [Relazione in Atti di Convegno]
Beneventano, Domenico; Magnani, Stefania
abstract

Electronic marketplaces are virtual communities where buyers may meet proposals of several suppliers and make the best choice. The exponential increment of the e-commerce amplifies the proliferation of different standards and joint initiatives for the classification of products and services. Therefore, B2B and B2C marketplaces have to classify products and goods according to different product classification standards. In this paper, we propose a framework to classify and reclassify electronic catalogs based on a semi-automatic methodology to define semantic mappings among different product classification standards and catalogs.


2004 - A Web Service based framework for the semantic mapping between product classification schemas [Articolo su rivista]
Beneventano, Domenico; Guerra, Francesco; Magnani, Stefania; Vincini, Maurizio
abstract

A marketplace is the place where the demands and offers of buyers and sellers participating in a business transaction may meet. Therefore, electronic marketplaces are virtual communities in which buyers may receive proposals from several suppliers and make the best choice. In the electronic commerce world, the comparison between different products is not possible due to the lack of common standards, used by the community, describing and classifying them. Therefore, B2B and B2C marketplaces have to reclassify products and goods according to different standardization models. In this paper, we propose a semi-automatic methodology, supported by a web service based framework, to define semantic mappings amongst different product classification schemas (ecommerce standards and catalogues) and we provide the ability to be able to search and navigate these mappings.The proposed methodology is shown over fragments of UNSPSC and ecl@ss standards and over a fragment of the eBay online catalogue.


2004 - MOMIS: an Ontology-based Information Integration System(software) [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Guerra, Francesco; Orsini, Mirko; Vincini, Maurizio
abstract

The Mediator Environment for Multiple Information Sources (Momis), developed by the database research group at the University of Modena and Reggio Emilia, aims to construct synthesized, integrated descriptions of information coming from multiple heterogeneous sources. Our goal is to provide users with a global virtual view (GVV) of information sources, independent oftheir location or their data’s heterogeneity.An open source version of the MOMIS system was released on April 2010 by the spin-off DATARIVER (www.datariver.it)Such a view conceptualizes the underlying domain; you can think of it as an ontology describing the sources involved. The Semantic Web exploits semantic markups to provide Web ages with machine-readable definitions. It thus relieson the a priori existence of ontologies that represent the domains associated with the given information sources. This approachrelies on the selected reference ontology’s accuracy, but we find that most ontologies in common use are generic and that theannotation phase (in which semantic annotations connect Web page parts to ontology items) causes a loss of semantics. Byinvolving the sources themselves, our approach builds an ontology that more precisely represents the domain. Moreover,the GVV is annotated according to a lexical ontology, which provides an easily understandable meaning to content.


2004 - Synthesizing an Integrated Ontology with MOMIS [Relazione in Atti di Convegno]
Benassi, Roberta; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The Mediator EnvirOnment for Multiple Information Sources (MOMIS) aims at constructing synthesized, integrated descriptions of the information coming from multiple heterogeneous sources, in order to provide the user with a global virtual view of the sources independent from their location and the level of hetero-geneity of their data. Such a global virtual view is a con-ceptualization of the underlying domain and then may be thought of as an ontology describing the involved sources. In this article we explore the framework’s main elements and discuss how the output of the integration process can be exploited to create a conceptualization of the underly-ing domain


2004 - The MOMIS methodology for integrating heterogeneous data sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

The Mediator EnvirOnment for Multiple Information Sources (MOMIS) aims at constructingsynthesized, integrateddescriptions of the information coming from multiplehe terogeneous sources, in order to provide the user with a global virtual viewof the sources independent from their location and the level of heterogeneity of their data. Such a global virtual view is a conceptualizationof the underlying domain andthen may be thought of as anontology describing the involved sources. In this article wee xplore the framework’s main elements and discuss how the output of the integration process can be exploited to create a conceptualization of the underlying domain.


2003 - A Peer-to-Peer Agent-Based Semantic Search Engine [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Fergnani, Alain; Guerra, Francesco; Vincini, Maurizio; D., Montanari
abstract

Several architectures, protocols, languages, and candidate standards, have been proposed to let the "semantic web'' idea take off. In particular, searching for information requires cooperation of the information providers and seekers. Past experience and history show that a successful architecture must support ease of adoption and deployment by a wide and heterogeneous population, a flexible policy to establish an acceptable cost-benefit ratio for using the system, and the growth of a cooperative distributed infrastructure with no central control. In this paper an agent-based peer-to-peer architecture is defined to support search through a flexible integration of semantic information.Two levels of integration are foreseen: strong integration of sources related to the same domain into a single information node by means of a mediator-based system; weak integration of information nodes on the basis of semantic relationships existing among concepts of different nodes.The EU IST SEWASIE project is described as an instantiation of this architecture. SEWASIE aims at implementing an advanced search engine, which will provide SMEs with intelligent access to heterogeneous information on the Internet.


2003 - Building an integrated Ontology within SEWASIE system [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Guerra, F.; Vincini, M.
abstract

The SEWASIE (SEmantic Webs and AgentS in Integrated Economies) project (IST-2001-34825) is an European research project that aims at designing and implementing an advanced search engine enabling intelligent access to heterogeneous data sources on the web. In this paper we focus on the Ontology Builder component of the SEWASIE system, that is a framework for information extraction and integration of heterogeneous structured and semi-structured information sources, built upon the MOMIS (Mediator envirOnment for Multiple Information Sources) system. The result of the integration process is a Global Virtual View (in short GVV) which is a set of (global) classes that represent the information contained in the sources being used. In particular, we present the application of our integration concerning a specific type of source (i.e. web documents), and show the extension of a built-up GVV by the addition of another source.


2003 - Building an Integrated Ontology within the SEWASIE Project: The Ontology Builder Tool [Abstract in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; D., Miselli; A., Fergnani; Vincini, Maurizio
abstract

See http://www.sewasie.org/


2003 - Building an integrated Ontology within the SEWASIE system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework for information extraction and integration of heterogeneous structured and semi-structured information sources. The result of the integration process is a Global Virtual View (in short GVV) which is a set of (global) classesthat represent the information contained in the sources being used. In this paper, we present the application of our integration concerning a specific type of source (i.e. web documents), and show how the result of the integration approach can be exploited to create a conceptualization of the domain belonging the sources, i.e. an ontology. Two new achievements of the MOMIS system are presented: the semi-automatic annotation of the GVV and the extension of a built-up ontology by the addition of another source.


2003 - Building an Ontology with MOMIS [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco
abstract

Nowadays the Web is a huge collection of data and its expansion rate is very high. Web users need new ways to exploit all this available information and possibilities. A new vision of the Web, the Semantic Web , where resources are annotated with machine-processable metadata providing them with background knowledge and meaning, arises. A fundamental component of the Semantic Web is the ontology; this “explicit specification of a conceptualization” allows information providers to give a understandable meaning to their documents. MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework for information extraction and integration of heterogeneous information sources. The system implements a semi-automatic methodology for data integration that follows the Global as View (GAV) approach. The result of the integration process is a global schema, which provides a reconciled, integrated and virtual view of the underlying sources, GVV (Global Virtual View). The GVV is composed of a set of (global) classes that represent the information contained in the sources. In this paper, we focus on the MOMIS application into a particular kind of source (i.e. web documents), and show how the result of the integration process can be exploited to create a conceptualization of the underlying domain, i.e. a domain ontology for the integrated sources. GVV is then semi-automatically annotated according to a lexical ontology. With reference to the Semantic Web area, where generally the annotation process consists of providing a web page with semantic markups according to an ontology, we firstly markup the local metadata descriptions and then the MOMIS system generates an annotated conceptualization of the sources. Moreover, our approach “builds” the domain ontology as the synthesis of the integration process, while the usual approach in the Semantic Web is based on “a priori” existence of ontology


2003 - Description logics for semantic query optimization in object-oriented database systems [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori
abstract

Semantic query optimization uses semantic knowledge (i.e., integrity constraints) to transform a query into an equivalent one that may be answered more efficiently. This article proposes a general method for semantic query optimization in the framework of Object-Oriented Database Systems. The method is effective for a large class of queries, including conjunctive recursive queries expressed with regular path expressions and is based on three ingredients. The first is a Description Logic, ODLRE, providing a type system capable of expressing: class descriptions, queries, views, integrity constraint rules and inference techniques, such as incoherence detection and subsumption computation. The second is a semantic expansion function for queries, which incorporates restrictions logically implied by the query and the schema (classes + rules) in one query. The third is an optimal rewriting method of a query with respect to the schema classes that rewrites a query into an equivalent one, by determining more specialized classes to be accessed and by reducing the number of factors. We implemented the method in a tool providing an ODMG-compliant interface that allows a full interaction with OQL queries, wrapping underlying Description Logic representation and techniques to the user.


2003 - MIKS: an agent framework supporting information access and integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; J., Gelati; Guerra, Francesco; Vincini, Maurizio
abstract

Providing an integrated access to multiple heterogeneous sourcesis a challenging issue in global information systems for cooperation and interoperability. In the past, companies haveequipped themselves with data storing systems building upinformative systems containing data that are related one another,but which are often redundant, not homogeneous and not alwayssemantically consistent. Moreover, to meet the requirements ofglobal, Internet-based information systems, it is important thatthe tools developed for supporting these activities aresemi-automatic and scalable as much as possible.To face the issues related to scalability in the large-scale, in this paper we propose the exploitation of mobile agents in the information integration area, and, in particular, their integration in the Momis infrastructure. MOMIS (Mediator EnvirOnment for Multiple Information Sources) is a system that has been conceived as a pool of tools to provide an integrated access to heterogeneous information stored in traditional databases (for example relational, object oriented databases) or in file systems, as well as in semi-structured data sources (XML-file).This proposal has been implemented within the MIKS (Mediator agent for Integration of Knowledge Sources) system and it is completely described in this paper.


2003 - Semantic Web Search Engines: the SEWASIE approach [Poster]
Beneventano, Domenico; Bergamaschi, Sonia; D., Montanari; L., Ottaviani
abstract

SEWASIE is a research project funded by the European Commission that aims to design and implement an advanced search engine enabling intelligent access to heterogeneous data sources on the web via semantic enrichment to provide the basis of structured secure web-based communication.


2003 - Synthesizing, an integrated ontology [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

To exploit the Internet’s expanding data collection, current Semantic Web approaches employ annotation techniques to link individual information resources with machine-comprehensible metadata. Before we can realize the potential this new vision presents, however, several issues must be solved. One of these is the need for data reliability in dynamic, constantly changing networks. Another issue is how to explicitly specify relationships between abstract data concepts. Ontologies provide a key mechanism for solving these challenges, but the Web’s dynamic nature leaves open the question of how to manage them. The Mediator Environment for Multiple Information Sources (Momis), developed by the database research group at the University of Modena and Reggio Emilia, aims to construct synthesized, integrated descriptions of information coming from multiple heterogeneous sources. Our goal is to provide users with a global virtual view (GVV) of information sources, independent of their location or their data’s heterogeneity. Such a view conceptualizes the underlying domain; you can think of it as an ontology describing the sources involved. The Semantic Web exploits semantic markups to provide Web pages with machine-readable definitions. It thus relies on the a priori existence of ontologies that represent the domains associated with the given information sources. This approach relies on the selected reference ontology’s accuracy, but we find that most ontologies in common use are generic and that the annotation phase (in which semantic annotations connect Web page parts to ontology items) causes a loss of semantics. By involving the sources themselves, our approach builds an ontology that more precisely represents the domain. Moreover, the GVV is annotated according to a lexical ontology, which provides an easily understandable meaning to content. In this article, we use Web documents as a representative information source to describe the Momis methodology’s general application. We explore the framework’s main elements and discuss how the output of the integration process can be exploited to create a conceptualization of the underlying domain. In particular, our method provides a way to extend previously created conceptualizations, rather than starting from scratch, by inserting a new source.


2002 - An Agent framework for Supporting the MIKS Integration Process [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; M., Felice; D., Gazzotti; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

Providing an integrated access to multiple heterogeneous sourcesis a challenging issue in global information systems forcooperation and interoperability. In the past, companies haveequipped themselves with data storing systems building upinformative systems containing data that are related one another,but which are often redundant, not homogeneous and not alwayssemantically consistent. Moreover, to meet the requirements ofglobal, Internet-based information systems, it is important thatthe tools developed for supporting these activities aresemi-automatic and scalable as much as possible.To face the issues related to scalability in the large-scale, inthis paper we propose the exploitation of mobile agents inthe information integration area, and, in particular, the rolesthey play in enhancing the feature of the Momis infrastructure.Momis (Mediator agent for Integration of Knowledge Sources) is asystem that has been conceived as a pool of tools to provide anintegrated access to heterogeneous information stored intraditional databases (for example relational, object orienteddatabases) or in file systems, as well as in semi-structured datasources (XML-file).In this paper we describe the new agent-based framework concerning the integration process as implemented in Miks (Mediator agent for Integration of Knowledge Sources) system.


2002 - An information integration framework for E-commerce [Articolo su rivista]
I., Benetti; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The Web has transformed electronic information systems from single, isolated nodes into a worldwide network of information exchange and business transactions. In this context, companies have equipped themselves with high-capacity storage systems that contain data in several formats. The problems faced by these companies often emerge because the storage systems lack structural and application homogeneity in addition to a common ontology.The semantic differences generated by a lack of consistent ontology can lead to conflicts that range from simple name contradictions (when companies use different names to indicate the same data concept) to structural incompatibilities (when companies use different models to represent the same information types).One of the main challenges for e-commerce infrastructure designers is information sharing and retrieving data from different sources to obtain an integrated view that can overcome any contradictions or redundancies. Virtual catalogs can help overcome this challenge because they act as instruments to retrieve information dynamically from multiple catalogs and present unified product data to customers. Instead of having to interact with multiple heterogeneous catalogs, customers can instead interact with a virtual catalog in a straightforward, uniform manner.This article presents a virtual catalog project called Momis (mediator environment for multiple information sources). Momis is a mediator-based system for information extraction and integration that works with structured and semistructured data sources. Momis includes a component called the SI-Designer for semiautomatically integrating the schemas of heterogeneous data sources, such as relational, object, XML, or semistructured sources. Starting from local source descriptions, the Global Schema Builder generates an integrated view of all data sources and expresses those views using XML. Momis lets you use the infrastructure with other open integration information systems by simply interchanging XML data files.Momis creates XML global schema using different stages, first by creating a common thesaurus of intra and interschema relationships. Momis extracts the intraschema relationships by using inference techniques, then shares these relationships in the common thesaurus. After this initial phase, Momis enriches the common thesaurus with interschema relationships obtained using the lexical WordNet system (www.cogsci.princeton.edu/wn), which identifies the affinities between interschema concepts on the basis of their lexicon meaning. Momis also enriches the common thesaurus using the Artemis system, which evaluates structural affinities among interschema concepts.


2002 - Semantic Integration and Query Optimization of Heterogeneous Data Sources [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Castano, S; DE ANTONELLIS, V; Ferrara, A; Guerra, Francesco; Mandreoli, Federica; ORNETTI G., C; Vincini, Maurizio
abstract

In modern Internet/Intranet-based architectures, an increasing number of applications requires an integrated and uniform accessto a multitude of heterogeneous and distributed data sources. Inthis paper, we describe the ARTEMIS/MOMIS system for the semantic integration and query optimization of heterogeneous structured and semistructured data sources.


2002 - SI-Web: a Web based interface for the MOMIS project [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; D., Bianco; Guerra, Francesco; Vincini, Maurizio
abstract

The MOMIS project (Mediator envirOnment for MultipleInformation Sources) developed in the past years allows the integration of data from structured and semi-structured data sources. SI-Designer (Source Integrator Designer) is a designer support tool implemented within the MOMIS project for semi-automatic integration of heterogeneous sources schemata. It is a java application where all modules involved are available as CORBA Object and interact using established IDL interfaces. The goal of this demonstration is to present a new tool: SI-Web (Source Integrator on Web), it offers the same features of SI-Designer but it has got the great advantage of being usable onInternet through a web browser.


2002 - The WINK Project for Virtual Enterprise Networking and Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gazzotti, Davide; Gelati, Gionata; Guerra, Francesco; Vincini, Maurizio
abstract

To stay competitive (or sometimes simply to stay) on the market companies and manufacturers more and more often have to join their forces to survive and possibly flourish. Among other solutions, the last decade has experienced the growth and spreading of an original business model called Virtual Enterprise. To manage a Virtual Enterprise modern information systems have to tackle technological issues as networking, integration and cooperation. The WINK project, born form the partnership between University of Modena and Reggio Emilia and Gruppo Formula, addresses these problems. The ultimate goal is to design, implement and finally test on a pilot case (provided by Alenia), the WINK system, as combination of two existing and promising software systems (the WHALES and MIKS systems), to provide the Virtual Enterprise requirement for data integration and cooperation amd management planning.


2001 - Exploiting extensional knowledge for query reformulation and object fusion in a data integration system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Query processing in global information systems integrating multiple heterogeneous sources is a challenging issue in relation to the effective extraction of information available on-line. In this paper we propose intelligent, tool-supported techniques for querying global information systems integrating both structured and semistructured data sources. The techniques have been developed in the environment of a data integration, wrapper/mediator based system, MOMIS, and try to achieve two main goals: optimized query reformulation w.r.t local sources and object fusion, i.e. grouping together information (from the same or different sources) about the same real-world entity. The developed techniques rely on the availability of integrationknowledge, i.e. local source schemata, a virtual mediated schema and its mapping descriptions, that is semantic mappings w.r.t. the underlying sources both at the intensional and extensional level. Mapping descriptions, obtained as a result of the semi-automatic integration process of multiple heterogeneous sources developed for the MOMIS system, include, unlike previous data integration proposals, extensional intra/interschema knowledge. Extensional knowledge is exploited to detect extensionally overlapping classes and to discover implicit join criteria among classes, which enables the goals of optimized query reformulation and object fusion to be achieved.The techniques have been implemented in the MOMIS system but can be applied, in general, to data integration systems including extensional intra/interschema knowledge in mapping descriptions.


2001 - Extensional Knowledge for semantic query optimization in a mediator based system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Mandreoli, Federica
abstract

Query processing in global information systems integrating multiple heterogeneous sources is a challenging issue in relation to the effective extraction of information available on-line. In this paper we propose intelligent, tool-supported techniques for querying global information systems integrating both structured and semistructured data sources. The techniques have been developed in the environment of a data integration, wrapper/mediator based system, MOMIS, and try to achieve the goal of optimized query reformulation w.r.t local sources. The developed techniques rely on the availability of integration knowledge whose semantics is expressed in terms of description logics. Integration knowledge includes local source schemata, a virtual mediated schema and its mapping descriptions, that is semantic mappings w.r.t. the underlying sources both at the intensional and extensional level. Mapping descriptions, obtained as a result of the semi-automatic integration process of multiple heterogeneous sources developed for the MOMIS system, include, unlike previous data integration proposals, extensional intra/interschema knowledge. Extensional knowledge is exploited to perform semantic query optimization in a mediator based system as it allows to devise an optimized query reformulation method. The techniques are under development in the MOMIS system but can be applied, in general, to data integration systems including extensional intra/interschema knowledge in mapping descriptions.


2001 - Semantic Integration of Heterogeneous Information Sources [Articolo su rivista]
Bergamaschi, Sonia; Castano, S.; Vincini, Maurizio; Beneventano, Domenico
abstract

Developing intelligent tools for the integration of information extracted from multiple heterogeneous sources is a challenging issue to effectively exploit the numerous sources available on-line in global information systems. In this paper, we propose intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources. An object-oriented language, with an underlying Description Logic, called ODLI3 , derived from the standard ODMG is introduced for information extraction. ODLI3 descriptions of the source schemas are exploited first to set a Common Thesaurus for the sources. Information integration is then performed in a semiautomatic way by exploiting the knowledge in the Common Thesaurus and ODLI 3 descriptions of source schemas with a combination of clustering techniques and Description Logics. This integration process gives rise to a virtual integrated view of the underlying sources for which mapping rules and integrity constraints are specified to handle heterogeneity. Integration techniques described in the paper are provided in the framework of the MOMIS system based on a conventional wrapper/mediator architecture.


2001 - SI-Designer: a tool for intelligent integration of information [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; I., Benetti; Corni, Alberto; Guerra, Francesco; G., Malvezzi
abstract

SI-Designer (Source Integrator Designer) is a designer supporttool for semi- automatic integration of heterogeneoussources schemata (relational, object and semi structuredsources); it has been implemented within the MOMIS projectand it carries out integration following a semantic approachwhich uses intelligent Description Logics-based techniques,clustering techniques and an extended ODMG-ODL language,ODL-I3, to represent schemata, extracted, integratedinformation. Starting from the sources’ ODL-I3 descriptions(local schemata) SI-Designer supports the designer inthe creation of an integrated view of all the sources (globalschema) which is expressed in the same ODL-I3 language.We propose SI-Designer as a tool to build virtual catalogsin the E-Commerce environment.


2001 - SI-Designer: an Integration Framework for E-Commerce [Relazione in Atti di Convegno]
I., Benetti; Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

Electronic commerce lets people purchase goods and exchange information on business transactions on-line. Therefore one of the main challenges for the designers of the e-commerce infrastructures is the information sharing, retrieving data located in different sources thus obtaining an integrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach as they are conceived as instruments to dynamically retrieve information from multiple catalogs and present product data in a unified manner, without directly storing product data from catalogs.In this paper we propose SI-Designer, a support tool for the integration of data from structured and semi-structured data sources, developed within the MOMIS (Mediator environment for Multiple Information Sources) project.


2001 - The MOMIS approach to information integration [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Guerra, F.; Vincini, M.
abstract

The web explosion, both at internet and intranet level, has transformed the electronic information system from single isolated node to an entry points into a worldwide network of information exchange and business transactions. Business and commerce has taken the opportunity of the new technologies to define the ecommerce activity. Therefore one of the main challenges for the designers of the e-commerce infrastructures is the information sharing, retrieving data located in different sources thus obtaining an integrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach as they are conceived as instruments to dynamically retrieve information from multiple catalogs and present product data in a unified manner, without directly storing product data from catalogs. Customers, instead of having to interact with multiple heterogeneous catalogs, can interact in a uniform way with a virtual catalog. In this paper we propose a designer support tool, called SI-Designer, for information integration developed within the MOMIS project. The MOMIS project (Mediator environment for Multiple Information Sources) aims to integrate data from structured and semi-structured data sources.


2001 - The Momis approach to Information Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Guerra, Francesco; Vincini, Maurizio
abstract

The web explosion, both at internet and intranet level, has transformed the electronic information systemfrom single isolated node to an entry points into a worldwide network of information exchange and businesstransactions. Business and commerce has taken the opportunity of the new technologies to define the ecommerceactivity. Therefore one of the main challenges for the designers of the e-commerceinfrastructures is the information sharing, retrieving data located in different sources thus obtaining anintegrated view to overcome any contradiction or redundancy. Virtual Catalogs synthesize this approach asthey are conceived as instruments to dynamically retrieve information from multiple catalogs and presentproduct data in a unified manner, without directly storing product data from catalogs. Customers, instead ofhaving to interact with multiple heterogeneous catalogs, can interact in a uniform way with a virtual catalog.In this paper we propose a designer support tool, called SI-Designer, for information integration developedwithin the MOMIS project. The MOMIS project (Mediator environment for Multiple Information Sources)aims to integrate data from structured and semi-structured data sources.


2000 - Creazione di una vista globale d'impresa con il sistema MOMIS basato su Description Logics [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; A., Corni; Vincini, Maurizio
abstract

-


2000 - Creazione di una vista globale d'impresa con il sistema MOMIS basato su Description Logics [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Castano; A., Corni; R., Guidetti; G., Malvezzi; M., Melchiori; Vincini, Maurizio
abstract

Sviluppare strumenti intelligenti per l'integrazione di informazioni provenienti da sorgenti eterogenee all'interno di un'impresa è un argomento di forte interesse in ambito di ricerca. In questo articolo proponiamo tecniche basate su strumenti intelligenti per l'estrazione e l'integrazione di informazioni provenienti da sorgenti strutturate e semistrutturate fornite dal sistema MOMIS. Per la descrizione delle sorgenti presenteremo e utilizzeremo il linguaggio object-oriented ODLI3 derivato dallo standard ODMG. Le sorgenti descritte in ODLI3 vengono elaborate in modo da creare un thesaurus delle informazioni condivise tra le sorgenti. L'integrazione delle sorgenti viene poi effettuata in modo semi-automatico elaborando le informazioni che descrivono le sorgenti con tecniche basate su Description Logics e tecniche di clustering generando uno Schema globale che permette la visione integrata virtuale delle sorgenti.


2000 - Fondamenti di Informatica [Monografia/Trattato scientifico]
Beneventano, Domenico; Bergamaschi, Sonia; Claudio, Sartori
abstract

Manuale di fondamenti di programmazione dei calcolatori elettronici e in particolare con l'obiettivo di sviluppare un metodo di soluzione rigoroso di classi diverse di problemi. Particolare accento è posto sui costrutti fondamentali e sulla possibilità di costruire soluzioni basate sul riuso del software.


2000 - Information integration - the MOMIS project demostration [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Castano; Corni, Alberto; G., Guidetti; M., Malvezzi; M., Melchiori; Vincini, Maurizio
abstract

The goal of this demonstration is to present the main features of a Mediator component, Global Schema Builder of an I3 system, called MOMIS (Mediator envirOnment for Multiple Information Sources). MOMIS has been conceived to provide an integrated access to heterogeneous information stored in traditional databases (e.g., relational, object- oriented) or file systems, as well as in semistructured sources. The demonstration is based on the integration of two simple sources of different kind, structured and semi-structured.


2000 - Information integration: The momis project demonstration [Relazione in Atti di Convegno]
Beneventano, D.; Bergamaschi, S.; Castano, S.; Cornil, A.; Guidettil, R.; Malvezzi, G.; Melchiori, M.; Vincini, M.
abstract


2000 - MOMIS: un sistema di Description Logics per l'integrazione del sistema informativo d'impresa [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Corni, Alberto; Vincini, Maurizio
abstract

Taormina


2000 - SI-DESIGNER: un tool di ausilio all'integrazione intelligente di sorgenti di informazione [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Corni, Alberto; R., Guidetti; G., Malvezzi
abstract

SI-Designer (Source Integrator Designer) e' un tool di supporto al progettista per l'integrazione semi-automatica di schemi di sorgenti eterogenee (relazionali, oggetti e semistrutturate). Realizzato nell'ambito del progetto MOMIS, SI-Designer esegue l'integrazione seguendo un approccio semantico che fa uso di tecniche intelligenti basate sulla Description Logics OLCD, di tecniche di clustering e di un linguaggio object-oriented per rappresentare le informazioni estratte ed integrate, ODLII3, derivato dallo standard ODMG. Partendo dalle descrizioni delle sorgenti in ODLII3 (gli schemi locali) SI-Designer assiste il progettista nella creazione di una vista integrata di tutte le sorgenti (schema globale) anch'essa espressa in linguaggio ODLII3.


1999 - Integration of information from multiple sources of textual data. [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

The chapter presents two ongoing projects towards an intelligent integration of information. They adopt a structural and semantic approach TSIMMIS (The Stanford IBM Manager of Multiple Information Sources) and MOMIS (Mediator environment for Multiple Information Sources) respectively. Both projects focus on mediator based information systems. The chapter describes the architecture of a wrapper and how to generate a mediator agent in TSIMMIS. Wrapper agents in TSIMMIS extract informations from a textual source and convert local data into a common data model; the mediator is an integration and refinement tool of data provided by the wrapper agents. In the second project MOMIS a conceptual schema for each source is provided adopting a common standard model and language The MOMIS approach uses a description logic or concept language for knowledge representation to obtain a semiautomatic generation of a common thesaurus. Clustering techniques are used to build the unified schema, i.e. the unified view of the data to be used for query processing in distributed heterogeneous and autonomous databases by a mediator.


1999 - Intelligent Techniques for the Extraction and Integration of Heterogeneous Information [Relazione in Atti di Convegno]
Bergamaschi, Sonia; S., Castano; Vincini, Maurizio; Beneventano, Domenico
abstract

Developing intelligent tools for the integration of informationextracted from multiple heterogeneous sources is a challenging issue to effectively exploit the numerous sources available on-line in global information systems. In this paper, we propose intelligent, tool-supported techniques to information extraction and integration which take into account both structured and semistructured data sources. An object-oriented language called odli3, derived from the standard ODMG, with an underlying Description Logics, is introduced for information extraction. Odli3 descriptions of the information sources are exploited first to set a shared vocabulary for the sources.Information integration is performed in a semi-automatic way, by exploiting odli3 descriptions of source schemas with a combination of Description Logics and clustering techniques. Techniques described in the paper have been implemented in theMOMIS system, based on a conventional mediator architecture.


1999 - ODL-Designer UNISQL: Un'Interfaccia per la Specifica Dichiarativa di Vincoli di Integrità in OODBMS [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; F., Sgarbi; Vincini, Maurizio
abstract

La specifica ed il trattamento dei vincoli di integrita' rappresenta un tema di ricerca fondamentale nell'ambito dellebasi di dati; infatti, spesso, i vincoli costituiscono la partepiu' onerosa nello sviluppo delle applicazioni reali basate suDBMS. L'obiettivo principale del componente software ODL-Designer UNISQL, presentato nel lavoro, e' quello di consentire alprogettista di basi di dati di esprimere i vincoli di integrita'attraverso un linguaggio dichiarativo, superando quindi l'approcciodegli OODBMS attuali che ne consente l'espressione solo attraverso procedure (metodi etrigger). ODL-Designer UNISQL acquisisce vincoli dichiarativi e genera automaticamente, in maniera trasparente al progettista, le ``procedure'' che implementano tali vincoli.Il linguaggio supportato da ODL-Designer UNISQL e' lo standard ODL-ODMG opportunamente esteso per esprimere vincoli di integrita', mentre l'OODBMS commerciale utilizzato e' UNISQL.


1998 - Consistency checking in complex object database schemata with integrity constraints [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

Integrity constraints are rules that should guarantee the integrity of a database. Provided an adequate mechanism to express them is available, the following question arises: Is there any way to populate a database which satisfies the constraints supplied by a database designer? That is, does the database schema, including constraints, admit at least a nonempty model? This work answers the above question in a complex object database environment. providing a theoretical framework, including the following ingredients: 1) two alternative formalisms, able to express a relevant set of state integrity constraints with a declarative style; 2) two specialized reasoners, based on the tableaux calculus, able to check the consistency of complex objects database schemata expressed with the two formalisms. The proposed formalisms share a common kernel, which supports complex objects and object identifiers, and which allow the expression of acyclic descriptions of: classes. nested relations and views, built up by means of the recursive use of record. quantified set. and object type constructors and by the intersection, union, and complement operators. Furthermore, the kernel formalism allows the declarative formulation of typing constraints and integrity rules. In order to improve the expressiveness and maintain the decidability of the reasoning activities. we extend the kernel formalism into two alternative directions. The first formalism, OLCP, introduces the capability of expressing path relations. Because cyclic schemas are extremely useful, we introduce a second formalism, OLCD, with the capability of expressing cyclic descriptions but disallowing the expression of path relations. In fact. we show that the reasoning activity in OLCDP (i.e., OLCP with cycles) is undecidable.


1997 - A semantics-driven query optimizer for OODBs [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Vincini, Maurizio; Beneventano, Domenico
abstract

ODB-QOptimizer is a ODMG 93 compliant tool for the schema validation and semantic query optimization. The approach is based on two fundamental ingredients. The first one is the OCDL description logics (DLs) proposed as a common formalism to express class descriptions, a relevant set of integrity constraints rules (IC rules) and queries. The second one are DLs inference techniques, exploited to evaluate the logical implications expressed by IC rules and thus to produce the semantic expansion of a given query.


1997 - Incoherence and Subsumption for recursive views and queries in Object-Oriented Data Models [Articolo su rivista]
Bergamaschi, Sonia; Beneventano, Domenico
abstract

Elsevier Science B.V. (North- Holland)


1997 - ODB-QOptimizer: a tool for semantic query optimization in OODB [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori; Vincini, Maurizio
abstract

Birmingham, UK


1997 - ODB-QOptimizer: a tool for semantic query optimization in OODB [Software]
Beneventano, Domenico; Bergamaschi, Sonia; Sartori, Claudio; Vincini, Maurizio
abstract

ODB-QOPTIMIZER is a ODMG 93 compliant tool for the schema validation and semantic query optimization.The approach is based on two fundamental ingredients. The first one is the OCDL description logics (DLs) proposed as a common formalism to express class descriptions, a relevant set of integrity constraints rules (IC rules) and queries.The second one are DLs inference techniques, exploited to evaluate the logical implications expressed by IC rules and thus to produce the semantic expansion of a given query.


1997 - ODB-Tools: a description logics based tool for schema validation and semantic query optimization in Object Oriented Databases [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori; Vincini, Maurizio
abstract

LNAI 1321. Roma


1996 - ODB- Reasoner: un ambiente per la verifica di schemi e l’ottimizzazione di interrogazioni in OODB [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; A., Garuti; Vincini, Maurizio; C., Sartori
abstract

S.Miniato. Atti a cura di Fausto Rabitti et al.


1996 - Scoperta di regole per l’ottimizzazione semantica delle interrogazioni [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori
abstract

Ottimizzazione Semantica delle Interrogazioni in ambiente relazionale con riferimento a vincoli di integrità che rappresentano restrizioni e semplici regole sugli attributi. Utilizzo del sistema Explora per la derivazione automatica delle regole da usare nell'ambito del processo di Ottimizzazione Semantica delle Interrogazioni.


1996 - Semantic Query Optimization by Subsumption in OODB [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori
abstract

The purpose of semantic query optimization is to use semantic knowledge (e.g. integrity constraints) for transforming a query into a form that may be answered more efficiently than the original version. This paper proposes a general method for semantic query optimization in the framework of Object Oriented Database Systems. The method is applicable to the class of conjunctive queries and is based on two ingredients: a formalism able to express both class descriptions and integrity constraints rules as types; subsumption computation between types to evaluate the logical implications expressed by integrity constraints rules.


1995 - A semantics-driven query optimizer for OODBs [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori; J. P., Ballerini; Vincini, Maurizio
abstract

Semantic query optimization uses problem-specic knowledge (e.g. integrity constraints) for transforming a query into an equivalentone (i.e., with the same answer set) that may be answered more eciently. The optimizer is applicable to the class conjunctive queries is based on two fundamental ingredients. The first one is the ODL description logics proposed as a common formalism to express: class descriptions, a relevant set of integrity constraintsrules (IC rules), queries as ODL types. The second one are DLs (Description Logics) inference techniques exploited to evaluate the logical implications expressed by IC rules and thus to produce the semantic expansion of a given query. The optimizer tentatively applies all the possible transformations and delays the choice of ben-ecial transformation till the end. Some preliminar ideas on ltering activities on the semantically expanded queryare reported. A prototype semantic queryoptimizer (ODB-QOptimizer) for object-oriented database systems (OODBs) is described.


1995 - Consistency checking in Complex Objects Database schemata with integrity constraints [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

Integrity constraints are rules which should guarantee the integrity of a database.Provided that an adequate mechanism to express them is available, the following question arises: is there any way to populate a database which satisfies the constraints supplied by a designer? i.e., does the database schema, including constraints, admit at least one model in which all classes are non-empty?This work gives an answer to the above question in an OODB environment, providing a Data Definition Language (DDL) able to express the semantics of a relevant set of state constraints and a specialized reasoner able to check the consistency of a schema with such constraints.The choice of the set of constraints expressed in the DDL is motivated by decidability issues.


1995 - FuzzyBase: A Fuzzy Logic Aid for Relational Database Queries [Relazione in Atti di Convegno]
Davide, Gazzotti; L., Piancastelli; Claudio, Sartori; Beneventano, Domenico
abstract

This paper presents a similarity query generator for DBMSs. A user query which turns out to be too restrictive and returns an empty set of rows is relaxed and transformed into a similar one: the resulting set of tuples will resemble, at some degree, the set defined by the original query. The relaxing activity is based on fuzzy logic and the system provides a user interface to express the query, to obtain suggestions on possible search values and to validate, on the basis of semantic integrity rules, the expressed conditions.


1995 - ODBQOptimizer: un ottimizzatore semantico per interrogazioni in OODB [Relazione in Atti di Convegno]
J. P., Ballerini; Beneventano, Domenico; Bergamaschi, Sonia; Vincini, Maurizio
abstract

Atti a cura di Antonio Albano et al.


1995 - Terminological logics for schema design and query processing inOODBs [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

The paper introduces ideas which make feasible and effective the application of Terminological Logic (TL) techniques for schema design and query optimization in Object Oriented Databases (OODBs).


1994 - Constraints in Complex Object Database Models [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

Database design almost invariably includes a specification of a set of rules (the integrity constraints) which should guarantee its consistency. Constraints are expressed in various fashions, depending on the data model, e.g. sub-sets of first order logic, or inclusion dependencies and predicates on row values, or methods in OO environments. Provided that an adequate formalism to express them is available, the following question arises?? Is there any way to populate a database which satisfieses the constraints supplied by a designer? Means of answering to this question should be embedded in automatic design tools, whose use is recommendable or often required in the difficult task of designing complex database schemas. The contribution of this research is to propose a computational solution to the problem of schema consistency in Complex Object Data Models.


1994 - Reasoning with constraints in Database Models [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

Database design almost invariably includes a specification of a set of rules(the integrity constraints) which should guarantee its consistency. Provided that an adequate mechanism to express them is available, the following question arises: is there any way to populate a database which satisfies the constraints supplied by a designer? i.e., does the database schema, including constraints, admit at least one model in which all classes are non-empty? This work gives an answer to the above question in an OODB environment, providing a Data Definition Language (DDL) able to express the semantics of a relevant set of state constraints and a specialized reasoner able to check the consistency of a schema with such constraints.


1994 - Using subsumption for semantic query optimization [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

The purpose of semantic query optimization is to use semantic knowledge (e.g. integrity constraints) for transforming a query into an equivalent one that may be answered more efficiently than the original version. This paper proposes a general method for semantic query optimization in the framework of OODBs(Object Oriented Database Systems). The method is applicable to the class of conjunctive queries and is based on two ingredients: a description logics able to express both class descriptions and integrity constraints rules (IC rules) as types; subsumption computation between types to evaluate the logical implications expressed by IC rules.


1993 - Taxonomic Reasoning in LOGIDATA+ [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori; A., Artale; F., Cesarini; G., Soda
abstract

This chapter introduces the subsumption computation techniques for a LOGIDATA+ schema.


1993 - Taxonomic Reasoning with Cycles in LOGIDATA+ [Capitolo/Saggio]
Beneventano, Domenico; Bergamaschi, Sonia; Sartori, C.
abstract

This chapter shows the subsumption computation techniques for a LOGIDATA+ schema allowing cyclic definitions for classes. The formal framework LOGIDATA_CYC*, which extends LOGIDATA* to perform taxonomic reasoning in the presence of cyclic class definitions is introduced. It includes the notions of possible instances of a schema; legal instance of a schema, defined as the greatest fixed-point of possible instances; subsumption relation. On the basis of this framework, the definitions of coherent type and consistent class are introduced and the necessary algorithms to detect incoherence and compute subsumption in a LOGIDATA+ schema are given. Some examples of subsumption computation show its feasibility for schema design and validation.


1993 - Using Subsumption in Semantic Query Optmization [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

The purpose of semantic query optimization is to use semantic knowledge (e.g. integrity constraints) for transforming a query into an equivalent one that may be answered more efficiently than the original version. This paper proposes a general method for semantic query optimization in the framework of OODBs (Object Oriented Database Systems). The method is applicable to the class of conjunctive queries and is based on two ingredients: a description logic able to express both class descriptions and integrity constraints rules (IC rules) as types; subsumption computation between types to evaluate the logical implications expressed by IC rules.


1993 - Uso della Subsumption per l'Ottimizzazione Semantica delle Queries [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; S., Lodi; C., Sartori
abstract

In questo lavoro si vuole analizzare la possibilità di effettuare l'Ottimizzazione Semantica delle Interrogazioni utilizzando la relazione di subsumption. Il lavoro include una formalizzazione dei modelli dei dati ad oggetti complessi, arricchita con la nozione di subsumption, che individua tutte le relazioni di specilizzazione tra classi di oggetti sulla base delle loro descrizioni.


1992 - Subsumption for Complex Object Data Models [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia
abstract

We adopt a formalism, similar to terminological logic languages developed in AI knowledge representation systems, to express the semantics of complex objects data models. Two main extensions are proposed with respect to previous proposed models: the conjunction operator, which permits the expression multiple inheritance between types (classes) as a semantic property and the introduction in the schema of derived (classes), similar to views. These extensions, together with the adoption of suitable semantics able for dealing with cyclic descriptions, allow for the automatic placement of classes in a specialization hierarchy. Mapping schemata to nondeterministic finite automata we face and solve interesting problems like detection of emptiness of a classextension and computation of a specialization ordering for the greatest, least and descriptive semantics. As queries can be expressed as derived classes these results also apply to intentional query answering and query validation.


1991 - Taxonomic Reasoning in Complex Object Data Models [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori
abstract

We adopt a formalism, similar to terminological logic languages developed in AI knowledge representation systems, to express the semantics of complex objects data models. Two main extensions are proposed with respect to previous proposed models: the conjunction operator, which permits the expression multiple inheritance between types (classes) as a semantic property and the introduction in the schema of derived (classes), similar to views. Then we introduce the notion of subsumption between classes.


1991 - Taxonomic Reasoning in LOGIDATA+ [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; C., Sartori
abstract

This paper introduces the subsumption computation techniques for a LOGIDATA+ schema.