Nuova ricerca

Laura PO

Professore Associato presso: Dipartimento di Ingegneria "Enzo Ferrari"


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2020 - Crime event localization and deduplication [Relazione in Atti di Convegno]
Rollo, Federica; Po, Laura
abstract


2020 - Empirical Evaluation of Linked Data Visualization Tools [Articolo su rivista]
Desimoni, Federico; Po, Laura
abstract

The economic impact of open data in Europe has an estimated value of €140 billions a year between direct and indirect effects. The social impact is also known to be high, as the use of more transparent open data have been enhancing public services and creating new opportunities for citizens and organizations. We are assisting at a staggering growth in the production and consumption of Linked Data (LD). Exploring, visualizing and analyzing LD is a core task for a variety of users in numerous scenarios. This paper deeply analyzes the state of the art of tools for LD visualization. Linked Data visualization aims to provide graphical representations of datasets or of some information of interest selected by a user, with the aim to facilitate their analysis. A complete list of 77 LD visualization tools has been created starting from tools listed in previous surveys or research papers and integrating newer tools recently published online. The visualization tools have been described and compared based on their usability, and their features. A set of goals that LD tools should implement in order to provide clear and convincing visualizations has been defined and 14 tools have been tested on a big LD dataset. The results of this comparison and test led us to define some suggestions for LD consumers in order for them to be able to select the most appropriate tools based on the type of analysis they wish to perform.


2020 - Linked Data Visualization: Techniques, Tools, and Big Data [Monografia/Trattato scientifico]
Po, Laura; Bikakis, Nikos; Desimoni, Federico; Papastefanatos, George
abstract

Linked Data (LD) is nowadays a well established standard for publishing and managing structured information on the Web, gathering and bridging together knowledge from very different scientific and commercial domains. The development of Linked Data Visualization techniques and tools has been followed as the primary means for the analysis of this vast amount of information by data scientists, domain experts, business users and citizens. This book aims at providing an overview of the recent advances in this area, focusing on techniques, tools and use cases of visualization and visual analysis of LD. It presents all necessary preliminary concepts related to the LD technology, the main techniques employed for data visualization based on the characteristics of the underlying data, use cases and tools for LD visualization and finally a thorough assessment of the usability of these tools, under different business scenarios. The goal of this book is to offer interested readers a complete guide on the evolution of LD visualization and empower them to get started with the visual analysis of such data.


2020 - Providing effective visualizations over big linked data [Relazione in Atti di Convegno]
DESIMONI, FEDERICO; PO, Laura
abstract

The number and the size of Linked Data sources are constantly increasing. In some lucky case, the data source is equipped with a tool that guides and helps the user during the exploration of the data, but in most cases, the data are published as an RDF dump through a SPARQL endpoint that can be accessed only through SPARQL queries. Although the RDF format was designed to be processed by machines, there is a strong need for visualization and exploration tools. Data visualizations make big and small linked data easier for the human brain to understand, and visualization also makes it easier to detect patterns, trends, and outliers in groups of data. For this reason, we developed a tool called H-BOLD (Highlevel Visualization over Big Linked Open Data). H-BOLD aims to help the user exploring the content of a Linked Data by providing a high-level view of the structure of the dataset and an interactive exploration that allows users to focus on the connections and attributes of one or more classes. Moreover, it provides a visual interface for querying the endpoint that automatically generates SPARQL queries.


2020 - Real-time data cleaning in traffic sensor networks [Relazione in Atti di Convegno]
Bachechi, Chiara; Rollo, Federica; Po, Laura
abstract


2020 - Semantic Traffic Sensor Data: The TRAFAIR Experience [Articolo su rivista]
Desimoni, Federico; Ilarri, Sergio; Po, Laura; Rollo, Federica; Trillo Lado, Raquel
abstract

Modern cities face pressing problems with transportation systems including, but not limited to, traffic congestion, safety, health, and pollution. To tackle them, public administrations have implemented roadside infrastructures such as cameras and sensors to collect data about environmental and traffic conditions. In the case of traffic sensor data not only the real-time data are essential, but also historical values need to be preserved and published. When real-time and historical data of smart cities become available, everyone can join an evidence-based debate on the city’s future evolution. The TRAFAIR (Understanding Traffic Flows to Improve Air Quality) project seeks to understand how traffic affects urban air quality. The project develops a platform to provide real-time and predicted values on air quality in several cities in Europe, encompassing tasks such as the deployment of low-cost air quality sensors, data collection and integration, modeling and prediction, the publication of open data, and the development of applications for end-users and public administrations. This paper explicitly focuses on the modeling and semantic annotation of traffic data. We present the tools and techniques used in the project and validate our strategies for data modeling and its semantic enrichment over two cities: Modena (Italy) and Zaragoza (Spain). An experimental evaluation shows that our approach to publish Linked Data is effective.


2020 - Using real sensors data to calibrate a traffic model for the city of Modena [Relazione in Atti di Convegno]
BACHECHI, CHIARA; ROLLO, FEDERICA; DESIMONI, FEDERICO; PO, Laura
abstract

In Italy, road vehicles are the preferred mean of transport. Over the last years, in almost all the EU Member States, the passenger car fleet increased. The high number of vehicles complicates urban planning and often results in traffic congestion and areas of increased air pollution. Overall, efficient traffic control is profitable in individual, societal, financial, and environmental terms. Traffic management solutions typically require the use of simulators able to capture in detail all the characteristics and dependencies associated with real-life traffic. Therefore, the realization of a traffic model can help to discover and control traffic bottlenecks in the urban context. In this paper, we analyze how to better simulate vehicle flows measured by traffic sensors in the streets. A dynamic traffic model was set up starting from traffic sensors data collected every minute in about 300 locations in the city of Modena. The reliability of the model is discussed and proved with a comparison between simulated values and real values from traffic sensors. This analysis pointed out some critical issues. Therefore, to better understand the origin of fake jams and incoherence with real data, we approached different configurations of the model as possible solutions.


2020 - Visual analytics for spatio-temporal air quality data [Relazione in Atti di Convegno]
Bachechi, Chiara; Desimoni, Federico; Po, Laura
abstract

Air pollution is the second biggest environmental concern for Europeans after climate change and the major risk to public health. It is imperative to monitor the spatio-temporal patterns of urban air pollution. The TRAFAIR air quality dashboard is an effective web application to empower decision-makers to be aware of the urban air quality conditions, define new policies, and keep monitoring their effects. The architecture copes with the multidimensionality of data and the real-time visualization challenge of big data streams coming from a network of low-cost sensors. Moreover, it handles the visualization and management of predictive air quality maps series that is produced by an air pollution dispersion model. Air quality data are not only visualized at a limited set of locations at different times but in the continuous space-time domain, thanks to interpolated maps that estimate the pollution at un-sampled locations.


2019 - Forecast of the impact by local emissions at an urban micro scale by the combination of Lagrangian modelling and low cost sensing technology: The TRAFAIR project [Relazione in Atti di Convegno]
Bigi, A.; Veratti, G.; Fabbi, S.; Po, L.; Ghermandi, G.
abstract


2019 - From Sensors Data to Urban Traffic Flow Analysis [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Bachechi, Chiara; Corni, Alberto
abstract

By 2050, almost 70% of the population will live in cities. As the population grows, travel demand increases and this might affect air quality in urban areas. Traffic is among the main sources of pollution within cities. Therefore, monitoring urban traffic means not only identifying congestion and managing accidents but also preventing the impact on air pollution. Urban traffic modeling and analysis is part of the advanced traffic intelligent management technologies that has become a crucial sector for smart cities. Its main purpose is to predict congestion states of a specific urban transport network and propose improvements in the traffic network that might result into a decrease of the travel times, air pollution and fuel consumption. This paper describes the implementation of an urban traffic flow model in the city of Modena based on real traffic sensor data. This is part of a wide European project that aims at studying the correlation among traffic and air pollution, therefore at combining traffic and air pollution simulations for testing various urban scenarios and raising citizen awareness about air quality where necessary.


2019 - Implementing an urban dynamic traffic model [Relazione in Atti di Convegno]
Bachechi, C.; Po, L.
abstract

The world of mobility is constantly evolving and proposing new technologies, such as autonomous driving, electromobility, shared-mobility or even new air transport systems. We do not know how people and things will be moving within cities in 30 years, but for sure we know that road network planning and traffic management will remain critical issues. The goal of our research is the implementation of a data-driven micro-simulation traffic model for computing everyday simulations of road traffic in a medium-sized city. A dynamic traffic model is needed in every urban area, we introduce an easy-to-set-up solution for cities that already have traffic sensors installed. Daily traffic flows are created from real data measured by induction loop detectors along the urban roads in Modena. The result of the simulation provides a set of "snapshots" of the traffic flow within the Modena road network every minute. The main contribution of the implemented model is the ability, starting from traffic punctual information on 400 locations, to provide an overview of traffic intensity on more than 800 km of roads.


2019 - TRAFAIR: Understanding Traffic Flow to Improve Air Quality [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Ramòn Rìos Viqueira, Josè; Trillo Lado, Raquel; Bigi, Alessandro; Cacheiro Lòpez, Javier; Paolucci, Michela; Nesi, Paolo
abstract

Environmental impacts of traffic are of major concern throughout many European metropolitan areas. Air pollution causes 400 000 deaths per year, making it first environmental cause of premature death in Europe. Among the main sources of air pollution in Europe, there are road traffic, domestic heating, and industrial combustion. The TRAFAIR project brings together 9 partners from two European countries (Italy and Spain) to develop innovative and sustainable services combining air quality, weather conditions, and traffic flows data to produce new information for the benefit of citizens and government decision-makers. The project is started in November 2018 and lasts two years. It is motivated by the huge amount of deaths caused by the air pollution. Nowadays, the situation is particularly critical in some member states of Europe. In February 2017, the European Commission warned five countries, among which Spain and Italy, of continued air pollution breaches. In this context, public administrations and citizens suffer from the lack of comprehensive and fast tools to estimate the level of pollution on an urban scale resulting from varying traffic flow conditions that would allow optimizing control strategies and increase air quality awareness. The goals of the project are twofold: monitoring urban air quality by using sensors in 6 European cities and making urban air quality predictions thanks to simulation models. The project is co-financed by the European Commission under the CEF TELECOM call on Open Data.


2019 - Traffic analysis in a smart city [Relazione in Atti di Convegno]
Bachechi, C.; Po, L.
abstract

Urbanization is accelerating at a high pace. This places new and critical issues on the transition towards smarter, efficient, livable as well as economically, socially and environmentally sustainable cities. Urban Mobility is one of the toughest challenges. In many cities, existing mobility systems are already inadequate, yet urbanization and increasing populations will increase mobility demand still further. Understanding traffic flows within an urban environment, studying similarities (or dissimilarity) among weekdays, finding the peaks within a day are the first steps towards understanding urban mobility. Following the implementation of a micro-simulation model in the city of Modena based on actual data from traffic sensors, a huge amount of information that describes daily traffic flows within the city were available. This paper reports an in-depth investigation of traffic flows in order to discover trends. Traffic analyzes to compare working days, weekends and to identify significant deviations are performed. Moreover, traffic flows estimations were studied during special days such as weather alert days or holidays to discover particular tendencies. This preliminary study allowed to identify the main critical points in the mobility of the city.


2018 - An Integrated Smart City Platform [Relazione in Atti di Convegno]
Nesi, Paolo; Po, Laura; R. Viqueira, Josè; Trillo Lado, Raquel
abstract

Smart Cities aim to create a higher quality of life for their citizens, improve business services and promote tourism experience. Fostering smart city innovation at local and regional level requires a set of mature technologies to discover, integrate and harmonize multiple data sources and the exposure of eective applications for end-users (citizens, administrators, tourists...). In this context, Semantic Web technologies and Linked Open Data principles provide a means for sharing knowledge about cities as physical, economical, social, and technical systems, enabling the development of smart city services. Despite the tremendous effort these communities have done so far, there exists a lack of comprehensive and effective platforms that handle the entire process of identication, ingestion, consumption and publication of data for Smart Cities. In this paper, a complete open-source platform to boost the integration, semantic enrichment, publication and exploitation of public data to foster smart cities in local and national administrations is proposed. Starting from mature software solutions, we propose a platform to facilitate the harmonization of datasets (open and private, static and dynamic on real time) of the same domain generated by dierent authorities. The platform provides a unied dataset oriented to smart cities that can be exploited to offer services to the citizens in a uniform way, to easily release open data, and to monitor services status of the city in real time by means of a suite of web applications.


2018 - Building an Urban Theft Map by Analyzing Newspaper Crime Reports [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica
abstract

One of the main issues in today's cities is related to public safety, which can be improved by implementing a systematic analysis for identifying and analyzing patterns and trends in crime also called crime mapping. Mapping crime allows police analysts to identify crime hot spots, moreover it increases public confidence and citizen engagement and promotes transparency.This paper is focused on analyzing and mapping thefts through on-line newspaper using text mining techniques for an Italian city.


2018 - Community detection applied on big linked data [Articolo su rivista]
Po, L.; Malvezzi, D.
abstract

The Linked Open Data (LOD) Cloud has more than tripled its sources in just six years (from 295 sources in 2011 to 1163 datasets in 2017). The actual Web of Data contains more then 150 Billions of triples. We are assisting at a staggering growth in the production and consumption of LOD and the generation of increasingly large datasets. In this scenario, providing researchers, domain experts, but also businessmen and citizens with visual representations and intuitive interactions can significantly aid the exploration and understanding of the domains and knowledge represented by Linked Data. Various tools and web applications have been developed to enable the navigation, and browsing of the Web of Data. However, these tools lack in producing high level representations for large datasets, and in supporting users in the exploration and querying of these big sources. Following this trend, we devised a new method and a tool called H-BOLD (High level visualizations on Big Open Linked Data). H-BOLD enables the exploratory search and multilevel analysis of Linked Open Data. It offers different levels of abstraction on Big Linked Data. Through the user interaction and the dynamic adaptation of the graph representing the dataset, it will be possible to perform an effective exploration of the dataset, starting from a set of few classes and adding new ones. Performance and portability of H-BOLD have been evaluated on the SPARQL endpoint listed on SPARQL ENDPOINT STATUS. The effectiveness of H-BOLD as a visualization tool is described through a user study.


2018 - H-BOLD (High level visualizations on Big Open Linked Data) [Software]
PO, Laura; DESIMONI, FEDERICO; MALVEZZI, DAVIDE
abstract


2018 - High-level visualization over big linked data [Relazione in Atti di Convegno]
PO, Laura; Malvezzi, Davide
abstract

The Linked Open Data (LOD) Cloud is continuously expanding and the number of complex and large sources is raising. Understanding at a glance an unknown source is a critical task for LOD users but it can be facilitated by visualization or exploration tools. H-BOLD (High-level visualization over Big Open Linked Data) is a tool that allows users with no a-priori knowledge on the domain nor SPARQL skills to start navigating and exploring Big Linked Data. Users can start from a high-level visualization and then focus on an element of interest to incrementally explore the source, as well as perform a visual query on certain classes of interest. At the moment, 32 Big Linked Data (with more than 500.000 triples) exposing a SPARQL endpoint can be explored by using H-BOLD.


2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.


2017 - Managing Road Safety through the Use of Linked Data and Heat Maps [Relazione in Atti di Convegno]
Colacino, Vincenzo Giuseppe; Po, Laura
abstract

Road traffic injuries are a critical public health challenge that requires valuable efforts for effective and sustainable prevention. Worldwide, an estimated 1.2 million people are killed in road crashes each year and as many as 50 million are injured. An analysis of data provided by authoritative sources can be a valuable source for understanding which are the most critical points on the road network. The aim of this paper is to discover data about road accidents in Italy and to provide useful visualization for improving road safety. Starting from the annual report of road accidents of the Automobile Club of Italy, we transform the original data into an RDF dataset according to the Linked Open Data principles and connect it to external datasets. Then, an integration with Open Street Map allows to display the accident data on a map. Here, the final user is able to identify which road sections are most critical based on the number of deaths, injuries or accidents.


2017 - Topic detection in multichannel Italian newspapers [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Lado, Raquel Trillo
abstract

Nowadays, any person, company or public institution uses and exploits different channels to share private or public information with other people (friends, customers, relatives, etc.) or institutions. This context has changed the journalism, thus, the major newspapers report news not just on its own web site, but also on several social media such as Twitter or YouTube. The use of multiple communication media stimulates the need for integration and analysis of the content published globally and not just at the level of a single medium. An analysis to achieve a comprehensive overview of the information that reaches the end users and how they consume the information is needed. This analysis should identify the main topics in the news flow and reveal the mechanisms of publication of news on different media (e.g. news timeline). Currently, most of the work on this area is still focused on a single medium. So, an analysis across different media (channels) should improve the result of topic detection. This paper shows the application of a graph analytical approach, called Keygraph, to a set of very heterogeneous documents such as the news published on various media. A preliminary evaluation on the news published in a 5 days period was able to identify the main topics within the publications of a single newspaper, and also within the publications of 20 newspapers on several on-line channels.


2016 - Driving Innovation in Youth Policies With Open Data [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

In December 2007, thirty activists held a meeting in California to define the concept of open public data. For the first time eight Open Government Data (OPG) principles were settled; OPG should be Complete, Primary (reporting data at an high level of granularity), Timely, Accessible, Machine processable, Non-discriminatory, Non-proprietary, License-free. Since the inception of the Open Data philosophy there has been a constant increase in information released improving the communication channel between public administrations and their citizens. Open data offers government, companies and citizens information to make better decisions. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources. This paper reports the activities carried on during a research project on Open Data for Youth Policies. The project was devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work. We identified interesting data sources both from the open data community and from the private repositories of local governments related to the Youth Policies. The selected sources have been integrated and, the result of the integration by means of a useful navigator tool have been shown up. In the end, we published new information on the web as Linked Open Data. Since the process applied and the tools used are generic, we trust this paper to be an example and a guide for new projects that aims to create new knowledge through Open Data.


2016 - Exposing the Underlying Schema of LOD Sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The Linked Data Principles defined by Tim-Berners Lee promise that a large portion of Web Data will be usable as one big interlinked RDF database. Today, with more than one thousand of Linked Open Data (LOD) sources available on the Web, we are assisting to an emerging trend in publication and consumption of LOD datasets. However, the pervasive use of external resources together with a deficiency in the definition of the internal structure of a dataset causes many LOD sources are extremely complex to understand. In this paper, we describe a formal method to unveil the implicit structure of a LOD dataset by building a (Clustered) Schema Summary. The Schema Summary contains all the main classes and properties used within the datasets, whether they are taken from external vocabularies or not, and is conceivable as an RDFS ontology. The Clustered Schema Summary, suitable for large LOD datasets, provides a more high level view of the classes and the properties used by gathering together classes that are object of multiple instantiations.


2015 - Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura
abstract

We propose a plot-based recommendation system, which is based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database. Our system is independent from the number of user ratings, thus it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. The system implements and compares the two Topic Models, Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA), on a movie database of two hundred thousand plots that has been constructed by integrating different movie databases in a local NoSQL (MongoDB) DBMS. The topic models behaviour has been examined on the basis of standard metrics and user evaluations, performance ssessments with 30 users to compare our tool with a commercial system have been conducted.


2015 - LODeX: A tool for Visual Querying Linked Open Data [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

Formulating a query on a Linked Open Data (LOD) source is not an easy task; a technical knowledge of the query language, and, the awareness of the structure of the dataset are essential to create a query. We present a revised version of LODeX that provides the user an easy way for building queries in a fast and interactive manner. When a user decides to explore a LOD source, he/she can take advantage of the Schema Summary produced by LODeX (i.e. a synthetic view of the dataset’s structure) and he/she can pick graphical elements from it to create a visual query. The tool also supports the user in browsing the results and, eventually, in refining the query. The prototype has been evaluated on hundreds of public SPARQL endpoints (listed in Data Hub) and it is available online at http://dbgroup.unimo.it/lodex2. A survey conducted on 27 users has demonstrated that our tool can effectively support both unskilled and skilled users in exploring and querying LOD datasets.


2015 - Open Data for Improving Youth Policies [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

The Open Data \textit{philosophy} is based on the idea that certain data should be made ​​available to all citizens, in an open form, without any copyright restrictions, patents or other mechanisms of control. Various government have started to publish open data, first of all USA and UK in 2009, and in 2015, the Open Data Barometer project (www.opendatabarometer.org) states that on 77 diverse states across the world, over 55 percent have developed some form of Open Government Data initiative. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources.This paper reports the activities carried on during a one-year research project on Open Data for Youth Policies. The project was mainly devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work.The project goals were: to identify interesting data sources both from the open data community and from the private repositories of local governments of Emilia Romagna region related to the Youth Policies; to integrate them and, to show up the result of the integration by means of a useful navigator tool; in the end, to publish new information on the web as Linked Open Data. This paper also reports the main issues encountered that may seriously affect the entire process of consumption, integration till the publication of open data.


2015 - Visual Querying LOD sources with LODeX [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The Linked Open Data (LOD) Cloud has more than tripled its sources in just three years (from 295 sources in 2011 to 1014 in 2014). While the LOD data are being produced at a increasing rate, LOD tools lack in producing an high level representation of datasets and in supporting users in the exploration and querying of a source. To overcome the above problems and significantly increase the number of consumers of LOD data, we devised a new method and a tool, called LODeX, that promotes the understanding, navigation and querying of LOD sources both for experts and for beginners. It also provides a standardized and homogeneous summary of LOD sources and supports user in the creation of visual queries on previously unknown datasets. We have extensively evaluated the portability and usability of the tool. LODeX have been tested on the entire set of datasets available at Data Hub, i.e. 302 sources. In this paper, we showcase the usability evaluation of the different features of the tool (the Schema Summary representation and the visual query building) obtained on 27 users (comprising both Semantic Web experts and beginners).


2014 - A Visual Summary for Linked Open Data sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

In this paper we propose LODeX, a tool that produces a representative summary of a Linked open Data (LOD) source starting from scratch, thus supporting users in exploring and understanding the contents of a dataset. The tool takes in input the URL of a SPARQL endpoint and launches a set of predefined SPARQL queries, from the results of the queries it generates a visual summary of the source. The summary reports statistical and structural information of the LOD dataset and it can be browsed to focus on particular classes or to explore their properties and their use. LODeX was tested on the 137 public SPARQL endpoints contained in Data Hub (formerly CKAN), one of the main Open Data catalogues. The statistical and structural information of the 107 well performed extractions are collected and available in the online version of LODeX (http://dbgroup.unimo.it/lodex).


2014 - Comparing Topic Models for a Movie Recommendation System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

Recommendation systems have become successful at suggesting content that are likely to be of interest to the user, however their performance greatly suffers when little information about the users preferences are given. In this paper we propose an automated movie recommendation system based on the similarity of movie: given a target movie selected by the user, the goal of the system is to provide a list of those movies that are most similar to the target one, without knowing any user preferences. The Topic Models of Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA) have been applied and extensively compared on a movie database of two hundred thousand plots. Experiments are an important part of the paper; we examined the topic models behaviour based on standard metrics and on user evaluations, we have conducted performance assessments with 30 users to compare our approach with a commercial system. The outcome was that the performance of LSA was superior to that of LDA in supporting the selection of similar plots. Even if our system does not outperform commercial systems, it does not rely on human effort, thus it can be ported to any domain where natural language descriptions exist. Since it is independent from the number of user ratings, it is able to suggest famous movies as well as old or unheard movies that are still strongly related to the content of the video the user has watched.


2014 - LODeX: A visualization tool for Linked Open Data navigation and querying. [Software]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

We present LODeX, a tool for visualizing, browsing and querying a LOD source starting from the URL of its SPARQL endpoint. LODeX creates a visual summary for a LOD dataset and allows users to perfor queries on it. Users can select the classes of interest for discovering which instances are stored in the LOD source without any knowledge of the underlying vocabulary used for describing data. The tool couples the overall view of the LOD source with the preview of the instances so that the user can easily build and refine his/her query. The tool has been evaluated on hundreds of public SPARQL endpoints (listed in Data Hub). The schema summaries of 40 LOD sources are stored and available for online querying at http://dbgroup.unimo.it/lodex2.


2014 - Online Index Extraction from Linked Open Data Sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The production of machine-readable data in the form of RDF datasets belonging to the Linked Open Data (LOD) Cloud is growing very fast. However, selecting relevant knowledge sources from the Cloud, assessing the quality and extracting synthetical information from a LOD source are all tasks that require a strong human effort. This paper proposes an approach for the automatic extraction of the more representative information from a LOD source and the creation of a set of indexes that enhance the description of the dataset. These indexes collect statistical information regarding the size and the complexity of the dataset (e.g. the number of instances), but also depict all the instantiated classes and the properties among them, supplying user with a synthetical view of the LOD source. The technique is fully implemented in LODeX, a tool able to deal with the performance issues of systems that expose SPARQL endpoints and to cope with the heterogeneity on the knowledge representation of RDF data. An evaluation on LODeX on a large number of endpoints (244) belonging to the LOD Cloud has been performed and the effectiveness of the index extraction process has been presented.


2013 - An iPad Order Management System for Fashion Trade [Relazione in Atti di Convegno]
I., Baroni; Bergamaschi, Sonia; Po, Laura
abstract

The fashion industry loves the new tablets. In 2011 we noted a 38% growth of e-commerce in the italian fashion industry. A large number of brands have understood the value of mobile devices as the key channel for consumer communication. The interest of brands in applications of mobile marketing and services have made a big step forward, with an increase of 129% in 2011 (osservatori.net, 2012). This paper presents a mobile version of the Fashion OMS (Order Management System) web application. Fashion Touch is a mobile application that allows clients and company’s sales networks to process commercial orders, consult the product catalog and manage customers as the OMS web version does with the added functionality of the off-line order entering mode. To develop an effective mobile App, we started by analyzing the new web technologies for mobile applications (HTML5, CSS3, Ajax) and their relative development frameworks making a comparison with the Apple’s native programming language. We selected Titanium, a multi-platform framework for native mobile and desktop devices application development via web technologies as the best framework for our purpose. We faced issues concerning the network synchronization and studied different database solutions depending on the device hardware characteristics and performances. This paper reports every aspect of the App development until the publication on the Apple Store.


2012 - A meta-language for MDX queries in eLog Business Solution [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Interlandi, Matteo; Mario, Longo; Po, Laura; Vincini, Maurizio
abstract

The adoption of business intelligence technologyin industries is growing rapidly. Business managers are notsatisfied with ad hoc and static reports and they ask for moreflexible and easy to use data analysis tools. Recently, applicationinterfaces that expand the range of operations available to theuser, hiding the underlying complexity, have been developed. Thepaper presents eLog, a business intelligence solution designedand developed in collaboration between the database group ofthe University of Modena and Reggio Emilia and eBilling, anItalian SME supplier of solutions for the design, production andautomation of documentary processes for top Italian companies.eLog enables business managers to define OLAP reports bymeans of a web interface and to customize analysis indicatorsadopting a simple meta-language. The framework translates theuser’s reports into MDX queries and is able to automaticallyselect the data cube suitable for each query.Over 140 medium and large companies have exploited thetechnological services of eBilling S.p.A. to manage their documentsflows. In particular, eLog services have been used by themajor media and telecommunications Italian companies and theirforeign annex, such as Sky, Mediaset, H3G, Tim Brazil etc. Thelargest customer can provide up to 30 millions mail pieces within6 months (about 200 GB of data in the relational DBMS). In aperiod of 18 months, eLog could reach 150 millions mail pieces(1 TB of data) to handle.


2012 - A non-intrusive movie recommendation system [Relazione in Atti di Convegno]
Farinella, Tania; Bergamaschi, Sonia; Po, Laura
abstract

Several recommendation systems have been developed to support the user in choosing an interesting movie from multimedia repositories. The widely utilized collaborative-filtering systems focus on the analysis of user profiles or user ratings of the items. However, these systems decrease their performance at the start-up phase and due to privacy issues, when a user hides most of his personal data. On the other hand, content-based recommendation systems compare movie features to suggest similar multimedia contents; these systems are based on less invasive observations, however they find some difficulties to supply tailored suggestions. In this paper, we propose a plot-based recommendation system, which is based upon an evaluation of similarity among the plot of a video that was watched by the user and a large amount of plots that is stored in a movie database. Since it is independent from the number of user ratings, it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. We experimented different methodologies to compare natural language descriptions of movies (plots) and evaluated the Latent Semantic Analysis (LSA) to be the superior one in supporting the selection of similar plots. In order to increase the efficiency of LSA, different models have been experimented and in the end, a recommendation system that is able to compare about two hundred thousands movie plots in less than a minute has been developed.


2011 - Automatic Normalization and Annotation for Discovering Semantic Mappings [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Po, Laura; Sorrentino, Serena
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.


2011 - Automatic generation of probabilistic relationships for improving schema matching [Articolo su rivista]
Po, Laura; Sorrentino, Serena
abstract

Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the ‘‘hidden meaning’’ associated with schema labels (i.e.class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering probabilistic lexical relationships in the environment of data integration ‘‘on the fly’’. Our method is based on a probabilistic lexical annotation technique, which automatically associates one or more meanings with schema elements w.r.t. a thesaurus/ lexical resource. However, the accuracy of automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and abbreviations.We address this problem by including a method to perform schema label normalization which increases the number of comparable labels. From the annotated schemata, we derive the probabilistic lexical relationships to be collected in the Probabilistic CommonThesaurus. The method is applied within the MOMIS data integration system but can easily be generalized to other data integration systems.


2011 - The Open Source release of the MOMIS Data Integration System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Corni, Alberto; Entela, Kazazi; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

MOMIS (Mediator EnvirOnment for Multiple InformationSources) is an Open Source Data Integration system able to aggregate data coming from heterogeneous data sources (structured and semistructured) in a semi-automatic way. DataRiver3 is a Spin-Off of the University of Modena and Reggio Emilia that has re-engineered the MOMIS system, and released its Open Source version both for commercial and academic use. The MOMIS system has been extended with a set of features to minimize the integration process costs, exploiting the semantics of the data sources and optimizing each integration phase.The Open Source MOMIS system have been successfully applied in several industrial sectors: Medical, Agro-food, Tourism, Textile, Mechanical, Logistics. This paper describes the features of the Open Source MOMIS system and how it is able to address real data integration challenges.


2011 - Using semantic techniques to access web data [Articolo su rivista]
Raquel, Trillo; Po, Laura; Sergio, Ilarri; Bergamaschi, Sonia; Eduardo, Mena
abstract

Nowadays, people frequently use different keyword-based web search engines to find the information they need on the web. However, many words are polysemous and, when these words are used to query a search engine, its output usually includes links to web pages referring to their different meanings. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the users, especially if the user-intended meanings behind the input keywords are not among the most popular on the web. In this paper, we propose a set of semantics techniques to group the results provided by a traditional search engine into categories defined by the different meanings of the input keywords. Differently from other proposals, our method considers the knowledge provided by ontologies available on the web in order to dynamically define the possible categories. Thus, it is independent of the sources providing the results that must be grouped. Our experimental results show the interest of the proposal.


2010 - Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher [Relazione in Atti di Convegno]
Po, Laura; Bergamaschi, Sonia
abstract

This paper proposes lexical annotation as an effective method to solve the ambiguity problems that affect ontology matchers. Lexical annotation associates to each ontology element a set of meanings belonging to a semantic resource. Performing lexical annotation on theontologies involved in the matching process allows to detect false positive mappings and to enrich matching results by adding new mappings (i.e. lexical relationships between elements on the basis of the semanticrelationships holding among meanings).The paper will go through the explanation of how to apply lexical annotation on the results obtained by a matcher. In particular, the paper shows an application on the SCARLET matcher.We adopt an experimental approach on two test cases, where SCARLET was previously tested, to investigate the potential of lexical annotation. Experiments yielded promising results, showing that lexical annotationimproves the precision of the matcher.


2010 - Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher (Extended Abstract) [Relazione in Atti di Convegno]
Po, Laura
abstract

This paper proposes lexical annotation as an effective method to solve the ambiguity problems that affect ontology matchers. Lexical annotation associates to each ontology element a set of meanings belonging to a semantic resource. Performing lexical annotation on the ontologies involved in the matching process allows to detect false positive mappings and to enrich matching results by adding new mappings (i.e. lexical relationships between elements on the basis of the semantic relationships holding among meanings).The paper will go through the explanation of how to apply lexical annotation on a matcher. In particular, we show an application on the SCARLET matcher.Experiments yielded promising results, showing that lexical annotation improves the precision of the matcher.


2010 - Schema Label Normalization for Improving Schema Matching [Articolo su rivista]
Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels.However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.


2010 - Uncertainty in data integration systems: automatic generation of probabilistic relationships [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

This paper proposes a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is based on probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. to perform annotation w.r.t. a thesaurus/lexical resource) the schemata of a given set of data sources to be integrated. From the annotated schemata and the relathionships defined in the thesaurus, we derived the probabilistic lexical relationships among schema elements. Lexical relationships are collected in the Probabilistic Common Thesaurus (PCT), as well as structural relationships.


2009 - ALA: Dealing with Uncertainty in Lexical Annotation [Software]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value. By performing probabilistic lexical annotation, we discover probabilistic inter-sources lexical relationships among schema elements. ALA extends the lexical annotation module of the MOMIS data integration system. However, it may be applied in general in the context of schema mapping discovery, ontology merging and data integration system and it is particularly suitable for performing “on-the-fly” data integration or probabilistic ontology matching.


2009 - An Ontology-Based Data Integration System for Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sala, Antonio; Sorrentino, Serena
abstract

Data integration is the problem of combining data residing at distributed heterogeneous sources, including multimedia sources, and providing the user with a unified view of these data. Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources [16]. Ontologies, with respect to the integration of data sources, can be used for the identification and association of semantically correspond- ing information concepts, i.e. for the definition of semantic mappings among concepts of the information sources. MOMIS is a Data Integration System which performs in-formation extraction and integration from both structured and semi- structured data sources [6]. In [5] MOMIS was extended to manage “traditional” and “multimedia” data sources at the same time. STASIS is a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics [1]. Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been pro-posed [7]. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration for data and multimedia sources.


2009 - DataRiver [Spin Off]
Bergamaschi, Sonia; Orsini, Mirko; Beneventano, Domenico; Sala, Antonio; Corni, Alberto; Po, Laura; Sorrentino, Serena; Quix, Srl
abstract


2009 - Dealing with Uncertainty in Lexical Annotation [Articolo su rivista]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value. By performing probabilistic lexical annotation, we discover probabilistic inter-sources lexical relationships among schema elements. ALA extends the lexical annotation module of the MOMIS data integration system. However, it may be applied in general in the context of schema mapping discovery, ontology merging and data integration system and it is particularly suitable for performing “on-the-fly” data integration or probabilistic ontology matching.


2009 - Lexical Knowledge Extraction: an Effective Approach to Schema and Ontology Matching [Relazione in Atti di Convegno]
Po, Laura; Sorrentino, Serena; Bergamaschi, Sonia; Beneventano, Domenico
abstract

This paper’s aim is to examine what role Lexical Knowledge Extraction plays in data integration as well as ontology engineering.Data integration is the problem of combining data residing at distributed heterogeneous sources, and providing the user with a unified view of these data; a common and important scenario in data integration are structured or semi-structure data sources described by a schema.Ontology engineering is a subfield of knowledge engineering that studies the methodologies for building and maintaining ontologies. Ontology engineering offers a direction towards solving the interoperability problems brought about by semantic obstacles, such as the obstacles related to the definitions of business terms and software classes. In these contexts where users are confronted with heterogeneous information it is crucial the support of matching techniques. Matching techniques aim at finding correspondences between semantically related entities of different schemata/ontologies.Several matching techniques have been proposed in the literature based on different approaches, often derived from other fields, such as text similarity, graph comparison and machine learning.This paper proposes a matching technique based on Lexical Knowledge Extraction: first, an Automatic Lexical Annotation of schemata/ontologies is performed, then lexical relationships are extracted based on such annotations.Lexical Annotation is a piece of information added in a document (book, online record, video, or other data), that refers to a semantic resource such as WordNet. Each annotation has the property to own one or more lexical descriptions. Lexical annotation is performed by the Probabilistic Word Sense Disambiguation (PWSD) method that combines several disambiguation algorithms.Our hypothesis is that performing lexical annotation of elements (e.g. classes and properties/attributes) of schemata/ontologies makes the system able to automatically extract the lexical knowledge that is implicit in a schema/ontology and then to derive lexical relationships between the elements of a schema/ontology or among elements of different schemata/ontologies.The effectiveness of the method presented in this paper has been proven within the data integration system MOMIS.


2009 - Schema Normalization for Improving Schema Matching [Relazione in Atti di Convegno]
Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.


2009 - Semantic Access to Data from the Web [Relazione in Atti di Convegno]
Raquel, Trillo; Po, Laura; Sergio, Ilarri; Bergamaschi, Sonia; Eduardo, Mena
abstract

There is a great amount of information available on the web. So, users typically use different keyword-based web search engines to find the information they need. However, many words are polysemous and therefore the output of the search engine will include links to web pages referring to different meanings of the keywords. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the user, specially if the meanings behind the input keywords are not among the most popular in the web. In this paper, we propose a semantics-based approach to group the results returned to the user in clusters defined by the different meanings of the input keywords. Differently from other proposals, our method considers the knowledge provided by a pool of ontologies available on the Web in order to dynamically define the different categories (or clusters). Thus, it is independent of the sources providing the results that must be grouped.


2009 - The MOMIS-STASIS approach for Ontology-Based Data Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources. Ontologies can be used in an integration task to describe the semantics of the information sources and to make the contents explicit. With respect to the integration of data sources, they can be used for the identification and association of semantically corresponding information concepts, i.e. for the definition of semantic mapping among concepts of the information sources. MOMIS is a Data Integration System which performs information extraction and integration from both structured and semi-structured data sources. The goal of the STASIS project is to create a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics.Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been proposed. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration.


2008 - Automatic annotation for mapping discovery in data integration systems (Extended abstract) [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

In this article we present CWSD (Combined Word Sense Disambiguation) a method and a software tool for enabling automatic lexical annotation of local (structured and semi-structured) data sources in a data integration system. CWSD is based on the exploitation of WordNet Domains and the lexical and structural knowledge of the data sources. The method extends the semi-automatic lexical annotation module of the MOMIS data integration system. The distinguishing feature of the method is its independence or low dependence of a human intervention. CWSD is a valid method to satisfy two important tasks: (1) the source lexical annotation process, i.e. the operation of associating an element of a lexical reference database (WordNet) to all source elements, (2) the discover of mappings among concepts of distributed data sources/ontologies.


2008 - Improving Data Integration through Disambiguation Techniques [Relazione in Atti di Convegno]
Po, Laura
abstract

In this paper Word Sense Disambiguation (WSD) issue in the context of data integration is outlined and an Approximate Word Sense Disambiguation approach (AWSD) is proposed for the automatic lexical annotation of structured and semi-structured data sources.


2008 - Open Source come modello di business per le PMI: analisi critica e casi di studio [Capitolo/Saggio]
Bergamaschi, Sonia; Nigro, Francesco; Po, Laura; Vincini, Maurizio
abstract

Il software Open Source sta attirando l'attenzione a tutti i livelli, sia all'interno del mondo economico che produttivo, perché propone un nuovo modello di sviluppo tecnologico ed economico fortemente innovativo e di rottura con il passato.In questo elaborato verranno analizzate le ragioni che stanno determinando il successo di tale modello e verranno inoltre presentate alcune casistiche in cui l'Open Source risulta vantaggioso, evidenziando gli aspetti più interessanti sia per gli utilizzatori che per i produttori del software.


2007 - Automatic annotation in data integration systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

We propose a CWSD (Combined Word Sense Disambiguation) algorithm for the automatic annotation of structured and semi-structured data sources. Rather than being targeted to textual data sources like most of the traditional WSD algorithms found in the literature, our algorithm can exploit information coming from the structure of the sources together with the lexical knowledge associated with the terms (elements of the schemata).


2007 - Automatic annotation of local data sources for data integration systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sala, Antonio; Sorrentino, Serena
abstract

In this article we present CWSD (Combined Word Sense Disambiguation) a method and a software tool for enabling automatic annotation of local structured and semi-structured data sources, with lexical information, in a data integration system. CWSD is based on the exploitation of WordNet Domains, structural knowledge and on the extension of the lexical annotation module of the MOMIS data integration system. The distinguishing feature of the algorithm is its low dependence of a human intervention. Our approach is a valid method to satisfy two important tasks: (1) the source annotation process, i.e. the operation of associating an element of a lexical reference database (WordNet) to all source elements, (2) the discover of mappings among concepts of distributed data sources/ontologies.


2007 - MELIS: An Incremental Method For The Lexical Annotation Of Domain Ontologies [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELISis its incrementality: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of MELIS as a standalone tool and as a component integrated in MOMIS.


2007 - MELIS: a tool for the incremental annotation of domain ontologies [Software]
Bergamaschi, Sonia; Paolo, Bouquet; Daniel, Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Melis is a software tool for enablingan incremental process of automatic annotation of local schemas (e.g. re-lational database schemas, directory trees) with lexical information. Thedistinguishing and original feature of MELIS is its incrementality: thehigher the number of schemas which are processed, the more back-ground/domain knowledge is cumulated in the system (a portion of do-main ontology is learned at every step), the better the performance ofthe systems on annotating new schemas.


2007 - Melis: an incremental method for the lexical annotation of domain ontologies [Articolo su rivista]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS.


2006 - An incremental method for meaning elicitation of a domain ontology [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Internet has opened the access to an overwhelming amount of data, requiring the development of new applications to automatically recognize, process and manage informationavailable in web sites or web-based applications. The standardSemantic Web architecture exploits ontologies to give a shared(and known) meaning to each web source elements.In this context, we developed MELIS (Meaning Elicitation and Lexical Integration System). MELIS couples the lexical annotation module of the MOMIS system with some components from CTXMATCH2.0, a tool for eliciting meaning from severaltypes of schemas and match them. MELIS uses the MOMIS WNEditor and CTXMATCH2.0 to support two main tasks in theMOMIS ontology generation methodology: the source annotationprocess, i.e. the operation of associating an element of a lexicaldatabase to each source element, and the extraction of lexicalrelationships among elements of different data sources.