Laura PO - personale UniMoRe

Nuova ricerca

Laura PO

Professore Associato
Dipartimento di Ingegneria "Enzo Ferrari"

Pubblicazioni

2024 - A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization [Articolo su rivista]
Rollo, Federica; Bonisoli, Giovanni; Po, Laura
abstract

2024 - HypeAIR: A novel framework for real-time low-cost sensor calibration for air quality monitoring in smart cities [Articolo su rivista]
Bachechi, Chiara; Rollo, Federica; Po, Laura
abstract

2024 - MitH: A framework for Mitigating Hygroscopicity in low-cost PM sensors [Articolo su rivista]
Casari, Martina; Po, Laura
abstract

2023 - AirMLP - SPS30 low-cost sensors and Tecora reference station PM 2.5 data [Banca dati]
Casari, Martina; Po, Laura
abstract

Dataset related to a study conducted in Turin, Italy, involving low-cost laser-scattering SPS30 sensors placed by Wiseair SRL and a Tecora reference station placed by Arpa Piemonte (Italian Air Quality Agency). This dataset spans two different time periods in 2022, specifically from March 1, 2022, to April 29, 2022, and from October 26, 2022, to December 30, 2022. The data in this dataset pertains to the mass concentration of PM2.5 (particulate matter with a diameter of 2.5 micrometres or less).

2023 - AirMLP: A Multilayer Perceptron Neural Network for Temporal Correction of PM2.5 Values in Turin [Articolo su rivista]
Casari, Martina; Po, Laura; Zini, Leonardo
abstract

2023 - AirMLP: PM 2.5 Correction MLP Network - Source Code v1.0.1 [Software]
Casari, Martina; Zini, Leonardo; Po, Laura
abstract

This repository houses a Multilayer Perceptron (MLP) network designed to correct PM 2.5 data affected by hygroscopicity, gathered from low-cost SPS30 sensors.

2023 - Anomaly Detection and Repairing for Improving Air Quality Monitoring [Articolo su rivista]
Rollo, F.; Bachechi, C.; Po, L.
abstract

Clean air in cities improves our health and overall quality of life and helps fight climate change and preserve our environment. High-resolution measures of pollutants’ concentrations can support the identification of urban areas with poor air quality and raise citizens’ awareness while encouraging more sustainable behaviors. Recent advances in Internet of Things (IoT) technology have led to extensive use of low-cost air quality sensors for hyper-local air quality monitoring. As a result, public administrations and citizens increasingly rely on information obtained from sensors to make decisions in their daily lives and mitigate pollution effects. Unfortunately, in most sensing applications, sensors are known to be error-prone. Thanks to Artificial Intelligence (AI) technologies, it is possible to devise computationally efficient methods that can automatically pinpoint anomalies in those data streams in real time. In order to enhance the reliability of air quality sensing applications, we believe that it is highly important to set up a data-cleaning process. In this work, we propose AIrSense, a novel AI-based framework for obtaining reliable pollutant concentrations from raw data collected by a network of low-cost sensors. It enacts an anomaly detection and repairing procedure on raw measurements before applying the calibration model, which converts raw measurements to concentration measurements of gasses. There are very few studies of anomaly detection in raw air quality sensor data (millivolts). Our approach is the first that proposes to detect and repair anomalies in raw data before they are calibrated by considering the temporal sequence of the measurements and the correlations between different sensor features. If at least some previous measurements are available and not anomalous, it trains a model and uses the prediction to repair the observations; otherwise, it exploits the previous observation. Firstly, a majority voting system based on three different algorithms detects anomalies in raw data. Then, anomalies are repaired to avoid missing values in the measurement time series. In the end, the calibration model provides the pollutant concentrations. Experiments conducted on a real dataset of 12,000 observations produced by 12 low-cost sensors demonstrated the importance of the data-cleaning process in improving calibration algorithms’ performances.

2023 - CEM: an Ontology for Crime Events in Newspaper Articles [Relazione in Atti di Convegno]
Rollo, Federica; Po, Laura; Castellucci, Alessandro
abstract

2023 - DICE: a Dataset of Italian Crime Event news [Relazione in Atti di Convegno]
Bonisoli, Giovanni; Po, Laura; Po, Laura; Rollo, Federica
abstract

2023 - DICE: a Dataset of Italian Crime Event news [Banca dati]
Rollo, Federica; Bonisoli, Giovanni; Po, Laura
abstract

DICE is a collection of 10,395 Italian news articles describing 13 types of crime events that happened in the province of Modena, Italy, between the end of 2011 and 2021.

2023 - Italian FastText models [Software]
Rollo, Federica; Bonisoli, Giovanni; Po, Laura
abstract

2023 - Italian GloVe models [Software]
Rollo, Federica; Bonisoli, Giovanni; Po, Laura
abstract

2023 - Italian Word2Vec models [Software]
Rollo, Federica; Bonisoli, Giovanni; Po, Laura
abstract

2023 - Mitigating the Impact of Humidity on Low-Cost PM Sensors [Relazione in Atti di Convegno]
Casari, Martina; Po, Laura
abstract

This preliminary study, conducted in Italy, aims to investigate the potential of growth functions and multi-layer perceptron neural networks (MLP NN) in reducing the impact of humidity on low-cost particulate matter (PM) sensors, with a focus on the sustainability of low-cost sensors compared to reference stations. All over the world, low-cost sensors are increasingly being used for air quality monitoring due to their cost-effectiveness and portability. However, low-cost sensors are susceptible to high humidity, which can lead to inaccurate measurements due to their hygroscopic property. This issue is particularly relevant in Italy, where many cities such as Rome, Milan, Naples, and Turin experience high mean relative humidity levels (>70%) for most months of the year. To improve data quality and gain useful data for quantitative analysis, techniques must be developed to reduce the impact of humidity on the final data. The sensors used in this study were placed in proximity to a reference station, solely for validation purposes in the case of corrective functions and involved in the training phase in the case of MLP NN.

2023 - Modeling Event-Centric Knowledge Graph for Crime Analysis on Online News [Capitolo/Saggio]
Rollo, F.; Po, L.
abstract

2022 - Big Data Analytics and Visualization in Traffic Monitoring [Articolo su rivista]
Bachechi, Chiara; Po, Laura; Rollo, Federica
abstract

This paper presents a system that employs information visualization techniques to analyze urban traffic data and the impact of traffic emissions on urban air quality. Effective visualizations allow citizens and public authorities to identify trends, detect congested road sections at specific times, and perform monitoring and maintenance of traffic sensors. Since road transport is a major source of air pollution, also the impact of traffic on air quality has emerged as a new issue that traffic visualizations should address. Trafair Traffic Dashboard exploits traffic sensor data and traffic flow simulations to create an interactive layout focused on investigating the evolution of traffic in the urban area over time and space. The dashboard is the last step of a complex data framework that starts from the ingestion of traffic sensor observations, anomaly detection, traffic modeling, and also air quality impact analysis. We present the results of applying our proposed framework on two cities (Modena, in Italy, and Santiago de Compostela, in Spain) demonstrating the potential of the dashboard in identifying trends, seasonal events, abnormal behaviors, and understanding how urban vehicle fleet affects air quality. We believe that the framework provides a powerful environment that may guide the public decision-makers through effective analysis of traffic trends devoted to reducing traffic issues and mitigating the polluting effect of transportation.

2022 - Crime Event Model [Software]
Rollo, Federica; Po, Laura; Castellucci, Alessandro
abstract

2022 - Detection and Classification of Sensor Anomalies for Simulating Urban Traffic Scenarios [Articolo su rivista]
Bachechi, Chiara; Rollo, Federica; Po, Laura
abstract

2022 - GIS-Based Geospatial Data Analysis: the Security of Cycle Paths in Modena [Relazione in Atti di Convegno]
Bachechi, Chiara; Po, Laura; Degliangeli, Federico
abstract

The use of fossil fuels is contributing to the global climate crisis and is threatening the sustainability of the planet. Bicycles are a vital component of the solution, as they can help mitigate the effects of climate change and improve the quality of life for all. However, cities need to be equipped with the necessary infrastructure to support their use guaranteeing safety for cyclists. Moreover, cyclists should plan their route considering the level of security associated with the different available options to reach their destination. The paper tests and presents a method that aims to integrate geographical data from various sources with different geometries and formats into a single view of the cycle paths in the province of Modena, Italy. The Geographic Information System (GIS) software functionalities have been exploited to classify paths in 5 categories: from protected bike lanes to streets with no bike infrastructure. The type of traffic that co-exists in each cycle path was analysed too. The main outcome of this research is a visualization of the cycle paths in the province of Modena highlighting the security of paths, the discontinuity of the routes, and the less covered areas. Moreover, a cycle paths graph data model was generated to perform routing based on the security level.

2022 - Knowledge Graphs for Community Detection in Textual Data [Relazione in Atti di Convegno]
Rollo, F.; Po, L.
abstract

Online sources produce a huge amount of textual data, i.e., freeform text. To derive insightful information from them and facilitate the application of Machine Learning algorithms textual data need to be processed and structured. Knowledge Graphs (KGs) are intelligent systems for the analysis of documents. In recent years, they have been adopted in multiple contexts, including text mining for the development of data-driven solutions to different problems. The scope of this paper is to provide a methodology to build KGs from textual data and apply algorithms to group similar documents in communities. The methodology exploits semantic and statistical approaches to extract relevant insights from each document; these data are then organized in a KG that allows for their interconnection. The methodology has been successfully tested on news articles related to crime events occurred in the city of Modena, in Italy. The promising results demonstrate how KG-based analysis can improve the management of information coming from online sources.

2022 - Online News Event Extraction for Crime Analysis [Relazione in Atti di Convegno]
Rollo, F.; Po, L.; Bonisoli, G.
abstract

Event Extraction is a complex and interesting topic in Information Extraction that includes methods for the identification of event's type, participants, location, and date from free text or web data. The result of event extraction systems can be used in several fields, such as online monitoring systems or decision support tools. In this paper, we introduce a framework that combines several techniques (lexical, semantic, machine learning, neural networks) to extract events from Italian news articles for crime analysis purposes. Furthermore, we concentrate to represent the extracted events in a Knowledge Graph. An evaluation on crimes in the province of Modena is reported.

2022 - Real-Time Visual Analytics for Air Quality [Capitolo/Saggio]
Bachechi, C.; Po, L.; Desimoni, F.
abstract

Raise collective awareness about the daily levels of humans exposure to toxic chemicals in the air is of great significance in motivating citizen to act and embrace a more sustainable life style. For this reason, Public Administrations are involved in effectively monitoring urban air quality with high-resolution and provide understandable visualization of the air quality conditions in their cities. Moreover, collecting data for a long period can help to estimate the impact of the policies adopted to reduce air pollutant concentration in the air. The easiest and most cost-effective way to monitor air quality is by employing low-cost sensors distributed in urban areas. These sensors generate a real-time data stream that needs elaboration to generate adequate visualizations. The TRAFAIR Air Quality dashboard proposed in this paper is a web application to inform citizens and decision-makers on the current, past, and future air quality conditions of three European cities: Modena, Santiago de Compostela, and Zaragoza. Air quality data are multidimensional observations update in real-time. Moreover, each observation has both space and a time reference. Interpolation techniques are employed to generate space-continuous visualizations that estimate the concentration of the pollutants where sensors are not available. The TRAFAIR project consists of a chain of simulation models that estimates the levels of NO and NO2 for up to 2 days. Furthermore, new future air quality scenarios evaluating the impact on air quality according to changes in urban traffic can be explored. All these processes generate heterogeneous data: coming from different sources, some continuous and others discrete in the space-time domain, some historical and others in real-time. The dashboard provides a unique environment where all these data and the derived statistics can be observed and understood.

2022 - Road Network Graph Representation for Traffic Analysis and Routing [Relazione in Atti di Convegno]
Bachechi, C.; Po, L.
abstract

The road network is the infrastructure along which the mobility of users and goods takes place; the analysis of these networks in terms of spatial and graph theoretical approaches can provide insights to understand urban mobility, improve daily commuting, and reflect on new, more sustainable, scenarios. This paper presents an open-source framework to analyze the road network and investigate the relationship between its topology and traffic conditions. Open-source geographical data are stored in a graph database containing roads, junctions, and Points of Interest (POI), allowing importing of traffic data. The framework includes routing algorithms to obtain the optimal path based on different aspects such as distance, traffic volume, and the number of traversed junctions; furthermore, it allows simulating road closures to observe how they affect road viability. The framework was tested in the use case of the city of Modena (Italy) providing promising results.

2022 - Semi Real-time Data Cleaning of Spatially Correlated Data in Traffic Sensor Networks [Relazione in Atti di Convegno]
Rollo, Federica; Bachechi, Chiara; Po, Laura
abstract

2022 - Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset [Relazione in Atti di Convegno]
Rollo, F.; Bonisoli, G.; Po, L.
abstract

The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages.

2022 - TAQE: A Data Modeling Framework for Traffic and Air Quality Applications in Smart Cities [Relazione in Atti di Convegno]
Martínez, David; Po, Laura; Trillo Lado, Raquel; Viqueira, José R. R.
abstract

Air quality and traffic monitoring and prediction are critical problems in urban areas. Therefore, in the context of smart cities, many relevant conceptual models and ontologies have already been proposed. However, the lack of standardized solutions boost development costs and hinder data integration between different cities and with other application domains. This paper proposes a classification of existing models and ontologies related to Earth observation and modeling and smart cities in four levels of abstraction, which range from completely general-purpose frameworks to application-specific solutions. Based on such classification and requirements extracted from a comprehensive set of state-of-the-art applications, TAQE, a new data modeling framework for air quality and traffic data, is defined. The effectiveness of TAQE is evaluated both by comparing its expressiveness with the state-of-the-art of the same application domain and by its application in the ``TRAFAIR -- Understanding traffic flows to improve air quality" EU project.

2021 - Air Quality Sensor Network Data Acquisition, Cleaning, Visualization, and Analytics: A Real-world IoT Use Case [Relazione in Atti di Convegno]
Rollo, Federica; Sudharsan, Bharath; Po, Laura; Breslin, John
abstract

Monitoring and analyzing air quality is of primary importance to encourage more sustainable lifestyles and plan corrective actions. This paper presents the design and end-To-end implementation1 of a real-world urban air quality data collection and analytics use case which is a part of the TRAFAIR (Understanding Traffic Flows to Improve Air Quality) European project [1, 2]. This implementation is related to the project work done in Modena city, Italy, starting from distributed low-cost multi-sensor IoT devices installation, LoRa network setup, data collection at LoRa server database, ML-based anomaly measurement detection plus cleaning, sensor calibration, central control and visualization using designed SenseBoard [3].

2021 - Anomaly Detection in Multivariate Spatial Time Series: A Ready-to-Use implementation [Relazione in Atti di Convegno]
Bachechi, Chiara; Rollo, Federica; Po, Laura; Quattrini, Fabio
abstract

2021 - SenseBoard: Sensor monitoring for air quality experts [Relazione in Atti di Convegno]
Rollo, F.; Po, L.
abstract

Air quality monitoring is crucial within cities since air pollution is one of the main causes of premature death in Europe. However, performing trustworthy monitoring of urban air quality is not a simple process. Especially, if you want to try to create extensive and timely monitoring of the entire urban area using low-cost sensors. In order to collect reliable measurements from low-cost sensors, a lot of work is required from environmental experts who deploy and maintain the air quality network, and daily calibrate, control, and clean up the data generated by these sensors. In this paper, we describe SenseBoard, an interactive dashboard created to support environmental experts in the sensor network control, management of sensor data calibration, and anomaly detection.

2021 - Using Word Embeddings for Italian Crime News Categorization [Relazione in Atti di Convegno]
Bonisoli, Giovanni; Rollo, Federica; Po, Laura
abstract

2020 - A comparative study of state-of-the-art linked data visualization tools [Relazione in Atti di Convegno]
Desimoni, F.; Bikakis, N.; Po, L.; Papastefanatos, G.
abstract

Data visualization tools are of great importance for the exploration and the analysis of Linked Data (LD) datasets. Such tools allow users to get an overview, understand content, and discover interesting insights of a dataset. Visualization approaches vary according to the domain, the type of data, the task that the user is trying to perform, as well as the skills of the user. Thus, the study of the capabilities that each approach offers is crucial in supporting users to select the proper tool/technique based on their need. In this paper we present a comparative study of the state-of-the-art LD visualization tools over a list of fundamental use cases. First, we define 16 use cases that are representative in the setting of LD visual exploration, examining several tool's aspects; e.g., functionality capabilities, feature richness. Then, we evaluate these use cases over 10 LD visualization tools, examining: (1) if the tools have the required functionality for the tasks; and (2) if they allow the successful completion of the tasks over the DBpedia dataset. Finally, we discuss the insights derived from the evaluation, and we point out possible future directions.

2020 - Automatic Publication of Open Data from OGC Services: the Use Case of TRAFAIR Project [Relazione in Atti di Convegno]
Nogueras-Iso, Javier; Ochoa-Ortiz, Héctor; Angel Janez, Manuel; Viqueira, Jose R. R.; Po, Laura; Trillo-Lado, Raquel
abstract

This work proposes a workflow for the publication of Open Spatial Data. The main contribution of this work is the automatic generation of metadata extracted from OGC spatial services providing access to feature types and coverages. Besides, this work adopts the GeoDCAT-AP metadata profile for the description of datasets because it allows for an appropriate crosswalk between the annotation requirements in the spatial domain and the metadata models accepted in general Open Data portals. The feasibility of the proposed workflow has been tested within the framework of the TRAFAIR project to publish monitoring and forecasting air quality data.

2020 - Crime event localization and deduplication [Relazione in Atti di Convegno]
Rollo, Federica; Po, Laura
abstract

2020 - Empirical Evaluation of Linked Data Visualization Tools [Articolo su rivista]
Desimoni, Federico; Po, Laura
abstract

The economic impact of open data in Europe has an estimated value of €140 billions a year between direct and indirect effects. The social impact is also known to be high, as the use of more transparent open data have been enhancing public services and creating new opportunities for citizens and organizations. We are assisting at a staggering growth in the production and consumption of Linked Data (LD). Exploring, visualizing and analyzing LD is a core task for a variety of users in numerous scenarios. This paper deeply analyzes the state of the art of tools for LD visualization. Linked Data visualization aims to provide graphical representations of datasets or of some information of interest selected by a user, with the aim to facilitate their analysis. A complete list of 77 LD visualization tools has been created starting from tools listed in previous surveys or research papers and integrating newer tools recently published online. The visualization tools have been described and compared based on their usability, and their features. A set of goals that LD tools should implement in order to provide clear and convincing visualizations has been defined and 14 tools have been tested on a big LD dataset. The results of this comparison and test led us to define some suggestions for LD consumers in order for them to be able to select the most appropriate tools based on the type of analysis they wish to perform.

2020 - Linked Data Visualization: Techniques, Tools, and Big Data [Monografia/Trattato scientifico]
Po, Laura; Bikakis, Nikos; Desimoni, Federico; Papastefanatos, George
abstract

Linked Data (LD) is nowadays a well established standard for publishing and managing structured information on the Web, gathering and bridging together knowledge from very different scientific and commercial domains. The development of Linked Data Visualization techniques and tools has been followed as the primary means for the analysis of this vast amount of information by data scientists, domain experts, business users and citizens. This book aims at providing an overview of the recent advances in this area, focusing on techniques, tools and use cases of visualization and visual analysis of LD. It presents all necessary preliminary concepts related to the LD technology, the main techniques employed for data visualization based on the characteristics of the underlying data, use cases and tools for LD visualization and finally a thorough assessment of the usability of these tools, under different business scenarios. The goal of this book is to offer interested readers a complete guide on the evolution of LD visualization and empower them to get started with the visual analysis of such data.

2020 - Providing effective visualizations over big linked data [Relazione in Atti di Convegno]
Desimoni, Federico; Po, Laura
abstract

The number and the size of Linked Data sources are constantly increasing. In some lucky case, the data source is equipped with a tool that guides and helps the user during the exploration of the data, but in most cases, the data are published as an RDF dump through a SPARQL endpoint that can be accessed only through SPARQL queries. Although the RDF format was designed to be processed by machines, there is a strong need for visualization and exploration tools. Data visualizations make big and small linked data easier for the human brain to understand, and visualization also makes it easier to detect patterns, trends, and outliers in groups of data. For this reason, we developed a tool called H-BOLD (Highlevel Visualization over Big Linked Open Data). H-BOLD aims to help the user exploring the content of a Linked Data by providing a high-level view of the structure of the dataset and an interactive exploration that allows users to focus on the connections and attributes of one or more classes. Moreover, it provides a visual interface for querying the endpoint that automatically generates SPARQL queries.

2020 - Real-time data cleaning in traffic sensor networks [Relazione in Atti di Convegno]
Bachechi, Chiara; Rollo, Federica; Po, Laura
abstract

2020 - Semantic Traffic Sensor Data: The TRAFAIR Experience [Articolo su rivista]
Desimoni, Federico; Ilarri, Sergio; Po, Laura; Rollo, Federica; Trillo Lado, Raquel
abstract

Modern cities face pressing problems with transportation systems including, but not limited to, traffic congestion, safety, health, and pollution. To tackle them, public administrations have implemented roadside infrastructures such as cameras and sensors to collect data about environmental and traffic conditions. In the case of traffic sensor data not only the real-time data are essential, but also historical values need to be preserved and published. When real-time and historical data of smart cities become available, everyone can join an evidence-based debate on the city’s future evolution. The TRAFAIR (Understanding Traffic Flows to Improve Air Quality) project seeks to understand how traffic affects urban air quality. The project develops a platform to provide real-time and predicted values on air quality in several cities in Europe, encompassing tasks such as the deployment of low-cost air quality sensors, data collection and integration, modeling and prediction, the publication of open data, and the development of applications for end-users and public administrations. This paper explicitly focuses on the modeling and semantic annotation of traffic data. We present the tools and techniques used in the project and validate our strategies for data modeling and its semantic enrichment over two cities: Modena (Italy) and Zaragoza (Spain). An experimental evaluation shows that our approach to publish Linked Data is effective.

2020 - Special issue on smart data and semantics in a sensor world [Articolo su rivista]
Ilarri, S.; Po, L.; Trillo-Lado, R.
abstract

2020 - Using real sensors data to calibrate a traffic model for the city of Modena [Relazione in Atti di Convegno]
Bachechi, Chiara; Rollo, Federica; Desimoni, Federico; Po, Laura
abstract

In Italy, road vehicles are the preferred mean of transport. Over the last years, in almost all the EU Member States, the passenger car fleet increased. The high number of vehicles complicates urban planning and often results in traffic congestion and areas of increased air pollution. Overall, efficient traffic control is profitable in individual, societal, financial, and environmental terms. Traffic management solutions typically require the use of simulators able to capture in detail all the characteristics and dependencies associated with real-life traffic. Therefore, the realization of a traffic model can help to discover and control traffic bottlenecks in the urban context. In this paper, we analyze how to better simulate vehicle flows measured by traffic sensors in the streets. A dynamic traffic model was set up starting from traffic sensors data collected every minute in about 300 locations in the city of Modena. The reliability of the model is discussed and proved with a comparison between simulated values and real values from traffic sensors. This analysis pointed out some critical issues. Therefore, to better understand the origin of fake jams and incoherence with real data, we approached different configurations of the model as possible solutions.

2020 - Visual analytics for spatio-temporal air quality data [Relazione in Atti di Convegno]
Bachechi, Chiara; Desimoni, Federico; Po, Laura
abstract

Air pollution is the second biggest environmental concern for Europeans after climate change and the major risk to public health. It is imperative to monitor the spatio-temporal patterns of urban air pollution. The TRAFAIR air quality dashboard is an effective web application to empower decision-makers to be aware of the urban air quality conditions, define new policies, and keep monitoring their effects. The architecture copes with the multidimensionality of data and the real-time visualization challenge of big data streams coming from a network of low-cost sensors. Moreover, it handles the visualization and management of predictive air quality maps series that is produced by an air pollution dispersion model. Air quality data are not only visualized at a limited set of locations at different times but in the continuous space-time domain, thanks to interpolated maps that estimate the pollution at un-sampled locations.

2019 - Forecast of the impact by local emissions at an urban micro scale by the combination of Lagrangian modelling and low cost sensing technology: The TRAFAIR project [Relazione in Atti di Convegno]
Bigi, A.; Veratti, G.; Fabbi, S.; Po, L.; Ghermandi, G.
abstract

2019 - From Sensors Data to Urban Traffic Flow Analysis [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Bachechi, Chiara; Corni, Alberto
abstract

By 2050, almost 70% of the population will live in cities. As the population grows, travel demand increases and this might affect air quality in urban areas. Traffic is among the main sources of pollution within cities. Therefore, monitoring urban traffic means not only identifying congestion and managing accidents but also preventing the impact on air pollution. Urban traffic modeling and analysis is part of the advanced traffic intelligent management technologies that has become a crucial sector for smart cities. Its main purpose is to predict congestion states of a specific urban transport network and propose improvements in the traffic network that might result into a decrease of the travel times, air pollution and fuel consumption. This paper describes the implementation of an urban traffic flow model in the city of Modena based on real traffic sensor data. This is part of a wide European project that aims at studying the correlation among traffic and air pollution, therefore at combining traffic and air pollution simulations for testing various urban scenarios and raising citizen awareness about air quality where necessary.

2019 - Implementing an urban dynamic traffic model [Relazione in Atti di Convegno]
Bachechi, C.; Po, L.
abstract

The world of mobility is constantly evolving and proposing new technologies, such as autonomous driving, electromobility, shared-mobility or even new air transport systems. We do not know how people and things will be moving within cities in 30 years, but for sure we know that road network planning and traffic management will remain critical issues. The goal of our research is the implementation of a data-driven micro-simulation traffic model for computing everyday simulations of road traffic in a medium-sized city. A dynamic traffic model is needed in every urban area, we introduce an easy-to-set-up solution for cities that already have traffic sensors installed. Daily traffic flows are created from real data measured by induction loop detectors along the urban roads in Modena. The result of the simulation provides a set of "snapshots" of the traffic flow within the Modena road network every minute. The main contribution of the implemented model is the ability, starting from traffic punctual information on 400 locations, to provide an overview of traffic intensity on more than 800 km of roads.

2019 - TRAFAIR: Understanding Traffic Flow to Improve Air Quality [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Ramòn Rìos Viqueira, Josè; Trillo Lado, Raquel; Bigi, Alessandro; Cacheiro Lòpez, Javier; Paolucci, Michela; Nesi, Paolo
abstract

Environmental impacts of traffic are of major concern throughout many European metropolitan areas. Air pollution causes 400 000 deaths per year, making it first environmental cause of premature death in Europe. Among the main sources of air pollution in Europe, there are road traffic, domestic heating, and industrial combustion. The TRAFAIR project brings together 9 partners from two European countries (Italy and Spain) to develop innovative and sustainable services combining air quality, weather conditions, and traffic flows data to produce new information for the benefit of citizens and government decision-makers. The project is started in November 2018 and lasts two years. It is motivated by the huge amount of deaths caused by the air pollution. Nowadays, the situation is particularly critical in some member states of Europe. In February 2017, the European Commission warned five countries, among which Spain and Italy, of continued air pollution breaches. In this context, public administrations and citizens suffer from the lack of comprehensive and fast tools to estimate the level of pollution on an urban scale resulting from varying traffic flow conditions that would allow optimizing control strategies and increase air quality awareness. The goals of the project are twofold: monitoring urban air quality by using sensors in 6 European cities and making urban air quality predictions thanks to simulation models. The project is co-financed by the European Commission under the CEF TELECOM call on Open Data.

2019 - Traffic analysis in a smart city [Relazione in Atti di Convegno]
Bachechi, C.; Po, L.
abstract

Urbanization is accelerating at a high pace. This places new and critical issues on the transition towards smarter, efficient, livable as well as economically, socially and environmentally sustainable cities. Urban Mobility is one of the toughest challenges. In many cities, existing mobility systems are already inadequate, yet urbanization and increasing populations will increase mobility demand still further. Understanding traffic flows within an urban environment, studying similarities (or dissimilarity) among weekdays, finding the peaks within a day are the first steps towards understanding urban mobility. Following the implementation of a micro-simulation model in the city of Modena based on actual data from traffic sensors, a huge amount of information that describes daily traffic flows within the city were available. This paper reports an in-depth investigation of traffic flows in order to discover trends. Traffic analyzes to compare working days, weekends and to identify significant deviations are performed. Moreover, traffic flows estimations were studied during special days such as weather alert days or holidays to discover particular tendencies. This preliminary study allowed to identify the main critical points in the mobility of the city.

2018 - An Integrated Smart City Platform [Relazione in Atti di Convegno]
Nesi, Paolo; Po, Laura; R. Viqueira, Josè; Trillo Lado, Raquel
abstract

Smart Cities aim to create a higher quality of life for their citizens, improve business services and promote tourism experience. Fostering smart city innovation at local and regional level requires a set of mature technologies to discover, integrate and harmonize multiple data sources and the exposure of eective applications for end-users (citizens, administrators, tourists...). In this context, Semantic Web technologies and Linked Open Data principles provide a means for sharing knowledge about cities as physical, economical, social, and technical systems, enabling the development of smart city services. Despite the tremendous effort these communities have done so far, there exists a lack of comprehensive and effective platforms that handle the entire process of identication, ingestion, consumption and publication of data for Smart Cities. In this paper, a complete open-source platform to boost the integration, semantic enrichment, publication and exploitation of public data to foster smart cities in local and national administrations is proposed. Starting from mature software solutions, we propose a platform to facilitate the harmonization of datasets (open and private, static and dynamic on real time) of the same domain generated by dierent authorities. The platform provides a unied dataset oriented to smart cities that can be exploited to offer services to the citizens in a uniform way, to easily release open data, and to monitor services status of the city in real time by means of a suite of web applications.

2018 - Building an Urban Theft Map by Analyzing Newspaper Crime Reports [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica
abstract

One of the main issues in today's cities is related to public safety, which can be improved by implementing a systematic analysis for identifying and analyzing patterns and trends in crime also called crime mapping. Mapping crime allows police analysts to identify crime hot spots, moreover it increases public confidence and citizen engagement and promotes transparency.This paper is focused on analyzing and mapping thefts through on-line newspaper using text mining techniques for an Italian city.

2018 - Community detection applied on big linked data [Articolo su rivista]
Po, L.; Malvezzi, D.
abstract

The Linked Open Data (LOD) Cloud has more than tripled its sources in just six years (from 295 sources in 2011 to 1163 datasets in 2017). The actual Web of Data contains more then 150 Billions of triples. We are assisting at a staggering growth in the production and consumption of LOD and the generation of increasingly large datasets. In this scenario, providing researchers, domain experts, but also businessmen and citizens with visual representations and intuitive interactions can significantly aid the exploration and understanding of the domains and knowledge represented by Linked Data. Various tools and web applications have been developed to enable the navigation, and browsing of the Web of Data. However, these tools lack in producing high level representations for large datasets, and in supporting users in the exploration and querying of these big sources. Following this trend, we devised a new method and a tool called H-BOLD (High level visualizations on Big Open Linked Data). H-BOLD enables the exploratory search and multilevel analysis of Linked Open Data. It offers different levels of abstraction on Big Linked Data. Through the user interaction and the dynamic adaptation of the graph representing the dataset, it will be possible to perform an effective exploration of the dataset, starting from a set of few classes and adding new ones. Performance and portability of H-BOLD have been evaluated on the SPARQL endpoint listed on SPARQL ENDPOINT STATUS. The effectiveness of H-BOLD as a visualization tool is described through a user study.

2018 - H-BOLD (High level visualizations on Big Open Linked Data) [Software]
Po, Laura; Desimoni, Federico; Malvezzi, Davide
abstract

2018 - High-level visualization over big linked data [Relazione in Atti di Convegno]
Po, Laura; Malvezzi, Davide
abstract

The Linked Open Data (LOD) Cloud is continuously expanding and the number of complex and large sources is raising. Understanding at a glance an unknown source is a critical task for LOD users but it can be facilitated by visualization or exploration tools. H-BOLD (High-level visualization over Big Open Linked Data) is a tool that allows users with no a-priori knowledge on the domain nor SPARQL skills to start navigating and exploring Big Linked Data. Users can start from a high-level visualization and then focus on an element of interest to incrementally explore the source, as well as perform a visual query on certain classes of interest. At the moment, 32 Big Linked Data (with more than 500.000 triples) exposing a SPARQL endpoint can be explored by using H-BOLD.

2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.

2017 - Managing Road Safety through the Use of Linked Data and Heat Maps [Relazione in Atti di Convegno]
Colacino, Vincenzo Giuseppe; Po, Laura
abstract

Road traffic injuries are a critical public health challenge that requires valuable efforts for effective and sustainable prevention. Worldwide, an estimated 1.2 million people are killed in road crashes each year and as many as 50 million are injured. An analysis of data provided by authoritative sources can be a valuable source for understanding which are the most critical points on the road network. The aim of this paper is to discover data about road accidents in Italy and to provide useful visualization for improving road safety. Starting from the annual report of road accidents of the Automobile Club of Italy, we transform the original data into an RDF dataset according to the Linked Open Data principles and connect it to external datasets. Then, an integration with Open Street Map allows to display the accident data on a map. Here, the final user is able to identify which road sections are most critical based on the number of deaths, injuries or accidents.

2017 - Topic detection in multichannel Italian newspapers [Relazione in Atti di Convegno]
Po, Laura; Rollo, Federica; Lado, Raquel Trillo
abstract

Nowadays, any person, company or public institution uses and exploits different channels to share private or public information with other people (friends, customers, relatives, etc.) or institutions. This context has changed the journalism, thus, the major newspapers report news not just on its own web site, but also on several social media such as Twitter or YouTube. The use of multiple communication media stimulates the need for integration and analysis of the content published globally and not just at the level of a single medium. An analysis to achieve a comprehensive overview of the information that reaches the end users and how they consume the information is needed. This analysis should identify the main topics in the news flow and reveal the mechanisms of publication of news on different media (e.g. news timeline). Currently, most of the work on this area is still focused on a single medium. So, an analysis across different media (channels) should improve the result of topic detection. This paper shows the application of a graph analytical approach, called Keygraph, to a set of very heterogeneous documents such as the news published on various media. A preliminary evaluation on the news published in a 5 days period was able to identify the main topics within the publications of a single newspaper, and also within the publications of 20 newspapers on several on-line channels.

2016 - Driving Innovation in Youth Policies With Open Data [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

In December 2007, thirty activists held a meeting in California to define the concept of open public data. For the first time eight Open Government Data (OPG) principles were settled; OPG should be Complete, Primary (reporting data at an high level of granularity), Timely, Accessible, Machine processable, Non-discriminatory, Non-proprietary, License-free. Since the inception of the Open Data philosophy there has been a constant increase in information released improving the communication channel between public administrations and their citizens. Open data offers government, companies and citizens information to make better decisions. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources. This paper reports the activities carried on during a research project on Open Data for Youth Policies. The project was devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work. We identified interesting data sources both from the open data community and from the private repositories of local governments related to the Youth Policies. The selected sources have been integrated and, the result of the integration by means of a useful navigator tool have been shown up. In the end, we published new information on the web as Linked Open Data. Since the process applied and the tools used are generic, we trust this paper to be an example and a guide for new projects that aims to create new knowledge through Open Data.

2016 - Exposing the Underlying Schema of LOD Sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The Linked Data Principles defined by Tim-Berners Lee promise that a large portion of Web Data will be usable as one big interlinked RDF database. Today, with more than one thousand of Linked Open Data (LOD) sources available on the Web, we are assisting to an emerging trend in publication and consumption of LOD datasets. However, the pervasive use of external resources together with a deficiency in the definition of the internal structure of a dataset causes many LOD sources are extremely complex to understand. In this paper, we describe a formal method to unveil the implicit structure of a LOD dataset by building a (Clustered) Schema Summary. The Schema Summary contains all the main classes and properties used within the datasets, whether they are taken from external vocabularies or not, and is conceivable as an RDFS ontology. The Clustered Schema Summary, suitable for large LOD datasets, provides a more high level view of the classes and the properties used by gathering together classes that are object of multiple instantiations.

2015 - Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura
abstract

We propose a plot-based recommendation system, which is based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database. Our system is independent from the number of user ratings, thus it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. The system implements and compares the two Topic Models, Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA), on a movie database of two hundred thousand plots that has been constructed by integrating different movie databases in a local NoSQL (MongoDB) DBMS. The topic models behaviour has been examined on the basis of standard metrics and user evaluations, performance ssessments with 30 users to compare our tool with a commercial system have been conducted.

2015 - LODeX: A tool for Visual Querying Linked Open Data [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

Formulating a query on a Linked Open Data (LOD) source is not an easy task; a technical knowledge of the query language, and, the awareness of the structure of the dataset are essential to create a query. We present a revised version of LODeX that provides the user an easy way for building queries in a fast and interactive manner. When a user decides to explore a LOD source, he/she can take advantage of the Schema Summary produced by LODeX (i.e. a synthetic view of the dataset’s structure) and he/she can pick graphical elements from it to create a visual query. The tool also supports the user in browsing the results and, eventually, in refining the query. The prototype has been evaluated on hundreds of public SPARQL endpoints (listed in Data Hub) and it is available online at http://dbgroup.unimo.it/lodex2. A survey conducted on 27 users has demonstrated that our tool can effectively support both unskilled and skilled users in exploring and querying LOD datasets.

2015 - Open Data for Improving Youth Policies [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Gagliardelli, Luca; Po, Laura
abstract

The Open Data \textit{philosophy} is based on the idea that certain data should be made available to all citizens, in an open form, without any copyright restrictions, patents or other mechanisms of control. Various government have started to publish open data, first of all USA and UK in 2009, and in 2015, the Open Data Barometer project (www.opendatabarometer.org) states that on 77 diverse states across the world, over 55 percent have developed some form of Open Government Data initiative. We claim Public Administrations, that are the main producers and one of the consumers of Open Data, might effectively extract important information by integrating its own data with open data sources.This paper reports the activities carried on during a one-year research project on Open Data for Youth Policies. The project was mainly devoted to explore the youth situation in the municipalities and provinces of the Emilia Romagna region (Italy), in particular, to examine data on population, education and work.The project goals were: to identify interesting data sources both from the open data community and from the private repositories of local governments of Emilia Romagna region related to the Youth Policies; to integrate them and, to show up the result of the integration by means of a useful navigator tool; in the end, to publish new information on the web as Linked Open Data. This paper also reports the main issues encountered that may seriously affect the entire process of consumption, integration till the publication of open data.

2015 - Visual Querying LOD sources with LODeX [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The Linked Open Data (LOD) Cloud has more than tripled its sources in just three years (from 295 sources in 2011 to 1014 in 2014). While the LOD data are being produced at a increasing rate, LOD tools lack in producing an high level representation of datasets and in supporting users in the exploration and querying of a source. To overcome the above problems and significantly increase the number of consumers of LOD data, we devised a new method and a tool, called LODeX, that promotes the understanding, navigation and querying of LOD sources both for experts and for beginners. It also provides a standardized and homogeneous summary of LOD sources and supports user in the creation of visual queries on previously unknown datasets. We have extensively evaluated the portability and usability of the tool. LODeX have been tested on the entire set of datasets available at Data Hub, i.e. 302 sources. In this paper, we showcase the usability evaluation of the different features of the tool (the Schema Summary representation and the visual query building) obtained on 27 users (comprising both Semantic Web experts and beginners).

2014 - A Visual Summary for Linked Open Data sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

In this paper we propose LODeX, a tool that produces a representative summary of a Linked open Data (LOD) source starting from scratch, thus supporting users in exploring and understanding the contents of a dataset. The tool takes in input the URL of a SPARQL endpoint and launches a set of predefined SPARQL queries, from the results of the queries it generates a visual summary of the source. The summary reports statistical and structural information of the LOD dataset and it can be browsed to focus on particular classes or to explore their properties and their use. LODeX was tested on the 137 public SPARQL endpoints contained in Data Hub (formerly CKAN), one of the main Open Data catalogues. The statistical and structural information of the 107 well performed extractions are collected and available in the online version of LODeX (http://dbgroup.unimo.it/lodex).

2014 - Comparing Topic Models for a Movie Recommendation System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

Recommendation systems have become successful at suggesting content that are likely to be of interest to the user, however their performance greatly suffers when little information about the users preferences are given. In this paper we propose an automated movie recommendation system based on the similarity of movie: given a target movie selected by the user, the goal of the system is to provide a list of those movies that are most similar to the target one, without knowing any user preferences. The Topic Models of Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA) have been applied and extensively compared on a movie database of two hundred thousand plots. Experiments are an important part of the paper; we examined the topic models behaviour based on standard metrics and on user evaluations, we have conducted performance assessments with 30 users to compare our approach with a commercial system. The outcome was that the performance of LSA was superior to that of LDA in supporting the selection of similar plots. Even if our system does not outperform commercial systems, it does not rely on human effort, thus it can be ported to any domain where natural language descriptions exist. Since it is independent from the number of user ratings, it is able to suggest famous movies as well as old or unheard movies that are still strongly related to the content of the video the user has watched.

2014 - LODeX: A visualization tool for Linked Open Data navigation and querying. [Software]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

We present LODeX, a tool for visualizing, browsing and querying a LOD source starting from the URL of its SPARQL endpoint. LODeX creates a visual summary for a LOD dataset and allows users to perfor queries on it. Users can select the classes of interest for discovering which instances are stored in the LOD source without any knowledge of the underlying vocabulary used for describing data. The tool couples the overall view of the LOD source with the preview of the instances so that the user can easily build and refine his/her query. The tool has been evaluated on hundreds of public SPARQL endpoints (listed in Data Hub). The schema summaries of 40 LOD sources are stored and available for online querying at http://dbgroup.unimo.it/lodex2.

2014 - Online Index Extraction from Linked Open Data Sources [Relazione in Atti di Convegno]
Benedetti, Fabio; Bergamaschi, Sonia; Po, Laura
abstract

The production of machine-readable data in the form of RDF datasets belonging to the Linked Open Data (LOD) Cloud is growing very fast. However, selecting relevant knowledge sources from the Cloud, assessing the quality and extracting synthetical information from a LOD source are all tasks that require a strong human effort. This paper proposes an approach for the automatic extraction of the more representative information from a LOD source and the creation of a set of indexes that enhance the description of the dataset. These indexes collect statistical information regarding the size and the complexity of the dataset (e.g. the number of instances), but also depict all the instantiated classes and the properties among them, supplying user with a synthetical view of the LOD source. The technique is fully implemented in LODeX, a tool able to deal with the performance issues of systems that expose SPARQL endpoints and to cope with the heterogeneity on the knowledge representation of RDF data. An evaluation on LODeX on a large number of endpoints (244) belonging to the LOD Cloud has been performed and the effectiveness of the index extraction process has been presented.

2013 - An iPad Order Management System for Fashion Trade [Relazione in Atti di Convegno]
I., Baroni; Bergamaschi, Sonia; Po, Laura
abstract

The fashion industry loves the new tablets. In 2011 we noted a 38% growth of e-commerce in the italian fashion industry. A large number of brands have understood the value of mobile devices as the key channel for consumer communication. The interest of brands in applications of mobile marketing and services have made a big step forward, with an increase of 129% in 2011 (osservatori.net, 2012). This paper presents a mobile version of the Fashion OMS (Order Management System) web application. Fashion Touch is a mobile application that allows clients and company’s sales networks to process commercial orders, consult the product catalog and manage customers as the OMS web version does with the added functionality of the off-line order entering mode. To develop an effective mobile App, we started by analyzing the new web technologies for mobile applications (HTML5, CSS3, Ajax) and their relative development frameworks making a comparison with the Apple’s native programming language. We selected Titanium, a multi-platform framework for native mobile and desktop devices application development via web technologies as the best framework for our purpose. We faced issues concerning the network synchronization and studied different database solutions depending on the device hardware characteristics and performances. This paper reports every aspect of the App development until the publication on the Apple Store.

2012 - A meta-language for MDX queries in eLog Business Solution [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Interlandi, Matteo; Mario, Longo; Po, Laura; Vincini, Maurizio
abstract

The adoption of business intelligence technologyin industries is growing rapidly. Business managers are notsatisfied with ad hoc and static reports and they ask for moreflexible and easy to use data analysis tools. Recently, applicationinterfaces that expand the range of operations available to theuser, hiding the underlying complexity, have been developed. Thepaper presents eLog, a business intelligence solution designedand developed in collaboration between the database group ofthe University of Modena and Reggio Emilia and eBilling, anItalian SME supplier of solutions for the design, production andautomation of documentary processes for top Italian companies.eLog enables business managers to define OLAP reports bymeans of a web interface and to customize analysis indicatorsadopting a simple meta-language. The framework translates theuser’s reports into MDX queries and is able to automaticallyselect the data cube suitable for each query.Over 140 medium and large companies have exploited thetechnological services of eBilling S.p.A. to manage their documentsflows. In particular, eLog services have been used by themajor media and telecommunications Italian companies and theirforeign annex, such as Sky, Mediaset, H3G, Tim Brazil etc. Thelargest customer can provide up to 30 millions mail pieces within6 months (about 200 GB of data in the relational DBMS). In aperiod of 18 months, eLog could reach 150 millions mail pieces(1 TB of data) to handle.

2012 - A non-intrusive movie recommendation system [Relazione in Atti di Convegno]
Farinella, Tania; Bergamaschi, Sonia; Po, Laura
abstract

Several recommendation systems have been developed to support the user in choosing an interesting movie from multimedia repositories. The widely utilized collaborative-filtering systems focus on the analysis of user profiles or user ratings of the items. However, these systems decrease their performance at the start-up phase and due to privacy issues, when a user hides most of his personal data. On the other hand, content-based recommendation systems compare movie features to suggest similar multimedia contents; these systems are based on less invasive observations, however they find some difficulties to supply tailored suggestions. In this paper, we propose a plot-based recommendation system, which is based upon an evaluation of similarity among the plot of a video that was watched by the user and a large amount of plots that is stored in a movie database. Since it is independent from the number of user ratings, it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. We experimented different methodologies to compare natural language descriptions of movies (plots) and evaluated the Latent Semantic Analysis (LSA) to be the superior one in supporting the selection of similar plots. In order to increase the efficiency of LSA, different models have been experimented and in the end, a recommendation system that is able to compare about two hundred thousands movie plots in less than a minute has been developed.

2011 - Automatic Normalization and Annotation for Discovering Semantic Mappings [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Po, Laura; Sorrentino, Serena
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

2011 - Automatic generation of probabilistic relationships for improving schema matching [Articolo su rivista]
Po, Laura; Sorrentino, Serena
abstract

Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the ‘‘hidden meaning’’ associated with schema labels (i.e.class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering probabilistic lexical relationships in the environment of data integration ‘‘on the fly’’. Our method is based on a probabilistic lexical annotation technique, which automatically associates one or more meanings with schema elements w.r.t. a thesaurus/ lexical resource. However, the accuracy of automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and abbreviations.We address this problem by including a method to perform schema label normalization which increases the number of comparable labels. From the annotated schemata, we derive the probabilistic lexical relationships to be collected in the Probabilistic CommonThesaurus. The method is applied within the MOMIS data integration system but can easily be generalized to other data integration systems.

2011 - The Open Source release of the MOMIS Data Integration System [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Corni, Alberto; Entela, Kazazi; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

MOMIS (Mediator EnvirOnment for Multiple InformationSources) is an Open Source Data Integration system able to aggregate data coming from heterogeneous data sources (structured and semistructured) in a semi-automatic way. DataRiver3 is a Spin-Off of the University of Modena and Reggio Emilia that has re-engineered the MOMIS system, and released its Open Source version both for commercial and academic use. The MOMIS system has been extended with a set of features to minimize the integration process costs, exploiting the semantics of the data sources and optimizing each integration phase.The Open Source MOMIS system have been successfully applied in several industrial sectors: Medical, Agro-food, Tourism, Textile, Mechanical, Logistics. This paper describes the features of the Open Source MOMIS system and how it is able to address real data integration challenges.

2011 - Using semantic techniques to access web data [Articolo su rivista]
Raquel, Trillo; Po, Laura; Sergio, Ilarri; Bergamaschi, Sonia; Eduardo, Mena
abstract

Nowadays, people frequently use different keyword-based web search engines to find the information they need on the web. However, many words are polysemous and, when these words are used to query a search engine, its output usually includes links to web pages referring to their different meanings. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the users, especially if the user-intended meanings behind the input keywords are not among the most popular on the web. In this paper, we propose a set of semantics techniques to group the results provided by a traditional search engine into categories defined by the different meanings of the input keywords. Differently from other proposals, our method considers the knowledge provided by ontologies available on the web in order to dynamically define the possible categories. Thus, it is independent of the sources providing the results that must be grouped. Our experimental results show the interest of the proposal.

2010 - Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher [Relazione in Atti di Convegno]
Po, Laura; Bergamaschi, Sonia
abstract

This paper proposes lexical annotation as an effective method to solve the ambiguity problems that affect ontology matchers. Lexical annotation associates to each ontology element a set of meanings belonging to a semantic resource. Performing lexical annotation on theontologies involved in the matching process allows to detect false positive mappings and to enrich matching results by adding new mappings (i.e. lexical relationships between elements on the basis of the semanticrelationships holding among meanings).The paper will go through the explanation of how to apply lexical annotation on the results obtained by a matcher. In particular, the paper shows an application on the SCARLET matcher.We adopt an experimental approach on two test cases, where SCARLET was previously tested, to investigate the potential of lexical annotation. Experiments yielded promising results, showing that lexical annotationimproves the precision of the matcher.

2010 - Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher (Extended Abstract) [Relazione in Atti di Convegno]
Po, Laura
abstract

This paper proposes lexical annotation as an effective method to solve the ambiguity problems that affect ontology matchers. Lexical annotation associates to each ontology element a set of meanings belonging to a semantic resource. Performing lexical annotation on the ontologies involved in the matching process allows to detect false positive mappings and to enrich matching results by adding new mappings (i.e. lexical relationships between elements on the basis of the semantic relationships holding among meanings).The paper will go through the explanation of how to apply lexical annotation on a matcher. In particular, we show an application on the SCARLET matcher.Experiments yielded promising results, showing that lexical annotation improves the precision of the matcher.

2010 - Schema Label Normalization for Improving Schema Matching [Articolo su rivista]
Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels.However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.

2010 - Uncertainty in data integration systems: automatic generation of probabilistic relationships [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

This paper proposes a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is based on probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. to perform annotation w.r.t. a thesaurus/lexical resource) the schemata of a given set of data sources to be integrated. From the annotated schemata and the relathionships defined in the thesaurus, we derived the probabilistic lexical relationships among schema elements. Lexical relationships are collected in the Probabilistic Common Thesaurus (PCT), as well as structural relationships.

2009 - ALA: Dealing with Uncertainty in Lexical Annotation [Software]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value. By performing probabilistic lexical annotation, we discover probabilistic inter-sources lexical relationships among schema elements. ALA extends the lexical annotation module of the MOMIS data integration system. However, it may be applied in general in the context of schema mapping discovery, ontology merging and data integration system and it is particularly suitable for performing “on-the-fly” data integration or probabilistic ontology matching.

2009 - An Ontology-Based Data Integration System for Data and Multimedia Sources [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sala, Antonio; Sorrentino, Serena
abstract

Data integration is the problem of combining data residing at distributed heterogeneous sources, including multimedia sources, and providing the user with a unified view of these data. Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources [16]. Ontologies, with respect to the integration of data sources, can be used for the identification and association of semantically correspond- ing information concepts, i.e. for the definition of semantic mappings among concepts of the information sources. MOMIS is a Data Integration System which performs in-formation extraction and integration from both structured and semi- structured data sources [6]. In [5] MOMIS was extended to manage “traditional” and “multimedia” data sources at the same time. STASIS is a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics [1]. Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been pro-posed [7]. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration for data and multimedia sources.

2009 - DataRiver [Spin Off]
Bergamaschi, Sonia; Orsini, Mirko; Beneventano, Domenico; Sala, Antonio; Corni, Alberto; Po, Laura; Sorrentino, Serena; Quix, Srl
abstract

2009 - Dealing with Uncertainty in Lexical Annotation [Articolo su rivista]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena; Corni, Alberto
abstract

2009 - Lexical Knowledge Extraction: an Effective Approach to Schema and Ontology Matching [Relazione in Atti di Convegno]
Po, Laura; Sorrentino, Serena; Bergamaschi, Sonia; Beneventano, Domenico
abstract

This paper’s aim is to examine what role Lexical Knowledge Extraction plays in data integration as well as ontology engineering.Data integration is the problem of combining data residing at distributed heterogeneous sources, and providing the user with a unified view of these data; a common and important scenario in data integration are structured or semi-structure data sources described by a schema.Ontology engineering is a subfield of knowledge engineering that studies the methodologies for building and maintaining ontologies. Ontology engineering offers a direction towards solving the interoperability problems brought about by semantic obstacles, such as the obstacles related to the definitions of business terms and software classes. In these contexts where users are confronted with heterogeneous information it is crucial the support of matching techniques. Matching techniques aim at finding correspondences between semantically related entities of different schemata/ontologies.Several matching techniques have been proposed in the literature based on different approaches, often derived from other fields, such as text similarity, graph comparison and machine learning.This paper proposes a matching technique based on Lexical Knowledge Extraction: first, an Automatic Lexical Annotation of schemata/ontologies is performed, then lexical relationships are extracted based on such annotations.Lexical Annotation is a piece of information added in a document (book, online record, video, or other data), that refers to a semantic resource such as WordNet. Each annotation has the property to own one or more lexical descriptions. Lexical annotation is performed by the Probabilistic Word Sense Disambiguation (PWSD) method that combines several disambiguation algorithms.Our hypothesis is that performing lexical annotation of elements (e.g. classes and properties/attributes) of schemata/ontologies makes the system able to automatically extract the lexical knowledge that is implicit in a schema/ontology and then to derive lexical relationships between the elements of a schema/ontology or among elements of different schemata/ontologies.The effectiveness of the method presented in this paper has been proven within the data integration system MOMIS.

2009 - Schema Normalization for Improving Schema Matching [Relazione in Atti di Convegno]
Sorrentino, Serena; Bergamaschi, Sonia; Gawinecki, Maciej; Po, Laura
abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.

2009 - Semantic Access to Data from the Web [Relazione in Atti di Convegno]
Raquel, Trillo; Po, Laura; Sergio, Ilarri; Bergamaschi, Sonia; Eduardo, Mena
abstract

There is a great amount of information available on the web. So, users typically use different keyword-based web search engines to find the information they need. However, many words are polysemous and therefore the output of the search engine will include links to web pages referring to different meanings of the keywords. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the user, specially if the meanings behind the input keywords are not among the most popular in the web. In this paper, we propose a semantics-based approach to group the results returned to the user in clusters defined by the different meanings of the input keywords. Differently from other proposals, our method considers the knowledge provided by a pool of ontologies available on the Web in order to dynamically define the different categories (or clusters). Thus, it is independent of the sources providing the results that must be grouped.

2009 - The MOMIS-STASIS approach for Ontology-Based Data Integration [Relazione in Atti di Convegno]
Beneventano, Domenico; Orsini, Mirko; Po, Laura; Sorrentino, Serena
abstract

Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources. Ontologies can be used in an integration task to describe the semantics of the information sources and to make the contents explicit. With respect to the integration of data sources, they can be used for the identification and association of semantically corresponding information concepts, i.e. for the definition of semantic mapping among concepts of the information sources. MOMIS is a Data Integration System which performs information extraction and integration from both structured and semi-structured data sources. The goal of the STASIS project is to create a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics.Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been proposed. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration.

2008 - Automatic annotation for mapping discovery in data integration systems (Extended abstract) [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

In this article we present CWSD (Combined Word Sense Disambiguation) a method and a software tool for enabling automatic lexical annotation of local (structured and semi-structured) data sources in a data integration system. CWSD is based on the exploitation of WordNet Domains and the lexical and structural knowledge of the data sources. The method extends the semi-automatic lexical annotation module of the MOMIS data integration system. The distinguishing feature of the method is its independence or low dependence of a human intervention. CWSD is a valid method to satisfy two important tasks: (1) the source lexical annotation process, i.e. the operation of associating an element of a lexical reference database (WordNet) to all source elements, (2) the discover of mappings among concepts of distributed data sources/ontologies.

2008 - Improving Data Integration through Disambiguation Techniques [Relazione in Atti di Convegno]
Po, Laura
abstract

In this paper Word Sense Disambiguation (WSD) issue in the context of data integration is outlined and an Approximate Word Sense Disambiguation approach (AWSD) is proposed for the automatic lexical annotation of structured and semi-structured data sources.

2008 - Open Source come modello di business per le PMI: analisi critica e casi di studio [Capitolo/Saggio]
Bergamaschi, Sonia; Nigro, Francesco; Po, Laura; Vincini, Maurizio
abstract

Il software Open Source sta attirando l'attenzione a tutti i livelli, sia all'interno del mondo economico che produttivo, perché propone un nuovo modello di sviluppo tecnologico ed economico fortemente innovativo e di rottura con il passato.In questo elaborato verranno analizzate le ragioni che stanno determinando il successo di tale modello e verranno inoltre presentate alcune casistiche in cui l'Open Source risulta vantaggioso, evidenziando gli aspetti più interessanti sia per gli utilizzatori che per i produttori del software.

2007 - Automatic annotation in data integration systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sorrentino, Serena
abstract

We propose a CWSD (Combined Word Sense Disambiguation) algorithm for the automatic annotation of structured and semi-structured data sources. Rather than being targeted to textual data sources like most of the traditional WSD algorithms found in the literature, our algorithm can exploit information coming from the structure of the sources together with the lexical knowledge associated with the terms (elements of the schemata).

2007 - Automatic annotation of local data sources for data integration systems [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Po, Laura; Sala, Antonio; Sorrentino, Serena
abstract

In this article we present CWSD (Combined Word Sense Disambiguation) a method and a software tool for enabling automatic annotation of local structured and semi-structured data sources, with lexical information, in a data integration system. CWSD is based on the exploitation of WordNet Domains, structural knowledge and on the extension of the lexical annotation module of the MOMIS data integration system. The distinguishing feature of the algorithm is its low dependence of a human intervention. Our approach is a valid method to satisfy two important tasks: (1) the source annotation process, i.e. the operation of associating an element of a lexical reference database (WordNet) to all source elements, (2) the discover of mappings among concepts of distributed data sources/ontologies.

2007 - MELIS: An Incremental Method For The Lexical Annotation Of Domain Ontologies [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELISis its incrementality: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of MELIS as a standalone tool and as a component integrated in MOMIS.

2007 - MELIS: a tool for the incremental annotation of domain ontologies [Software]
Bergamaschi, Sonia; Paolo, Bouquet; Daniel, Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Melis is a software tool for enablingan incremental process of automatic annotation of local schemas (e.g. re-lational database schemas, directory trees) with lexical information. Thedistinguishing and original feature of MELIS is its incrementality: thehigher the number of schemas which are processed, the more back-ground/domain knowledge is cumulated in the system (a portion of do-main ontology is learned at every step), the better the performance ofthe systems on annotating new schemas.

2007 - Melis: an incremental method for the lexical annotation of domain ontologies [Articolo su rivista]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS.

2006 - An incremental method for meaning elicitation of a domain ontology [Relazione in Atti di Convegno]
Bergamaschi, Sonia; P., Bouquet; D., Giacomuzzi; Guerra, Francesco; Po, Laura; Vincini, Maurizio
abstract

Internet has opened the access to an overwhelming amount of data, requiring the development of new applications to automatically recognize, process and manage informationavailable in web sites or web-based applications. The standardSemantic Web architecture exploits ontologies to give a shared(and known) meaning to each web source elements.In this context, we developed MELIS (Meaning Elicitation and Lexical Integration System). MELIS couples the lexical annotation module of the MOMIS system with some components from CTXMATCH2.0, a tool for eliciting meaning from severaltypes of schemas and match them. MELIS uses the MOMIS WNEditor and CTXMATCH2.0 to support two main tasks in theMOMIS ontology generation methodology: the source annotationprocess, i.e. the operation of associating an element of a lexicaldatabase to each source element, and the extraction of lexicalrelationships among elements of different data sources.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni