Riccardo MARTOGLIA - personale UniMoRe

Nuova ricerca

Riccardo MARTOGLIA

Professore Associato
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica

Pubblicazioni

2023 - Does the venue of scientific conferences leverage their impact? A large scale study on Computer Science conferences [Articolo su rivista]
Bedogni, L.; Cabri, G.; Martoglia, R.; Poggi, F.
abstract

Purpose: Conferences bring scientists together and provide one of the most timely means for disseminating new ideas and cutting-edge works. The importance of conferences in many scientific areas is testified by quantitative indexes. The main goal of this paper is to investigate a novel research question: is there any correlation between the impact of scientific conferences and the venue where they took place? Design/methodology/approach: To measure the impact of conferences, the authors conducted a large scale analysis on the bibliographic data extracted from 3,838 Computer Science conference series and over 2.5 million papers spanning more than 30 years of research. To quantify the “touristicity'' of a venue, the authors exploited indexes about the attractiveness of a venue from reports of the World Economic Forum, and have extracted four country-wide and two city-wide touristic indexes, which measure the attractiveness and the touristicity of any country or city. Findings: The authors found out that the two aspects are related, and the correlation with conference impact is stronger when considering country-wide touristic indexes, achieving a correlation value of more than 0.5 when considering the average citations, and more than 0.8 when considering the total citations. Moreover the almost linear correlation with the Tourist Service Infrastructure index attests the specific importance of tourist/accommodation facilities in a given country. Research limitations/implications: There are two main limitations of this work: (1) the use of citations to evaluate the attractiveness of the conferences and (2) the difficulty to formally define the touristic attractiveness of a venue. Practical implications: Starting from the results concerning the correlation between different touristicity indicators and the outcome of a conference in terms of citations, it would be possible to support conference organizers in their decisions. For instance, they could plan in advance conference venues considering the same touristicity indicators, comparing different options and selecting cities which have high scores. This will allow for rapid planning of a conference venue, encompassing the easiness of travel and the attractivity of a venue, hence increasing the potential outcomes of the conference. Social implications: Regarding the social implications, this study will enable the possibility for municipalities and conference organizers to understand what it can be improved in a specific venue to make it more attractive. This may include better transport connections or selecting cities which show a high potential regarding the touristicity index. Regarding the willingness of a researcher to submit a paper to a specific conference, it would be unaltered, meaning that what the results show is that there is already a mental process, before submitting a paper to a conference, which considers these indicators. Originality/value: This is the first attempt to focus on the relationship of venue characteristics to conference papers. The results open up new possibilities, such as supporting conference organizers in their organization efforts.

2023 - Knowledge extraction, management and long-term preservation of non-Latin cultural heritages - Digital Maktaba project presentation [Relazione in Atti di Convegno]
Martoglia, Riccardo; Bergamaschi, Sonia; Ruozzi, Federico; Vanzini, Matteo; Sala, Luca; Vigliermo, RICCARDO AMERIGO
abstract

2022 - A Novel Real-Time Edge-Cloud Big Data Management and Analytics Framework for Smart Cities [Articolo su rivista]
Cavicchioli, Roberto; Martoglia, Riccardo; Verucchi, Micaela
abstract

Exposing city information to dynamic, distributed, powerful, scalable, and user-friendly big data systems is expected to enable the implementation of a wide range of new opportunities; however, the size, heterogeneity and geographical dispersion of data often makes it difficult to combine, analyze and consume them in a single system. In the context of the H2020 CLASS project, we describe an innovative framework aiming to facilitate the design of advanced big-data analytics workflows. The proposal covers the whole compute continuum, from edge to cloud, and relies on a well-organized distributed infrastructure exploiting: a) edge solutions with advanced computer vision technologies enabling the real-time generation of “rich” data from a vast array of sensor types; b) cloud data management techniques offering efficient storage, real-time querying and updating of the high-frequency incoming data at different granularity levels. We specifically focus on obstacle detection and tracking for edge processing, and consider a traffic density monitoring application, with hierarchical data aggregation features for cloud processing; the discussed techniques will constitute the groundwork enabling many further services. The tests are performed on the real use-case of the Modena Automotive Smart Area (MASA).

2022 - A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario [Articolo su rivista]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

Museums are embracing social technologies in an attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this article, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help to enhance the message and increase the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

2022 - A tool for semiautomatic cataloguing of an islamic digital library: a use case from the Digital Maktaba project (short paper) [Relazione in Atti di Convegno]
Martoglia, R.; Sala, L.; Vanzini, M.; Vigliermo, R.
abstract

2022 - About Challenges in Data Analytics and Machine Learning for Social Good [Articolo su rivista]
Martoglia, Riccardo; Montangero, Manuela
abstract

The large number of new services and applications and, in general, all our everyday activities resolve in data mass production: all these data can become a golden source of information that might be used to improve our lives, wellness and working days. (Interpretable) Machine Learning approaches, the use of which is increasingly ubiquitous in various settings, are definitely one of the most effective tools for retrieving and obtaining essential information from data. However, many challenges arise in order to effectively exploit them. In this paper, we analyze key scenarios in which large amounts of data and machine learning techniques can be used for social good: social network analytics for enhancing cultural heritage dissemination; game analytics to foster Computational Thinking in education; medical analytics to improve the quality of life of the elderly and reduce health care expenses; exploration of work datafication potential in improving the management of human resources (HRM). For the first two of the previously mentioned scenarios, we present new results related to previously published research, framing these results in a more general discussion over challenges arising when adopting machine learning techniques for social good.

2022 - Invited Speech: Data Analytics and (Interpretable) Machine Learning for Social Good [Relazione in Atti di Convegno]
Martoglia, R.
abstract

In recent years, in all contexts of our lives, we have seen a real explosion of data. From a research standpoint, data processing needs have increasingly become common in an ever growing number of applications, with potential benefits not only in our work but also in our lives: the need not just to acquire, store and perform modest operational tasks but also to analyze and properly interpret data. In this talk, we consider some of the hottest and most demanding scenarios in our daily lives, which include: medical analytics to improve the quality of life of the elderly and reduce health care expenses; social network analytics for enhancing cultural heritage dissemination; exploration of work datafication potential in improving the management of human resources (HRM); game analytics to foster Computational Thinking in education. We describe the recent findings we have obtained in our research in these contexts using the latest technology for data analytics, including interpretable machine learning, and discuss the consequences and directions for the future.

2022 - Let the Games Speak by Themselves: Towards Game Features Discovery Through Data-Driven Analysis and Explainable AI [Relazione in Atti di Convegno]
Martoglia, R.; Pontiroli, M.
abstract

The idea behind this work is to start exploring the application of data analytics and (explainable) machine learning techniques to better understand games and discover new features that will possibly help in effectively exploiting them in different socially useful domains. We prove the feasibility of the idea by: (i) collecting a large dataset of board game information; (ii) designing and testing an information processing pipeline for automatically discovering game categories and game mechanics, with some first encouraging results. In the future, we plan to further generalize this approach for different kinds of games and for discovering currently unknown but useful aspects, e.g. games or game features that could better foster Computational Thinking in education, those better suited to be applied in social distancing contexts, and so on.

2022 - Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach [Articolo su rivista]
Bergamaschi, Sonia; De Nardis, Stefania; Martoglia, Riccardo; Ruozzi, Federico; Sala, Luca; Vanzini, Matteo; Vigliermo, RICCARDO AMERIGO
abstract

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view.

2022 - On Designing a Time Sensitive Interaction Graph to Identify Twitter Opinion Leaders [Relazione in Atti di Convegno]
Furini, M.; Mariotti, L.; Martoglia, R.; Montangero, M.
abstract

What happened on social media during the recent pandemic? Who was the opinion leader of the conversations? Who influenced whom? Were they medical doctors, ordinary people, scientific experts? Did health institutions play an important role in informing and updating citizens? Identifying opinion leaders within social platforms is of particular importance and, in this paper, we introduce the idea of a time sensitive interaction graph to identify opinion leaders within Twitter conversations. To evaluate our proposal, we focused on all the tweets posted on Twitter in the period 2020-21 and we considered just the ones that were Italian-written and were related to COVID-19. After mapping these tweets into the graph, we applied the PageRank algorithm to extract the opinion leaders of these conversations. Results show that our approach is effective in identifying opinion leaders and therefore it might be used to monitor the role that specific accounts (i.e., health authorities, politicians, city administrators) have within specific conversations.

2022 - Towards Multi-Model Big Data Road Traffic Forecast at Different Time Aggregations and Forecast Horizons [Articolo su rivista]
Martoglia, Riccardo; Savoia, Gabriele
abstract

2022 - Visual Exploratory Data Analysis for Copy Number Variation Studies in Biomedical Research [Articolo su rivista]
Vischioni, C.; Bove, F.; Mandreoli, F.; Martoglia, R.; Pisi, V.; Taccioli, C.
abstract

The study of Copy Number Variations (CNVs) is recently emerging as a hot topic for biomedical cancer research. While different data sources, websites, and tools concerning genomic CNVs have been made publicly available, CNV data is still a largely unexplored source of biological information, due to the limitations of currently available analysis tools. To this respect, we propose a novel platform, named VarNuCopy, that overcomes such limitations by pursuing the core principles of Exploratory Data Analysis (EDA) in the context of Copy Number Variation (CNV) data. The platform has been made publicly available as a web application, and is, to our best knowledge, the first tool enabling visual, interactive exploration and analysis of the CNV landscape of multiple species. Through novel client and server-side optimizations inspired by scalable data science, VarNuCopy implements a comprehensive and efficient data exploration solution that empowers researchers to easily recognize complex trends and patterns within a huge amount of data concerning CNVs, and to identify new target genes that might function as tumor suppressor and oncogenes.

2022 - Work Datafication and Digital Work Behavior Analysis as a Source of HRM Insights [Capitolo/Saggio]
Fabbri, T.; Scapolan, A.; Bertolotti, F.; Mandreoli, F.; Martoglia, R
abstract

The digital transformation of organizations is boosting workplace networking and collaboration while making it “observable” with unprecedented timeliness and de-tail. However, the informational and managerial potential of work datafication is still largely unutilized in Human Resource Management (HRM) and its benefits, both at the individual and the organizational level, remain largely unexplored. Our research focuses on the relationship between digitally tracked work behaviors and employee attitudes and, in so doing, it explores work datafication as a source of data driven HRM policies and practices. As a chapter of a wider research pro-gram, this paper presents some data analysis we performed on a collection of En-terprise Collaboration Software (ECS) data, in search for promising correlations between behavioral and relational (digital) work patterns and employee attitudes. To this end, the digital actions performed by 106 employees in one year are trans-formed into a graph representation in order to analyze data under two different points of view: the individual (behavioral) perspective, according to the user who performed the action and the performed action, and the social (relational) perspec-tive, making explicit the interactions between users and the objects of their ac-tions. Different employees’ rankings are thus derived and correlated with their at-titudes. Finally, we discuss the obtained results and their implications in terms of People Analytics and data driven HRM.

2022 - miRNAs Copy Number Variations Repertoire as Hallmark Indicator of Cancer Species Predisposition [Articolo su rivista]
Vischioni, Chiara; Bove, Fabio; De Chiara, Matteo; Mandreoli, Federica; Martoglia, Riccardo; Pisi, Valentino; Liti, Gianni; Taccioli, Cristian
abstract

Aging is one of the hallmarks of multiple human diseases, including cancer. We hypothesized that variations in the number of copies (CNVs) of specific genes may protect some long-living organisms theoretically more susceptible to tumorigenesis from the onset of cancer. Based on the statistical comparison of gene copy numbers within the genomes of both cancer-prone and -resistant species, we identified novel gene targets linked to tumor predisposition, such as CD52, SAT1 and SUMO. Moreover, considering their genome-wide copy number landscape, we discovered that microRNAs (miRNAs) are among the most significant gene families enriched for cancer progression and predisposition. Through bioinformatics analyses, we identified several alterations in miRNAs copy number patterns, involving miR-221, miR-222, miR-21, miR-372, miR-30b, miR-30d and miR-31, among others. Therefore, our analyses provide the first evidence that an altered miRNAs copy number signature can statistically discriminate species more susceptible to cancer from those that are tumor resistant, paving the way for further investigations.

2021 - Preserving and conserving culture: First steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages [Relazione in Atti di Convegno]
Bergamaschi, S.; Martoglia, R.; Ruozzi, F.; Vigliermo, R. A.; De Nardis, S.; Sala, L.; Vanzini, M.
abstract

Managing and sharing cultural heritages also in supranational and multi-literate contexts is a very hot research topic. In this paper we discuss the research we are conducting in the DigitalMaktaba project, presenting the first steps for designing an innovative workflow and tool for the automatic extraction of knowledge from documents written in multiple non-Latin languages (Arabic, Persian and Azerbaijani languages). The tool leverages different OCR, text processing techniques and linguistic corpora in order to provide both a highly accurate extracted text and a rich metadata content, overcoming typical limitations of current state-of-the-art systems; this will enable in the near future the development of an automatic cataloguer which we hope will ultimately help in better preserving and conserving culture in such a demanding scenario.

2021 - Unleashing the power of querying streaming data in a temporal database world: A relational algebra approach [Articolo su rivista]
Grandi, F.; Mandreoli, F.; Martoglia, R.; Penzo, W.
abstract

Modern data-intensive applications have to manage huge quantities of streaming/relational data and need advanced query capabilities involving combinations of continuous queries (CQs) and one-time queries (OTQs) also requiring the verification of complex temporal conditions. In this paper, we go beyond the disjointed panorama of current approaches and adopt a new holistic approach to the integration of stream processing capabilities into the temporal database world based on the streaming table concept. To this end, we propose a full-fledged query interface composed of a TSQL2-like query language with an underlying algebraic framework. The algebraic framework, which is aimed at implementing the query interface on top of a working DBMS, is made up of: (a) the extended temporal algebra TA⋆ supporting OTQs with an hybrid temporal semantics (sequenced and non-sequenced); (b) the continuous temporal algebra CTA that extends TA⋆ with window expressions for CQ specification; (c) the translation of CTA expressions into TA⋆ ones that can be executed by a traditional DBMS with an extended kernel.

2021 - Web2Touch 2021, Semantic Technologies for Smart Information Sharing and Web Collaboration [Relazione in Atti di Convegno]
Bonacin, R.; Fugini, M.; Martoglia, R.; Nabuco, O.; Sais, F.
abstract

This foreword introduces a summary of themes and papers of the Web2Touch (W2T) 2021 Track at the 30th IEEE WETICE Conference held as a virtual Conference, in October 2021. W2T 2021 includes four full papers. They all address relevant issues in the field of collaborative web, semantic technologies, ontologies, knowledge engineering, linked data and internet of things applied to themes of high impact on society, such as education, social inclusion and health. These papers propose to explore affordable technologies to promote and valorize rural areas, to develop ontologies for supporting simulation-based training in Medicine, to use semantic technologies in a framework for promoting reuse and interoperability of Electronic Health Records, as well as to use these technologies to provide recommendations in an Internet of Things device migration scenario.

2020 - Agilechains: Agile supply chains through smart digital twins [Relazione in Atti di Convegno]
Pernici, B.; Plebani, P.; Mecella, M.; Leotta, F.; Mandreoli, F.; Martoglia, R.; Cabri, G.
abstract

Currently, production and logistics performance of a single organization are only partially dependent on the internal resources, but more and more often, they also depend on the interactions that happen across the so-called supply chain, that is, the interactions between the organization and its customers and suppliers. In particular, the production and logistics coordination between actors in the supply chain is often a difficult activity which draws significant resources. Also, such coordination requires continuous revisions and updates to be performed. In Industry 4.0, the digital twins paradigm is currently adopted to represent, simulate and test the behavior of one or more machines and production plants belonging to an organization. This paper introduces the AgileChains paradigm, extending the digital twin paradigm to supply chains and the dynamics of their participants. This extension also positively affects the reactivity and resilience of the internal processes in case the supply chain has to be reconfigured. We propose a novel conceptual framework that combines Service Oriented Architectures (SOA) with Cyber-Physical Systems (CPS), in order to create service oriented systems suited for exchanging data in a dynamic and adaptive way. In addition, we propose a novel data management mechanism capable of finding the right balance between the internal needs of each organization when handling their data and the need to securely and efficiently export data in the supply chain (cf. smart data movement ). Finally, we plan to define governance tools to model and manage the supply chain that treat agility as a first-class citizen. These tools will allow users to dynamically and predictively change the involved actors, as well as the nature of the exchanged data and the data exchange policies, focusing in particular on adverse, risk-prone events, so to minimize the risk and to optimize the supply chain performance both in terms of efficiency and effectiveness.

2020 - An Intelligent Dashboard for Assisted Tweet Composition in the Cultural Heritage Area (Work-in-progress) [Relazione in Atti di Convegno]
Martoglia, R.; Montangero, M.
abstract

Cultural Heritage institutions are nowadays using social media to communicate with citizens and tourists. However, providing actual effective communication is not an easy task, as every day millions of messages are posted through social media. Thus, getting visibility is not trivial. In this paper we present the architecture of a dashboard, accessible by mobile Android devices, to support museum social media managers in composing effective tweets by providing suggestions to improve message drafts. At this aim, the application exploits machine learning techniques over data related to tweets posted by museums in the past.

2020 - Data-driven vs knowledge-driven inference of health outcomes in the ageing population: A case study [Relazione in Atti di Convegno]
Ferrari, D.; Guaraldi, G.; Mandreoli, F.; Martoglia, R.; Milić, Jovana; Missier, Paolo
abstract

Preventive, Predictive, Personalised and Participative (P4) medicine has the potential to not only vastly improve people's quality of life, but also to significantly reduce healthcare costs and improve its efficiency. Our research focuses on age-related diseases and explores the opportunities offered by a data-driven approach to predict wellness states of ageing individuals, in contrast to the commonly adopted knowledge-driven approach that relies on easy-to-interpret metrics manually introduced by clinical experts. This is done by means of machine learning models applied on the My Smart Age with HIV (MySAwH) dataset, which is collected through a relatively new approach especially for older HIV patient cohorts. This includes Patient Related Outcomes values from mobile smartphone apps and activity traces from commercial-grade activity loggers. Our results show better predictive performance for the data-driven approach. We also show that a post hoc interpretation method applied to the predictive models can provide intelligible explanations that enable new forms of personalised and preventive medicine.

2020 - InstaCircos: A Web Application for Fast and Interactive Circular Visualization of Large Genomic Data (Work in Progress) [Relazione in Atti di Convegno]
Ghidoni, G.; Martoglia, R.; Taccioli, C.; Vischioni, C.
abstract

One of the most effective visualizations for genomics data is the circular one, supported by popular packages and visualization suites. Many tools are available, however most of them share a number of negative points including limited ease of installation/usage, slow performance and memory limitations (making them unfeasible for very large genomes such as the human one) and non interactivity. In this paper we present the ongoing work on InstaCircos, a web application born from the scientific collaboration between Big Data Analytics and Bioinformatics researchers and aiming at overcoming the available tools' limitations. It provides advanced visualization features through an easy to use web interface and offers interactive functionalities and near real-Time performances thanks to an integrated big data management back-end based on MongoDB.

2020 - VarCopy: A Visual Exploratory Data Analysis Platform for Copy Number Variation Studies [Relazione in Atti di Convegno]
Bove, F.; Mandreoli, F.; Martoglia, R.; Pisi, V.; Taccioli, C.; Vischioni, C.
abstract

The study of such a complex phenomenon as cancer, which depends on several but unexplored and unclear factors, needs new ways to visualize, analyze and combine different data both on species characteristics and genes function. To this respect, we propose a novel platform, named VarCopy, supporting visual Exploratory Data Analysis (EDA) in the context of Copy Number Variation (CNV) data. The platform will be publicly available as a web application soon, and is, to our best knowledge, the first tool allowing visual, interactive exploration and analysis of the CNV landscape of multiple species, allowing the identification of new target genes that might be useful for biomedical research.

2020 - Web2Touch 2020-21 : Semantic Technologies for Smart Information Sharing and Web Collaboration [Relazione in Atti di Convegno]
Bonacin, R.; Fugini, M.; Martoglia, R.; Nabuco, O.; Sais, F.
abstract

This foreword introduces a summary of themes and papers of the Web2Touch (W2T) 2020-21 Track at the 29th IEEE WETICE Conference held as a virtual Conference, in October 2020. W2T 2020-21 includes six full papers and four short papers. They all address relevant issues in the field of information sharing for collaboration, including, big data analytics, knowledge engineering, linked open data, applications of smart Web technologies, and smart care. The papers address a portfolio of hot issues in research and applications of semantics, smart technologies (e.g., IoT, sensors, devices for tele-monitoring, and smart contents management) with crucial topics, such as big data analysis, knowledge representation, smart enterprise management, among the others. This track shows how cooperative technologies based on knowledge representation, intelligent tools, and enhanced Web engineering can enhance collaborative work through smart service design and delivery, so it contributes to radically change the role of the semantic Web and applications.

2020 - Work datafication and digital work behavior analysis as a source of social good [Relazione in Atti di Convegno]
Bertolotti, F.; Fabbri, T.; Mandreoli, F.; Martoglia, R.; Scapolan, A. C.
abstract

The digital transformation of organizations is boosting workplace networking and collaboration while making it 'observable' with unprecedented timeliness and detail. However, the informational and managerial potential of work datafication is still largely unutilized in Human Resource Management (HRM) and its social benefits, both at the individual and the organizational level, remain largely unexplored. Our research focuses on the relationship between digitally tracked work behaviors and employee attitudes and, in so doing, it explores work datafication as a source of social good. As part of a wider research program, this paper presents some data analysis we performed on a collection of Enterprise Collaboration Software (ECS) data, in search for promising correlations between behavioral and relational (digital) work patterns and employee attitudes. To this end, we transformed the digital actions performed by 106 employees during a one year period into a graph representation to analyze data under two different points of view: the individual (behavioral) perspective, according to the user who performed the action and the action undertaken, and the social (relational) perspective, making explicit the interactions between users and the objects of their actions. Different employees' rankings are thus derived and correlated with their attitudes. We discuss the obtained results and their benefits in terms of perspective social good for both the company and the employee

2019 - Employee attitudes and (Digital) collaboration data: a preliminary analysis in the HRM field [Relazione in Atti di Convegno]
Fabbri, T.; Mandreoli, F.; Martoglia, R.; Scapolan, A. C.
abstract

The digital transformation of organizations is making workplace collaboration more and more powerful and work always "observable"; however, the informational and managerial potential of the generated data is still largely unutilized in Human Resource Management (HRM). Our research, conducted in collaboration with business engineers and economists, aims at exploring the relationship between digital work behaviors and employee attitudes. This paper is a work-in-progress contribution that presents a preliminary phase of data analysis we performed on a collection of Enterprise Collaboration Software (ECS) data. In the exploratory data analysis step, we analyze data in their original table format and elaborate it according to the user who performed the action and the performed action. Then, we move to a graph representation in order to make explicit the interaction between users and the objects of their actions. Finally, we introduce the concept of employee-attitude-oriented pattern as a mean to derive significant views over the overall graph and discuss Social Network Analysis (SNA) approaches that can be exploited for our purposes.

2019 - Fitness tracking wearable devices and a dedicated smart phone app (MySAwH App) to predict quality of life in PLWH: a multi-centre prospective study [Abstract in Atti di Convegno]
Guaraldi, G; Orsini, M; Caselgrandi, A; Malagoli, A; D'Imprima, F; Milic, J; Ghinelli, F; Martoglia, R; Mandreoli, F; Ferrari, D; Liu, G; Bloch, M
abstract

2019 - Towards Patient-Centric Healthcare: Multi-Version Ontology-Based Personalization of Clinical Guidelines [Capitolo/Saggio]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo
abstract

Retrieving personalized care plans from a guideline repository is an ever-increasing need in the medical world, not only for physicians but also for empowered patients. In this chapter, we continue our long-lasting research on ontology-based personalized access to very large collections of multi-version documents by addressing a novel challenge: dealing with multi-version clinical guidelines but also with a multi-version ontology used to support personalized access to them. Efficiency is ensured by a newly introduced annotation scheme for guidelines and solutions to cope with the evolution of ontology structure. The tests performed on a prototype implementation confirm the goodness of the approach. Finally, the chapter proposes an exhaustive analysis of the state of the art in this field and, in the final part, a discussion where we expand our vision to related research themes and possible further developments of our work.

2019 - Web2Touch 2019: Semantic Technologies for Smart Information Sharing and Web Collaboration [Relazione in Atti di Convegno]
Bonacin, R.; Fugini, M.; Martoglia, R.; Nabuco, O.; Sais, F.
abstract

This foreword introduces a summary of themes and papers of the Web2Touch (W2T) 2019 Track at the 28th IEEE WETICE Conference held in Capri, June 2019. W2T 2019 includes ten full papers and one short paper. They all address relevant issues in the field of information sharing for collaboration, including, big data analytics, knowledge engineering, linked open data, applications of smart Web technologies, and smart care. The papers are a portfolio of hot issues in research and applications of semantics, smart technologies (e.g., IoT, sensors, devices for tele-monitoring, and smart contents management) with crucial topics, such as big data analysis, knowledge representation, smart enterprise management, among the others. This track shows how cooperative technologies based on knowledge representation, intelligent tools, and enhanced Web engineering can enhance collaborative work through smart service design and delivery, so it contributes to radically change the role of the semantic Web and applications.

2018 - 5 steps to make art museums tweet influentially [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

A growing number of museums has started using social networks as different forms of engagement that can act outside museum architectural bounds. Specifically, museum leaders are praising Twitter as a necessary tool to any online programming or presence in museums today. Nevertheless, using Twitter in a satisfactory way so to increase museums' influence is not an easy task and there has been a gap between its usage and the possibilities it represents. In this paper, we propose an easily understandable framework to analyze the key content factors in museum conversations, including novel formulas for the evaluation of tweets and Twitter accounts influence. We apply the framework to a dataset of 100,000 messages related to 26 museum accounts to understand which museum is more influential in writing tweets, and which features have more impact on the influence of a tweet. Finally, we propose 5 key steps that museums can perform in order to write more influential tweets.

2018 - A User-Aware and Semantic Approach for Enterprise Search [Articolo su rivista]
Cabri, Giacomo; Martoglia, Riccardo
abstract

This article describes how in addition to general purposes search engines, specialized search engines have appeared and have gained their part of the market. An enterprise search engine enables the search inside the enterprise information, mainly web pages but also other kinds of documents; the search is performed by people inside the enterprise or by customers. This article proposes an enterprise search engine called AMBIT-SE that relies on two enhancements: first, it is user-aware in the sense that it takes into consideration the profile of the users that perform the query; second, it exploits semantic techniques to consider not only exact matches but also synonyms and related terms. It performs two main activities: (1) information processing to analyse the documents and build the user profile and (2) search and retrieval to search for information that matches user’s query and profile. An experimental evaluation of the proposed approach is performed on different real websites, showing its benefits over other well-established approaches.

2018 - SocialGQ: Towards semantically approximated and user-Aware querying of social-graph data [Relazione in Atti di Convegno]
Martoglia, Riccardo
abstract

The proliferation of social and collaborative sites makes users increasingly active in the generation of socialgraph data; however, such sea of data often hinders them from finding the information they need. In this paper, we present SocialGQ ("Social-Graph Querying"), a novel approach for the effective and efficient querying of socialgraph data overcoming the limitations of typical search approaches proposed in the literature. SocialGQ allows users to compose complex queries in a simple way, and is able to retrieve useful knowledge (top-k answers) by jointly exploiting: (a) the structure of the graph, semantically approximating the user's requests with meaningful answers; (b) the unstructured textual resources of the graph; (c) its social and user-Aware dimension. An experimental evaluation comparing SocialGQ to leading approaches shows strong gains on a real social-graph data scenario.

2018 - Standards, Security and Business Models: Key Challenges for the IoT Scenario [Articolo su rivista]
Bujari, Armir; Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela; Ronzani, Daniele
abstract

The number of physical objects connected to the Internet constantly grows and a common thought says the IoT scenario will change the way we live and work. Since IoT technologies have the potential to be pervasive in almost every aspect of a human life, in this paper, we deeply analyze the IoT scenario. First, we describe IoT in simple terms and then we investigate what current technologies can achieve. Our analysis shows four major issues that may limit the use of IoT (i.e., interoperability, security, privacy, and business models) and it highlights possible solutions to solve these problems. Finally, we provide a simulation analysis that emphasizes issues and suggests practical research directions.

2018 - Towards tweet content suggestions for museum media managers [Relazione in Atti di Convegno]
Furini, Marco; Martoglia, Riccardo; Mandreoli, Federica; Montangero, Manuela
abstract

Cultural Heritage institutions are embracing social technologies in the attempt to provide an effective communication towards citizens. Although it seems easy to reach millions of people with a simple message posted on social media platforms, media managers know that practice is different from theory. Millions of posts are competing every day to get visibility in terms of likes and retweets. The way text, images, hashtags and links are combined together is critical for the visibility of a post. In this paper, we propose to exploit machine learning techniques in order to predict whether a tweet will likely be appreciated by Twitter users or not. Through an experimental assessment, we show that it is possible to provide insights about the tweet features that will likely influence its reception/recommendation among readers. The preliminary tests, performed on a real-world dataset of 19,527 museum tweets, show promising accuracy results.

2018 - Web2Touch 2018: Semantic technologies in smart information sharing and web collaboration [Relazione in Atti di Convegno]
Bonacin, Rodrigo; Fugini, Mariagrazia; Martoglia, Riccardo; Nabuco, Olga
abstract

We present Web2Touch 2018, one of the Tracks at the 27th IEEE WETICE Conference. Web2Touch 2018 includes five full papers and one short paper tackling very up-to-date issues in information sharing and collaboration, including, among others, big data analytics, development of virtual agents and assistants, privacy and security analysis and evaluation. Papers come from areas such as knowledge engineering, linked data, big data, security, safety, and web science. The overall focus is on how research on semantics coupled with crucial topics such as big data analysis, privacy, knowledge representation, and enterprise contents management, among the others, can improve services and collaboration and push forward new ways of interpreting the role of the web.

2017 - A relational algebra for streaming tables living in a temporal database world [Relazione in Atti di Convegno]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

The recently introduced streaming table concept, a fully native representation of streaming data inside a DBMS, enabled modern data-intensive applications with one-time queries (OTQs) and continuous queries (CQs) capabilities on both streaming and standard relational tables. In this paper, we fully acknowledge the temporal nature of streaming tables and we propose to go one step further and integrate them in a temporal DBMS context, where time management is native. Our aim is to break the traditional barrier between the streaming and the temporal worlds, offering complete interoperability between streams and temporal data. To this end, we present a continuous temporal algebra supporting both OTQs and CQs seamlessly on streaming, standard and temporal relational tables. We further show how the transition from continuous to one-time semantics can be managed by defining suitable translation rules, which can also be used as a basis for the implementation of the proposed continuous algebra in a temporal DBMS.

2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.

2017 - IoT: Science Fiction or real revolution? [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

It's been many years since media began talking about the wonders of the IoT scenario, where a smart fridge checks the milk ex- piration date and automatically compiles the shopping list, but in the real life how many people have this smart fridge in the kitchen? Yet the interest around the IoT scenario is growing every day, so in this paper we try to figure out if IoT is science fiction or a real revolution. In particu- lar, we describe in simple terms the IoT scenario, what can be done with current technologies, what are the main obstacles that limit the success and the wide use of IoT and we highlight directions that can make IoT a true reality.

2017 - Multi-version ontology-based personalization of clinical guidelines for patient-centric healthcare [Articolo su rivista]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo
abstract

When dealing with a specific patient case, physicians are often interested in retrieving a personalized version of a clinical guideline, that is a version tailored to their use needs. In a patient-centric scenario, empowered patients make up another class of users interested in retrieving personalized care plans from a guideline repository. In their previous work, the authors proposed techniques to efficiently provide ontology-based personalized access to very large collections of multi-version clinical guidelines. In this paper, they address the problem of also dealing with a multi-version ontology used to support personalized access to clinical guidelines. The authors' approach allows the semantic indexing of guideline contents with respect to multi-version ontology classes and exploits the IS-A relationship among such classes for granting personalized access. Efficiency is ensured by a newly introduced annotation scheme for guidelines and solutions to cope with the evolution of ontology structure. The tests performed on a prototype implementation confirm the goodness of the approach.

2017 - Streaming Tables: Native Support to Streaming Data in DBMSs [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

Data stream management systems (DSMSs) are conceived for running continuous queries (CQs) on the most recently streamed data. This model does not completely fit the needs of several modern data-intensive applications that require to manage recent/historical/static data and execute both CQs and OTQs joining such data. In order to cope with these new needs, some DSMSs have moved toward the integration of database management systems (DBMSs) functionalities to augment their capabilities. In this paper we adopt the opposite perspective and we lay the groundwork for extending DBMSs to natively support streaming facilities. To this end, we introduce a new kind of table, the streaming table, as a persistent structure where streaming data enters and remains stored for a long period, ideally forever. Streaming tables feature a novel access paradigm: continuous writes and one-time as well as continuous reads. We present a streaming table implementation and two novel types of indices that efficiently support both update and scan high rates. A detailed experimental evaluation shows the effectiveness of the proposed technology.

2017 - The Use of Hashtags in the Promotion of Art Exhibitions [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

Hashtags are increasingly used to promote, foster and group conversations around specific topics. For example, the entertainment industry widely uses hashtags to increase interest around their products. In this paper, we analyze whether hashtags are effective in a niche scenario like the art exhibitions. The obtained results show very different behaviors and confused strategies: from museums that do not consider hashtags at all, to museums that create official hastags, but hardly mention them; from museums that create multiple hashtags for the same exhibition, to those that are very confused about hashtag usage. Furthermore, we discovered an interesting case, where a smart usage of hashtags stimulated the interest around art. Finally, we highlight few practical guidelines with behaviors to follow and to avoid; the guidelines might help promoting art exhibitions.

2017 - Web2Touch 2017: Semantic technologies in smart information sharing and web collaboration [Relazione in Atti di Convegno]
Nabuco, Olga; Bonacin, Rodrigo; Martoglia, Riccardo
abstract

This report presents Web2Touch 2017, a Track at the 26th IEEE WETICE Conference. This year Web2Touch completed 10 editions focusing on scientific and practical works about semantic web as a support for collaborative platforms in their need for sharing knowledge. Web2Touch is an open forum for studies in multiple application domains including, for example, web science, health systems, collaborative learning, smart cooperative systems, and web collaboration and communication in general. Web2Touch 2017 includes five full papers and one short paper. The overall focus of the contributions is on the research of how semantics can improve information sharing, services, and collaboration on "the new web".

2016 - A Data Management Middleware for ITS Services in Smart Cities [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

A major societal challenge to be tackled in megacities is sustainable urban transportation. Intelligent Transportation Systems (ITSs) are actually data-centric applications that need to store and query real-time as well as historical/static data from various data sources and have to provide timely responses to users' transportation needs. In this paper we introduce a data management middleware that offers the robustness of a common framework to support the development of smart applications having the above needs. It supports the efficient storage and access to real-time and historical/static data and provides both one-time and continuous query capabilities. While the middleware has been designed to be general and versatile to support data management for any kind of application, in this paper we explore its suitability to ITS smart services also by means of an experimental evaluation conducted on a variety of traffic scenarios.

2016 - AMBIT-SE: Towards a User-aware Semantic Enterprise Search Engine [Relazione in Atti di Convegno]
Cabri, Giacomo; Gaddi, Stefano; Martoglia, Riccardo
abstract

Search engines represent one of the most exploited tools both in our everyday life and in our work. In this paper we propose a user-aware semantic enterprise search engine called AMBIT-SE. It is "enterprise" in the sense that it is focused on the search in enterprise websites; the "semantic" aspect is related to the fact that it exploits not an exact word match, but relies also on the meaning of the words by means of synonyms and related terms; finally, to produce query results it takes into account also the user information, which turns out to be very useful to improve the search. We explain how our system works and report the results of experiments on different websites.

2016 - Designing a Collaborative Middleware for Semantic and User-aware Service Composition [Relazione in Atti di Convegno]
Cabri, Giacomo; Martoglia, Riccardo; Zambonelli, Franco
abstract

The large availability of services, provided by different means such as the Web, smartphone apps and wearable devices, provides users a valuable support for their everyday activities, but at the same time introduces the need for a tailored choice and exploitation of them.Several approaches have been proposed that take into account users' preferences, but a comprehensive user-aware approach is still missing.In this paper we propose a middleware for composing and exploiting services that exhibits some key features: (i) it considers the profile of users that exploit the service to choose appropriate services, (ii) it exploits semantic similarity techniques to make the choice more effective, and (iii) it enables the collaboration among users.By means of a case study we present a possible scenario that can take advantage of our middleware, and show how it can be exploited.

2016 - Exploiting Semantics for Searching Agricultural Bibliographic Data [Articolo su rivista]
Beneventano, Domenico; Bergamaschi, Sonia; Martoglia, Riccardo
abstract

Filtering and search mechanisms which permit to identify key bibliographic references are fundamental for researchers. In this paper we propose a fully automatic and semantic method for filtering/searching bibliographic data, which allows users to look for information by specifying simple keyword queries or document queries, i.e. by simply submitting existing documents to the system. The limitations of standard techniques, based on either syntactical text search and on manually assigned descriptors, are overcome by considering the semantics intrinsically associated to the document/query terms; to this aim, we exploit different kinds of external knowledge sources (both general and specific domain dictionaries or thesauri). The proposed techniques have been developed and successfully tested for agricultural bibliographic data, which plays a central role to enable researchers and policy makers to retrieve related agricultural and scientific information by using the AGROVOC thesaurus.

2016 - Journal of Computer and System Sciences Special Issue on Query Answering on Graph-Structured Data [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

Graph-based data models have recently gained much popularity as powerful means for data representation in several database application areas. Notable examples of application domains where data is naturally represented in graph-based form are knowledge bases, biological and chemical databases, Web-scattered data, healthcare, personal information management (PIM), enterprise information management (EIM) systems, online mapping/routing services, and social networks, just to mention a few. The heterogeneity, complexity and largeness of contents that characterize datasets in these fields unquestionably make the querying experience a really challenging task. This special issue of the Journal of Computer and System Sciences follows the 2013 and 2014 editions of the International Workshop on Querying Graph Structured Data (GraphQ), which were co-located with the International Conference on Extending Database Technology and were held in Genoa, Italy and in Athens, Greece, respectively. The two editions of the workshop attracted a large world-wide audience of researchers and professionals, and yielded several excellent presentations exploring how to effectively and efficiently support graph queries in different application domains. This special issue includes a shortlist of selected contributions that were extended to provide deeper investigations along three main research directions: (1) graph query answering; (2) graph query processing; (3) graph data dynamics.

2016 - Towards User-Aware Service Composition [Relazione in Atti di Convegno]
Cabri, Giacomo; Leoncini, Mauro; Martoglia, Riccardo; Zambonelli, Franco
abstract

Our everyday life is more and more supported by the information technology in general and specific services provided by means of our electronic devices. The AMBIT project (Algorithms and Models for Building context-dependent Information delivery Tools) aims at providing a support to develop services that are automatically tailored based on the user profile. However, while the adaptation of the single services is the first step, the next step is to achieve adaptation in the composition of different services. In this paper, we explore how services can be composed in a user-aware way, in order to decide the composition that better meets users’ requirements. That is, we exploit the user profile not only to provide her customized services, but also to compose them in a suitable way.

2016 - Web2Touch 2016: Evolution and security of collaborative web knowledge [Relazione in Atti di Convegno]
Nabuco, Olga; Bonacin, Rodrigo; Fugini, Mariagrazia; Martoglia, Riccardo
abstract

This report introduces the Web2Touch 2016, a Track at the 25th IEEE WETICE Conference. This track involves works from collaborative web knowledge research community and related themes. Web2Touch 2016 explores the state-of-the-art on users' practical experiences, as well as trends and research topics paving the way for future collaborative approaches to knowledge management. Papers come from areas such as computational analysis, management of contextual information, support to personalized information management, collaborative knowledge production, consistency, knowledge engineering and security modeling for multiple knowledge sources. The overall focus is on determining how to route, organize, and present contextual and meaningful information and services to facilitate collaboration.

2015 - AMBIT: Semantic Engine Foundations for Knowledge Management in Context-dependent Applications [Relazione in Atti di Convegno]
Martoglia, Riccardo
abstract

Context-aware application and services proposing potentially useful information to users are more and more widespread; however, their actual usefulness is often limited by the “syntactical” notion of context they adopt. The recently started AMBIT project aims to provide a general software architecture for developing semantic-based context-aware tools in a number of vertical case study applications. In this paper, we focus on the knowledge management foundations we are laying for the Semantic Engine of the AMBIT architecture. The proposed semantic analysis and similarity techniques: (a) exploit the textual information deeply characterizing both users and the information to be retrieved; (b) overcome the limits of syntactic methods by leveraging on the strengths of both classic information retrieval and knowledge-based analysis and classification, ultimately proposing information relevant to the user interests. The experimental evaluation of a preliminary implementation in an actual “cultural territorial enhancement” scenario already shows promising results.

2015 - Approximating expressive queries on graph-modeled data: The GeX approach [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

We present the GeX (Graph-eXplorer) approach for the approximate matching of complex queries on graph-modeled data. GeX generalizes existing approaches and provides for a highly expressive graph-based query language that supports queries ranging from keyword-based to structured ones. The GeX query answering model gracefully blends label approximation with structural relaxation, under the primary objective of delivering meaningfully approximated results only. GeX implements ad-hoc data structures that are exploited by a top-k retrieval algorithm which enhances the approximate matching of complex queries. An extensive experimental evaluation on real world datasets demonstrates the efficiency of the GeX query answering.

2015 - Effective Aggregation and Querying of Probabilistic RFID Data in a Location Tracking Context [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

RFID applications usually rely on RFID deployments to manage high-level events such as tracking the location that products visit for supply-chain management, localizing intruders for alerting services, and so on. However, transforming low-level streams into high-level events poses a number of challenges. In this paper, we deal with the well known issues of data redundancy and data-information mismatch: we propose an on-line summarization mechanism that is able to provide small space representation for massive RFID probabilistic data streams while preserving the meaningfulness of the information. We also show that common information needs, i.e. detecting complex events meaningful to applications, can be effectively answered by executing temporal probabilistic SQL queries directly on the summarized data. All the techniques presented in this paper are implemented in a complete framework and successfully evaluated in real-world location tracking scenarios.

2015 - Exploiting Semantics for Filtering and Searching Knowledge in a Software Development Context [Articolo su rivista]
Bergamaschi, Sonia; Martoglia, Riccardo; Sorrentino, Serena
abstract

Software development is still considered a bottleneck for SMEs (Small and Medium Enterprises) in the advance of the Information Society. Usually, SMEs store and collect a large number of software textual documentation; these documents might be profitably used to facilitate them in using (and re-using) Software Engineering methods for systematically designing their applications, thus reducing software development cost. Specific and semantics textual filtering/search mechanisms, supporting the identification of adequate processes and practices for the enterprise needs, are fundamental in this context. To this aim, we present an automatic document retrieval method based on semantic similarity and Word Sense Disambiguation (WSD) techniques. The proposal leverages on the strengths of both classic information retrieval and knowledge-based techniques, exploiting syntactical and semantic information provided by general and specific domain knowledge sources. For any SME, it is as easily and generally applicable as are the search techniques offered by common enterprise Content Management Systems (CMSs). Our method was developed within the FACIT-SME European FP-7 project, whose aim is to facilitate the diffusion of Software Engineering methods and best practices among SMEs. As shown by a detailed experimental evaluation, the achieved effectiveness goes well beyond typical retrieval solutions.

2014 - AMBIT: Towards an Architecture for the Development of Context-dependent Applications and Systems [Relazione in Atti di Convegno]
Cabri, Giacomo; Leoncini, Mauro; Martoglia, Riccardo
abstract

The development of ubiquitous services tailored to the needs and expectations of a very large number of potential users (especially mobile users) requires that future applications and systems be aware of the service fruition contexts and possibly of accurate user proles. The AMBIT research project aims at providing a general model of con- text as well as a platform that can be exploited to build and deploy dif- ferent kinds of context-dependent applications and systems. We aim at overcoming the restrictions of the existing approaches, which are mainly due to the limited notion of context they propose (if any). In partic- ular, we stress the fact that current technologies does not accurately consider the notion of context semantics and user prole, which is the main source of the ooding of useless data that overload systems and often users' minds.

2014 - Advanced Data Management for real-time data intensive applications and services [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo
abstract

This work focuses on Data Management for data intensive real-time applications and services in a mobile and pervasive transportation scenario. It presents the main goals achieved in the course of the PEGASUS Project, a project funded by Industria 2015 programme and having the overall goal to build an advanced Intelligent Transportation System (ITS).

2014 - Data management techniques for active RFID applications [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

In the last several years, RFID technology has gained significant popularity due to its ability of de- tecting objects and people carrying small RFID tags in an environment equipped with RFID readers. This research involved the design, implementation and experimental evaluation of a realtime system that ad- dresses the above mentioned data management issues in the context of RFID location tracking systems.

2014 - Online filtering and uncertainty management techniques for rfid data processing [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

RFID is one of the emerging technologies for a wide-range of applications, including supply chain and asset management, healthcare and intruder localization. However, the nature of an RFID data stream is noisy, redundant and unreliable, making it unsuitable for direct use in applications. In this paper, we propose specific RFID Online Filtering and Uncertainty Management techniques that operate on unreliable and imprecise data streams in order to transform them into reliable probabilistic data that can be meaningful to the applications. Our proposal makes use of an Hidden Markov Model (HMM) that continuously infers hidden variables (locations, in case of above example) based on sensor readings. The resulting data can be directly stored in a probabilistic database table for further analysis. All the techniques presented in this paper are implemented in a complete framework and succesfully evaluated in real-world object tracking scenarios.

2014 - RPDM: A System for RFID Probabilistic Data Management [Articolo su rivista]
Razia, Haider; Mandreoli, Federica; Martoglia, Riccardo
abstract

Data streams are more and more commonly generated in a large number of scenarios by audio and video devices, Global Positioning System (GPS), Radio Frequency Identification (RFID) and other types of sensors. In particular, RFID technology has recently gained significant popularity, especially for real-time people and goods tracking, however the noisy, redundant and unreliable nature of RFID streams, coupled with their huge size, can make their exploitation and management difficult. In this paper, we present a realtime system for RFID Probabilistic Data Management (RPDM). The system manages unreliable and noisy raw RFID data and transforms them into reliable meaningful probabilistic data streams by means of a newly proposed method based on a probabilistic Hidden Markov Model (HMM). Moreover, to handle the huge data volume generated by RFID deployments, RPDM proposes and implements a simple on-line summarization mechanism, which is able to provide small space representation for the massive RFID probabilistic data streams while preserving the meaningful information. The results are promptly stored in a probabilistic database, in such a way that a wide range of probabilistic queries can be submitted and answered effectively. The experimental evaluation proves the feasibility of the approach in real-world object tracking scenarios.

2014 - UCbase 2.0: ultraconserved sequences database (2014 update) [Articolo su rivista]
Lomonaco, V; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, L; Emmett, W; Bicciato, Silvio; Taccioli, C.
abstract

UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it.

2013 - A Framework for ITS Data Management in a Smart City Scenario [Relazione in Atti di Convegno]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, W.
abstract

In this paper we introduce a technological framework to efficiently support data management in a modern Intelligent Transportation System (ITS). The proposed technology enables the efficient storage of a variety of recent/historical/static data and guarantees its effective querying by supporting continuous as well as one-time queries for the delivering of real-time traffic services. The framework also offers a scalable solution for coping with the acquisition of huge volumes of data by employing data reduction techniques in Vehicle-to-Infrastructure transmissions. Experimental evaluation on the Linear Road ITS benchmark and along various simulated scenarios demonstrates that the proposed framework efficiently supports smart city data needs.

2013 - UNIMORE at ImageCLEF 2013: Scalable Concept Image Annotation [Relazione in Atti di Convegno]
Grana, Costantino; Serra, Giuseppe; Manfredi, Marco; Cucchiara, Rita; Martoglia, Riccardo; Mandreoli, Federica
abstract

In this paper we propose a large-scale Image annotation system for the Scalable Concept Image Annotation task. For each concept to be detected a separated classifier is built using the provided textual annotation. Images are represented as a Multivariate Gaussian distribution of a set of local features extracted over a dense regular grid. Textual analysis, on the web pages containing training images, is performed to retrieve a relevant set of samples for learning each concept classifier. An online SVMs solver based on Stochastic Gradient Descent is used to manage the large amount of training data. Experimental results show that the combination of different kind of local features encoded with our strategy achieves very competitive performance both in terms of mAP and mean F-measure.

2013 - Wearable Queries: Adapting Common Retrieval Needs to Data and Users [Relazione in Atti di Convegno]
Catania, Barbara; Guerrini, Giovanna; Belussi, Alberto; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

The wealth of information generated by users interacting with the network and its applications is often under-utilized due to complications in accessing heterogeneous and dynamic data and retrieving relevant information from sources having possibly unknown formats and structures. Processing complex requests on such information sources can, thus, be costly, though not guaranteeing user satisfaction. Fur- thermore, dynamic contexts prevent substantial user involvement in the interpretation of the request. The paper envisions an innovative solution to process the above mentioned requests, limiting user involvement by ex- ploiting information on: (a) user context (geo-location, interests, needs); (b) data and processing quality; (c) similar requests repeated over time. By interpreting a request in a novel way by means of a Wearable Query (WQ), i.e., a query that captures the user and request specificities, we envision a methodological and technological solution for WQs in the presence of repeated information needs in distributed, heterogeneous, dynamic environments, with emphasis on the geo-spatial dimension and on data quality.

2012 - A Framework For Biological Data Normalization, Interoperability, and Mining for Cancer Microenvironment Analysis [Relazione in Atti di Convegno]
M., Ceci; M., Coluccia; F., Fumarola; P. H., Guzzi; Mandreoli, Federica; Martoglia, Riccardo; E., Masciari; M., Mecella; W., Penzo
abstract

Over the last decade, the advances in the high-throughput omic technologies have given the possibility to profile tumor cells at different levels, fostering the discovery of new biological data and the proliferation of a large number of bio-technological databases. In this paper we describe a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in a such huge quantity of data. In this framework, a key role is played by the Connectivity Map, a databank which relates diseases, physiological processes, and the action of drugs. The system will be used in a pilot study on the Multiple Myeloma (MM).

2012 - A Semantic Method for Searching Knowledge in a Software Development Context [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Martoglia, Riccardo; Sorrentino, Serena
abstract

The FACIT-SME European FP-7 project targets to facilitate the use and sharing of Software Engineering (SE) methods and best practices among software developing SMEs. In this context, we present an automatic semantic document searching method based on Word Sense Disambiguation which exploits both syntactic and semantic information provided by external dictionaries and is easily applicable for any SME.

2012 - Efficient management of multi-version clinical guidelines [Articolo su rivista]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo
abstract

Clinical medicine and health-care developments in recent years testiﬁed a tremendous increase in the number of available guidelines, i.e., ‘‘best practices’’ encoding and standardizing care procedures for a given disease. Clinical guidelines are subject to continuous development and revision by committees of expert physicians and health authorities and, thus, multiple versions coexist as a consequence of the clinical and healthcare activities. Moreover, several alternatives are usually included in order to make the guidelines as general as possible, making them difﬁcult to handle both in manual and automated fashions. In this work, we will introduce techniques to model and to provide efﬁcient personalized access to very large collections of multi-version clinical guidelines, which can be stored both in textual and in executable format in an XML repository. In this way, multiple temporal perspectives, patient proﬁle and context information can be used by an automated personalization service to efﬁciently build on demand a guideline version tailored to a speciﬁc use case.

2012 - Evaluation of Data Reduction Techniques for Vehicle to Infrastructure Communication Saving Purposes [Relazione in Atti di Convegno]
L., Carafoli; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo
abstract

In this paper we investigate the employment of different data reduction techniques to minimize V2I communication in an Intelligent Transportation System (ITS). We consider the context of the PEGASUS Project, where vehicles are equipped with sensor-based devices able to compute and communicate to a Control Centre (CC) information like vehicleśs position and speed. The CC relies on a general-purpose data management module that supports the execution of continuous queries as well as standard SQL one-time queries on the collected data to provide various infomobility services. The paper explores two categories of data reduction techniques: independent techniques, where vehicles autonomously send data to the CC, and information-need techniques, where data is sent by taking into account additional data received from the CC. The paper discusses and implements the technical changes needed in the CC to support the required info-mobility services under the reduced availability of data. All the investigated techniques have been extensively evaluated in a variety of traffic scenarios.

2012 - FACIT-SME - Facilitate IT-providing SMEs by Operation-related Models and Methods [Software]
Bergamaschi, Sonia; Beneventano, Domenico; Martoglia, Riccardo
abstract

The FACIT SME project addresses SMEs operating in the ICT domain. The goals are (a) to facilitate the use of Software Engineering (SE) methods and to systematize their application integrated with the business processes, (b) to provide efficient and affordable certification of these processes according to internationally accepted standards, and (c) to securely share best practices, tools and experiences with development partners and customers. The project targets (1) to develop a novel Open Reference Model (ORM) for ICT SME, serving as knowledge backbone in terms of procedures, documents, tools and deployment methods; (2) to develop a customisable Open Source Enactment System (OSES) that provides IT support for the project-specific application of the ORM; and (3) to evaluate these developments with 5 ICT SMEs by establishing the ORM, the OSES and preparing the certifications. The approach combines and amends achievements from Model Generated Workplaces, Certification of SE for SMEs, and model-based document management. The consortium is shaped by 4 significant SME associations as well as a European association exclusively focused on the SME community in the ICT sector. Five R&D partners provide the required competences. Five SMEs operating in the ICT domain will evaluate the results in daily-life application. The major impact is expected for ICT SMEs by (a) optimising their processes based on best practise; (b) achieving internationally accepted certification; and (c) provision of structured reference knowledge. They will improve implementation projects and make their solutions more appealing to SMEs. ICT SME communities (organized by associations) will experience significant benefit through exchange of recent knowledge and best practises. By providing clear assets (ORM and OSES), the associations shape the service offering to their members and strengthen their community. The use of Open Source will further facilitate the spread of the results across European SMEs.

2012 - Fast On-Line Summarization of RFID Probabilistic Data Streams [Relazione in Atti di Convegno]
R., Haider; Mandreoli, Federica; Martoglia, Riccardo; S., Sassatelli
abstract

Abstract. RFID applications usually rely on RFID deployments to manage high-level events. A fundamental relation for these purposes is the location of people and objects over time. However, the nature of RFID data streams is noisy, redundant and unreliable and thus streams of low-level tag-reads can be transformed into probabilistic data streams that can reach in practical cases the size of gigabytes in a day. In this paper, we propose a simple on-line summarization mechanism, which is able to provide small space representation for massive RFID probabilistic data streams while preserving the meaningful information. The main idea behind the proposed approach is to keep on aggregating tuples in an incremental way until a state transition is detected. Probabilistic tuples are processed as they arrive, hence avoiding the use of expensive offline disk based operations, and the output is stored in a probabilistic database in such a way that, as we also experimentally prove, a wide range of probabilistic queries can be applicable and answered effectively.

2012 - The IS-BioBank project: a framework for biological data normalization, interoperability, and mining for cancer microenvironment analysis [Articolo su rivista]
M., Ceci; P. H., Guzzi; E., Masciari; M., Coluccia; Mandreoli, Federica; M., Mecella; F., Fumarola; Martoglia, Riccardo; W., Penzo
abstract

Advances of high throughput technologies have yielded the possibility to investigate human cells of healthy and morbid ones at different levels. Consequently, this has made possible the discovery of new biological and biomedical data and the proliferation of a large number of databases. In this paper, we describe the IS-BioBank (Integrated Semantic Biological Data Bank) proposal. It consists of the realization of a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in such a huge quantity of data. In this framework, a key role has been played by the Connectivity Map, a databank which relates diseases, physiological processes, and the action of drugs. The system will be used in a pilot study on the Multiple Myeloma (MM).

2012 - Toward a Semantic Framework for the Querying, Mining and Visualization of Cancer Microenvironment Data [Relazione in Atti di Convegno]
M., Ceci; F., Fumarola; P. H., Guzzi; Mandreoli, Federica; Martoglia, Riccardo; E., Masciari; W., Penzo
abstract

Over the last decade, the advances in the high-throughput omic technologies have given the possibility to profile tumor cells at different levels, fostering the discovery of new biological data and the proliferation of a large number of bio-technological databases. In this paper we describe a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in such a huge quantity of data. The system will be used in a pilot study on the Multiple Myeloma (MM).

2011 - A Reasoning Engine for Intruders' Localization in Wide Open Areas using a Network of Cameras and RFIDs [Relazione in Atti di Convegno]
Cucchiara, Rita; Fornaciari, Michele; Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo; Prati, Andrea; Sassatelli, Simona
abstract

Wide open areas represent challenging scenarios forsurveillance systems, since sensory data can be affected bynoise, uncertainty, and distractors. Therefore, the tasks oflocalizing and identifying targets (e.g., people) in such environmentssuggest to go beyond the use of camera-only deployments.In this paper, we propose an innovative systemrelying on the joint use of cameras and RFIDs, allowing usto “map” RFID tags to people detected by cameras and,thus, highlighting potential intruders. To this end, sophisticatedfiltering techniques preserve the uncertainty of dataand overcome the heterogeneity of sensors, while an evidentialfusion architecture, based on Transferable Belief Model,combines the two sources of information and manages conflictbetween them. The conducted experimental evaluationshows very promising results.

2011 - A Unified Multimedia and Semantic Perspective for Data Retrieval in the Semantic Web [Articolo su rivista]
R., Lenzi; C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; W., Penzo; S., Sassatelli
abstract

In recent years, the emerging diffusion of peer-to-peer networks isgoing beyond the single-domain paradigm like, for instance, themonothematic file sharing one (e.g., Napster for music). Peers aremore and more heterogeneous data sources which need to share data with commercial, educational, and/or collaboration purposes, just tomention a few. Moreover, in current information processingapplications data can not be meaningfully searched by precise database queries that would return exact matches (e.g., when dealing with multimedia, proteomic, statistical data).In this paper we move a step towards multi-domain multi-type datasharing systems by introducing an advanced technologicalinfrastructure which enables users to meet these new emerging needs.A fundamental issue in this context is data heterogeneity, which ispervasive and intrinsically present both at intensional level where,due to peers‚Äô autonomy, different semantic descriptions of theavailable information are provided, and at extensional level, wheremultiple data types can coexist, also including content-basedsearchable data types such as multimedia data.Our proposal relies on a Peer Data Management Systems (PDMS) framework to present innovative network organization and query routing mechanisms which exploit both peers‚Äô data description and data content to achieve effective and efficient network management and data retrieval in such a context. The validity of our proposal isdemonstrated by an absolutely satisfactory experimental evaluation ona real setting.

2011 - Facilitate IT-Providing SMEs in Software Development: a Semantic Helper for Filtering and Searching Knowledge [Relazione in Atti di Convegno]
Martoglia, Riccardo
abstract

Software development is still considered a bottleneck in the advance of the Information Society. The recently started FACIT-SME European FP-7 project targets to facilitate the use and sharing of Software Engineering methods and best practices among software developing SMEs. On top of an Open Reference Model (ORM) serving as an underlying knowledge backbone, specific filtering/search mechanisms will support the identification of adequate processes and practices for specific enterprise needs. In this paper, we focus on the proposal of knowledge-based text analysis and retrieval techniques which will form a key component of the advanced filtering mechanisms of the project. The proposed solution is designed to be more powerful and flexible than standard syntactic search techniques, but also to be easily applicable for any SME. The experimental evaluation on the preliminary implementation shows promising results.

2011 - Knowledge-based sense disambiguation (almost) for all structures [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo
abstract

Structural disambiguation is acknowledged as a very real and frequent problem for many semantic-aware applications. In this paper, we propose a unified answer to sense disambiguation on a large variety of structures both at data and metadata level such as relational schemas, XML data and schemas, taxonomies, and ontologies. Our knowledge-based approach achieves a general applicability by converting the input structures into a common format and by allowing users to tailor the extraction of the context to the specific application needs and structure characteristics. Flexibility is ensured by supporting the combination of different disambiguation methods together with different information extracted from different sources of knowledge. Further, we support both assisted and completely automatic semantic annotation tasks, while several novel feedback techniques allow us to improve the initial disambiguation results without necessarily requiring user intervention. An extensive evaluation of the obtained results shows the good effectiveness of the proposed solutions on a large variety of structure-based information and disambiguation requirements.

2010 - Data Management Issues for Intelligent Transportation Systems [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

In this paper we discuss the technical challenges of devising a Data Stream Management System (DSMS) in the intelligent transportation scenario considered in the PEGASUS project, where the final aim is to provide reliable and timely information to improve the safety and the efficiency of vehicles' and goods' flows.The system should collect and integrate the large amounts of geo-located stream items coming from On Board Units (OBUs) installedon vehicles, with the aim of producing real-time maps including traffic and Points Of Interest (POIs) information to be then distributed to OBUs. OBUs' smart navigation engines will exploit these maps to enhance mobility and provide user-targeted information.We propose a two-tiered GIS DSMS architecture where stream items are pulled from the source input stream, processed and stored in a result container to be further pulled by other operators. The system reduces the data acquisition costs by adopting communication-saving policies, supports ad-hoc strategies for reducing the storage management costs (lowering response times and memory consumption), and provides the required data access functionalities through an SQL-like query language enhanced with stream, event, spatial and temporal operators. OBU stream items are also exploited to detectEvents Of Interest (EOIs) such as jams and accidents and to support a collaborative mechanism for user-powered POI management and rating. EOIs and POIs are modeled through specific ontologies which allow for a flexible and extensible data management and guarantee data independence from the raw streams.

2010 - Information Retrieval Techniques for Pattern Matching - Managing and Searching Textual and XML Information in 21st Century Applications [Monografia/Trattato scientifico]
Martoglia, Riccardo
abstract

Information is the main value of Information Society. The recent developments in computing power and telecommunications, along with the constant drop of Internet access costs and data management and storing, created the right conditions for the global diffusion of the Web and, more generally, of new research tools able to analyze information and their contents. Depending on the particular application scenario and on the type of information that has to be managed and searched, different techniques need to be devised. In this book, the author deals with the two most common types of information: plain text, discussed in the first part, and semi-structured data, in particular XML documents, deeply discussed the second part. The detailed analysis of approximate matching, duplicate document detection, exact, approximate and semantic query answering, multi-version document management and personalized access techniques offered in this book will guide Information Technology professionals and users in effectively and efficiently managing information and knowledge, thus answering the increasingly complex Information needs of most 21st century applications.

2010 - Leveraging Semantic Approximations in Heterogeneous XML Data Sharing Networks: The SUNRISE Approach. [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In recent years, the huge amount of data available from Internet information sources has focused much attention on the sharing of distributed information through P2P and, in line with the Semantic Web vision, through Peer Data Management Systems (PDMSs).On the other hand, XML is with no doubt the most popular datarepresentation and exchange format on the Web and more and more Internet applications are conforming to this de facto standard for data sharing. In this chapter we present SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration) for XML data sharing. SUNRISE is a complete PDMS infrastructure aiming at semantic interoperability in heterogeneous networks. Decentralized data sharing is supported by a set of autonomous peers which model their local data through schemas and which are locally connected through semantic mappings. SUNRISE leverages the semantic approximations originating from schemas' heterogeneity for an effective and efficient organization and exploration of the network. For these purposes, SUNRISE implements soft computing techniques which cluster peers in Semantic Overlay Networks according to their own contents, and promote the routing of queries towards the semantically best directions in the network.

2010 - Preface [Relazione in Atti di Convegno]
Bergamaschi, S.; Lodi, S.; Martoglia, R.; Sartori, C.
abstract

2010 - SEBD 2010 Proceedings of the 18th Italian Symposium on Advanced Database Systems [Curatela]
Bergamaschi, Sonia; S., Lodi; Martoglia, Riccardo; C., Sartori
abstract

PrefaceThis volume collects the papers selected for presentation at the Eighteenth ItalianSymposium on Advanced Database Systems (SEBD 2010), held in Rimini,Italy, from the 20th to the 23rd of June 2010.SEBD is the major annual event of the Italian database research community.The symposium is conceived as a gathering forum for the discussionand exchange of ideas and experiences among researchers and experts fromthe academy and industry, about all aspects of database systems and their applications.SEBD is back in Rimini after sixteen years, and it is interesting to observehow the landscape of the Italian database research community has changed. In1994 twenty-one papers were accepted, now the number has more than doubled,meaning that the community has been steadily growing. Most of the topicsconsidered in 1994 are still around, even if the language, the formalisms andthe reference applications have changed. The Web was e-mail, FTP, Usenet,a small amount of HTML pages here and there, and little more, now it isthe pervasive engine of information dissemination and search. The Web is sopowerful that a series of brand new ideas and applications have arisen fromit, due to a mix of possibility and necessity. Social systems across the Web,mobility, and heterogeneity were not conceivable in the early 1990s. Semanticweb, data mining and warehousing, streaming techniques, large scale integrationare necessary to deal with the growing amount of data and information.The SEBD 2010 program reects the current interests of the Italian databaseresearchers and covers most of the topics considered by the international researchcommunity. Sixty papers were submitted to SEBD 2010, of which twenty-twowere research papers, two were software demonstrations, and thirty-four wereextended abstracts, i.e., papers containing descriptions of on-going projects orpresenting results already published. Fifty-one papers were accepted for presentation,of which seventeen were research papers, two were software demonstrations,and thirty-two were extended abstracts.Besides paper presentations, the program includes a tutorial by Divesh Srivastava(AT&T Labs-Research) and two invited talks, the rst by Hector Garcia-Molina (Stanford University, CA) and the second by Amr El Abbadi (Universityof California, CA).We would like to thank all the authors who submitted papers and all symposiumparticipants. We are grateful to the members of the Program Committeeand the external referees for their thorough work in reviewing submissions withexpertise and patience, and to the members of the SEBD Steering Committeefor their support in the organization of SEBD 2010. Special thanks are due tothe members of the Organizing Committee and to the University of Bologna,Polo di Rimini, which made this event possible. Finally, we gratefully thank allcooperating institutions.Rimini, June 2010 Sonia Bergamaschi Stefano LodiRiccardo Martoglia Claudio Sartori

2010 - STRIDER: Structural Disambiguation [Software]
Martoglia, Riccardo
abstract

The software implements versatile disambiguation approaches which can be used to make explicit the meaning of structure based information such as XML schemas, XML document structures, web directories, and ontologies.

2010 - SUNRISE: P2P Networks for Data and Service Sharing [Software]
Martoglia, Riccardo
abstract

The software implements techniques for creating, maintaining and accessing Peer-to-Peer networks for data and service sharing.

2010 - Toward a Flexible Data Management Middleware for Wireless Sensor Networks [Relazione in Atti di Convegno]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; Tiberio, Paolo
abstract

In this paper we present the research activity we are carrying out in the "Mobile Semantic Self-Organizing Wireless Sensor Networks" Project at the Department of Information Engineering of the University of Modena and Reggio Emilia. In this context, the main aim of our research is to study solutions for the flexible querying of distributed data collected by heterogeneous devices providing measurement readings. To this end, we propose a middleware for wireless sensor networks which is able to autonomously configure the communication and the operations required to each device in order to reduce energy and temporal costs.

2010 - Toward an Effective and Efficient Query Processing in the NeP4B Project [Relazione in Atti di Convegno]
C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; S., Orlando; W., Penzo; Sassatelli, Simona; Tiberio, Paolo
abstract

In this paper we present our main current research activity in the Italian co-funded FIRB Project NeP4B (Networked Peers for Business). In particular, we provide an overview of our P2P query routing approach which combines semantics and multimedia aspects in order to make query processing effective and efficient.

2009 - Combining Semantic and Multimedia Query Routing Techniques for Unified Data Retrieval in a PDMS [Relazione in Atti di Convegno]
C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; W., Penzo; Sassatelli, Simona
abstract

The NeP4B project aims at the development of an advancedtechnological infrastructure for data sharing in a network of business partners. In this paper we leverage our distinct experiences on semantic and multimedia query routing, and propose an innovative mechanism for an effective and efficient unified data retrieval of both semantic and multimedia data in the context of the NeP4B project.

2009 - Data-Sharing P2P Networks with Semantic Approximation Capabilities [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

The synergy between Peer-to-Peer systems and Semantic Web technologies has paved the way for large-scale sharing of semantically rich data, usually represented through schemas like, for instance, RDF or ontologies.Because of the lack of common understanding of the vocabulary used by peers, the resulting heterogeneity of data representations opens new challenges as to the efficient and effective retrieval of relevant information.In this paper, as opposed to viewing semantic misalignment as a limit for interoperability, we leverage on the presence of semantic approximations between the peers' schemas as a means for giving effective hints along two directions: 1) for query routing purposes, to identify the peers which best satisfy the user's requests, and 2) for making users aware of the relevance of the returned answers through a ranking mechanism which promotes the most semantically related results.

2009 - Flexible Query Answering on Graph-modeled Data [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Villani, Giorgio
abstract

The largeness and the heterogeneity of most graph-modeleddatasets in several database application areas make the queryprocess a real challenge because of the lack of a completeknowledge of the vocabulary used, as well as of the informationabout the structural relationships between the data.To overcome these problems, flexible query answering capabilitiesare an essential need. In this paper we present a general model for supporting approximate queries on graphmodeled data. Approximation is both on the vocabularies and the structure. The model is general in that it is not bound to a specific graph data model, rather it gracefully accommodates labeled directed/undirected data graphs with labeled/unlabeled edges. The query answering principles underlying the model are not compelled to a specific data graph, instead they are founded on properties inferable from the data model the data graph conforms to. We complement the work with a ranking model to deal with data approximations and with an efficient top-k retrieval algorithm which smartly accesses ad-hoc data structures andgenerates the most promising answers in an order correlatedwith the ranking measures. Experimental results prove thegood effectiveness and efficiency of our proposal on differentreal world datasets.

2009 - Issues in Personalized Access to Multiversion XML Documents [Capitolo/Saggio]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo
abstract

In several application fields including legal and medical domains, XML documents are “versioned” along different dimensions of interest, whose nature depends on the application needs such as time, space and security. Specifically, temporal and semantic versioning is particularly demanding in a broad range of application domains where temporal versioning can be used to maintain histories of the underlying resources along various time dimensions, and semantic versioning can then be used to model limited applicability of resources to individual cases or contexts. The selection and reconstruction of the version(s) of interest for a user means the retrieval of those fragments of documents that match both the implicit and explicit user needs, which can be formalized as what we call personalization queries. In this chapter, we focus on the design and implementation issues of a personalization query processor. We consider different design options and, among them, we introduce an in-depth studyof a native solution by showing, also through experimental evaluation, how some of the best performing technological solutions available today for XML data management can be successfully extended and optimally combined in order to support personalization queries.

2009 - Native Temporal Slicing Support for XML Databases [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Ronchetti, Enrico
abstract

XML databases, providing structural query-ing support, are becoming more and more popular. As weknow, XML data may change over time and providing ane±cient support to queries which also involve temporalaspects is still an open issue. In this paper we presentour native Temporal XML Query Processor, which ex-ploits an ad-hoc temporal indexing scheme relying on re-lational approaches and a technology supporting temporalslicing. As we show through an extensive experimentalevaluation, our solution achieves good e±ciency results,outperforming stratum-based solutions when dealing withtime-related application requirements while continuing toguarantee good performance in traditional scenarios.

2009 - Paving the Way to an Effective and Efficient Retrieval of Data over Semantic Overlay Networks [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In a Peer-to-Peer (P2P) system, a Semantic Overlay Network (SON) models a network of peers whose connections are influenced by the peers’ content, so that semantically related peers connect with each other. This is very common in P2P communities, where peers share common interests, and a peer can belong to more than one SON, depending on its own interests. Querying such a kind of systems is not an easy task: The retrieval of relevant data can not rely on flooding approaches which forward a query to the overall network. A way of selecting which peers are more likely to provide relevant answers is necessary to support more efficient and effective query processing strategies. This chapter presents a semantic infrastructure for routing queries effectively in a network of SONs. Peers are semantically rich, in that peers’ content is modelled with a schema on their local data, and peers are related each other through semantic mappings defined between their own schemas. A query is routed through the network by means of a sequence of reformulations, according to the semantic mappings encountered in the routing path. As reformulations may lead to semantic approximations, we define a fully distributed indexing mechanism which summarizes the semantics underlying whole subnetworks, in order to be able to locate the semantically best directions to forward a query to. In support of our proposal, we demonstrate through a rich set of experiments that our routing mechanism overtakes algorithms which are usually limited to the only knowledge of the peers directly connected to the querying peer, and that our approach is particularly successful in a SONs scenario.

2009 - Principles of Holism for Sequential Twig Pattern Matching [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

Modern applications face the challenge of dealing with structured and semi-structured data. They have to deal with complex objects, most of them presenting some kind of internal structure, which often forms a hierarchy. Though XML documents are the most known, chemical compounds, CAD drawings, web-sites and many other applications have to deal with similar problems. In such environments, ordered and unordered tree pattern matching are the fundamental search operations. One of the main thrusts of research activities for tree pattern matching is the class of holistic approaches. Their ultimate goal is to evaluate a query twig as a whole by relying on sequential access patterns and non trivial auxiliary storage structures, typically stored in main memory. Based on the pre/post-order ranks of individual tree nodes, we establish strong theoretical bases as a foundation for correct and efficient holistic pattern matching algorithms. In particular, we define and prove sufficient and necessary conditions to minimize the amount of data retained in memory, thus introducing a correct and complete framework on which different holistic solutions can be compared. We also show how these rules can be applied for building algorithms for ordered and unordered tree-pattern matching. Thanks to the above theoretical achievements, each holistic algorithm gains in efficiency as it is directly implemented on the adopted numbering scheme, avoids expensive matching refinements and keeps memory requirements stable. An experimental analysis and comparison with previous approaches confirms the superiority of our approach tested on synthetic as well as real-life data sets.

2009 - Semantics-driven Approximate Query Answering on Graph Databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Villani, Giorgio
abstract

Several database application areas need to deal with graph-modeled datasets. The main features of these datasets are the largeness and the heterogeneity of the data, which make it impractical to answer exact queries. In this paper we present our recent research efforts in modeling flexible query answering capabilities in this context. Flexibility is captured by approximations both on the labels and on the structureof graph-based queries, by guaranteeing semantically meaningful relaxations only. In order to cope with the excess of results, we adapt a well-known top-k retrieval algorithm to our context. The good effectiveness and efficiency of our proposal are proved by an extensive experimental evaluation on different real world datasets.

2009 - Shaping Tomorrow Information Management, Today [Articolo su rivista]
Martoglia, Riccardo
abstract

The recent developments in computing power and telecommunications, and, in general, the advanced ICT (Information and Communication Technology) of the 20th century, accelerated the use and value of Information in our society. Indeed, Information is the main value of Information Society. In this respect, the World Wide Web, Peer-to-Peer networks, mobile devices and ubiquitous computing systems and sensors give us more and more interesting possibilities today; however, current research on the relevant technologies, structures and services is still not enough mature.Research at the Information Systems Group (ISGroup), inside the Information Engineering Department (DII) of the Modena and Reggio Emilia University, is focused on the design and development of new systems, algorithms and data structures for the access and management of Information. The group constantly devises and puts into practice, also by means of national and international research projects and collaborations, innovative solutions able to answer, both effectively and efficiently, increasingly complex Information needs in several 21st century applications.

2009 - X-SITER: Efficient XML Query Processing [Software]
Martoglia, Riccardo
abstract

The software includes twig query processing techniques allowing flexible methods of structural interrogation, both for ordered (the order of the sibling nodes of a query is important) and unordered (the order of the sibling nodes is not influential). Further, it includes algorithms and structures allowing an efficient execution also on remarkable amounts of data.

2008 - Boosting a Network of Semantic Peers [Relazione in Atti di Convegno]
S., Lodi; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

InaPeerDataManagementSystem(PDMS),semanticpeers connect with each other through semantic mappings between their own schemas. Because of schema heterogeneity, due to peers’ autonomy as for data representation, querying a PDMS implies query reformulations across semantic mappings, possibly incurring in a semantic degradation due to the reiterated approximations given by the traversal of long paths. The linkage closeness of semantically similar peers is thus a crucial issue. In this paper we present a strategy for the incremental maintenance of a flexible network organization for PDMSs that clusters together semanti- cally related peers.

2008 - Building a PDMS Infrastructure for XML Data Sharing with SUNRISE [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

Semantic support for data representation as well as a flexible machine-readable format have made XML the de facto standard for Internet applications semantic interoperability. Its applicability is primarily evident in realities where actors are heterogeneous data sources which interact each other for data sharing purposes. This is exactly the scenario envisioned by Peer Data Management Systems (PDMSs), where autonomous sources (peers) model their local data according to a schema, and are connected in a peer-to-peer network by means of pairwise semantic mappings between the peers' own schemas. One of the main challenges in such a semantically heterogeneous environment is concerned with query processing when dealing with the inherent semantic approximations occurring in the data. In this paper we present an instantiation of SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration) for XML data sources. SUNRISE is a complete PDMS infrastructure which extends each peer with functionalities for capturing the semantic approximation originating from schema heterogeneity and exploiting it for a semantically driven network organization and query routing.

2008 - Efficient and Effective Query Answering in a PDMS with SUNRISE [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

Peer Data Management Systems (PDMSs) have been re- cently proposed as an evolution of Peer-To-Peer (P2P) systems toward a more semantics-based description of peers’ contents and relationships. In a PDMS scenario a key challenge is query routing, i.e. the capability of selecting small subsets of semantically relevant peers to forward a query to. In this paper we demonstrate SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration), a complete infrastructure which supports an effective and efficient exploration of a PDMS network for query answering purposes. SUNRISE offers several routing policies designed around different performance priorities in order to minimize the information spanning over the network.

2008 - Ontology-Based Personalization of E-Government Services [Capitolo/Saggio]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo; Ronchetti, Enrico; M. R., Scalas; Tiberio, Paolo
abstract

While the World Wide Web user is suffering form the disease caused by information overload, for which personalization is one of the treatments which work, the citizen who gets ready to use the e-Government services which are made available on the Web is not immune from contagion. This seems a good reason to try to prescribe a personalization treatment also to the e-Government user. Hence, we introduce the design and implementation of Web information systems supporting personalized access to multi-version resources in an e-Government scenario. Personalization is supported by means of Semantic Web techniques and relies on an ontology-based profiling of users (citizens). Resources we consider are collections of norm documents (laws, decrees, regulations, etc.) in XML format but can also be generic Web pages and portals or e-Government transactional services. We introduce a reference infrastructure, describe the organization and present performance figures of a prototype system we have developed.

2008 - Semantic Peer, Here are the Neighbors You Want! [Relazione in Atti di Convegno]
S., Lodi; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

Peer Data Management Systems (PDMSs) have been introduced as a solution to the problem of large-scale sharing of semantically rich data. A PDMS consists of semantic peers connected through semantic mappings. Querying a PDMS may lead to very poor results, because of the semantic degradation due to the approximations given by the traversal of the semantic mappings, thus leading to the problem of how to boost a network of mappings in a PDMS.In this paper we propose a strategy for the incremental maintenance of a flexible network organization that clusters together peers which are semantically related in Semantic Overlay Networks (SONs), while maintaining a high degree of node autonomy. Semantic features, a summarized repre- sentation of clusters, are stored in a “light” structure which effectively assists a newly entering peer when choosing its se- mantically closest overlay networks. Then, each peer is sup- ported in the selection of its own neighbors within each overlay network according to two policies: Range-based selection and k-NN selection. For both policies, we introduce specific algorithms which exploit a distributed indexing mechanism for efficient network navigation. The proposed approach has been implemented in a prototype where its effectiveness and efficiency have been extensively tested.

2007 - Disambiguation of Structure-Based Information in the STRIDER System [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

We present the current version of STRIDER, a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web directories. It can be of support to the semantic-awareness of a wide range of applications, thanks to its novel and fully-automated disambiguation algorithms.

2007 - Efficient Management of Multi-version XML Documents for e-Government Applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; F., Grandi; M. R., Scalas
abstract

This paper describes our research activities in developing efficient systems for the management of multi- version XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (publication, validity, efficacy and transaction times) were used to represent the evolution of norms in time and their resulting versioning and a stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version management system has migrated to a different architecture (“native” approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the citizen’s ontology, which could also require a redesign of the document storage scheme, and the development of a complete infrastructure for the management of the citizen’s digital identity.

2007 - Native Temporal Slicing Support for XML Databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

XML databases, providing structural querying support, are becoming more and more popular. As we know, XML data may change over time and providing an efficient support to queries which also involve temporal aspects is still an open issue. In this paper we present our native Temporal XML Query Processor, which exploits an ad-hoc temporal indexing scheme relying on relational approaches and a technology supporting temporal slicing. As we show through an extensive experimental evaluation, our solution achieves good efficiency results, outperforming stratum-based solutions when dealing with time-related application requirements while continuing to guarantee good performance in traditional scenarios.

2007 - SRI@work: Efficient and Effective Routing Strategies in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In recent years, information sharing has gained much benefit by the large diffusion of distributed computing, namely through P2P systems and, in line with the Semantic Web vision, through Peer Data Management Systems (PDMSs). In a PDMS scenario one of the most difficult challenges is query routing, i.e. the capability of selecting small subsets of semantically relevant peers to forward a query to. In this paper, we put the Semantic Routing Index (SRI) distributed mechanism we proposed in [6] at work. In particular, we present general SRI-based query execution models, designed around different performance priorities and minimizing the information spanning over the network. Starting from these models, we devise several SRI-enabled routing policies, characterized by different effectiveness and efficiency targets, and we deeply test them in ad-hoc PDMS simulation environments.

2007 - SUNRISE: Exploring PDMS Networks with Semantic Routing Indexes [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

We demonstrate SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration), a complete infrastructure supporting the construction of a PDMS semantic layer and providing a series of techniques that can be used for an effective and efficient exploration of a semantic network, for instance in a query answering setting.

2007 - Semantic Routing for Effective Search in Heterogeneous and Distributed Digital Libraries [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

Next generation Digital Libraries (DLs) will offer an entire ensemble of systems and services designed to help users to easily find and access the information they are looking for. However, much work is still required in order to achieve this vision. In this paper, we concentrate our attention on devising techniques allowing an effective routing of queries, which we think can be of the utmost importance in providing effective and efficient querying in heterogeneous and distributed DLs, identifying the best ways to navigate the available nodes and, thus, the documents (or their parts) which are most suitable to best answer the user needs. We describe a routing mechanism, which we call routing by mapping, in which the query is sent to the DL peers whose subnetworks best approximate the concepts required. To this end a distributed index mechanism is adopted, which we call Semantic Routing Index (SRI). We also present some exploratory experiments showing the effectiveness of the proposed approach.

2006 - A Native Extensible XML Query Processor Towards Efficient and Effective MPEG-7 Querying [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; M., Righini
abstract

In recent years the production of massive amounts of visual information has led to the arrival of very large multimedia Digital Libraries (DLs). The key to support efficient search and management operations in such repositories is to exploit metadata information for digital media, such as MPEG-7 based ones, which seem to be the most widely accepted. The underlying XML syntax, together with the high versatility of the provided constructs, make it easy to specify significant and complex queries, however executing them efficiently on huge quantities of data is not a trivial task. In this paper we provide an overview of the XSiter system, a native and extensible XML query processor providing very high performance in general XML querying settings and whose flexible architecture can be easily enhanced to better support the peculiarities of retrieving multimedia objects through MPEG-7 annotation metadata. Further, we consider possible "use-cases" and tasks related to multimedia and video DLs querying and management which our system can successfully accomplish.

2006 - An eGovernment system for temporal- and semantic-aware access to norms [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present the results of an ongoing research involving the design and implementation, in an eGovernment scenario, of a semantic-aware system supporting efficient and personalized access to a multi-version repository of norma- tive texts. The research activity is entitled “Semantic web techniques for the management of digital identity and the ac- cess to norms”. In the context of a complete and modular in- frastructure, we defined a multi-version XML data model and developed a temporal and semantical XML query processor supporting both temporal versioning –essential in normative systems– and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applica- ble to their personal case. The whole infrastructure, which we plan to complete in the near future, will integrate the query- ing component with several auxiliary services, including au- tomatic citizen identification and classification and assisted update of the repository data

2006 - EXTRA: a system for example-based translation assistance [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Nowadays we are witnessing the need to translate ever increasing quantities of texts, with an ever increasing quality. The expertise and skill of professional translators is not alone entirely sufficient in order to achieve highly effective and efficient translation performance. The best way to translate very large quantities of documents, while ensuring optimal translation time and costs, is to exploit Example-Based Machine Translation (EBMT), which is devised in the aim of achieving better quality and quantity in less time, while preserving and treasuring the richness and accuracy that only human translation can achieve. In this paper we present EXTRA (EXample-based TRanslation Assistant), the EBMT system we have developed over the last few years to support the translation of texts written in Western languages. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, language-independent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. Furthermore, EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System (DBMS). In the paper we also provide a thorough evaluation of both the effectiveness and the e±ciency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, process-oriented, discrete-event models.

2006 - SRI: Exploiting Semantic Information for Effective Query Routing in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; W., Penzo
abstract

The huge amount of data available from Internet information sources has focused much attention on the sharing of distributed information through Peer Data Management Systems (PDMSs). In a PDMS, peers have a schema on their local data, and they are related each other through semantic mappings that can be defined between their own schemas. Querying a PDMS means either flooding the network with messages to all peers or take advantage of a routing mechanism to reformulate a query only on the best peers selected according to some given criteria. As reformulations may lead to semantic approximations, we deem that such approximations can be exploited for locating the semantically best directions to forward a query to. In this paper, we propose a distributed index mechanism where each peer is provided with a Semantic Routing Index (SRI) for routing queries effectively. A fuzzy-oriented model for SRI is presented where operations for creating and maintaining SRIs are well-founded. In addition, we show how SRIs can be employed in the query processing phase with the aim of reducing the space of reformulations. Finally, we conduct a series of meaningful experiments showing the effectiveness of the proposed approach.

2006 - STRIDER: a Versatile System for Structural Disambiguation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

We present STRIDER, a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web directories. The system performs high-quality fully-automated disambiguation by exploiting a novel and versatile structural disambiguation approach.

2006 - Semantic Query Routing Experiences in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; W., Penzo
abstract

Querying a PDMS means either flooding the network with messages to all peers or taking advantage of a routing mechanism to reformulate a query only on the best peers selected according to some given criteria. As reformulations may lead to semantic approximations, we deem that such approximations can be exploited for locating the semantically best directions to forward a query to. In this paper, we present our experiences in devising and testing a mechanism for effective query routing in a PDMS. In particular, we describe a distributed index mechanism where each peer is provided with a Semantic Routing Index (SRI) for routing queries effectively. We illustrate SRIs’ structure, their use and the framework we devised for their incremental update, then we provide an extensive evaluation of their effectiveness through a set of query routing experiments on a variety of scenarios. This work is partially supported by the PRIN WISDOM and FIRB NeP4B national projects.

2006 - Semantic Web Techniques for Personalization of eGovernment Services [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present the results of an ongoing research involving the design and implementation of systems supporting personalized access to multi-version resources in an eGovernment scenario. Personalization is supported by means of Semantic Web techniques and is based on an ontology-based profiling of users (citizens). Resources we consider are collections of norm documents in XML format but can also be generic Web pages and portals or eGovernment services. We introduce a reference infrastructure, describe the organization and present performance figures of a prototype system we have developed.

2006 - Supporting temporal slicing in XML databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

Nowadays XML is universally accepted as the standard for structural data representation; XML databases, providing structural querying support, are thus becoming more and more popular. However, XML data changes over time and the task of providing efficient support to queries which also involve temporal aspects goes through the tricky task of time-slicing the input data. In this paper we take up the challenge of providing a native and efficient solution in constructing an XML query processor supporting temporal slicing, thus dealing with non-conventional application requirements while continuing to guarantee good performance in traditional scenarios. Our contributions include a novel temporal indexing scheme relying on relational approaches and a technology supporting the time-slice operator.

2006 - Using Semantic Mappings for Query Routing in a PDMS Environment [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; Tiberio, Paolo; W., Penzo
abstract

In this paper we present the current achievement of our research activity in the WISDOM project, whose aim is the definition of intelligent techniques enabling e®ective and e±cient information search in a distributed and decentralized PDMS scenario. We focus on the query routing problem and we define a new routing mechanism, which we call routing by mapping, in which the query is sent to the peers whose subnetworks best approximate the concepts required. In order to select the best subnetworks, the peer receiving the query exploits information about the semantic approximation of the query concepts, when moving towards each neighbour. This information is computed starting from the semantic mappings established with the peer's neighbours and it is maintained into specifically devised data structures called Semantic Routing Indices (SRIs), whose update we propose specific algorithms and protocols for. The effectiveness of the achieved results has been experimentally proved through a series of exploratory tests.

2005 - Accesso Personalizzato a Documenti Multiversione per Applicazioni nel Settore dell’E-Government [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; E., Ronchetti; F., Grandi; M. R., Scalas
abstract

In questo lavoro viene presentata l’attività di ricerca concernente la realizzazione di sistemi prototipali per la gestione efficiente di documenti XML multiversione in uno scenario di e-Government. Lo scopo applicativo di tali sistemi è di permettere al cittadino l’accesso a versioni personalizzate di risorse quali testi normativi e informazioni rese disponibili sul WEB dalle Pubbliche Amministrazioni. Per rappresentare l’evoluzione delle norme nel tempo e il conseguente “versionamento” si sono usate quattro dimensioni temporali e un’ulteriore dimensione semantica per rappresentare l’applicabilità delle norme a differenti classi di cittadini, in accordo alla loro identità digitale. La classificazione dei cittadini è basata sulla gestione di un’ontologia e l’adozione di tecniche di Semantic WEB. L’attuale implementazione, evoluzione di un approccio di tipo “stratum” (sviluppato on top di una piattaforma RDBMS), è basata su un approccio “nativo” consistente in un query processor XML sviluppato ad-hoc. Una sperimentazione preliminare ha evidenziato nel nuovo sistema buoni livelli di prestazioni e scalabilità.

2005 - Efficient Management of Multi-Version XML Documents for eGovernment Applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; F., Grandi; M. R., Scalas
abstract

This paper describes our research activities in developing efficient systems for the management of multiversion XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (publication, validity, efficacy and transaction times) were used to represent the evolution of norms in time and their resulting versioning and a stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version management system has migrated to a different architecture ("native" approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the citizen’s ontology, which could also require a redesign of the document storage scheme, and the development of a complete infrastructure for the management of the citizen’s digital identity.

2005 - Enhanced access to eGovernment services: temporal and semantics-aware retrieval of norms [Relazione in Atti di Convegno]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti; M. R., Scalas; Tiberio, Paolo
abstract

In this paper, we summarize the results of an ongoing research involving the design and implementation of a multi-version repository of norm texts supporting efficient and personalized access in an eGovernment scenario. The research activity is entitled "Semantic web techniques for the management of digital identity and the access to norms". In the context of a complete and modular infrastructure, we defined a multiversion XML data model and developed an XML query processor supporting both temporal and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applicable to their personal case. The whole infrastructure, which we plan to complete in the near future, will integrate the query answering component with several auxiliary services, including automatic citizen identification and classification and computer-aided update of the repository data.

2005 - Improving Semantic Awareness of Knowledge-based Applications through Structural Disambiguation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

In this paper, we summarize the features of the versatile disambiguation approach we recentlty presented. Its main aim is to make explicit the meaning of structure-based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic In this paper, we summarize the features of the versatile disambiguation approach we recentlty presented. Its main aim is to make explicit the meaning of structure-based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic annotation of web pages and query expansion. The effectiveness of the achieved results has been experimentally proved and is founded both on a flexible exploitation of the structure context, whose extraction can be tailored on the specific application needs, and of the information provided by commonly available thesauri such as WordNet. This work is partially supported by the Italian Council co-funded project WISDOM.

2005 - Personalized Access to Multi-Version Documents for E-Government Applications [Relazione in Atti di Convegno]
F., Grandi; M. R., Scalas; Mandreoli, Federica; Martoglia, Riccardo
abstract

In this paper we describe the design and implementation of two prototype systems for the efficient management of multi-version XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (validity, efficacy, transaction and publication times) were used to represent the evolution of norms in time and their resulting versioning and a “stratum” approach was used for its implementation on top of an object-relational DBMS. Recently, the multi-version management system has migrated to a different architecture (“native” approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the “stratum” approach and a good scalability behavior. This work has been supported by the MIUR-PRIN Project: “The European citizen in e-Governance: philosophical-juridical, legal, information and economic profiles”.

2005 - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Tiberio; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present some results of an ongoing research involving the design and implementation, in an eGovernment scenario, of a multiversion repository of norm texts supporting efficient and personalized access. In particular we defined a multi-version XML data model supporting both temporal versioning –essential in normative systems– and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applicable to their personal case. We describe the organization and present preliminary performance figures of a prototype system we developed.

2005 - Personalized access to multi-version XML documents in an eGovernment scenario [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

2005 - Text Clustering as a Mining Task [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

In this chapter we introduce readers to the various aspects of cluster analysis performed on textual data in a mining framework. We first provide a brief overview on the techniques and the background notions on general clustering. Then, we focus on the importance and on the goals of clustering in a text mining scenario, analyzing and describing the issues which are specific to this particular field. Effective information extraction from highly dimensional textual data, clustering algorithms specifically designed to efficiently work on very large unstructured and, possibly, hyperlinked data sets, and comprehension of the clustering output are among the covered topics.

2005 - Versatile structural disambiguation for semantic-aware applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic annotation of web pages and query expansion. The effectiveness of the achieved results has been experimentally proved and is founded both on a flexible exploitation of the structure context, whose extraction can be tailored on the specific application needs, and of the information provided by commonly available thesauri such as WordNet.

2005 - XML-S3MART: Similarity Search on Semi-Structured Data [Software]
Martoglia, Riccardo
abstract

The software solves the problem of approximate search on XML data coming from heterogenous sources. In particular, it considers the field of advanced search engines for digital libraries containing semi-structured information describing the same reality but coming from different sources and, therefore, satisfying different structural requirements. In this context, it includes techniques allowing the automatic rewriting of the queries submitted by the users (query rewriting) w.r.t. every document of the digital library which can be useful to satisfy their information need.

2004 - A Document Comparison Scheme for Secure Duplicate Detection [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.

2004 - Approximate Query Answering for a Heterogeneous XML Document Base [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

In this paper, we deal with the problem of effective search and query answering in heterogeneous web document bases containing documents in XML format of which the schemas are available. We propose a new solution for the structural approximation of the submitted queries which, in a preliminary schema matching process, is able to automatically identify the similarities between the involved schemas and to use them in the query processing phase, when a query written on a source schema is automatically rewritten in order to be compatible with the other useful XML documents. The proposed approach has been implemented in a web service and can deliver middleware rewriting services in any open-architecture XML repository system offering advanced search capabilities.

2004 - EXTRA: Example Based Machine Translation [Software]
Martoglia, Riccardo
abstract

In collaboration with LOGOS Group, the software is in the EBMT (Example Based Machine Translation) field, in which approximate matching techniques are applied to sentences, also considering additional problems related to multi-linguism. It also extends the syntactic similarity to the semantic field, by means of the study of techniques of disambiguation based on the use of WordNet.

2004 - Exploiting related digital library corpora with query rewriting [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo
abstract

In this paper, we present the preliminary results of the ongoing research activity we are carrying out in the context of approximate XML query answering when the schemas of the XML documents are available. The method we propose involves a preliminary schema matching process, which automatically identifies the semantic and structural similarities between the schema elements to be used in the subsequent operation of query rewriting, in which a query written on a source schema is automatically rewritten in order to be compatible with the other useful XML documents. The proposed approach has been implemented in a web service, named XML S3MART, which is part of the open architecture proposed in the ongoing Italian CNR co-funded ECD Project.

2004 - Tree Signatures and Unordered XML Pattern Matching [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient algorithm is proposed and its implementation tested on real-life data collections.

2004 - Unordered XML Pattern Matching with Tree Signatures [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient technique is proposed and its implementation tested on real-life data collections.

2003 - DANCER: Similarity Search on Plain Text [Software]
Martoglia, Riccardo
abstract

The software includes techniques for the access of textual data, both based on the syntax and on the semantic analysis. In particular, it exploits similarity search techniques allowing to go beyond the simple exact search, and metrics for syntactic similarities suitable for textual sequences of any type, i.e. sequences of words (phrases) or generic sequences of symbols (like genetic codes).

2003 - Exploiting multi-lingual text potentialities in EBMT systems [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Translating documents from a source to a target language is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream. Among the several types of approaches in Machine Translation (MT), one of the most promising paradigms is Example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar source-language material into the target language. In this paper we introduce EXTRA (EXample-based TRanslation Assistant), a complete EBMT system that exploits some innovative ideas in information retrieval and multilingual text management to effectively and efficiently extract useful suggestions from past translations and present them to the translator. This work has been developed as a joint work with the LOGOS group, a worldwide leader in multilingual document translation.

2003 - Un Metodo per il Riconoscimento di Duplicati in Collezioni di Documenti [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

I recenti avanzamenti nella potenza di calcolo e nelle telecomunicazioni hanno creato le giuste condizioni per la diffusione globale di enormi moli di informazioni elettroniche e di nuovi strumenti per l’analisi del loro contenuto, sollevando problemi di information overload e, in particolare, di duplicate detection. I duplicati, cioe' documenti molto simili che contengono approssimativamente le stesse informazioni, degradano l’efficacia e l’efficienza delle ricerche e, spesso, costituiscono anche violazioni di copyright. In questo articolo introduciamo DANCER (Document ANalysis and Comparison ExpeRt), un sistema completo di duplicate detection che sfrutta idee innovative nell’ambito dell’information retrieval per l’identificazione dei documenti duplicati, utilizzando algoritmi e misure di similarita' inedite in questo campo e sufficientemente fini da ottenere una buona efficacia nella maggior parte delle applicazioni. Inoltre, il sistema propone diverse nuove tecniche di data reduction che permettono di ridurre sia il tempo di esecuzione che lo spazio richiesto per la memorizzazione dei dati, senza compromettere la buona qualita' dei risultati.

2002 - A syntactic approach for searching similarities within sentences [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Textual data is the main electronic form of knowledge representation. Sentences, meant as logic units of meaningful word sequences, can be considered its backbone. In this paper, we propose a solution based on a purely syntactic approach for searching similarities within sentences, named approximate sub2sequence matching. This process being very time consuming, efficiency in retrieving the most similar parts available in large repositories of textual data is ensured by making use of new filtering techniques. As far as the design of the system is concerned, we chose a solution that allows us to deploy approximate sub2sequence matching without changing the underlying database.

2002 - Searching Similar (Sub)Sentences for Example-Based Machine Translation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar sourcelanguage sentences into the target language. The basic premise is that, if a previously translated sentence occurs again, the same translation is likely to be correct. In this paper, we propose a solution based on a purely syntactic approach for searching similar sentences and parts of them in an EBMT system; the underlying similarity measure is based on the similarity between sequence of terms such that the sentences most close to a given one are those who maintain most of the original form and contents. The system efficiently retrieves and ranks the most similar sentences available and, when no useful suggestion exists, it proceeds with the retrieval of similar parts. We opted for a design that would require minimal changes to existing databases and whose similarity measure and search algorithms are completely independent from the involved languages. This work has been developed as a joint work with LOGOS S.p.A., a worldwide leader in multilingual document translation.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni