Federica MANDREOLI - personale UniMoRe

Nuova ricerca

Federica MANDREOLI

Professore Ordinario
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica

Pubblicazioni

2023 - A Machine Learning Approach to Predict Weight Change in ART-Experienced People Living with HIV [Articolo su rivista]
Motta, F.; Milic, J.; Gozzi, L.; Belli, M.; Sighinolfi, L.; Cuomo, G.; Carli, F.; Dolci, G.; Iadisernia, V.; Burastero, G.; Mussini, C.; Missier, P.; Mandreoli, F.; Guaraldi, G.
abstract

Introduction:The objective of the study was to develop machine learning (ML) models that predict the percentage weight change in each interval of time in antiretroviral therapy-experienced people living with HIV.Methods:This was an observational study that comprised consecutive people living with HIV attending Modena HIV Metabolic Clinic with at least 2 visits. Data were partitioned in an 80/20 training/test set to generate 10 progressively parsimonious predictive ML models. Weight gain was defined as any weight change >5%, at the next visit. SHapley Additive exPlanations values were used to quantify the positive or negative impact of any single variable included in each model on the predicted weight changes.Results:A total of 3,321 patients generated 18,322 observations. At the last observation, the median age was 50 years and 69% patients were male. Model 1 (the only 1 including body composition assessed with dual-energy x-ray absorptiometry) had an accuracy greater than 90%. This model could predict weight at the next visit with an error of <5%.Conclusions:ML models with the inclusion of body composition and metabolic and endocrinological variables had an excellent performance. The parsimonious models available in standard clinical evaluation are insufficient to obtain reliable prediction, but are good enough to predict who will not experience weight gain.

2023 - Automated Knowledge Graph Completion for Natural Language Understanding: Known Paths and Future Directions [Relazione in Atti di Convegno]
Buzzega, G.; Guidetti, V.; Mandreoli, F.; Mariotti, L.; Belli, A.; Lombardi, P.
abstract

Knowledge Graphs (KGs) are large collections of structured data that can model real world knowledge and are important assets for the companies that employ them. KGs are usually constructed iteratively and often show a sparse structure. Also, as knowledge evolves, KGs must be updated and completed. Many automatic methods for KG Completion (KGC) have been proposed in the literature to reduce the costs associated with manual maintenance. Motivated by an industrial case study aiming to enrich a KG specifically designed for Natural Language Understanding tasks, this paper presents an overview of classical and modern deep learning completion methods. In particular, we delve into Large Language Models (LLMs), which are the most promising deep learning architectures. We show that their applications to KGC are affected by several shortcomings, namely they neglect the structure of KG and treat KGC as a classification problem. Such limitations, together with the brittleness of the LLMs themselves, stress the need to create KGC solutions at the interface between symbolic and neural approaches and lead to the way ahead for future research in intelligible corpus-based KGC.

2023 - Death After Liver Transplantation: Mining Interpretable Risk Factors for Survival Prediction [Relazione in Atti di Convegno]
Guidetti, V.; Dolci, G.; Franceschini, E.; Bacca, E.; Burastero, G. J.; Ferrari, D.; Serra, V.; Di Benedetto, F.; Mussini, C.; Mandreoli, F.
abstract

This study introduces a novel approach to mine risk factors for short-term death after liver transplantation (LT). The method outputs intelligible survival models by combining Cox's regression with a genetic programming technique known as multi-objective symbolic regression (MOSR). We consider 485 Electronic Health Records (EHRs) of patients who underwent LT, containing information on hospitalization and preoperative conditions, with a focus on infections and colonizations by multi-resistant Gram-negative bacteria. We evaluate MOSR outcomes against several performance metrics and demonstrate that they are well-calibrated, predictive, safe, and parsimonious. Finally, we select the most promising post-LT early survival risk score based on information criteria, performance, and out-of-distribution safety. Validating this technique at a multicenter level could improve service pipeline logistics through a trustworthy machine-learning method.

2022 - A Machine learning approach to predict Weight change in ART experienced PLWH [Esposizione]
Motta, Federico; Milić, Jovana; Barbieri, Sara; Gozzi, Licia; Aprile, Emanuele; Belli, Michela; Venuta, Maria; Cuomo, Gianluca; Carli, Federica; Dolci, Giovanni; Iadisernia, Vittorio; Burastero, Giulia; Mussini, Cristina; Mandreoli, Federica; Guaraldi, Giovanni
abstract

2022 - A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario [Articolo su rivista]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

Museums are embracing social technologies in an attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this article, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help to enhance the message and increase the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

2022 - Data-driven, AI-based clinical practice: experiences, challenges, and research directions [Relazione in Atti di Convegno]
Ferrari, Davide; Mandreoli, Federica; Motta, Federico; Missier, Paolo
abstract

Clinical practice is evolving rapidly, away from the traditional but inefficient detect-and-cure approach, and towards a Preventive, Predictive, Personalised and Participative (P4) vision that focuses on extending people’s wellness state. This vision is increasingly data-driven, AI-based, and is underpinned by many forms of "Big Health Data" including periodic clinical assessments and electronic health records, but also using new forms of self-assessment, such as mobile-based questionnaires and personal wearable devices. Over the last few years, we have been conducting a fruitful research collaboration with the Infectious Disease Clinic of the University Hospital of Modena having the main aim of exploring specific opportunities offered by data-driven AI-based approaches to support diagnosis, hospital organization and clinical research. Drawing from this experience, in this paper we provide an overview of the main research challenges that need to be addressed to design and implement data-driven healthcare applications. We present concrete instantiations of these challenges in three real-world use cases and summarise the specific solutions we devised to address them and, finally, we propose a research agenda that outlines the future of research in this field.

2022 - Machine learning algorithm to predict >5% Weight Gain in PWH switching to InSTI [Abstract in Atti di Convegno]
Guaraldi, Giovanni; Motta, Federico; Milić, Jovana; Barbieri, Sara; Gozzi, Licia; Aprile, Emanuele; Belli, Michela; Venuta, Maria; Cuomo, Gianluca; Carli, Federica; Dolci, Giovanni; Iadisernia, Vittorio; Burastero, Giulia; Mussini, Cristina; Mandreoli, Federica
abstract

2022 - Machine learning algorithm to predict >5% weight gain in PWH switching to INSTI [Poster]
Guaraldi, Giovanni; Motta, Federico; Milić, Jovana; Barbieri, Sara; Gozzi, Licia; Aprile, Emanuele; Belli, Michela; Venuta, Maria; Cuomo, Gianluca; Carli, Federica; Dolci, Giovanni; Iadisernia, Vittorio; Burastero, Giulia; Mussini, Cristina; Mandreoli, Federica
abstract

Background: Weight gain (WG) is a well-described phenomenon in PWH starting or switching ART. Machine learning (ML) methods is a tool of P4 medicine (Predictive, Preventive, Personalized & Participatory) and can generate models to identify patients at risk of WG. The objective was to develop an ML algorithm that predicts a 9-month WG≥5% in PLWH switching to InSTI with/without TAF. Methods: This was an observational study that comprised ART-experienced PWH attending Modena HIV metabolic clinic from 2004 to 2020. The patients' medical, HIV and ART data were partitioned in an 80/20 training/test set to generate predictive models. A ML model was used to leverage a hybrid approach where clinical expertise is applied along with data-driven analysis. The study outcome was the prediction at 9 months of weight change with a cut of 5%: at any patient visit (model 1) and in the subset of PWH switching to InSTI with/without TAF (model 2). 9-month prediction was chosen as being the minimum time occurring between any two given visits in the 95% of the cases. A robust implementation of linear regressor algorithms were able to predict weight gain/loss while tolerating missing data. Intelligible explanations were obtained through Shapley Additive exPlanations values (SHAP), which quantified the positive or negative impact of each variable included in each model on the predicted outcome. A measure of effectiveness (E-measure) was chosen as a performance metric, because unlike accuracy it can penalize errors, particularly underestimation ones. Results: A total of 2817 patients contributed to generate 10877 observations, which allowed construction of 2 predictive models based on 44-variables including anthropometric, HIV and laboratory biomarkers. At last observation median age was 51 years (IQR 11); 70% were male. Median CD4 nadir was 200 cells/μL (IQR 217), current CD4 was 659 cells/μL (IQR 372), 97% had undetectable VL and time since HIV diagnosis was 20 years (IQR 13). Median BMI was 23.4 (IQR 4.5) and 5.8% had obesity. The highest ranked variables used to train the models were weight at time of prediction and the ones depicted in the figure. Model 1 had accuracy of 84.4% and 83.9% E-measure; model 2 had accuracy of 84.4% and 86.4% E-measure. Conclusion: We developed a ML tool with a remarkable E-measure that may assist clinicians in decision-making and shift HIV care towards a P4 medicine. Immune-metabolic variables were more relevant than ART switching in the prediction of WG.

2022 - Machine learning algorithm to predict weight change in ART experienced PWH [Abstract in Atti di Convegno]
Motta, Federico; Milić, Jovana; Barbieri, Sara; Gozzi, Licia; Aprile, Emanuele; Belli, Michela; Venuta, Maria; Cuomo, Gianluca; Carli, Federica; Dolci, Giovanni; Iadisernia, Vittorio; Burastero, Giulia; Mussini, Cristina; Mandreoli, Federica; Guaraldi, Giovanni
abstract

2022 - Multi-Objective Symbolic Regression for Data-Driven Scoring System Management [Relazione in Atti di Convegno]
Ferrari, D.; Guidetti, V.; Mandreoli, F.
abstract

Scores are mathematical combinations of elementary indicators (EIs) widely used to measure complex phenomena. Upon the theoretical framework definition, score construction requires a method to aggregate EIs. Aggregation is usually chosen among known methodologies fixing its shape through a try and error approach. Only then are the predictive power, the distribution of the index, and its ability to stratify the population measured. In this paper, we propose a novel data-driven approach that generates analytic aggregation methods relying on multi-objective symbolic regression. We translate the properties that the index must exhibit into optimization goals so that optimal index candidates replicate target variables, data balancing, and stratification. We run experiments on real data sets to solve three main score management problems: data-driven score simplification, generation, and combination. The results obtained show the effectiveness and robustness of the proposed approach.

2022 - Real-world data mining meets clinical practice: Research challenges and perspective [Articolo su rivista]
Mandreoli, Federica; Ferrari, Davide; Guidetti, Veronica; Motta, Federico; Missier, Paolo
abstract

As Big Data Analysis meets healthcare applications, domain-specific challenges and opportunities materialize in all aspects of data science. Advanced statistical methods and Artificial Intelligence (AI) on Electronic Health Records (EHRs) are used both for knowledge discovery purposes and clinical decision support. Such techniques enable the emerging Predictive, Preventative, Personalized, and Participatory Medicine (P4M) paradigm. Working with the Infectious Disease Clinic of the University Hospital of Modena, Italy, we have developed a range of Data-Driven (DD) approaches to solve critical clinical applications using statistics, Machine Learning (ML) and Big Data Analytics on real-world EHRs. Here, we describe our perspective on the challenges we encountered. Some are connected to medical data and their sparse, scarce, and unbalanced nature. Others are bound to the application environment, as medical AI tools can affect people's health and life. For each of these problems, we report some available techniques to tackle them, present examples drawn from our experience, and propose which approaches, in our opinion, could lead to successful real-world, end-to-end implementations. DESY report number: DESY-22-153.

2022 - The interplay of post-acute COVID-19 syndrome and aging: a biological, clinical and public health approach [Articolo su rivista]
Guaraldi, Giovanni; Milic, Jovana; Cesari, Matteo; Leibovici, Leonard; Mandreoli, Federica; Missier, Paolo; Rozzini, Renzo; Cattelan, Anna Maria; Motta, Federico; Mussini, Cristina; Cossarizza, Andrea
abstract

The post-acute COVID-19 syndrome (PACS) is characterized by the persistence of fluctuating symptoms over three months from the onset of the possible or confirmed COVID-19 acute phase. Current data suggests that at least 10% of people with previously documented infection may develop PACS, and up to 50-80% of prevalence is reported among survivors after hospital discharge. This viewpoint will discuss various aspects of PACS, particularly in older adults, with a specific hypothesis to describe PACS as the expression of a modified aging trajectory induced by SARS CoV-2. This hypothesis will be argued from biological, clinical and public health view, addressing three main questions: (i) does SARS-CoV-2-induced alterations in aging trajectories play a role in PACS?; (ii) do people with PACS face immuno-metabolic derangements that lead to increased susceptibility to age-related diseases?; (iii) is it possible to restore the healthy aging trajectory followed by the individual before pre-COVID?. A particular focus will be given to the well-being of people with PACS that could be assessed by the intrinsic capacity model and support the definition of the healthy aging trajectory.

2022 - Visual Exploratory Data Analysis for Copy Number Variation Studies in Biomedical Research [Articolo su rivista]
Vischioni, C.; Bove, F.; Mandreoli, F.; Martoglia, R.; Pisi, V.; Taccioli, C.
abstract

The study of Copy Number Variations (CNVs) is recently emerging as a hot topic for biomedical cancer research. While different data sources, websites, and tools concerning genomic CNVs have been made publicly available, CNV data is still a largely unexplored source of biological information, due to the limitations of currently available analysis tools. To this respect, we propose a novel platform, named VarNuCopy, that overcomes such limitations by pursuing the core principles of Exploratory Data Analysis (EDA) in the context of Copy Number Variation (CNV) data. The platform has been made publicly available as a web application, and is, to our best knowledge, the first tool enabling visual, interactive exploration and analysis of the CNV landscape of multiple species. Through novel client and server-side optimizations inspired by scalable data science, VarNuCopy implements a comprehensive and efficient data exploration solution that empowers researchers to easily recognize complex trends and patterns within a huge amount of data concerning CNVs, and to identify new target genes that might function as tumor suppressor and oncogenes.

2022 - Work Datafication and Digital Work Behavior Analysis as a Source of HRM Insights [Capitolo/Saggio]
Fabbri, T.; Scapolan, A.; Bertolotti, F.; Mandreoli, F.; Martoglia, R
abstract

The digital transformation of organizations is boosting workplace networking and collaboration while making it “observable” with unprecedented timeliness and de-tail. However, the informational and managerial potential of work datafication is still largely unutilized in Human Resource Management (HRM) and its benefits, both at the individual and the organizational level, remain largely unexplored. Our research focuses on the relationship between digitally tracked work behaviors and employee attitudes and, in so doing, it explores work datafication as a source of data driven HRM policies and practices. As a chapter of a wider research pro-gram, this paper presents some data analysis we performed on a collection of En-terprise Collaboration Software (ECS) data, in search for promising correlations between behavioral and relational (digital) work patterns and employee attitudes. To this end, the digital actions performed by 106 employees in one year are trans-formed into a graph representation in order to analyze data under two different points of view: the individual (behavioral) perspective, according to the user who performed the action and the performed action, and the social (relational) perspec-tive, making explicit the interactions between users and the objects of their ac-tions. Different employees’ rankings are thus derived and correlated with their at-titudes. Finally, we discuss the obtained results and their implications in terms of People Analytics and data driven HRM.

2022 - miRNAs Copy Number Variations Repertoire as Hallmark Indicator of Cancer Species Predisposition [Articolo su rivista]
Vischioni, Chiara; Bove, Fabio; De Chiara, Matteo; Mandreoli, Federica; Martoglia, Riccardo; Pisi, Valentino; Liti, Gianni; Taccioli, Cristian
abstract

Aging is one of the hallmarks of multiple human diseases, including cancer. We hypothesized that variations in the number of copies (CNVs) of specific genes may protect some long-living organisms theoretically more susceptible to tumorigenesis from the onset of cancer. Based on the statistical comparison of gene copy numbers within the genomes of both cancer-prone and -resistant species, we identified novel gene targets linked to tumor predisposition, such as CD52, SAT1 and SUMO. Moreover, considering their genome-wide copy number landscape, we discovered that microRNAs (miRNAs) are among the most significant gene families enriched for cancer progression and predisposition. Through bioinformatics analyses, we identified several alterations in miRNAs copy number patterns, involving miR-221, miR-222, miR-21, miR-372, miR-30b, miR-30d and miR-31, among others. Therefore, our analyses provide the first evidence that an altered miRNAs copy number signature can statistically discriminate species more susceptible to cancer from those that are tumor resistant, paving the way for further investigations.

2021 - An HMM-ensemble approach to predict severity progression of ICU treatment for hospitalized COVID-19 patients [Relazione in Atti di Convegno]
Mandreoli, Federica; Motta, Federico; Missier, Paolo
abstract

COVID-19-related pneumonia requires different modalities of Intensive Care Unit (ICU) interventions at different times to facilitate breathing, depending on severity progression. The ability for clinical staff to predict how patients admitted to hospital will require more or less ICU treatment on a daily basis is critical to ICU management. For real datasets that are sparse and incomplete and where the most important state transitions (dismissal, death) are rare, a standard Hidden Markov Model (HMM) approach is insufficient, as it is prone to overfitting. In this paper we propose a more sophisticated ensemble-based approach that involves training multiple HMMs, each specialized in a subset of the state transitions, and then selecting the more plausible predictions either by selecting or combining the models. We have validated the approach on a live dataset of about 1,000 patients from a partner hospital. Our results show that rare events, as well as the transitions to the most severe treatments outperform state of the art approaches.

2021 - An HMM–ensemble approach to predict severity progression of ICU treatment for hospitalized Covid–19 patients [Esposizione]
Mandreoli, Federica; Motta, Federico; Missier, Paolo
abstract

COVID–19–related pneumonia requires different modalities of Intensive Care Unit (ICU) interventions at different times to facilitate breathing, depending on severity progression. The ability for clinical staff to predict how patients admitted to hospital will require more or less ICU treatment on a daily basis is critical to ICU management. For real datasets that are sparse and incomplete and where the most important state transitions (dismissal, death) are rare, a standard Hidden Markov Model (HMM) approach is insufficient, as it is prone to overfitting. In this paper we propose a more sophisticated ensemble-based approach that involves training multiple HMMs, each specialized in a subset of the state transitions, and then selecting the more plausible predictions either by selecting or combining the models. We have validated the approach on a live dataset of about 1,000 patients from a partner hospital. Our results show that rare events, as well as the transitions to the most severe treatments outperform state of the art approaches.

2021 - Autonomous, context-aware, adaptive Digital Twins—State of the art and roadmap [Articolo su rivista]
Hribernik, K.; Cabri, G.; Mandreoli, F.; Mentzas, G.
abstract

Digital Twins are an important concept in the comprehensive digital representation of manufacturing assets, products, and other resources, comprising their design and configuration, state, and behaviour. Digital Twins provide information about and services based on their physical counterpart's current condition, history and predicted future. They are the building blocks of a vision of future Digital Factories where stakeholders collaborate via the information Digital Twins provide about physical assets in the factory and throughout the product lifecycle. Digital Twins may also contribute to more flexible and resilient Digital Factories. To achieve this, Digital Twins will need to evolve from today's expert-centric tools towards active entities which extend the capabilities of their physical counterparts. Required features include sensing and processing their environment and situation, pro-actively communicating with each other, taking decisions towards their own or cooperative goals, and adapting themselves and their physical counterparts to achieve those goals. Future Digital Twins will need to be context-aware, autonomous, and adaptive. This paper aims to establish a roadmap for this evolution. It sets the scene by proposing a working definition of Digital Twins and examines the state-of-the-art in the three topics in their relation to DTs. It then elaborates potentials for each topic mapped against the working definition, to finally identify research gaps allowing for the definition of a roadmap towards the full realisation of autonomous, context-aware, adaptive Digital Twins as building blocks of tomorrow's Digital Factories.

2021 - Editorial: Big Data Management in Industry 4.0 [Articolo su rivista]
Firmani, D.; Leotta, F.; Mandreoli, F.; Mecella, M.
abstract

2021 - Unleashing the power of querying streaming data in a temporal database world: A relational algebra approach [Articolo su rivista]
Grandi, F.; Mandreoli, F.; Martoglia, R.; Penzo, W.
abstract

Modern data-intensive applications have to manage huge quantities of streaming/relational data and need advanced query capabilities involving combinations of continuous queries (CQs) and one-time queries (OTQs) also requiring the verification of complex temporal conditions. In this paper, we go beyond the disjointed panorama of current approaches and adopt a new holistic approach to the integration of stream processing capabilities into the temporal database world based on the streaming table concept. To this end, we propose a full-fledged query interface composed of a TSQL2-like query language with an underlying algebraic framework. The algebraic framework, which is aimed at implementing the query interface on top of a working DBMS, is made up of: (a) the extended temporal algebra TA⋆ supporting OTQs with an hybrid temporal semantics (sequenced and non-sequenced); (b) the continuous temporal algebra CTA that extends TA⋆ with window expressions for CQ specification; (c) the translation of CTA expressions into TA⋆ ones that can be executed by a traditional DBMS with an extended kernel.

2020 - Agilechains: Agile supply chains through smart digital twins [Relazione in Atti di Convegno]
Pernici, B.; Plebani, P.; Mecella, M.; Leotta, F.; Mandreoli, F.; Martoglia, R.; Cabri, G.
abstract

Currently, production and logistics performance of a single organization are only partially dependent on the internal resources, but more and more often, they also depend on the interactions that happen across the so-called supply chain, that is, the interactions between the organization and its customers and suppliers. In particular, the production and logistics coordination between actors in the supply chain is often a difficult activity which draws significant resources. Also, such coordination requires continuous revisions and updates to be performed. In Industry 4.0, the digital twins paradigm is currently adopted to represent, simulate and test the behavior of one or more machines and production plants belonging to an organization. This paper introduces the AgileChains paradigm, extending the digital twin paradigm to supply chains and the dynamics of their participants. This extension also positively affects the reactivity and resilience of the internal processes in case the supply chain has to be reconfigured. We propose a novel conceptual framework that combines Service Oriented Architectures (SOA) with Cyber-Physical Systems (CPS), in order to create service oriented systems suited for exchanging data in a dynamic and adaptive way. In addition, we propose a novel data management mechanism capable of finding the right balance between the internal needs of each organization when handling their data and the need to securely and efficiently export data in the supply chain (cf. smart data movement ). Finally, we plan to define governance tools to model and manage the supply chain that treat agility as a first-class citizen. These tools will allow users to dynamically and predictively change the involved actors, as well as the nature of the exchanged data and the data exchange policies, focusing in particular on adverse, risk-prone events, so to minimize the risk and to optimize the supply chain performance both in terms of efficiency and effectiveness.

2020 - Data-driven vs knowledge-driven inference of health outcomes in the ageing population: A case study [Relazione in Atti di Convegno]
Ferrari, D.; Guaraldi, G.; Mandreoli, F.; Martoglia, R.; Milić, Jovana; Missier, Paolo
abstract

Preventive, Predictive, Personalised and Participative (P4) medicine has the potential to not only vastly improve people's quality of life, but also to significantly reduce healthcare costs and improve its efficiency. Our research focuses on age-related diseases and explores the opportunities offered by a data-driven approach to predict wellness states of ageing individuals, in contrast to the commonly adopted knowledge-driven approach that relies on easy-to-interpret metrics manually introduced by clinical experts. This is done by means of machine learning models applied on the My Smart Age with HIV (MySAwH) dataset, which is collected through a relatively new approach especially for older HIV patient cohorts. This includes Patient Related Outcomes values from mobile smartphone apps and activity traces from commercial-grade activity loggers. Our results show better predictive performance for the data-driven approach. We also show that a post hoc interpretation method applied to the predictive models can provide intelligible explanations that enable new forms of personalised and preventive medicine.

2020 - Machine learning in predicting respiratory failure in patients with COVID-19 pneumonia - challenges, strengths, and opportunities in a global health emergency. [Articolo su rivista]
Ferrari, D; Milic, J; Tonelli, R; Ghinelli, F; Meschiari, M; Volpi, S; Faltoni, M; Franceschi, G; Iadisernia, V; Yaacoub, D; Ciusa, G; Bacca, E; Rogati, C; Tutone, M; Burastero, G; Raimondi, A; Menozzi, M; Franceschini, E; Cuomo, G; Corradi, L; Orlando, G; Santoro, A; Di Gaetano, M; Puzzolante, C; Carli, F; Borghi, V; Bedini, A; Fantini, R; Tabbì, L; Castaniere, I; Busani, S; Clini, E; Girardis, M; Sarti, M; Cossarizza, A; Mussini, C; Mandreoli, F; Missier, P; Guaraldi, G.
abstract

Aims- The aim of this study was to estimate a 48 hour prediction of moderate to severe respiratory failure, requiring mechanical ventilation, in hospitalized patients with COVID-19 pneumonia. Methods- This was an observational study that comprised consecutive patients with COVID-19 pneumonia admitted to hospital from 21 February to 6 April 2020. The patients’ medical history, demographic, epidemiologic and clinical data were collected in an electronic patient chart. The dataset was used to train predictive models using an established machine learning framework leveraging a hybrid approach where clinical expertise is applied alongside a data-driven analysis. The study outcome was the onset of moderate to severe respiratory failure defined as PaO 2 /FiO 2 ratio <150 mmHg in at least one of two consecutive arterial blood gas analyses in the following 48 hours. Shapley Additive exPlanations values were used to quantify the positive or negative impact of each variable included in each model on the predicted outcome. Results- A total of 198 patients contributed to generate 1068 usable observations which allowed to build 3 predictive models based respectively on 31-variables signs and symptoms, 39-variables laboratory biomarkers and 91-variables as a composition of the two. A fourth “boosted mixed model” included 20 variables was selected from the model 3, achieved the best predictive performance (AUC=0.84) without worsening the FN rate. Its clinical performance was applied in a narrative case report as an example. Conclusion- This study developed a machine model with 84% prediction accuracy, which is able to assist clinicians in decision making process and contribute to develop new analytics to improve care at high technology readiness levels.

2020 - Predicting Respiratory Failure in Patients with COVID-19 pneumonia: a case study from Northern Italy [Relazione in Atti di Convegno]
Ferrari, Davide; Mandreoli, Federica; Guaraldi, Giovanni; Milić, Jovana; Missier, Paolo
abstract

The Covid-19 crisis caught health care services around the world by surprise, putting unprecedented pressure on Intensive Care Units (ICU). To help clinical staff to manage the limited ICU capacity, we have developed a Machine Learning model to estimate the probability that a patient admitted to hospital with COVID-19 symptoms would develop severe respiratory failure and require Intensive Care within 48 hours of admission. The model was trained on an initial cohort of 198 patients admitted to the Infectious Disease ward of Modena University Hospital, in Italy, at the peak of the epidemic, and subsequently refined as more patients were admitted. Using the LightGBM Decision Tree ensemble approach, we were able to achieve good accuracy (AUC = 0.84) despite a high rate of missing values. Furthermore, we have been able to provide clinicians with explanations in the form of personalised ranked lists of features for each prediction, using only 20 out of more than 90 variables, using Shapley values to describe the importance of each feature.

2020 - The FIRST (vF Interoperation suppoRting buSiness innovaTion) Project: Service Management for Virtual Factories [Relazione in Atti di Convegno]
Bai, Y.; Bose, S.; Cabri, G.; de Vrieze, P.; Eder, N.; Lazovik, A.; Mandreoli, F.; Mecella, M.; Mu, H.; Xu, L.
abstract

The H2020 FIRST project addresses the virtual factories, which are digital abstractions of real factories. The exploitation of virtual factories enables interoperability between real components inside a factory as well as between different factories belonging to the same supply chain. Moreover, virtual factories can be exploited to manage and compose services inside a factory, defining dynamic adaptation of set of services depending on high-level goals. In this paper we sketch the project results and its current state.

2020 - Towards Smart Manufacturing with Dynamic Dataspace Alignment [Relazione in Atti di Convegno]
Firmani, Donatella; Leotta, Francesco; Mandreoli, Federica; Mecella, Massimo
abstract

The technological foundation of smart manufacturing consists of cyber-physical systems and the Internet-of-Things (IoT). Despite smart manufacturing has become a key paradigm to promote the integration of manufacturing processes using digital technologies, the manufacturing processes themselves are designed by human experts in a traditional way and have limited ability to adapt their behavior to exceptional circumstances. We leverage the fact that each IoT device in a smart factory can be coupled with a digital twin – that is, a software artefact that faithfully represents the physical system using real-time sensor data – to envision a software architecture to support adaptation of the manufacturing process when divergence from reference practices occur.

2020 - VarCopy: A Visual Exploratory Data Analysis Platform for Copy Number Variation Studies [Relazione in Atti di Convegno]
Bove, F.; Mandreoli, F.; Martoglia, R.; Pisi, V.; Taccioli, C.; Vischioni, C.
abstract

The study of such a complex phenomenon as cancer, which depends on several but unexplored and unclear factors, needs new ways to visualize, analyze and combine different data both on species characteristics and genes function. To this respect, we propose a novel platform, named VarCopy, supporting visual Exploratory Data Analysis (EDA) in the context of Copy Number Variation (CNV) data. The platform will be publicly available as a web application soon, and is, to our best knowledge, the first tool allowing visual, interactive exploration and analysis of the CNV landscape of multiple species, allowing the identification of new target genes that might be useful for biomedical research.

2020 - Work datafication and digital work behavior analysis as a source of social good [Relazione in Atti di Convegno]
Bertolotti, F.; Fabbri, T.; Mandreoli, F.; Martoglia, R.; Scapolan, A. C.
abstract

The digital transformation of organizations is boosting workplace networking and collaboration while making it 'observable' with unprecedented timeliness and detail. However, the informational and managerial potential of work datafication is still largely unutilized in Human Resource Management (HRM) and its social benefits, both at the individual and the organizational level, remain largely unexplored. Our research focuses on the relationship between digitally tracked work behaviors and employee attitudes and, in so doing, it explores work datafication as a source of social good. As part of a wider research program, this paper presents some data analysis we performed on a collection of Enterprise Collaboration Software (ECS) data, in search for promising correlations between behavioral and relational (digital) work patterns and employee attitudes. To this end, we transformed the digital actions performed by 106 employees during a one year period into a graph representation to analyze data under two different points of view: the individual (behavioral) perspective, according to the user who performed the action and the action undertaken, and the social (relational) perspective, making explicit the interactions between users and the objects of their actions. Different employees' rankings are thus derived and correlated with their attitudes. We discuss the obtained results and their benefits in terms of perspective social good for both the company and the employee

2019 - A Conceptual Architecture and Model for Smart Manufacturing Relying on Service-Based Digital Twins [Relazione in Atti di Convegno]
Catarci, Tiziana; Firmani, Donatella; Leotta, Francesco; Mandreoli, Federica; Mecella, Massimo; Sapio, Francesco
abstract

The technological foundation of smart manufacturing consists of cyber-physical systems and the Internet-of-Things (IoT). Each IoT device in a smart factory can be coupled with a digital twin, that is, a dynamic virtual representation of the physical system across its life-cycle using real-time sensor data. Currently, the manufacturing process itself, the involved devices, and how they interact, is designed by human experts in a traditional way. We envision an architecture where humans can instead specify a goal and take advantage of technologies such as digital twins to automatically compose the corresponding physical processes, sharing some analogies with the notion of Web service composition

2019 - An architectural approach for digital factories [Relazione in Atti di Convegno]
Bicocchi, N.; Cabri, G.; Leotta, F.; Mandreoli, F.; Mecella, M.; Sapio, F.
abstract

Digital factories comprise a multi-layered integration of various activities along the factories and product life-cycles. A central aspect of a digital factory is that of enabling the product lifecycle stakeholders to collaborate through the use of software solutions. The digital factory expands outside the company boundaries and allows to collaborate on business processes over the whole supply chain. This extended abstract, based on a recently published paper, discusses an interoperability architecture for digital factories. It analyses the key requirements for enabling a scalable factory architecture characterized by access to services, aggregation of data, and orchestration of production processes.

2019 - Dealing With Data Heterogeneity in a Data Fusion Perspective: Models, Methodologies, and Algorithms [Capitolo/Saggio]
Mandreoli, F.; Montangero, M.
abstract

Dealing with multiple manifestations of the same real-world entity across several data sources is a very common challenge for many modern applications, including life science applications. This challenge is referenced as data heterogeneity in the data management research field where the final aim is often to get a unified or integrated view of the real-world entities represented in the data sources. Data heterogeneity is a long-standing challenge that has attracted much interest in different computer science disciplines. The main aim of the chapter is to show how data heterogeneity problems that are typical of life science application contexts can be afforded by adopting systematic solutions stemming from the computer science field. To this end, it focusses on the main sources of heterogeneity in the life science context, presents the main problems that arise when dealing with heterogeneity, and provides a review of the solutions proposed in the computer science literature.

2019 - Dynamic digital factories for agile supply chains: An architectural approach [Articolo su rivista]
Bicocchi, Nicola; Cabri, Giacomo; Mandreoli, Federica; Mecella, Massimo
abstract

Digital factories comprise a multi-layered integration of various activities along the factories and product lifecycles. A central aspect of a digital factory is that of enabling the product lifecycle stakeholders to collaborate through the use of software solutions. The digital factory thus expands outside the company boundaries and offers the opportunity to collaborate on business processes affecting the whole supply chain. This paper discusses an interoperability architecture for digital factories. To this end, it delves into the issue by analysing the key requirements for enabling a scalable factory architecture characterized by access to services, aggregation of data, and orchestration of production processes. Then, the paper revises the state-of-the-art w.r.t. these requirements and proposes an architectural framework conjugating features of both service-oriented and data-sharing architectures. The framework is exemplified through a case study.

2019 - EU H2020 MSCA RISE Project FIRST - “virtual Factories: Interoperation suppoRting buSiness innovation” [Capitolo/Saggio]
Boese, Stephan; Cabri, Giacomo; Eder, Norbert; Mandreoli, Federica; Lazovik, Alexander; Mecella, Massimo; Phalp, Keith; de Vrieze, Paul; Xu, Lai; Yu, Hongnian
abstract

FIRST – “virtual Factories: Interoperation suppoRting buSiness innovation”, is a European H2020 project, founded by the RESEARCH AND INNOVATION STAFF EXCHANGE (RISE) Work Programme as part of the Marie Skłodowska-Curie actions. The project concerns with Manufacturing 2.0 and aims at providing the new technology and methodology to describe manufacturing assets; to compose and integrate the existing services into collaborative virtual manufacturing processes; and to deal with evolution of changes. This Chapter provides an overview of the state of the art for the research topics related to the project research objectives, and then it presents the progresses the project achieved up to now towards the implementation of the proposed innovations.

2019 - Employee attitudes and (Digital) collaboration data: a preliminary analysis in the HRM field [Relazione in Atti di Convegno]
Fabbri, T.; Mandreoli, F.; Martoglia, R.; Scapolan, A. C.
abstract

The digital transformation of organizations is making workplace collaboration more and more powerful and work always "observable"; however, the informational and managerial potential of the generated data is still largely unutilized in Human Resource Management (HRM). Our research, conducted in collaboration with business engineers and economists, aims at exploring the relationship between digital work behaviors and employee attitudes. This paper is a work-in-progress contribution that presents a preliminary phase of data analysis we performed on a collection of Enterprise Collaboration Software (ECS) data. In the exploratory data analysis step, we analyze data in their original table format and elaborate it according to the user who performed the action and the performed action. Then, we move to a graph representation in order to make explicit the interaction between users and the objects of their actions. Finally, we introduce the concept of employee-attitude-oriented pattern as a mean to derive significant views over the overall graph and discuss Social Network Analysis (SNA) approaches that can be exploited for our purposes.

2019 - Fitness tracking wearable devices and a dedicated smart phone app (MySAwH App) to predict quality of life in PLWH: a multi-centre prospective study [Abstract in Atti di Convegno]
Guaraldi, G; Orsini, M; Caselgrandi, A; Malagoli, A; D'Imprima, F; Milic, J; Ghinelli, F; Martoglia, R; Mandreoli, F; Ferrari, D; Liu, G; Bloch, M
abstract

2019 - Towards Patient-Centric Healthcare: Multi-Version Ontology-Based Personalization of Clinical Guidelines [Capitolo/Saggio]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo
abstract

Retrieving personalized care plans from a guideline repository is an ever-increasing need in the medical world, not only for physicians but also for empowered patients. In this chapter, we continue our long-lasting research on ontology-based personalized access to very large collections of multi-version documents by addressing a novel challenge: dealing with multi-version clinical guidelines but also with a multi-version ontology used to support personalized access to them. Efficiency is ensured by a newly introduced annotation scheme for guidelines and solutions to cope with the evolution of ontology structure. The tests performed on a prototype implementation confirm the goodness of the approach. Finally, the chapter proposes an exhaustive analysis of the state of the art in this field and, in the final part, a discussion where we expand our vision to related research themes and possible further developments of our work.

2018 - 5 steps to make art museums tweet influentially [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

A growing number of museums has started using social networks as different forms of engagement that can act outside museum architectural bounds. Specifically, museum leaders are praising Twitter as a necessary tool to any online programming or presence in museums today. Nevertheless, using Twitter in a satisfactory way so to increase museums' influence is not an easy task and there has been a gap between its usage and the possibilities it represents. In this paper, we propose an easily understandable framework to analyze the key content factors in museum conversations, including novel formulas for the evaluation of tweets and Twitter accounts influence. We apply the framework to a dataset of 100,000 messages related to 26 museum accounts to understand which museum is more influential in writing tweets, and which features have more impact on the influence of a tweet. Finally, we propose 5 key steps that museums can perform in order to write more influential tweets.

2018 - Dealing with data and software interoperability issues in digital factories [Relazione in Atti di Convegno]
Bicocchi, Nicola; Cabri, Giacomo; Mandreoli, Federica; Mecella, Massimo
abstract

The digital factory paradigm comprises a multi-layered integration of the information related to various activities along the factory and product lifecycle manufacturing related resources. A central aspect of a digital factory is that of enabling the product lifecycle stakeholders to collaborate through the use of software solutions. The digital factory thus expands outside the actual company boundaries and offers the opportunity for the business and its suppliers to collaborate on business processes that affect the whole supply chain. This paper discusses an interoperability architecture for digital factories. To this end, it delves into the issue by analysing the main challenges that must be addressed to support an integrated and scalable factory architecture characterized by access to services, aggregation of data, and orchestration of production processes. Then, it revises the state of the art in the light of these requirements and proposes a general architectural framework conjugating the most interesting features of serviceoriented architectures and data sharing architectures. The study is exemplified through a case study.

2018 - Standards, Security and Business Models: Key Challenges for the IoT Scenario [Articolo su rivista]
Bujari, Armir; Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela; Ronzani, Daniele
abstract

The number of physical objects connected to the Internet constantly grows and a common thought says the IoT scenario will change the way we live and work. Since IoT technologies have the potential to be pervasive in almost every aspect of a human life, in this paper, we deeply analyze the IoT scenario. First, we describe IoT in simple terms and then we investigate what current technologies can achieve. Our analysis shows four major issues that may limit the use of IoT (i.e., interoperability, security, privacy, and business models) and it highlights possible solutions to solve these problems. Finally, we provide a simulation analysis that emphasizes issues and suggests practical research directions.

2018 - Towards tweet content suggestions for museum media managers [Relazione in Atti di Convegno]
Furini, Marco; Martoglia, Riccardo; Mandreoli, Federica; Montangero, Manuela
abstract

Cultural Heritage institutions are embracing social technologies in the attempt to provide an effective communication towards citizens. Although it seems easy to reach millions of people with a simple message posted on social media platforms, media managers know that practice is different from theory. Millions of posts are competing every day to get visibility in terms of likes and retweets. The way text, images, hashtags and links are combined together is critical for the visibility of a post. In this paper, we propose to exploit machine learning techniques in order to predict whether a tweet will likely be appreciated by Twitter users or not. Through an experimental assessment, we show that it is possible to provide insights about the tweet features that will likely influence its reception/recommendation among readers. The preliminary tests, performed on a real-world dataset of 19,527 museum tweets, show promising accuracy results.

2017 - A Framework for user-driven mapping discovery in rich spaces of heterogeneous data [Relazione in Atti di Convegno]
Mandreoli, Federica
abstract

Data analysis in rich spaces of heterogeneous data sources is an increasingly common activity. Examples include exploratory data analysis and personal information management. Mapping specification is one of the key issues in this data management setting that answer to the need of a unified search over the full spectrum of relevant knowledge. Indeed, while users in data analytics are engaged in an open-ended interaction between data discovery and data orchestration, most of the solutions for mapping specification available so far are intended for expert users. This paper proposes a general framework for a novel paradigm for user-driven mapping discovery where mapping specification is interactively driven by the information seeking activities of users and the exclusive role of mappings is to contribute to users satisfaction. The underlying key idea is that data semantics is in the eye of the consumers. Thus, we start from user queries which we try to satisfy in the dataspace. In this process of satisfaction, we often need to discover new mappings, to expose the user to the data thereby discovered for their feedback, and possibly continued towards user satisfaction. The framework is made up of (a) a theoretical foundation where we formally introduce the notion of candidate mapping sets for a user query, and (b) an interactive and incremental algorithm that, given a user query, finds a candidate mapping set that satisfies the user. The algorithm incrementally builds the candidate mapping set by searching in the dataspace data samples and deriving mapping lattices that are explored to deliver mappings for user feedback. With the aim of fitting the user information need in a limited number of interactions, the algorithm provides for a multi-criteria selection strategy for candidate mapping sets. Finally, a proof of the correctness of the algorithm is provided in the paper.

2017 - A relational algebra for streaming tables living in a temporal database world [Relazione in Atti di Convegno]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

The recently introduced streaming table concept, a fully native representation of streaming data inside a DBMS, enabled modern data-intensive applications with one-time queries (OTQs) and continuous queries (CQs) capabilities on both streaming and standard relational tables. In this paper, we fully acknowledge the temporal nature of streaming tables and we propose to go one step further and integrate them in a temporal DBMS context, where time management is native. Our aim is to break the traditional barrier between the streaming and the temporal worlds, offering complete interoperability between streams and temporal data. To this end, we present a continuous temporal algebra supporting both OTQs and CQs seamlessly on streaming, standard and temporal relational tables. We further show how the transition from continuous to one-time semantics can be managed by defining suitable translation rules, which can also be used as a basis for the implementation of the proposed continuous algebra in a temporal DBMS.

2017 - From Data Integration to Big Data Integration [Capitolo/Saggio]
Bergamaschi, Sonia; Beneventano, Domenico; Mandreoli, Federica; Martoglia, Riccardo; Guerra, Francesco; Orsini, Mirko; Po, Laura; Vincini, Maurizio; Simonini, Giovanni; Zhu, Song; Gagliardelli, Luca; Magnotta, Luca
abstract

Abstract. The Database Group (DBGroup, www.dbgroup.unimore.it) and Information System Group (ISGroup, www.isgroup.unimore.it) re- search activities have been mainly devoted to the Data Integration Research Area. The DBGroup designed and developed the MOMIS data integration system, giving raise to a successful innovative enterprise DataRiver (www.datariver.it), distributing MOMIS as open source. MOMIS provides an integrated access to structured and semistructured data sources and allows a user to pose a single query and to receive a single unified answer. Description Logics, Automatic Annotation of schemata plus clustering techniques constitute the theoretical framework. In the context of data integration, the ISGroup addressed problems related to the management and querying of heterogeneous data sources in large-scale and dynamic scenarios. The reference architectures are the Peer Data Management Systems and its evolutions toward dataspaces. In these contexts, the ISGroup proposed and evaluated effective and efficient mechanisms for network creation with limited information loss and solutions for mapping management query reformulation and processing and query routing. The main issues of data integration have been faced: automatic annotation, mapping discovery, global query processing, provenance, multi- dimensional Information integration, keyword search, within European and national projects. With the incoming new requirements of integrating open linked data, textual and multimedia data in a big data scenario, the research has been devoted to the Big Data Integration Research Area. In particular, the most relevant achieved research results are: a scalable entity resolution method, a scalable join operator and a tool, LODEX, for automatically extracting metadata from Linked Open Data (LOD) resources and for visual querying formulation on LOD resources. Moreover, in collaboration with DATARIVER, Data Integration was successfully applied to smart e-health.

2017 - IoT: Science Fiction or real revolution? [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

It's been many years since media began talking about the wonders of the IoT scenario, where a smart fridge checks the milk ex- piration date and automatically compiles the shopping list, but in the real life how many people have this smart fridge in the kitchen? Yet the interest around the IoT scenario is growing every day, so in this paper we try to figure out if IoT is science fiction or a real revolution. In particu- lar, we describe in simple terms the IoT scenario, what can be done with current technologies, what are the main obstacles that limit the success and the wide use of IoT and we highlight directions that can make IoT a true reality.

2017 - Multi-version ontology-based personalization of clinical guidelines for patient-centric healthcare [Articolo su rivista]
Grandi, Fabio; Mandreoli, Federica; Martoglia, Riccardo
abstract

When dealing with a specific patient case, physicians are often interested in retrieving a personalized version of a clinical guideline, that is a version tailored to their use needs. In a patient-centric scenario, empowered patients make up another class of users interested in retrieving personalized care plans from a guideline repository. In their previous work, the authors proposed techniques to efficiently provide ontology-based personalized access to very large collections of multi-version clinical guidelines. In this paper, they address the problem of also dealing with a multi-version ontology used to support personalized access to clinical guidelines. The authors' approach allows the semantic indexing of guideline contents with respect to multi-version ontology classes and exploits the IS-A relationship among such classes for granting personalized access. Efficiency is ensured by a newly introduced annotation scheme for guidelines and solutions to cope with the evolution of ontology structure. The tests performed on a prototype implementation confirm the goodness of the approach.

2017 - Streaming Tables: Native Support to Streaming Data in DBMSs [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

Data stream management systems (DSMSs) are conceived for running continuous queries (CQs) on the most recently streamed data. This model does not completely fit the needs of several modern data-intensive applications that require to manage recent/historical/static data and execute both CQs and OTQs joining such data. In order to cope with these new needs, some DSMSs have moved toward the integration of database management systems (DBMSs) functionalities to augment their capabilities. In this paper we adopt the opposite perspective and we lay the groundwork for extending DBMSs to natively support streaming facilities. To this end, we introduce a new kind of table, the streaming table, as a persistent structure where streaming data enters and remains stored for a long period, ideally forever. Streaming tables feature a novel access paradigm: continuous writes and one-time as well as continuous reads. We present a streaming table implementation and two novel types of indices that efficiently support both update and scan high rates. A detailed experimental evaluation shows the effectiveness of the proposed technology.

2017 - The Use of Hashtags in the Promotion of Art Exhibitions [Relazione in Atti di Convegno]
Furini, Marco; Mandreoli, Federica; Martoglia, Riccardo; Montangero, Manuela
abstract

Hashtags are increasingly used to promote, foster and group conversations around specific topics. For example, the entertainment industry widely uses hashtags to increase interest around their products. In this paper, we analyze whether hashtags are effective in a niche scenario like the art exhibitions. The obtained results show very different behaviors and confused strategies: from museums that do not consider hashtags at all, to museums that create official hastags, but hardly mention them; from museums that create multiple hashtags for the same exhibition, to those that are very confused about hashtag usage. Furthermore, we discovered an interesting case, where a smart usage of hashtags stimulated the interest around art. Finally, we highlight few practical guidelines with behaviors to follow and to avoid; the guidelines might help promoting art exhibitions.

2016 - A Data Management Middleware for ITS Services in Smart Cities [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

A major societal challenge to be tackled in megacities is sustainable urban transportation. Intelligent Transportation Systems (ITSs) are actually data-centric applications that need to store and query real-time as well as historical/static data from various data sources and have to provide timely responses to users' transportation needs. In this paper we introduce a data management middleware that offers the robustness of a common framework to support the development of smart applications having the above needs. It supports the efficient storage and access to real-time and historical/static data and provides both one-time and continuous query capabilities. While the middleware has been designed to be general and versatile to support data management for any kind of application, in this paper we explore its suitability to ITS smart services also by means of an experimental evaluation conducted on a variety of traffic scenarios.

2016 - Journal of Computer and System Sciences Special Issue on Query Answering on Graph-Structured Data [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

Graph-based data models have recently gained much popularity as powerful means for data representation in several database application areas. Notable examples of application domains where data is naturally represented in graph-based form are knowledge bases, biological and chemical databases, Web-scattered data, healthcare, personal information management (PIM), enterprise information management (EIM) systems, online mapping/routing services, and social networks, just to mention a few. The heterogeneity, complexity and largeness of contents that characterize datasets in these fields unquestionably make the querying experience a really challenging task. This special issue of the Journal of Computer and System Sciences follows the 2013 and 2014 editions of the International Workshop on Querying Graph Structured Data (GraphQ), which were co-located with the International Conference on Extending Database Technology and were held in Genoa, Italy and in Athens, Greece, respectively. The two editions of the workshop attracted a large world-wide audience of researchers and professionals, and yielded several excellent presentations exploring how to effectively and efficiently support graph queries in different application domains. This special issue includes a shortlist of selected contributions that were extended to provide deeper investigations along three main research directions: (1) graph query answering; (2) graph query processing; (3) graph data dynamics.

2016 - No users no dataspaces! Query-driven dataspace orchestration [Relazione in Atti di Convegno]
Mandreoli, Federica; Fletcher, George H. L.
abstract

Data analysis in rich spaces of heterogeneous data sources is an increasingly common activity. Examples include querying the web of linked data and personal information management. Such analytics on dataspaces is often iterative and dynamic, in an open-ended interaction between discovery and data orchestration. The current state of the art in integration and orchestration in dataspaces is primarily geared towards close-ended analysis, targeting the discovery of stable data mappings or one-time, pay-as-you-go ad hoc data mappings. The perspective here is dataspace-centric. In this paper, we propose a shift to a user-centric perspective on dataspace orchestration. We outline basic conceptual and technical challenges in supporting data analytics which is open-ended and always evolving, as users respond to new discoveries and connections.

2015 - Approximating expressive queries on graph-modeled data: The GeX approach [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

We present the GeX (Graph-eXplorer) approach for the approximate matching of complex queries on graph-modeled data. GeX generalizes existing approaches and provides for a highly expressive graph-based query language that supports queries ranging from keyword-based to structured ones. The GeX query answering model gracefully blends label approximation with structural relaxation, under the primary objective of delivering meaningfully approximated results only. GeX implements ad-hoc data structures that are exploited by a top-k retrieval algorithm which enhances the approximate matching of complex queries. An extensive experimental evaluation on real world datasets demonstrates the efficiency of the GeX query answering.

2015 - Effective Aggregation and Querying of Probabilistic RFID Data in a Location Tracking Context [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

RFID applications usually rely on RFID deployments to manage high-level events such as tracking the location that products visit for supply-chain management, localizing intruders for alerting services, and so on. However, transforming low-level streams into high-level events poses a number of challenges. In this paper, we deal with the well known issues of data redundancy and data-information mismatch: we propose an on-line summarization mechanism that is able to provide small space representation for massive RFID probabilistic data streams while preserving the meaningfulness of the information. We also show that common information needs, i.e. detecting complex events meaningful to applications, can be effectively answered by executing temporal probabilistic SQL queries directly on the summarized data. All the techniques presented in this paper are implemented in a complete framework and successfully evaluated in real-world location tracking scenarios.

2014 - Advanced Data Management for real-time data intensive applications and services [Articolo su rivista]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo
abstract

This work focuses on Data Management for data intensive real-time applications and services in a mobile and pervasive transportation scenario. It presents the main goals achieved in the course of the PEGASUS Project, a project funded by Industria 2015 programme and having the overall goal to build an advanced Intelligent Transportation System (ITS).

2014 - Data management techniques for active RFID applications [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

In the last several years, RFID technology has gained significant popularity due to its ability of de- tecting objects and people carrying small RFID tags in an environment equipped with RFID readers. This research involved the design, implementation and experimental evaluation of a realtime system that ad- dresses the above mentioned data management issues in the context of RFID location tracking systems.

2014 - Online filtering and uncertainty management techniques for rfid data processing [Articolo su rivista]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo
abstract

RFID is one of the emerging technologies for a wide-range of applications, including supply chain and asset management, healthcare and intruder localization. However, the nature of an RFID data stream is noisy, redundant and unreliable, making it unsuitable for direct use in applications. In this paper, we propose specific RFID Online Filtering and Uncertainty Management techniques that operate on unreliable and imprecise data streams in order to transform them into reliable probabilistic data that can be meaningful to the applications. Our proposal makes use of an Hidden Markov Model (HMM) that continuously infers hidden variables (locations, in case of above example) based on sensor readings. The resulting data can be directly stored in a probabilistic database table for further analysis. All the techniques presented in this paper are implemented in a complete framework and succesfully evaluated in real-world object tracking scenarios.

2014 - RPDM: A System for RFID Probabilistic Data Management [Articolo su rivista]
Razia, Haider; Mandreoli, Federica; Martoglia, Riccardo
abstract

Data streams are more and more commonly generated in a large number of scenarios by audio and video devices, Global Positioning System (GPS), Radio Frequency Identification (RFID) and other types of sensors. In particular, RFID technology has recently gained significant popularity, especially for real-time people and goods tracking, however the noisy, redundant and unreliable nature of RFID streams, coupled with their huge size, can make their exploitation and management difficult. In this paper, we present a realtime system for RFID Probabilistic Data Management (RPDM). The system manages unreliable and noisy raw RFID data and transforms them into reliable meaningful probabilistic data streams by means of a newly proposed method based on a probabilistic Hidden Markov Model (HMM). Moreover, to handle the huge data volume generated by RFID deployments, RPDM proposes and implements a simple on-line summarization mechanism, which is able to provide small space representation for the massive RFID probabilistic data streams while preserving the meaningful information. The results are promptly stored in a probabilistic database, in such a way that a wide range of probabilistic queries can be submitted and answered effectively. The experimental evaluation proves the feasibility of the approach in real-world object tracking scenarios.

2014 - UCbase 2.0: ultraconserved sequences database (2014 update) [Articolo su rivista]
Lomonaco, V; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, L; Emmett, W; Bicciato, Silvio; Taccioli, C.
abstract

UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it.

2013 - 4th Workshop on Advances in Programming Languages [Curatela]
Lukovic, I.; Mernik, M.; Slivnik, B.; Janousek, J.; Aycock, J.; Chen, H.; Henriques, P. R.; Horvath, Z.; Ivanovic, M.; Kardas, G.; Kollar, J.; Kosar, T.; Liu, S. -H.; Lukovic, I.; Mandreoli, F.; Martinez Lopez, P. E.; Mernik, M.; Milasinovic, B.; Moessenboeck, H.; Papaspyrou, N.; Pereira, M. J. V.; Poruban, J.; Rodriguez, J. L. S.; Slivnik, B.; Splawski, Z.; Watson, B.
abstract

2013 - A Framework for ITS Data Management in a Smart City Scenario [Relazione in Atti di Convegno]
Carafoli, Luca; Mandreoli, Federica; Martoglia, Riccardo; Penzo, W.
abstract

In this paper we introduce a technological framework to efficiently support data management in a modern Intelligent Transportation System (ITS). The proposed technology enables the efficient storage of a variety of recent/historical/static data and guarantees its effective querying by supporting continuous as well as one-time queries for the delivering of real-time traffic services. The framework also offers a scalable solution for coping with the acquisition of huge volumes of data by employing data reduction techniques in Vehicle-to-Infrastructure transmissions. Experimental evaluation on the Linear Road ITS benchmark and along various simulated scenarios demonstrates that the proposed framework efficiently supports smart city data needs.

2013 - UNIMORE at ImageCLEF 2013: Scalable Concept Image Annotation [Relazione in Atti di Convegno]
Grana, Costantino; Serra, Giuseppe; Manfredi, Marco; Cucchiara, Rita; Martoglia, Riccardo; Mandreoli, Federica
abstract

In this paper we propose a large-scale Image annotation system for the Scalable Concept Image Annotation task. For each concept to be detected a separated classifier is built using the provided textual annotation. Images are represented as a Multivariate Gaussian distribution of a set of local features extracted over a dense regular grid. Textual analysis, on the web pages containing training images, is performed to retrieve a relevant set of samples for learning each concept classifier. An online SVMs solver based on Stochastic Gradient Descent is used to manage the large amount of training data. Experimental results show that the combination of different kind of local features encoded with our strategy achieves very competitive performance both in terms of mAP and mean F-measure.

2013 - Wearable Queries: Adapting Common Retrieval Needs to Data and Users [Relazione in Atti di Convegno]
Catania, Barbara; Guerrini, Giovanna; Belussi, Alberto; Mandreoli, Federica; Martoglia, Riccardo; Penzo, Wilma
abstract

The wealth of information generated by users interacting with the network and its applications is often under-utilized due to complications in accessing heterogeneous and dynamic data and retrieving relevant information from sources having possibly unknown formats and structures. Processing complex requests on such information sources can, thus, be costly, though not guaranteeing user satisfaction. Fur- thermore, dynamic contexts prevent substantial user involvement in the interpretation of the request. The paper envisions an innovative solution to process the above mentioned requests, limiting user involvement by ex- ploiting information on: (a) user context (geo-location, interests, needs); (b) data and processing quality; (c) similar requests repeated over time. By interpreting a request in a novel way by means of a Wearable Query (WQ), i.e., a query that captures the user and request specificities, we envision a methodological and technological solution for WQs in the presence of repeated information needs in distributed, heterogeneous, dynamic environments, with emphasis on the geo-spatial dimension and on data quality.

2012 - A Framework For Biological Data Normalization, Interoperability, and Mining for Cancer Microenvironment Analysis [Relazione in Atti di Convegno]
M., Ceci; M., Coluccia; F., Fumarola; P. H., Guzzi; Mandreoli, Federica; Martoglia, Riccardo; E., Masciari; M., Mecella; W., Penzo
abstract

Over the last decade, the advances in the high-throughput omic technologies have given the possibility to profile tumor cells at different levels, fostering the discovery of new biological data and the proliferation of a large number of bio-technological databases. In this paper we describe a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in a such huge quantity of data. In this framework, a key role is played by the Connectivity Map, a databank which relates diseases, physiological processes, and the action of drugs. The system will be used in a pilot study on the Multiple Myeloma (MM).

2012 - A query reformulation framework for P2P OLAP [Relazione in Atti di Convegno]
M., Golfarelli; Mandreoli, Federica; W., Penzo; S., Rizzi; E., Turricchia
abstract

The idea of collaborative business intelligence is to extend the decision-making process beyond the company boundaries thanks to cooperation and data sharing with other companies and organizations. In this direction, we propose a query reformulation framework based on a P2P network of heterogeneous peers, each exposing OLAP query answer- ing functionalities aimed at sharing business information. In our frame- work, an OLAP query expressed on a peer is reformulated on other peers by relying on a set of mappings between the multidimensional schemata of peers. In this extended abstract we sketch the user interaction scenario we envision and briefly discuss each phase of the reformulation process.

2012 - BIN: Business Intelligence Networks [Capitolo/Saggio]
M., Golfarelli; Mandreoli, Federica; S., Rizzi; W., Penzo; E., Turricchia
abstract

Cooperation is seen by companies as one of the major means for increasing flexibility and innovating. Business intelligence (BI) platforms are aimed at serving individual companies, and they cannot operate over networks of companies characterized by an organizational, lexical, and semantic heterogeneity. In this chapter we propose a framework, called Business Intelligence Network (BIN), for sharing BI functionalities over complex networks of companies that are chasing mutual advantages through the sharing of strategic information. A BIN is based on a network of peers, one for each company participating in the consortium. Peers are equipped with independent BI platforms that expose some querying functionalities aimed at sharing business information for the decision-making process. After proposing an architecture for a BIN, we outline the main research issues involved in its building and operating, and we focus on the definition of an ad hoc language for expressing semantic mappings between the multidimensional schemata owned by the different peers, aimed at enabling query reformulation over the network.

2012 - Efficient management of multi-version clinical guidelines [Articolo su rivista]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo
abstract

Clinical medicine and health-care developments in recent years testiﬁed a tremendous increase in the number of available guidelines, i.e., ‘‘best practices’’ encoding and standardizing care procedures for a given disease. Clinical guidelines are subject to continuous development and revision by committees of expert physicians and health authorities and, thus, multiple versions coexist as a consequence of the clinical and healthcare activities. Moreover, several alternatives are usually included in order to make the guidelines as general as possible, making them difﬁcult to handle both in manual and automated fashions. In this work, we will introduce techniques to model and to provide efﬁcient personalized access to very large collections of multi-version clinical guidelines, which can be stored both in textual and in executable format in an XML repository. In this way, multiple temporal perspectives, patient proﬁle and context information can be used by an automated personalization service to efﬁciently build on demand a guideline version tailored to a speciﬁc use case.

2012 - Evaluation of Data Reduction Techniques for Vehicle to Infrastructure Communication Saving Purposes [Relazione in Atti di Convegno]
L., Carafoli; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo
abstract

In this paper we investigate the employment of different data reduction techniques to minimize V2I communication in an Intelligent Transportation System (ITS). We consider the context of the PEGASUS Project, where vehicles are equipped with sensor-based devices able to compute and communicate to a Control Centre (CC) information like vehicleśs position and speed. The CC relies on a general-purpose data management module that supports the execution of continuous queries as well as standard SQL one-time queries on the collected data to provide various infomobility services. The paper explores two categories of data reduction techniques: independent techniques, where vehicles autonomously send data to the CC, and information-need techniques, where data is sent by taking into account additional data received from the CC. The paper discusses and implements the technical changes needed in the CC to support the required info-mobility services under the reduced availability of data. All the investigated techniques have been extensively evaluated in a variety of traffic scenarios.

2012 - Fast On-Line Summarization of RFID Probabilistic Data Streams [Relazione in Atti di Convegno]
R., Haider; Mandreoli, Federica; Martoglia, Riccardo; S., Sassatelli
abstract

Abstract. RFID applications usually rely on RFID deployments to manage high-level events. A fundamental relation for these purposes is the location of people and objects over time. However, the nature of RFID data streams is noisy, redundant and unreliable and thus streams of low-level tag-reads can be transformed into probabilistic data streams that can reach in practical cases the size of gigabytes in a day. In this paper, we propose a simple on-line summarization mechanism, which is able to provide small space representation for massive RFID probabilistic data streams while preserving the meaningful information. The main idea behind the proposed approach is to keep on aggregating tuples in an incremental way until a state transition is detected. Probabilistic tuples are processed as they arrive, hence avoiding the use of expensive offline disk based operations, and the output is stored in a probabilistic database in such a way that, as we also experimentally prove, a wide range of probabilistic queries can be applicable and answered effectively.

2012 - OLAP query reformulation in peer-to-peer data warehousing [Articolo su rivista]
M., Golfarelli; Mandreoli, Federica; W., Penzo; S., Rizzi; E., Turricchia
abstract

Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities, but traditional business intelligence systems do not provide support to this end. To fill this gap, in this paper we envision a peer-to-peer data warehousing architecture based on a network of heterogeneous peers, each exposing query answering functionalities aimed at sharing business information. To enhance the decision making process, an OLAP query expressed on a peer needs to be properly reformulated on the local multidimensional schemata of the other peers. To this end, we present a language for the definition of mappings between the multidimensional schemata of peers and we introduce a query reformulation framework that relies on the translation of mappings, queries, and multidimensional schemata onto the relational level. Then, we formalize a query reformulation algorithm and prove two properties: correctness and closure, that are essential in a peer-to-peer setting. Finally, we discuss the main implementation issues related to the reformulation setting proposed, with specific reference to the case in which the local multidimensional engines hosted by peers use the standard MDX language.

2012 - The IS-BioBank project: a framework for biological data normalization, interoperability, and mining for cancer microenvironment analysis [Articolo su rivista]
M., Ceci; P. H., Guzzi; E., Masciari; M., Coluccia; Mandreoli, Federica; M., Mecella; F., Fumarola; Martoglia, Riccardo; W., Penzo
abstract

Advances of high throughput technologies have yielded the possibility to investigate human cells of healthy and morbid ones at different levels. Consequently, this has made possible the discovery of new biological and biomedical data and the proliferation of a large number of databases. In this paper, we describe the IS-BioBank (Integrated Semantic Biological Data Bank) proposal. It consists of the realization of a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in such a huge quantity of data. In this framework, a key role has been played by the Connectivity Map, a databank which relates diseases, physiological processes, and the action of drugs. The system will be used in a pilot study on the Multiple Myeloma (MM).

2012 - Toward a Semantic Framework for the Querying, Mining and Visualization of Cancer Microenvironment Data [Relazione in Atti di Convegno]
M., Ceci; F., Fumarola; P. H., Guzzi; Mandreoli, Federica; Martoglia, Riccardo; E., Masciari; W., Penzo
abstract

Over the last decade, the advances in the high-throughput omic technologies have given the possibility to profile tumor cells at different levels, fostering the discovery of new biological data and the proliferation of a large number of bio-technological databases. In this paper we describe a framework for enabling the interoperability among different biological data sources and for ultimately supporting expert users in the complex process of extraction, navigation and visualization of the precious knowledge hidden in such a huge quantity of data. The system will be used in a pilot study on the Multiple Myeloma (MM).

2012 - Working in a dynamic environment: the NeP4B approach as a MAS [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Guerra, Francesco; Mandreoli, Federica; Vincini, Maurizio
abstract

Integration of heterogeneous information in the context of Internet is becoming a key activity to enable a more organized and semantically meaningful access to several kinds of information in the form of data sources, multimediadocuments and web services. In NeP4B (Networked Peers for Business), a project funded by the Italian Ministry of University and Research, we developed an approach for providing a uniform representation of data, multimedia and services,thus allowing users to obtain sets of data, multimedia documents and lists of webservices as query results. NeP4B is based on a P2P network of semantic peers, connected one with each other by means of automatically generated mappings.In this paper we present a new architecture for NeP4B, based on a Multi-Agent System.We claim that such a solution may be more efficient and effective, thanks to the agents’ autonomy and intelligence, in a dynamic environment, where sources are frequently added (or deleted) to (from) the network.

2011 - A Reasoning Engine for Intruders' Localization in Wide Open Areas using a Network of Cameras and RFIDs [Relazione in Atti di Convegno]
Cucchiara, Rita; Fornaciari, Michele; Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo; Prati, Andrea; Sassatelli, Simona
abstract

Wide open areas represent challenging scenarios forsurveillance systems, since sensory data can be affected bynoise, uncertainty, and distractors. Therefore, the tasks oflocalizing and identifying targets (e.g., people) in such environmentssuggest to go beyond the use of camera-only deployments.In this paper, we propose an innovative systemrelying on the joint use of cameras and RFIDs, allowing usto “map” RFID tags to people detected by cameras and,thus, highlighting potential intruders. To this end, sophisticatedfiltering techniques preserve the uncertainty of dataand overcome the heterogeneity of sensors, while an evidentialfusion architecture, based on Transferable Belief Model,combines the two sources of information and manages conflictbetween them. The conducted experimental evaluationshows very promising results.

2011 - A Unified Multimedia and Semantic Perspective for Data Retrieval in the Semantic Web [Articolo su rivista]
R., Lenzi; C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; W., Penzo; S., Sassatelli
abstract

In recent years, the emerging diffusion of peer-to-peer networks isgoing beyond the single-domain paradigm like, for instance, themonothematic file sharing one (e.g., Napster for music). Peers aremore and more heterogeneous data sources which need to share data with commercial, educational, and/or collaboration purposes, just tomention a few. Moreover, in current information processingapplications data can not be meaningfully searched by precise database queries that would return exact matches (e.g., when dealing with multimedia, proteomic, statistical data).In this paper we move a step towards multi-domain multi-type datasharing systems by introducing an advanced technologicalinfrastructure which enables users to meet these new emerging needs.A fundamental issue in this context is data heterogeneity, which ispervasive and intrinsically present both at intensional level where,due to peers‚Äô autonomy, different semantic descriptions of theavailable information are provided, and at extensional level, wheremultiple data types can coexist, also including content-basedsearchable data types such as multimedia data.Our proposal relies on a Peer Data Management Systems (PDMS) framework to present innovative network organization and query routing mechanisms which exploit both peers‚Äô data description and data content to achieve effective and efficient network management and data retrieval in such a context. The validity of our proposal isdemonstrated by an absolutely satisfactory experimental evaluation ona real setting.

2011 - Identification of Intruders in Groups of People using Cameras and RFIDs [Relazione in Atti di Convegno]
Cucchiara, Rita; Fornaciari, Michele; Haider, Razia; Mandreoli, Federica; Prati, Andrea
abstract

The identification of intruders in groups of people moving in wide open areas represents a challenging scenario where coordination between cameras can be certainly used but this solution is not enough. In this paper, we propose to go beyond pure vision-based approaches by integrating the use of distributed cameras with the RFID technology. To this end, we introduce a system that “maps” RFID tags to people detected by cameras by using sophisticated techniques to filter the singular modalities and an evidential fusion architecture, based on Transferable Belief Model, to combine the two sources of information and manage conflict between them. The conducted experimental evaluation shows very promising results, especially in treating groups of people.

2011 - Knowledge-based sense disambiguation (almost) for all structures [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo
abstract

Structural disambiguation is acknowledged as a very real and frequent problem for many semantic-aware applications. In this paper, we propose a unified answer to sense disambiguation on a large variety of structures both at data and metadata level such as relational schemas, XML data and schemas, taxonomies, and ontologies. Our knowledge-based approach achieves a general applicability by converting the input structures into a common format and by allowing users to tailor the extraction of the context to the specific application needs and structure characteristics. Flexibility is ensured by supporting the combination of different disambiguation methods together with different information extracted from different sources of knowledge. Further, we support both assisted and completely automatic semantic annotation tasks, while several novel feedback techniques allow us to improve the initial disambiguation results without necessarily requiring user intervention. An extensive evaluation of the obtained results shows the good effectiveness of the proposed solutions on a large variety of structure-based information and disambiguation requirements.

2010 - Data Management Issues for Intelligent Transportation Systems [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

In this paper we discuss the technical challenges of devising a Data Stream Management System (DSMS) in the intelligent transportation scenario considered in the PEGASUS project, where the final aim is to provide reliable and timely information to improve the safety and the efficiency of vehicles' and goods' flows.The system should collect and integrate the large amounts of geo-located stream items coming from On Board Units (OBUs) installedon vehicles, with the aim of producing real-time maps including traffic and Points Of Interest (POIs) information to be then distributed to OBUs. OBUs' smart navigation engines will exploit these maps to enhance mobility and provide user-targeted information.We propose a two-tiered GIS DSMS architecture where stream items are pulled from the source input stream, processed and stored in a result container to be further pulled by other operators. The system reduces the data acquisition costs by adopting communication-saving policies, supports ad-hoc strategies for reducing the storage management costs (lowering response times and memory consumption), and provides the required data access functionalities through an SQL-like query language enhanced with stream, event, spatial and temporal operators. OBU stream items are also exploited to detectEvents Of Interest (EOIs) such as jams and accidents and to support a collaborative mechanism for user-powered POI management and rating. EOIs and POIs are modeled through specific ontologies which allow for a flexible and extensible data management and guarantee data independence from the raw streams.

2010 - Leveraging Semantic Approximations in Heterogeneous XML Data Sharing Networks: The SUNRISE Approach. [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In recent years, the huge amount of data available from Internet information sources has focused much attention on the sharing of distributed information through P2P and, in line with the Semantic Web vision, through Peer Data Management Systems (PDMSs).On the other hand, XML is with no doubt the most popular datarepresentation and exchange format on the Web and more and more Internet applications are conforming to this de facto standard for data sharing. In this chapter we present SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration) for XML data sharing. SUNRISE is a complete PDMS infrastructure aiming at semantic interoperability in heterogeneous networks. Decentralized data sharing is supported by a set of autonomous peers which model their local data through schemas and which are locally connected through semantic mappings. SUNRISE leverages the semantic approximations originating from schemas' heterogeneity for an effective and efficient organization and exploration of the network. For these purposes, SUNRISE implements soft computing techniques which cluster peers in Semantic Overlay Networks according to their own contents, and promote the routing of queries towards the semantically best directions in the network.

2010 - Toward a Flexible Data Management Middleware for Wireless Sensor Networks [Relazione in Atti di Convegno]
Haider, Razia; Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; Tiberio, Paolo
abstract

In this paper we present the research activity we are carrying out in the "Mobile Semantic Self-Organizing Wireless Sensor Networks" Project at the Department of Information Engineering of the University of Modena and Reggio Emilia. In this context, the main aim of our research is to study solutions for the flexible querying of distributed data collected by heterogeneous devices providing measurement readings. To this end, we propose a middleware for wireless sensor networks which is able to autonomously configure the communication and the operations required to each device in order to reduce energy and temporal costs.

2010 - Toward an Effective and Efficient Query Processing in the NeP4B Project [Relazione in Atti di Convegno]
C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; S., Orlando; W., Penzo; Sassatelli, Simona; Tiberio, Paolo
abstract

In this paper we present our main current research activity in the Italian co-funded FIRB Project NeP4B (Networked Peers for Business). In particular, we provide an overview of our P2P query routing approach which combines semantics and multimedia aspects in order to make query processing effective and efficient.

2010 - Towards OLAP Query Reformulation in Peer-to-Peer Data Warehousing [Relazione in Atti di Convegno]
M., Golfarelli; Mandreoli, Federica; W., Penzo; S., Rizzi; E., Turricchia
abstract

Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities, respecting their own autonomy and heterogeneity. Traditional business intelligence systems, that were born to support stand-alone decision-making, do not provide support to this end. Peer Data Management Systems (PDMSs) have been proposed in the literature as architectures to support sharing of operational data across networks of peers while guaranteeing peers' autonomy, based on semantic mappings that mediate between the heterogeneous schemata exposed by peers. In line with the PDMS infrastructure, in this paper we envision a peer-to-peer data warehousing architecture based on a network of heterogeneous peers, each equipped with an independent data warehouse system, that expose query answering functionalities aimed at sharing business information. To enhance the decision making process, an OLAP query expressed on a peer needs be properly reformulated on the other peers. In this direction, we present a language for the definition of mappings between the multidimensional schemata of peers, and we introduce a query reformulation framework that relies on the translation of these mappings towards relational schemata. Finally, we sketch the query reformulation algorithm by outlining the reformulation steps of typical OLAP queries.

2009 - Combining Semantic and Multimedia Query Routing Techniques for Unified Data Retrieval in a PDMS [Relazione in Atti di Convegno]
C., Gennaro; Mandreoli, Federica; Martoglia, Riccardo; M., Mordacchini; W., Penzo; Sassatelli, Simona
abstract

The NeP4B project aims at the development of an advancedtechnological infrastructure for data sharing in a network of business partners. In this paper we leverage our distinct experiences on semantic and multimedia query routing, and propose an innovative mechanism for an effective and efficient unified data retrieval of both semantic and multimedia data in the context of the NeP4B project.

2009 - Data-Sharing P2P Networks with Semantic Approximation Capabilities [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

The synergy between Peer-to-Peer systems and Semantic Web technologies has paved the way for large-scale sharing of semantically rich data, usually represented through schemas like, for instance, RDF or ontologies.Because of the lack of common understanding of the vocabulary used by peers, the resulting heterogeneity of data representations opens new challenges as to the efficient and effective retrieval of relevant information.In this paper, as opposed to viewing semantic misalignment as a limit for interoperability, we leverage on the presence of semantic approximations between the peers' schemas as a means for giving effective hints along two directions: 1) for query routing purposes, to identify the peers which best satisfy the user's requests, and 2) for making users aware of the relevance of the returned answers through a ranking mechanism which promotes the most semantically related results.

2009 - Flexible Query Answering on Graph-modeled Data [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Villani, Giorgio
abstract

The largeness and the heterogeneity of most graph-modeleddatasets in several database application areas make the queryprocess a real challenge because of the lack of a completeknowledge of the vocabulary used, as well as of the informationabout the structural relationships between the data.To overcome these problems, flexible query answering capabilitiesare an essential need. In this paper we present a general model for supporting approximate queries on graphmodeled data. Approximation is both on the vocabularies and the structure. The model is general in that it is not bound to a specific graph data model, rather it gracefully accommodates labeled directed/undirected data graphs with labeled/unlabeled edges. The query answering principles underlying the model are not compelled to a specific data graph, instead they are founded on properties inferable from the data model the data graph conforms to. We complement the work with a ranking model to deal with data approximations and with an efficient top-k retrieval algorithm which smartly accesses ad-hoc data structures andgenerates the most promising answers in an order correlatedwith the ranking measures. Experimental results prove thegood effectiveness and efficiency of our proposal on differentreal world datasets.

2009 - Issues in Personalized Access to Multiversion XML Documents [Capitolo/Saggio]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo
abstract

In several application fields including legal and medical domains, XML documents are “versioned” along different dimensions of interest, whose nature depends on the application needs such as time, space and security. Specifically, temporal and semantic versioning is particularly demanding in a broad range of application domains where temporal versioning can be used to maintain histories of the underlying resources along various time dimensions, and semantic versioning can then be used to model limited applicability of resources to individual cases or contexts. The selection and reconstruction of the version(s) of interest for a user means the retrieval of those fragments of documents that match both the implicit and explicit user needs, which can be formalized as what we call personalization queries. In this chapter, we focus on the design and implementation issues of a personalization query processor. We consider different design options and, among them, we introduce an in-depth studyof a native solution by showing, also through experimental evaluation, how some of the best performing technological solutions available today for XML data management can be successfully extended and optimally combined in order to support personalization queries.

2009 - Native Temporal Slicing Support for XML Databases [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Ronchetti, Enrico
abstract

XML databases, providing structural query-ing support, are becoming more and more popular. As weknow, XML data may change over time and providing ane±cient support to queries which also involve temporalaspects is still an open issue. In this paper we presentour native Temporal XML Query Processor, which ex-ploits an ad-hoc temporal indexing scheme relying on re-lational approaches and a technology supporting temporalslicing. As we show through an extensive experimentalevaluation, our solution achieves good e±ciency results,outperforming stratum-based solutions when dealing withtime-related application requirements while continuing toguarantee good performance in traditional scenarios.

2009 - Paving the Way to an Effective and Efficient Retrieval of Data over Semantic Overlay Networks [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In a Peer-to-Peer (P2P) system, a Semantic Overlay Network (SON) models a network of peers whose connections are influenced by the peers’ content, so that semantically related peers connect with each other. This is very common in P2P communities, where peers share common interests, and a peer can belong to more than one SON, depending on its own interests. Querying such a kind of systems is not an easy task: The retrieval of relevant data can not rely on flooding approaches which forward a query to the overall network. A way of selecting which peers are more likely to provide relevant answers is necessary to support more efficient and effective query processing strategies. This chapter presents a semantic infrastructure for routing queries effectively in a network of SONs. Peers are semantically rich, in that peers’ content is modelled with a schema on their local data, and peers are related each other through semantic mappings defined between their own schemas. A query is routed through the network by means of a sequence of reformulations, according to the semantic mappings encountered in the routing path. As reformulations may lead to semantic approximations, we define a fully distributed indexing mechanism which summarizes the semantics underlying whole subnetworks, in order to be able to locate the semantically best directions to forward a query to. In support of our proposal, we demonstrate through a rich set of experiments that our routing mechanism overtakes algorithms which are usually limited to the only knowledge of the peers directly connected to the querying peer, and that our approach is particularly successful in a SONs scenario.

2009 - Principles of Holism for Sequential Twig Pattern Matching [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

Modern applications face the challenge of dealing with structured and semi-structured data. They have to deal with complex objects, most of them presenting some kind of internal structure, which often forms a hierarchy. Though XML documents are the most known, chemical compounds, CAD drawings, web-sites and many other applications have to deal with similar problems. In such environments, ordered and unordered tree pattern matching are the fundamental search operations. One of the main thrusts of research activities for tree pattern matching is the class of holistic approaches. Their ultimate goal is to evaluate a query twig as a whole by relying on sequential access patterns and non trivial auxiliary storage structures, typically stored in main memory. Based on the pre/post-order ranks of individual tree nodes, we establish strong theoretical bases as a foundation for correct and efficient holistic pattern matching algorithms. In particular, we define and prove sufficient and necessary conditions to minimize the amount of data retained in memory, thus introducing a correct and complete framework on which different holistic solutions can be compared. We also show how these rules can be applied for building algorithms for ordered and unordered tree-pattern matching. Thanks to the above theoretical achievements, each holistic algorithm gains in efficiency as it is directly implemented on the adopted numbering scheme, avoids expensive matching refinements and keeps memory requirements stable. An experimental analysis and comparison with previous approaches confirms the superiority of our approach tested on synthetic as well as real-life data sets.

2009 - Semantics-driven Approximate Query Answering on Graph Databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Villani, Giorgio
abstract

Several database application areas need to deal with graph-modeled datasets. The main features of these datasets are the largeness and the heterogeneity of the data, which make it impractical to answer exact queries. In this paper we present our recent research efforts in modeling flexible query answering capabilities in this context. Flexibility is captured by approximations both on the labels and on the structureof graph-based queries, by guaranteeing semantically meaningful relaxations only. In order to cope with the excess of results, we adapt a well-known top-k retrieval algorithm to our context. The good effectiveness and efficiency of our proposal are proved by an extensive experimental evaluation on different real world datasets.

2008 - Boosting a Network of Semantic Peers [Relazione in Atti di Convegno]
S., Lodi; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

InaPeerDataManagementSystem(PDMS),semanticpeers connect with each other through semantic mappings between their own schemas. Because of schema heterogeneity, due to peers’ autonomy as for data representation, querying a PDMS implies query reformulations across semantic mappings, possibly incurring in a semantic degradation due to the reiterated approximations given by the traversal of long paths. The linkage closeness of semantically similar peers is thus a crucial issue. In this paper we present a strategy for the incremental maintenance of a flexible network organization for PDMSs that clusters together semanti- cally related peers.

2008 - Building a PDMS Infrastructure for XML Data Sharing with SUNRISE [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

Semantic support for data representation as well as a flexible machine-readable format have made XML the de facto standard for Internet applications semantic interoperability. Its applicability is primarily evident in realities where actors are heterogeneous data sources which interact each other for data sharing purposes. This is exactly the scenario envisioned by Peer Data Management Systems (PDMSs), where autonomous sources (peers) model their local data according to a schema, and are connected in a peer-to-peer network by means of pairwise semantic mappings between the peers' own schemas. One of the main challenges in such a semantically heterogeneous environment is concerned with query processing when dealing with the inherent semantic approximations occurring in the data. In this paper we present an instantiation of SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration) for XML data sources. SUNRISE is a complete PDMS infrastructure which extends each peer with functionalities for capturing the semantic approximation originating from schema heterogeneity and exploiting it for a semantically driven network organization and query routing.

2008 - Efficient and Effective Query Answering in a PDMS with SUNRISE [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

Peer Data Management Systems (PDMSs) have been re- cently proposed as an evolution of Peer-To-Peer (P2P) systems toward a more semantics-based description of peers’ contents and relationships. In a PDMS scenario a key challenge is query routing, i.e. the capability of selecting small subsets of semantically relevant peers to forward a query to. In this paper we demonstrate SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration), a complete infrastructure which supports an effective and efficient exploration of a PDMS network for query answering purposes. SUNRISE offers several routing policies designed around different performance priorities in order to minimize the information spanning over the network.

2008 - Ontology-Based Personalization of E-Government Services [Capitolo/Saggio]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo; Ronchetti, Enrico; M. R., Scalas; Tiberio, Paolo
abstract

While the World Wide Web user is suffering form the disease caused by information overload, for which personalization is one of the treatments which work, the citizen who gets ready to use the e-Government services which are made available on the Web is not immune from contagion. This seems a good reason to try to prescribe a personalization treatment also to the e-Government user. Hence, we introduce the design and implementation of Web information systems supporting personalized access to multi-version resources in an e-Government scenario. Personalization is supported by means of Semantic Web techniques and relies on an ontology-based profiling of users (citizens). Resources we consider are collections of norm documents (laws, decrees, regulations, etc.) in XML format but can also be generic Web pages and portals or e-Government transactional services. We introduce a reference infrastructure, describe the organization and present performance figures of a prototype system we have developed.

2008 - Semantic Peer, Here are the Neighbors You Want! [Relazione in Atti di Convegno]
S., Lodi; Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

Peer Data Management Systems (PDMSs) have been introduced as a solution to the problem of large-scale sharing of semantically rich data. A PDMS consists of semantic peers connected through semantic mappings. Querying a PDMS may lead to very poor results, because of the semantic degradation due to the approximations given by the traversal of the semantic mappings, thus leading to the problem of how to boost a network of mappings in a PDMS.In this paper we propose a strategy for the incremental maintenance of a flexible network organization that clusters together peers which are semantically related in Semantic Overlay Networks (SONs), while maintaining a high degree of node autonomy. Semantic features, a summarized repre- sentation of clusters, are stored in a “light” structure which effectively assists a newly entering peer when choosing its se- mantically closest overlay networks. Then, each peer is sup- ported in the selection of its own neighbors within each overlay network according to two policies: Range-based selection and k-NN selection. For both policies, we introduce specific algorithms which exploit a distributed indexing mechanism for efficient network navigation. The proposed approach has been implemented in a prototype where its effectiveness and efficiency have been extensively tested.

2007 - A P2P-based Architecture for Semantic Web Service Automatic Composition [Relazione in Atti di Convegno]
Mandreoli, Federica; Penzo, W; Perdichizzi, A. M.
abstract

The problem of efficiently evaluating XPath and XQuery queries has become increasingly significant since more and more XML data is stored in its native form. We propose a novel optimisation technique for XML queries that is based on the semantic properties exhibited by XML data. In sharp contrast to previous studies on selectivity estimation we propose to specify bounds on the number of element nodes in an XML tree that form the root of isomorphic subtrees. It turns out that efficient reasoning about these constraints provides effective means to predict the number of XPath and XQuery query answers, to predict the number of updates using the XQuery update facility, to predict the number of en(de)cryptions using XML encryption, and to optimise XML queries.

2007 - Disambiguation of Structure-Based Information in the STRIDER System [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

We present the current version of STRIDER, a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web directories. It can be of support to the semantic-awareness of a wide range of applications, thanks to its novel and fully-automated disambiguation algorithms.

2007 - Efficient Management of Multi-version XML Documents for e-Government Applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; F., Grandi; M. R., Scalas
abstract

This paper describes our research activities in developing efficient systems for the management of multi- version XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (publication, validity, efficacy and transaction times) were used to represent the evolution of norms in time and their resulting versioning and a stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version management system has migrated to a different architecture (“native” approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the citizen’s ontology, which could also require a redesign of the document storage scheme, and the development of a complete infrastructure for the management of the citizen’s digital identity.

2007 - Native Temporal Slicing Support for XML Databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

XML databases, providing structural querying support, are becoming more and more popular. As we know, XML data may change over time and providing an efficient support to queries which also involve temporal aspects is still an open issue. In this paper we present our native Temporal XML Query Processor, which exploits an ad-hoc temporal indexing scheme relying on relational approaches and a technology supporting temporal slicing. As we show through an extensive experimental evaluation, our solution achieves good efficiency results, outperforming stratum-based solutions when dealing with time-related application requirements while continuing to guarantee good performance in traditional scenarios.

2007 - SRI@work: Efficient and Effective Routing Strategies in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

In recent years, information sharing has gained much benefit by the large diffusion of distributed computing, namely through P2P systems and, in line with the Semantic Web vision, through Peer Data Management Systems (PDMSs). In a PDMS scenario one of the most difficult challenges is query routing, i.e. the capability of selecting small subsets of semantically relevant peers to forward a query to. In this paper, we put the Semantic Routing Index (SRI) distributed mechanism we proposed in [6] at work. In particular, we present general SRI-based query execution models, designed around different performance priorities and minimizing the information spanning over the network. Starting from these models, we devise several SRI-enabled routing policies, characterized by different effectiveness and efficiency targets, and we deeply test them in ad-hoc PDMS simulation environments.

2007 - SUNRISE: Exploring PDMS Networks with Semantic Routing Indexes [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona; Villani, Giorgio
abstract

We demonstrate SUNRISE (System for Unified Network Routing, Indexing and Semantic Exploration), a complete infrastructure supporting the construction of a PDMS semantic layer and providing a series of techniques that can be used for an effective and efficient exploration of a semantic network, for instance in a query answering setting.

2007 - Semantic Routing for Effective Search in Heterogeneous and Distributed Digital Libraries [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; W., Penzo; Sassatelli, Simona
abstract

Next generation Digital Libraries (DLs) will offer an entire ensemble of systems and services designed to help users to easily find and access the information they are looking for. However, much work is still required in order to achieve this vision. In this paper, we concentrate our attention on devising techniques allowing an effective routing of queries, which we think can be of the utmost importance in providing effective and efficient querying in heterogeneous and distributed DLs, identifying the best ways to navigate the available nodes and, thus, the documents (or their parts) which are most suitable to best answer the user needs. We describe a routing mechanism, which we call routing by mapping, in which the query is sent to the DL peers whose subnetworks best approximate the concepts required. To this end a distributed index mechanism is adopted, which we call Semantic Routing Index (SRI). We also present some exploratory experiments showing the effectiveness of the proposed approach.

2007 - Semantic Web Service Composition in the NeP4B Project: Challenges and Architectural Issues [Relazione in Atti di Convegno]
Mandreoli, Federica; W., Penzo; A. M., Perdichizzi
abstract

SemanticWeb service discovery and composition frameworksproposed so far assume for the most part a centralized registry that holds information of all the Web services available at any given time. This solution does not well cope with the scalability and flexibility requirements of dynamic, fast changing contexts. As part of the NeP4B project, in this paper we propose an alternative peer to peer architecture based on the Goal concept.

2006 - A Native Extensible XML Query Processor Towards Efficient and Effective MPEG-7 Querying [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; M., Righini
abstract

In recent years the production of massive amounts of visual information has led to the arrival of very large multimedia Digital Libraries (DLs). The key to support efficient search and management operations in such repositories is to exploit metadata information for digital media, such as MPEG-7 based ones, which seem to be the most widely accepted. The underlying XML syntax, together with the high versatility of the provided constructs, make it easy to specify significant and complex queries, however executing them efficiently on huge quantities of data is not a trivial task. In this paper we provide an overview of the XSiter system, a native and extensible XML query processor providing very high performance in general XML querying settings and whose flexible architecture can be easily enhanced to better support the peculiarities of retrieving multimedia objects through MPEG-7 annotation metadata. Further, we consider possible "use-cases" and tasks related to multimedia and video DLs querying and management which our system can successfully accomplish.

2006 - An eGovernment system for temporal- and semantic-aware access to norms [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present the results of an ongoing research involving the design and implementation, in an eGovernment scenario, of a semantic-aware system supporting efficient and personalized access to a multi-version repository of norma- tive texts. The research activity is entitled “Semantic web techniques for the management of digital identity and the ac- cess to norms”. In the context of a complete and modular in- frastructure, we defined a multi-version XML data model and developed a temporal and semantical XML query processor supporting both temporal versioning –essential in normative systems– and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applica- ble to their personal case. The whole infrastructure, which we plan to complete in the near future, will integrate the query- ing component with several auxiliary services, including au- tomatic citizen identification and classification and assisted update of the repository data

2006 - EXTRA: a system for example-based translation assistance [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Nowadays we are witnessing the need to translate ever increasing quantities of texts, with an ever increasing quality. The expertise and skill of professional translators is not alone entirely sufficient in order to achieve highly effective and efficient translation performance. The best way to translate very large quantities of documents, while ensuring optimal translation time and costs, is to exploit Example-Based Machine Translation (EBMT), which is devised in the aim of achieving better quality and quantity in less time, while preserving and treasuring the richness and accuracy that only human translation can achieve. In this paper we present EXTRA (EXample-based TRanslation Assistant), the EBMT system we have developed over the last few years to support the translation of texts written in Western languages. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, language-independent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. Furthermore, EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System (DBMS). In the paper we also provide a thorough evaluation of both the effectiveness and the e±ciency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, process-oriented, discrete-event models.

2006 - SRI: Exploiting Semantic Information for Effective Query Routing in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; W., Penzo
abstract

The huge amount of data available from Internet information sources has focused much attention on the sharing of distributed information through Peer Data Management Systems (PDMSs). In a PDMS, peers have a schema on their local data, and they are related each other through semantic mappings that can be defined between their own schemas. Querying a PDMS means either flooding the network with messages to all peers or take advantage of a routing mechanism to reformulate a query only on the best peers selected according to some given criteria. As reformulations may lead to semantic approximations, we deem that such approximations can be exploited for locating the semantically best directions to forward a query to. In this paper, we propose a distributed index mechanism where each peer is provided with a Semantic Routing Index (SRI) for routing queries effectively. A fuzzy-oriented model for SRI is presented where operations for creating and maintaining SRIs are well-founded. In addition, we show how SRIs can be employed in the query processing phase with the aim of reducing the space of reformulations. Finally, we conduct a series of meaningful experiments showing the effectiveness of the proposed approach.

2006 - STRIDER: a Versatile System for Structural Disambiguation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

We present STRIDER, a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web directories. The system performs high-quality fully-automated disambiguation by exploiting a novel and versatile structural disambiguation approach.

2006 - Semantic Query Routing Experiences in a PDMS [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; W., Penzo
abstract

Querying a PDMS means either flooding the network with messages to all peers or taking advantage of a routing mechanism to reformulate a query only on the best peers selected according to some given criteria. As reformulations may lead to semantic approximations, we deem that such approximations can be exploited for locating the semantically best directions to forward a query to. In this paper, we present our experiences in devising and testing a mechanism for effective query routing in a PDMS. In particular, we describe a distributed index mechanism where each peer is provided with a Semantic Routing Index (SRI) for routing queries effectively. We illustrate SRIs’ structure, their use and the framework we devised for their incremental update, then we provide an extensive evaluation of their effectiveness through a set of query routing experiments on a variety of scenarios. This work is partially supported by the PRIN WISDOM and FIRB NeP4B national projects.

2006 - Semantic Web Techniques for Personalization of eGovernment Services [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present the results of an ongoing research involving the design and implementation of systems supporting personalized access to multi-version resources in an eGovernment scenario. Personalization is supported by means of Semantic Web techniques and is based on an ontology-based profiling of users (citizens). Resources we consider are collections of norm documents in XML format but can also be generic Web pages and portals or eGovernment services. We introduce a reference infrastructure, describe the organization and present performance figures of a prototype system we have developed.

2006 - Supporting temporal slicing in XML databases [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

Nowadays XML is universally accepted as the standard for structural data representation; XML databases, providing structural querying support, are thus becoming more and more popular. However, XML data changes over time and the task of providing efficient support to queries which also involve temporal aspects goes through the tricky task of time-slicing the input data. In this paper we take up the challenge of providing a native and efficient solution in constructing an XML query processor supporting temporal slicing, thus dealing with non-conventional application requirements while continuing to guarantee good performance in traditional scenarios. Our contributions include a novel temporal indexing scheme relying on relational approaches and a technology supporting the time-slice operator.

2006 - Using Semantic Mappings for Query Routing in a PDMS Environment [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Sassatelli, Simona; Tiberio, Paolo; W., Penzo
abstract

In this paper we present the current achievement of our research activity in the WISDOM project, whose aim is the definition of intelligent techniques enabling e®ective and e±cient information search in a distributed and decentralized PDMS scenario. We focus on the query routing problem and we define a new routing mechanism, which we call routing by mapping, in which the query is sent to the peers whose subnetworks best approximate the concepts required. In order to select the best subnetworks, the peer receiving the query exploits information about the semantic approximation of the query concepts, when moving towards each neighbour. This information is computed starting from the semantic mappings established with the peer's neighbours and it is maintained into specifically devised data structures called Semantic Routing Indices (SRIs), whose update we propose specific algorithms and protocols for. The effectiveness of the achieved results has been experimentally proved through a series of exploratory tests.

2005 - Accesso Personalizzato a Documenti Multiversione per Applicazioni nel Settore dell’E-Government [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; E., Ronchetti; F., Grandi; M. R., Scalas
abstract

In questo lavoro viene presentata l’attività di ricerca concernente la realizzazione di sistemi prototipali per la gestione efficiente di documenti XML multiversione in uno scenario di e-Government. Lo scopo applicativo di tali sistemi è di permettere al cittadino l’accesso a versioni personalizzate di risorse quali testi normativi e informazioni rese disponibili sul WEB dalle Pubbliche Amministrazioni. Per rappresentare l’evoluzione delle norme nel tempo e il conseguente “versionamento” si sono usate quattro dimensioni temporali e un’ulteriore dimensione semantica per rappresentare l’applicabilità delle norme a differenti classi di cittadini, in accordo alla loro identità digitale. La classificazione dei cittadini è basata sulla gestione di un’ontologia e l’adozione di tecniche di Semantic WEB. L’attuale implementazione, evoluzione di un approccio di tipo “stratum” (sviluppato on top di una piattaforma RDBMS), è basata su un approccio “nativo” consistente in un query processor XML sviluppato ad-hoc. Una sperimentazione preliminare ha evidenziato nel nuovo sistema buoni livelli di prestazioni e scalabilità.

2005 - Efficient Management of Multi-Version XML Documents for eGovernment Applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; F., Grandi; M. R., Scalas
abstract

This paper describes our research activities in developing efficient systems for the management of multiversion XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (publication, validity, efficacy and transaction times) were used to represent the evolution of norms in time and their resulting versioning and a stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version management system has migrated to a different architecture ("native" approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the citizen’s ontology, which could also require a redesign of the document storage scheme, and the development of a complete infrastructure for the management of the citizen’s digital identity.

2005 - Enhanced access to eGovernment services: temporal and semantics-aware retrieval of norms [Relazione in Atti di Convegno]
F., Grandi; Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti; M. R., Scalas; Tiberio, Paolo
abstract

In this paper, we summarize the results of an ongoing research involving the design and implementation of a multi-version repository of norm texts supporting efficient and personalized access in an eGovernment scenario. The research activity is entitled "Semantic web techniques for the management of digital identity and the access to norms". In the context of a complete and modular infrastructure, we defined a multiversion XML data model and developed an XML query processor supporting both temporal and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applicable to their personal case. The whole infrastructure, which we plan to complete in the near future, will integrate the query answering component with several auxiliary services, including automatic citizen identification and classification and computer-aided update of the repository data.

2005 - Improving Semantic Awareness of Knowledge-based Applications through Structural Disambiguation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

In this paper, we summarize the features of the versatile disambiguation approach we recentlty presented. Its main aim is to make explicit the meaning of structure-based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic In this paper, we summarize the features of the versatile disambiguation approach we recentlty presented. Its main aim is to make explicit the meaning of structure-based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic annotation of web pages and query expansion. The effectiveness of the achieved results has been experimentally proved and is founded both on a flexible exploitation of the structure context, whose extraction can be tailored on the specific application needs, and of the information provided by commonly available thesauri such as WordNet. This work is partially supported by the Italian Council co-funded project WISDOM.

2005 - Personalized Access to Multi-Version Documents for E-Government Applications [Relazione in Atti di Convegno]
F., Grandi; M. R., Scalas; Mandreoli, Federica; Martoglia, Riccardo
abstract

In this paper we describe the design and implementation of two prototype systems for the efficient management of multi-version XML documents in an e-Government scenario. The application aim is to enable citizens to access personalized versions of resources, like norm texts and information made available on the Web by public administrations. In the first system developed, four temporal dimensions (validity, efficacy, transaction and publication times) were used to represent the evolution of norms in time and their resulting versioning and a “stratum” approach was used for its implementation on top of an object-relational DBMS. Recently, the multi-version management system has migrated to a different architecture (“native” approach) based on a multi-version XML query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning mechanism, in order to represent applicability of norms to different classes of citizens according to their digital identity. Classification of citizens is based on the management of an ontology with the deployment of semantic Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to the “stratum” approach and a good scalability behavior. This work has been supported by the MIUR-PRIN Project: “The European citizen in e-Governance: philosophical-juridical, legal, information and economic profiles”.

2005 - Personalized Access to Multi-version Norm Texts in an eGovernment Scenario [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Tiberio; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

In this paper, we present some results of an ongoing research involving the design and implementation, in an eGovernment scenario, of a multiversion repository of norm texts supporting efficient and personalized access. In particular we defined a multi-version XML data model supporting both temporal versioning –essential in normative systems– and semantic versioning. Semantic versioning is based on the applicability of different norm parts to different classes of citizens and allows users to retrieve personalized norm versions only containing provisions which are applicable to their personal case. We describe the organization and present preliminary performance figures of a prototype system we developed.

2005 - Personalized access to multi-version XML documents in an eGovernment scenario [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo; F., Grandi; M. R., Scalas; E., Ronchetti
abstract

2005 - Temporal modelling and management of normative documents in XML format [Articolo su rivista]
F., Grandi; Mandreoli, Federica; Tiberio, Paolo
abstract

In this paper, we present the results of a research project concerning the temporal management of normative texts in XML format. In particular, four temporal dimensions (publication, validity, efficacy and transaction times) are used to correctly represent the evolution of norms in time and their resulting versioning. Hence, we introduce a multiversion data model based on XML schema and define basic mechanisms for the maintenance and retrieval of multiversion norm texts. Finally, we describe a prototype management system which has been implemented and evaluated.

2005 - Text Clustering as a Mining Task [Capitolo/Saggio]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

In this chapter we introduce readers to the various aspects of cluster analysis performed on textual data in a mining framework. We first provide a brief overview on the techniques and the background notions on general clustering. Then, we focus on the importance and on the goals of clustering in a text mining scenario, analyzing and describing the issues which are specific to this particular field. Effective information extraction from highly dimensional textual data, clustering algorithms specifically designed to efficiently work on very large unstructured and, possibly, hyperlinked data sets, and comprehension of the clustering output are among the covered topics.

2005 - Versatile structural disambiguation for semantic-aware applications [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; E., Ronchetti
abstract

In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document structures, web directories, and ontologies. It can be of support to the semantic-awareness of a wide range of applications, from schema matching and query rewriting to peer data management systems, from XML data clustering to ontology-based automatic annotation of web pages and query expansion. The effectiveness of the achieved results has been experimentally proved and is founded both on a flexible exploitation of the structure context, whose extraction can be tailored on the specific application needs, and of the information provided by commonly available thesauri such as WordNet.

2004 - A Document Comparison Scheme for Secure Duplicate Detection [Articolo su rivista]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

The ever-growing amounts of textual information coming from different sources have fostered the development of digital libraries, making digital contents readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, how much a document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property, while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing new data reduction techniques.

2004 - Approximate Query Answering for a Heterogeneous XML Document Base [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

In this paper, we deal with the problem of effective search and query answering in heterogeneous web document bases containing documents in XML format of which the schemas are available. We propose a new solution for the structural approximation of the submitted queries which, in a preliminary schema matching process, is able to automatically identify the similarities between the involved schemas and to use them in the query processing phase, when a query written on a source schema is automatically rewritten in order to be compatible with the other useful XML documents. The proposed approach has been implemented in a web service and can deliver middleware rewriting services in any open-architecture XML repository system offering advanced search capabilities.

2004 - Exploiting related digital library corpora with query rewriting [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo
abstract

In this paper, we present the preliminary results of the ongoing research activity we are carrying out in the context of approximate XML query answering when the schemas of the XML documents are available. The method we propose involves a preliminary schema matching process, which automatically identifies the semantic and structural similarities between the schema elements to be used in the subsequent operation of query rewriting, in which a query written on a source schema is automatically rewritten in order to be compatible with the other useful XML documents. The proposed approach has been implemented in a web service, named XML S3MART, which is part of the open architecture proposed in the ongoing Italian CNR co-funded ECD Project.

2004 - Management of the Citizen's Digital Identity and Access to Multi-version Norm Texts on the Semantic Web [Relazione in Atti di Convegno]
Mandreoli, Federica; Tiberio, Paolo; F., Grandi; M. R., Scalas
abstract

This paper describes an ongoing research project involving the implementation of e-Government services on the Semantic Web. In particular, the project is aimed at managing the “digital identity” of citizens on the Internet, enabling them to benefit from “personalized” versions of the online services offeredby the Public Administration, which can improve and optimize their involvement in the e-Governance process. The kind of service we will consider is the selective access to norm texts available on Web repositories. The project requires the definitionand maintenance of a citizen’s ontology, the semantic markup and versioning of the stored norm texts which takes into account the actual applicability to different classes of citizens, the definition and enactment of Web services for the reconstruction of the citizen’s digital identity and its classification with respectto the ontology, the design and implementation of a legal document management system for the selective access to personalized norm versions.

2004 - Tree Signatures and Unordered XML Pattern Matching [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient algorithm is proposed and its implementation tested on real-life data collections.

2004 - Unordered XML Pattern Matching with Tree Signatures [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; P., Zezula
abstract

We propose an efficient approach for finding relevant XML data twigs defined by unordered query tree specifications. We use the tree signatures as the index structure and find qualifying patterns through integration of structurally consistent query path qualifications. An efficient technique is proposed and its implementation tested on real-life data collections.

2003 - A Formal Model for Temporal Schema Versioning in Object-Oriented Databases [Articolo su rivista]
Mandreoli, Federica; Grandi, F.
abstract

In this paper we present a formal model for the support of temporal schema versions in object-oriented databases. Its definition is partially based on a generic (ODMG compatible)object model and partially introduces new concepts. The proposed model supports all the schema changes which are usually considered in the OODB literature, for which an operational semantics and a formal analysis of their correct behaviour is provided. Semantic issues arising from the introduction of temporal schema versioning in a conventional or temporal database (concerning the interaction between the intensional and extensional levels of versioning and the management of data in the presence of multiple schema versions) are also considered.

2003 - A temporal data model and management system for normative texts in XML format [Relazione in Atti di Convegno]
Mandreoli, Federica; Grandi, F; Bergonzini, M; Tiberio, Paolo
abstract

In this paper, we present the results of an on-going researchactivity concerning the temporal management of normativetexts in XML format. In particular, four temporal dimen-sions (publication, validity, e±cacy and transaction times)are used to correctly represent the evolution of norms intime and their resulting versioning. Hence, we introduce amultiversion data model based on XML schema and de¯nebasic mechanisms for the management of norm texts. Fi-nally, we describe a prototype management system whichhas been implemented and evaluated.

2003 - A temporal data model and system architecture for the management of normative texts [Relazione in Atti di Convegno]
Mandreoli, Federica; Tiberio, Paolo; F., Grandi; M., Bergonzini
abstract

In this paper, we present the preliminary results of an ongoingresearch activity concerning the temporal management of normative texts in XML format. In particular, four temporal dimensions (publication, validity, efficacy and transaction times) are used to correctly represent the evolution of norms in time and their resulting versioning. Hence, we introduce a multiversion data model based on XML schema and define three basic operators for the management of norm texts. Finally, we describe the architecture of a management system prototype which is being implemented.

2003 - Description Logics for Modeling Dynamic Information [Capitolo/Saggio]
Mandreoli, Federica; Franconi, E; Artale, A.
abstract

This chapter presents a complete formal characterization of the semantics of supporting temporal aspects in DBMS both on data and on schemata

2003 - Exploiting multi-lingual text potentialities in EBMT systems [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Translating documents from a source to a target language is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream. Among the several types of approaches in Machine Translation (MT), one of the most promising paradigms is Example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar source-language material into the target language. In this paper we introduce EXTRA (EXample-based TRanslation Assistant), a complete EBMT system that exploits some innovative ideas in information retrieval and multilingual text management to effectively and efficiently extract useful suggestions from past translations and present them to the translator. This work has been developed as a joint work with the LOGOS group, a worldwide leader in multilingual document translation.

2003 - Un Metodo per il Riconoscimento di Duplicati in Collezioni di Documenti [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

I recenti avanzamenti nella potenza di calcolo e nelle telecomunicazioni hanno creato le giuste condizioni per la diffusione globale di enormi moli di informazioni elettroniche e di nuovi strumenti per l’analisi del loro contenuto, sollevando problemi di information overload e, in particolare, di duplicate detection. I duplicati, cioe' documenti molto simili che contengono approssimativamente le stesse informazioni, degradano l’efficacia e l’efficienza delle ricerche e, spesso, costituiscono anche violazioni di copyright. In questo articolo introduciamo DANCER (Document ANalysis and Comparison ExpeRt), un sistema completo di duplicate detection che sfrutta idee innovative nell’ambito dell’information retrieval per l’identificazione dei documenti duplicati, utilizzando algoritmi e misure di similarita' inedite in questo campo e sufficientemente fini da ottenere una buona efficacia nella maggior parte delle applicazioni. Inoltre, il sistema propone diverse nuove tecniche di data reduction che permettono di ridurre sia il tempo di esecuzione che lo spazio richiesto per la memorizzazione dei dati, senza compromettere la buona qualita' dei risultati.

2002 - A syntactic approach for searching similarities within sentences [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Textual data is the main electronic form of knowledge representation. Sentences, meant as logic units of meaningful word sequences, can be considered its backbone. In this paper, we propose a solution based on a purely syntactic approach for searching similarities within sentences, named approximate sub2sequence matching. This process being very time consuming, efficiency in retrieving the most similar parts available in large repositories of textual data is ensured by making use of new filtering techniques. As far as the design of the system is concerned, we chose a solution that allows us to deploy approximate sub2sequence matching without changing the underlying database.

2002 - Searching Similar (Sub)Sentences for Example-Based Machine Translation [Relazione in Atti di Convegno]
Mandreoli, Federica; Martoglia, Riccardo; Tiberio, Paolo
abstract

Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation (EBMT). An EBMT system translates by analogy, using past translations to translate other, similar sourcelanguage sentences into the target language. The basic premise is that, if a previously translated sentence occurs again, the same translation is likely to be correct. In this paper, we propose a solution based on a purely syntactic approach for searching similar sentences and parts of them in an EBMT system; the underlying similarity measure is based on the similarity between sequence of terms such that the sentences most close to a given one are those who maintain most of the original form and contents. The system efficiently retrieves and ranks the most similar sentences available and, when no useful suggestion exists, it proceeds with the retrieval of similar parts. We opted for a design that would require minimal changes to existing databases and whose similarity measure and search algorithms are completely independent from the involved languages. This work has been developed as a joint work with LOGOS S.p.A., a worldwide leader in multilingual document translation.

2002 - Semantic Integration and Query Optimization of Heterogeneous Data Sources [Relazione in Atti di Convegno]
Bergamaschi, Sonia; Beneventano, Domenico; Castano, S; DE ANTONELLIS, V; Ferrara, A; Guerra, Francesco; Mandreoli, Federica; ORNETTI G., C; Vincini, Maurizio
abstract

In modern Internet/Intranet-based architectures, an increasing number of applications requires an integrated and uniform accessto a multitude of heterogeneous and distributed data sources. Inthis paper, we describe the ARTEMIS/MOMIS system for the semantic integration and query optimization of heterogeneous structured and semistructured data sources.

2002 - The Valid Web: un'infrastruttura XML/XSL per la gestione temporale di documenti Web [Articolo su rivista]
F., Grandi; Mandreoli, Federica
abstract

In questo lavoro presentiamo una estensione temporale del Web per il supporto e la gestione del tempo di validità, definita attraverso una infrastuttura XML/XSL

2001 - Beyond Schema Versioning: A Flexible Model for Spatio-Temporal Schema selection [Articolo su rivista]
Mandreoli, Federica; Grandi, F.; Scalas, M. R.; Roddick, J. F.
abstract

Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data items are formatted also becomes an issue. This paper presents a generalized model that accommodates temporal, spatial and spatio-temporal schema versioning within databases.

2001 - Codifica XML e Gestione di Informazione Temporale in Fonti Storiche Digitalizzate di Grandi Dimensioni [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi
abstract

Questo lavoro tratta dell'impiego di tecnologie legate all'XML per applicazioni nel campo dei Beni Culturali che sfruttino la codifica di semantica temporale nella gestione di documenti storici in forma elettronica. La ricerca è inserita nel contesto di un progetto mirato alla produzione di una versione digitale XML fruibile via Internet del dizionario Repetti (XIX secolo), di grande interesse per lo studio della storia e dell'archeologia medievale della Toscana. In particolare presentiamo una proposta di classificazione e codifica uniforme delle informazioni temporali contenute in fonti testuali caratterizzate da indeterminazione e uso di granularità e calendari multipli. Tale proposta si basa sull'estensione dell'approccio probabilistico (alla TSQL2) all'indeterminazione, con l'introduzione di distribuzioni di probabilità costanti a tratti, che risultano essere corrette da unpunto di vista semantico e si prestano ad elaborazioni particolarmente efficienti.L'articolo contiene inoltre una breve descrizione di due strumenti i cui prototipi sono in avanzata fase di realizzazione: un tool di sviluppo, di uso amichevole, per l'introduzione assistita della marcatura temporale all'interno dei documenti e un sistema per la gestione della collezione di documenti XML che rende disponible tramite il Web un efficiente motore di ricerca temporale.

2001 - Effective representation and efficient management of indeterminate dates [Relazione in Atti di Convegno]
F., Grandi; Mandreoli, Federica
abstract

Management of indeterminate temporal expressions is useful in a wide range of applications, from designing and querying temporal databases to knowledge representation and reasoning in artificial intelligence. In this paper, we focus on the representation and management of indeterminate dates, corresponding to a common use of temporal indeterminacy which can be found in (historical) texts written in natural language, as in expressions like: around 1624, near the end of the fourteenth century, etc. In this context, we adapt and improve the probabilistic approach designed for the TSQL2 language and further developed by Dyreson and Snodgrass, and show how it can be effectively and efficiently adopted for the management of indeterminate dates.

2001 - Extensional Knowledge for semantic query optimization in a mediator based system [Relazione in Atti di Convegno]
Beneventano, Domenico; Bergamaschi, Sonia; Mandreoli, Federica
abstract

Query processing in global information systems integrating multiple heterogeneous sources is a challenging issue in relation to the effective extraction of information available on-line. In this paper we propose intelligent, tool-supported techniques for querying global information systems integrating both structured and semistructured data sources. The techniques have been developed in the environment of a data integration, wrapper/mediator based system, MOMIS, and try to achieve the goal of optimized query reformulation w.r.t local sources. The developed techniques rely on the availability of integration knowledge whose semantics is expressed in terms of description logics. Integration knowledge includes local source schemata, a virtual mediated schema and its mapping descriptions, that is semantic mappings w.r.t. the underlying sources both at the intensional and extensional level. Mapping descriptions, obtained as a result of the semi-automatic integration process of multiple heterogeneous sources developed for the MOMIS system, include, unlike previous data integration proposals, extensional intra/interschema knowledge. Extensional knowledge is exploited to perform semantic query optimization in a mediator based system as it allows to devise an optimized query reformulation method. The techniques are under development in the MOMIS system but can be applied, in general, to data integration systems including extensional intra/interschema knowledge in mapping descriptions.

2001 - Schema evolution and versioning: A logical and computational characterisation [Relazione in Atti di Convegno]
Franconi, E.; Grandi, F.; Mandreoli, F.
abstract

2001 - The "XML/Repetti" Project: Encoding and Manipulation of Temporal Information in Historical Text Sources [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi
abstract

The paper deals with the deployment of XML-related technologies in Cultural Heritage applications concerningthe encoding of temporal semantics in the digital versionof historical documents. Since written sources have oftenthe same importance as material evidence in medievalarchaeology, our approach can be applied to thedevelopment of tools for the support of archaeologicalresearch. In previous work, we developed an XML/XSLinfrastructure called “The Valid Web” for the definitionand management of historical information within Webdocuments. In this paper we describe the application andextension of such an approach to the realization of theelectronic version of Repetti's historical-geographicaldictionary of Tuscany. The extension concerns the uniformmanagement of temporal indeterminacy, the use ofmultiple calendars and granularities and the proposedsolutions have been inspired by similar research done fortemporal query languages. From the user viewpoint, theproposed XML extensions allow the addition of historicalmetainformation to the encoded text sources and their“intelligent” temporal navigation via standard Webbrowsers. The project also involves the definition ofoptimized search algorithms, storage and temporalindexing of XML-encoded Repetti's Dictionary items,implementation of a prototype. As a byproduct, also a toolfor computer-aided temporal XML-encoding of textsources will be developed to be used by Cultural Heritageoperators (e.g. archaeology researchers).

2000 - A Generalized Modeling Framework for Schema Versioning Support [Relazione in Atti di Convegno]
Mandreoli, Federica; Grandi, M; R., Scalas
abstract

Advanced object-oriented applications require the management of schema versions, in order to cope with changes in the structure of the stored data. Two types of versioning have been separately considered so far: branching and temporal.The former arose in application domains like CAD/CAM and software engineering, where different solutions have been proposed to support design schema versions (consolidated versions). The latter concerns temporal databases, where some works considered temporal schema versioning to fulfil advanced needs of other typical object-oriented applications like GIS and the multimedia ones.In this work, we propose a general model which integrates the two approaches by supporting both design and temporal schema versions.The model is provided with a complete set of schema change primitives for full-fledged version manipulation whose semantics is described in the paper.

2000 - A Semantic Approach for Schema Evolution and Versioning in Object-Oriented Databases [Relazione in Atti di Convegno]
Mandreoli, Federica; E., Franconi; F., Grandi
abstract

In this paper a semantic approach for the specification and the management of databases with evolving schemata is introduced. It is shown how a general object-oriented model for schema versioning and evolution can be formalized; how the semantics of schema change operations can be defined; how interesting reasoning tasks can be supported, based on an encoding in description logics.

2000 - A Semantic Approach for Schema Evolution and Versioning in Object-Oriented Databases. [Relazione in Atti di Convegno]
E., Franconi; F., Grandi; Mandreoli, Federica
abstract

2000 - A general framework for evolving schemata support. [Relazione in Atti di Convegno]
E., Franconi; F., Grandi; Mandreoli, Federica
abstract

In this paper a semantic approach for the specification and themanagement of databases with evolving schemata is introduced. It is shown how a general object-oriented model for schema versioning and evolution can be formalised; how the semantics of schema change operations can be defined; how interesting reasoning tasks can be supported, based on an encoding in description logics.

2000 - Evolution and Change in Data Management - Issues and Directions [Articolo su rivista]
Mandreoli, Federica; Roddick, J. F.
abstract

One of the fundamental aspects of information and database systems is that they change. Moreover, in so doing they evolve, although the manner and quality of this evolution is highly dependent on the mechanisms in place to handle it. While changes in data are handled well, changes in other aspects, such as structure, rules, constraints, the model, etc., are handled to varying levels of sophistication and completeness. In order to study this in more detail a workshop on Evolution and Change in Data Management was held in Paris in November 1999. It brought together researchers from a wide range of disciplines with a common interest in handling the fundamental characteristics and the conceptual modelling of change in information and database systems. This short report of the workshop concentrates on some of the general lessons that emerged during the four days.

2000 - Schema evolution and versioning: a logical and computational characterisation [Relazione in Atti di Convegno]
Mandreoli, Federica; E., Franconi; F., Grandi
abstract

In this paper we study the logical and computational properties of schema evolution and versioning support in object-oriented databases. To this end, we present the formalisation of a general model for an object base with evolving schemata and define the semantics of the provided schema change operations. We will then sketch how the encoding of such a framework in a suitable Description Logic will allow the introduction and solution of interesting reasoning tasks at global database and single schema version levels.

2000 - The Valid Web [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi
abstract

The Valid Web is a software prototype implementing temporal extensions of the World Wide Web. The temporal dimension of interest is the valid time, which represents the evolution of data with respect of the real-world (or virtual) environment they describe. The prototype consists of a Web site browsable with MS Internet Explorer 5 (Ie5), which allows the selective processing of HTML/XML documents containing historical information or temporal data. The base techniques employed in the prototype design and development, which derive from the temporal database theory, are the adoption of data timestamping andtemporal selection operators for the creation and management of Web pages, respectively.

2000 - The Valid Web: an XML/XSL Infrastructure for Temporal Management of Web Documents. [Relazione in Atti di Convegno]
F., Grandi; Mandreoli, Federica
abstract

In this paper we present a temporal extension of the World Wide Web based on a complete XML/XSL infrastructure to support valid time. The proposed technique enables the explicit definition of temporal information within HTML/XML documents, whose contents can then be selectively accessed according to their valid time. By acting on a navigation validity context, the proposed solution makes it possible to “travel in time” in a given virtual environment with any XML-compliant browser; this allows, for instance, to cut personalized visit routes for a specific epoch in a virtual museum or a digital historical library, to visualize the evolution of an archaeological site through successives ages, to selectively access past issues of magazines, to browse historical time series (e.g. stock quote archives), etc. The proposed Web extensions have been tested on a demo prototype showing, as application example, the functionalities of a temporal Web museum.

2000 - Un'infrastruttura XML/XSL per la gestione temporale di documenti e dati in ambiente Web. [Relazione in Atti di Convegno]
F., Grandi; Mandreoli, Federica
abstract

In questo lavoro presentiamo una estensione temporale del Web per il supporto e la gestione del tempo di validità, definita attraverso una infrastruttura XML/XSL. Tale estensione consente la definizione esplicita di informazione temporale all'interno di pagine Web (documenti HTML o XML), i cui contenuti possono cosìessere acceduti e fruiti selettivamente in base alla loro validità. Con la soluzione proposta, agendo su di un contesto di navigazione temporale, è possibile "viaggiare nel tempo'' in un ambiente virtuale dato, attraverso un qualunque browser che riconosca il codice XML. Dal punto di vista dell'utente tale funzionalità consente, ad esempio, di ritagliare percorsi di visita personalizzata, circoscritti ad una particolare epoca, all'interno di un museo virtuale o di una biblioteca storica digitale, oppure di visualizzare l'evoluzione attraverso epoche successive di un sitoarcheologico, oppure anche di accedere selettivamente a serie storiche di dati (es. quotazioni di borsa), edizioni passate di pubblicazioni on-line e quanto altro possa essere organizzato secondo la dimensione temporale. In aggiunta a tali funzionalità dinavigazione, l'infrastruttura proposta può anche essere impiegata, in maniera immediata, per la gestione di dati semistrutturati codificati in XML, ponendo le basi per la gestione di dati temporali e lo sviluppo di linguaggi di interrogazione temporale per XML.Le estensioni del Web proposte sono state sperimentate su di un prototipo software che mostra due esempi applicativi: la realizzazione di un sito Web temporale (museo virtuale) e la gestione di dati XML temporali con funzionalità di query di tipo TSQL2.

1999 - ODMG language extensions for generalized schema versioning support [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi
abstract

The management of different schema versions is required inlong-lived database systems to accomplish data structural changes and represent their history. Once a suitable data model for schema versioning support has been defined, appropriate extensions must also be introduced in the data definition and manipulation languages. Such an extension is aimed at making the versioning facilities available at user-interface leveland is the basis for the development of advanced multi-schema applications. In this paper we present extensions to the definition and manipulation language of the standard object-oriented data model ODMG for a generalized schema versioning support. To this end, two versioning modalities will be considered in a single powerful system: temporal versioning and management of alternative design versions. As far as the temporal components are concerned, the proposed extensions of ODL and OQL will be consistent with the TSQL temporal query language.

1999 - Towards a Model for Spatio-Temporal Schema Selection [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi; M. R., Scalas; J. F., Roddick
abstract

Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data is formatted also becomes an issue. This paper presents a generalised model that accommodates schema versioning within static, temporal, spatial and spatio-temporal relational and object-oriented databases.

1999 - Un Nuovo Modello per la Gestione di Versioni di Progetto e Versioni Temporali di Schema nelle Basi di Dati Object-Oriented [Relazione in Atti di Convegno]
Mandreoli, Federica; F., Grandi; M. R., Scalas
abstract

Il problema della gestione di versioni di schema (schema versioning) nelle basi di dati object-oriented è stato studiato nell'ambito di due principali filoni di ricerca. Il primo di essi riguarda sistemi statici (non temporali), per i quali esistono numerose soluzioni per il supporto di versioni progettuali di schema (versioni consolidate), sulla base delle esigenze di domini applicativi quali il CAD/CAM e l'ingegneria del software.Il secondo filone di ricerca riguarda invece le basi di dati temporali. In questo ambito, per soddisfare le richieste avanzate da altre tipiche applicazioni object-oriented, quali GIS e multimediale, sono state presentate alcune proposte di gestione di versioni temporali di schema. In questo lavoro ci proponiamo di integrare i due approcci, introducendo un modello generalizzato orientato agli oggetti per la gestione di versioni di schema sia progettuali sia temporali.Il modello proposto estende le possibilità applicative di un singolo sistema arricchendo l'espressività delle versioni e le potenzialità dischiuse dal loro trattamento. A tal fine è stato formalmente definito un insieme completo di primitive per il cambiamento di schema il cui utilizzo sarà esemplificato nel lavoro.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni