Nuova ricerca

MARCO LIPPI

Professore Associato
Dipartimento di Scienze e Metodi dell'Ingegneria


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2023 - A General Pipeline for Online Gesture Recognition in Human–Robot Interaction [Articolo su rivista]
Villani, Valeria; Secchi, Cristian; Lippi, Marco; Sabattini, Lorenzo
abstract


2023 - Application of Machine Learning Demand Forecasting Techniques in the Italian Processed Meat Industry [Abstract in Atti di Convegno]
Mucciarini, Mirko; Caselli, Giulia; Iori, Manuel; Lippi, Marco
abstract


2023 - Cross-Load Generalization of Bearing Fault Recognition with Decision Trees [Relazione in Atti di Convegno]
Briglia, Giovanni; Immovilli, Fabio; Cocconcelli, Marco; Lippi, Marco
abstract

The literature on condition monitoring is nowadays characterized by a wide variety of machine learning approaches. We argue that, in most of the works, the experimental evaluation is conducted in an oversimplified scenario, where training and test data contain samples obtained under the same radial and torsional load conditions. In this paper, we propose to apply an interpretable machine learning model, namely decision trees, to perform fault detection and recognition across different load configurations, a challenging benchmark that requires general-ization capabilities. The rules extracted from the trees provide explanations of the classification process.


2023 - Enabling causality learning in smart factories with hierarchical digital twins [Articolo su rivista]
Lippi, M.; Martinelli, M.; Picone, M.; Zambonelli, F.
abstract

Smart factories are complex systems where many different components need to interact and cooperate in order to achieve common goals. In particular, devices must be endowed with the skill of learning how to react in front of evolving situations and unexpected scenarios. In order to develop these capabilities, we argue that systems will need to build an internal, and possibly shared, representation of their operational world that represents causal relations between actions and observed variables. Within this context, digital twins will play a crucial role, by providing the ideal infrastructure for the standardisation and digitisation of the whole industrial process, laying the groundwork for the high-level learning and inference processes. In this paper, we introduce a novel hierarchical architecture enabled by digital twins, that can be exploited to build logical abstractions of the overall system, and to learn causal models of the environment directly from data. We implement our vision through a case study of a simulated production process. Our results in that scenario show that Bayesian networks and intervention via do-calculus can be effectively exploited within the proposed architecture to learn interpretable models of the environment. Moreover, we evaluate how the use of digital twins has a strong impact on the reduction of the physical complexity perceived by external applications.


2023 - Multi-Task Attentive Residual Networks for Argument Mining [Articolo su rivista]
Galassi, A.; Lippi, M.; Torroni, P.
abstract

We explore the use of residual networks and neural attention for multiple argument mining tasks. We propose a residual architecture that exploits attention, multi-task learning, and makes use of ensemble, without any assumption on document or argument structure. We present an extensive experimental evaluation on five different corpora of user-generated comments, scientific publications, and persuasive essays. Our results show that our approach is a strong competitor against state-of-the-art architectures with a higher computational footprint or corpus-specific design, representing an interesting compromise between generality, performance accuracy and reduced model size.


2022 - Activity Imputation of Shared e-Bikes Travels in Urban Areas [Relazione in Atti di Convegno]
Hadjidimitriou, N. S.; Lippi, M.; Mamei, M.
abstract

In 2017, about 900 thousands motorbikes were registered in Europe. These types of vehicles are often selected as the only alternative when the congestion in urban areas is high, thus consistently contributing to environmental emissions. This work proposes a data-driven approach to analyse trip purposes of shared electric bikes users in urban areas. Knowing how e-bikes are used in terms of trip duration and purpose is important to integrate them in the current transportation system. The data set consists of GPS traces collected during one year and three months representing 6,705 trips performed by 91 users of the e-bike sharing service located in three South European cities (Malaga, Rome and Bari). The proposed methodology consists of computing a set of features related to the temporal (time of the day, day of the week), meteorological (e.g. weather, season) and topological (the percentage of km traveled on roads with cycleways, speed on different types of roads, proximity of arrival to the nearest Point of Interest) characteristics of the trip. Based on the identified features, logistic regression and random forest classifiers are trained to predict the purpose of the trip. The random forest performs better with an average accuracy, over the 10 random splits of the train and test set, of 82%. The overall accuracy decreases to 67% when training and test sets are split at the level of users and not at the level of trips. Finally, the travel activities are predicted for the entire data set and the features are analysed to provide a description of the behaviour of shared e-bike users.


2022 - AMICA: An Argumentative Search Engine for COVID-19 Literature [Relazione in Atti di Convegno]
Lippi, M.; Antici, F.; Brambilla, G.; Cisbani, E.; Galassi, A.; Giansanti, D.; Magurano, F.; Rosi, A.; Ruggeri, F.; Torroni, P.
abstract

AMICA is an argument mining-based search engine, specifically designed for the analysis of scientific literature related to COVID-19. AMICA retrieves scientific papers based on matching keywords and ranks the results based on the papers' argumentative content. An experimental evaluation conducted on a case study in collaboration with the Italian National Institute of Health shows that the AMICA ranking agrees with expert opinion, as well as, importantly, with the impartial quality criteria indicated by Cochrane Systematic Reviews.


2022 - Argument mining as rapid screening tool of COVID-19 literature quality: Preliminary evidence [Articolo su rivista]
Brambilla, G.; Rosi, A.; Antici, F.; Galassi, A.; Giansanti, D.; Magurano, F.; Ruggeri, F.; Torroni, P.; Cisbani, E.; Lippi, M.
abstract

The COVID-19 pandemic prompted the scientific community to share timely evidence, also in the form of pre-printed papers, not peer reviewed yet.


2022 - Demand Forecasting Methods: A Case Study in the Italian Processed Meat Industry [Abstract in Atti di Convegno]
Mucciarini, Mirko; Caselli, Giulia; Iori, Manuel; Lippi, Marco
abstract


2022 - Detecting and explaining unfairness in consumer contracts through memory networks [Articolo su rivista]
Ruggeri, F.; Lagioia, F.; Lippi, M.; Torroni, P.
abstract

Recent work has demonstrated how data-driven AI methods can leverage consumer protection by supporting the automated analysis of legal documents. However, a shortcoming of data-driven approaches is poor explainability. We posit that in this domain useful explanations of classifier outcomes can be provided by resorting to legal rationales. We thus consider several configurations of memory-augmented neural networks where rationales are given a special role in the modeling of context knowledge. Our results show that rationales not only contribute to improve the classification accuracy, but are also able to offer meaningful, natural language explanations of otherwise opaque classifier outcomes.


2022 - Detection of Unsorted Metal Components for Robot Bin Picking Using an Inexpensive RGB-D Sensor [Relazione in Atti di Convegno]
Monica, R.; Saccuti, A.; Aleotti, J.; Lippi, M.
abstract

This work investigates the problem of 6D pose estimation and robot bin picking of non-Lambertian reflecting objects based on a low-cost commercial 3D sensor. In particular, we address the task of estimating the pose of small metal hydraulic components of the same type, randomly placed in a bin. The system consists of a robot arm and an RGB-D sensor in eye-in-hand configuration. The proposed method works in two main phases. In the first phase a Convolutional Neural Network (CNN) extracts the bounding boxes of the objects contained in the bin from a single RGB image of the environment. In the second phase the 6D pose of the objects is estimated using a dense 3D reconstruction of the scene and by applying a template matching algorithm from multiple virtual views of the object CAD model. Experimental results have been carried out on a dataset containing both RGB and depth images. Preliminary experiments are also reported in the real setup.


2022 - Individual and Collective Self-Development: Concepts and Challenges [Relazione in Atti di Convegno]
Lippi, Marco; Mariani, Stefano; Martinelli, Matteo; Zambonelli, Franco
abstract


2022 - Poka Yoke Meets Deep Learning: A Proof of Concept for an Assembly Line Application [Articolo su rivista]
Martinelli, M.; Lippi, M.; Gamberini, R.
abstract

In this paper, we present the re-engineering process of an assembly line that features speed reducers and multipliers for agricultural applications. The “as-is” line was highly inefficient due to several issues, including the age of the machines, a non-optimal arrangement of the shop floor, and the absence of process standards. The assembly line issues were analysed with Lean Manufacturing tools, identifying irregularities and operations that require effort (Mura), overload (Muri), and waste (Muda). The definition of the “to-be” line included actions to update the department layout, modify the assembly process, and design the line feeding system in compliance with the concepts of Golden Zone (i.e., the horizontal space more ergonomically and easily accessible by the operator) and Strike Zone (i.e., the vertical workspace setup in accordance to ergonomics specifications). The re-engineering process identified a critical problem in the incorrect assembly of the oil seals, mainly caused by the difficulty in visually identifying the correct side of the component, due to different reasons. Convolutional neural networks were used to address this issue. The proposed solution resulted to be a Poka Yoke. The whole re-engineering process induced a productivity increase that is estimated from 46% to 80%. The study demonstrates how Lean Manufacturing tools together with deep learning technologies can be effective in the development of smart manufacturing lines.


2022 - Self-Development and Causality in Intelligent Environments [Relazione in Atti di Convegno]
Martinelli, Matteo; Mariani, Stefano; Lippi, Marco; Zambonelli, Franco
abstract


2021 - A Data Driven Approach to Match Demand and Supply for Public Transport Planning [Articolo su rivista]
Hadjidimitriou, Natalia; Lippi, Marco; Mamei, Marco
abstract


2021 - Assessing the Cross-Market Generalization Capability of the CLAUDETTE System [Relazione in Atti di Convegno]
Jablonowska, A.; Lagioia, F.; Lippi, M.; Micklitz, H. -W.; Sartor, G.; Tagiuri, G.
abstract

We present a study aimed at testing the CLAUDETTE system's ability to generalise the concept of unfairness in consumer contracts across diverse market sectors. The data set includes 142 terms of services grouped in five sub-sets: travel and accommodation, games and entertainment, finance and payments, health and well-being, and the more general others. Preliminary results show that the classifier has satisfying performance on all the sectors.


2021 - Attention in Natural Language Processing [Articolo su rivista]
Galassi, Andrea; Lippi, Marco; Torroni, Paolo
abstract

Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain.


2021 - Developing a 'Sense of Agency' in IoT Systems: Preliminary Experiments in a Smart Home Scenario [Relazione in Atti di Convegno]
Lippi, M.; Mariani, S.; Zambonelli, F.
abstract

Smart IoT systems are increasingly required to take decisions and act in contexts that are only partially known, or that dynamically evolve through time. Therefore, they should become able to to autonomously learn models of their context, there included a model of the effects of their own actions on it (that is, developing a 'sense of agency'). This would enable them to learn how to act purposefully towards the achievement of specific goals. In this paper we propose a general-purpose Bayesian learning approach to build such context models and the associated sense of agency, and present some promising preliminary experiments performed in a smart home scenario.


2021 - Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? [Articolo su rivista]
De Cesarei, A.; Cavicchi, S.; Cristadoro, G.; Lippi, M.
abstract

The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.


2021 - Lead time forecasting with machine learning techniques for a pharmaceutical supply chain [Relazione in Atti di Convegno]
Biazon de Oliveira, Maiza; Zucchi, Giorgio; Lippi, Marco; Farias Cordeiro, Douglas; Rosa da Silva, Nubia; Iori, Manuel
abstract

Purchasing lead time is the time elapsed between the moment in which an order for a good is sent to a supplier and the moment in which the order is delivered to the company that requested it. Forecasting of purchasing lead time is an essential task in the planning, management and control of industrial processes. It is of particular importance in the context of pharmaceutical supply chain, where avoiding long waiting times is essential to provide efficient healthcare services. The forecasting of lead times is, however, a very difficult task, due to the complexity of the production processes and the significant heterogeneity in the data. In this paper, we use machine learning regression algorithms to forecast purchasing lead times in a pharmaceutical supply chain, using a real-world industrial database. We compare five algorithms, namely k-nearest neighbors, support vector machines, random forests, linear regression and multilayer perceptrons. The support vector machines approach obtained the best performance overall, with an average error lower than two days. The dataset used in our experiments is made publicly available for future research.


2021 - Sensing and Forecasting Crowd Distribution in Smart Cities: Potentials and Approaches [Articolo su rivista]
Cecaj, Alket; Lippi, Marco; Mamei, Marco; Zambonelli, Franco
abstract

The possibility of sensing and predicting the movements of crowds in modern cities is of fundamental importance for improving urban planning, urban mobility, urban safety, and tourism activities. However, it also introduces several challenges at the level of sensing technologies and data analysis. The objective of this survey is to overview: (i) the many potential application areas of crowd sensing and prediction; (ii) the technologies that can be exploited to sense crowd along with their potentials and limitations; (iii) the data analysis techniques that can be effectively used to forecast crowd distribution. Finally, the article tries to identify open and promising research challenges.


2020 - Comparing deep learning and statistical methods in forecasting crowd distribution from aggregated mobile phone data [Articolo su rivista]
Cecaj, A.; Lippi, M.; Mamei, M.; Zambonelli, F.
abstract

Accurately forecasting how crowds of people are distributed in urban areas during daily activities is of key importance for the smart city vision and related applications. In this work we forecast the crowd density and distribution in an urban area by analyzing an aggregated mobile phone dataset. By comparing the forecasting performance of statistical and deep learning methods on the aggregated mobile data we show that each class of methods has its advantages and disadvantages depending on the forecasting scenario. However, for our time-series forecasting problem, deep learning methods are preferable when it comes to simplicity and immediacy of use, since they do not require a time-consuming model selection for each different cell. Deep learning approaches are also appropriate when aiming to reduce the maximum forecasting error. Statistical methods instead show their superiority in providing more precise forecasting results, but they require data domain knowledge and computationally expensive techniques in order to select the best parameters.


2020 - Explaining potentially unfair clauses to the consumer with the claudette tool [Relazione in Atti di Convegno]
Liepina, R.; Ruggeri, F.; Lagioia, F.; Lippi, M.; Drazewski, K.; Torroni, P.
abstract

This paper presents the latest developments of the use of memory network models in detecting and explaining unfair terms in online consumer contracts. We extend the CLAUDETTE tool for the detection of potentially unfair clauses in online Terms of Service, by providing to the users the explanations of unfairness (legal rationales) for five different categories: Arbitration, unilateral change, content removal, unilateral termination, and limitation of liability.


2020 - Forecasting Crowd Distribution in Smart Cities [Relazione in Atti di Convegno]
Cecaj, A.; Lippi, M.; Mamei, M.; Zambonelli, F.
abstract

In this work we present a forecasting method that can be used to predict crowd distribution across the city. Specifically, we analyze and forecast cellular network traffic and estimate crowd on such basis. Our forecasting model is based on a neural network combined with time series decomposition techniques. Our analysis shows that this approach can give interesting results in two directions. First, it creates a forecasting solution that fits all the variability in our dataset without having to create specific features and without complex search procedures for optimal parameters. Second, the method performs well, showing to be robust even in the presence of spikes in the data thus enabling better applications such as event management and detection of crowd gathering.


2020 - Machine Learning for Severity Classification of Accidents Involving Powered Two Wheelers [Articolo su rivista]
Hadjidimitriou, N. S.; Dell'Amico, M.; Lippi, M.; Skiera, A.
abstract

Road traffic safety is one of the major challenges for the future of smart cities and transportation networks. Despite several solutions exist to reduce the number of fatalities and severe accidents happening daily in our roads, this reduction is smaller than expected and new methods and intelligent systems are needed. The emergency Call is an initiative of the European Commission aimed at providing rapid assistance to motorists thanks to the implementation of a unique emergency number. In this work, we study the problem of classifying the severity of accidents involving Powered Two Wheelers, by exploiting machine learning systems based on features that could be reasonably collected at the moment of the accident. An extended study on the set of features allows to identify the most important factors that allow to distinguish accident severity. The system we develop achieves over 90% of precision and recall on a large, publicly available corpus, using only a set of twelve features.


2020 - Neural-Symbolic Argumentation Mining: An Argument in Favor of Deep Learning and Reasoning [Articolo su rivista]
Galassi, A.; Kersting, K.; Lippi, M.; Shao, X.; Torroni, P.
abstract

Deep learning is bringing remarkable contributions to the field of argumentation mining, but the existing approaches still need to fill the gap toward performing advanced reasoning tasks. In this position paper, we posit that neural-symbolic and statistical relational learning could play a crucial role in the integration of symbolic and sub-symbolic methods to achieve this goal.


2020 - Parallelizing Machine Learning as a service for the end-user [Articolo su rivista]
Loreti, D.; Lippi, M.; Torroni, P.
abstract

As Machine Learning (ML) applications are becoming ever more pervasive, fully-trained systems are made increasingly available to a wide public, allowing end-users to submit queries with their own data, and to efficiently retrieve results. With increasingly sophisticated such services, a new challenge is how to scale up to ever growing user bases. In this paper, we present a distributed architecture that could be exploited to parallelize a typical ML system pipeline. We propose a case study consisting of a text mining service, and discuss how the method can be generalized to many similar applications. We demonstrate the significance of the computational gain boosted by the distributed architecture by way of an extensive experimental evaluation.


2020 - Texture analysis and multiple-instance learning for the classification of malignant lymphomas [Articolo su rivista]
Lippi, Marco; Gianotti, Stefania; Fama, Angelo; Casali, Massimiliano; Barbolini, Elisa; Ferrari, Angela; Fioroni, Federica; Iori, Mauro; Luminari, Stefano; Menga, Massimo; Merli, Francesco; Trojani, Valeria; Versari, Annibale; Zanelli, Magda; Bertolini, Marco
abstract

Background and objectives: Malignant lymphomas are cancers of the immune system and are characterized by enlarged lymph nodes that typically spread across many different sites. Many different histological subtypes exist, whose diagnosis is typically based on sampling (biopsy) of a single tumor site, whereas total body examinations with computed tomography and positron emission tomography, though not diagnostic, are able to provide a comprehensive picture of the patient. In this work, we exploit a data-driven approach based on multiple-instance learning algorithms and texture analysis features extracted from positron emission tomography, to predict differential diagnosis of the main malignant lymphomas subtypes. Methods: We exploit a multiple-instance learning setting where support vector machines and random forests are used as classifiers both at the level of single VOIs (instances) and at the level of patients (bags). We present results on two datasets comprising patients that suffer from four different types of malignant lymphomas, namely diffuse large B cell lymphoma, follicular lymphoma, Hodgkin's lymphoma, and mantle cell lymphoma. Results: Despite the complexity of the task, experimental results show that, with sufficient data samples, some cancer subtypes, such as the Hodgkin's lymphoma, can be identified from texture information: in particular, we achieve a 97.0% of sensitivity (recall) and a 94.1% of predictive positive value (precision) on a dataset that consists in 60 patients. Conclusions: The presented study indicates that texture analysis features extracted from positron emission tomography, combined with multiple-instance machine learning algorithms, can be discriminating for different malignant lymphomas subtypes.


2020 - The Force Awakens: Artificial intelligence for consumer law [Articolo su rivista]
Lippi, M.; Contissa, G.; Jablonowska, A. J.; Lagioia, F.; Micklitz, H. -W.; Palka, P.; Sartor, G.; Torroni, P.
abstract

Recent years have been tainted by market practices that continuously expose us, as consumers, to new risks and threats. We have become accustomed, and sometimes even resigned, to businesses monitoring our activities, examining our data, and even meddling with our choices. Artificial Intelligence (AI) is often depicted as a weapon in the hands of businesses and blamed for allowing this to happen. In this paper, we envision a paradigm shift, where AI technologies are brought to the side of consumers and their organizations, with the aim of building an efficient and effective counter-power. AI-powered tools can support a massive-scale automated analysis of textual and audiovisual data, as well as code, for the benefit of consumers and their organizations. This in turn can lead to a better oversight of business activities, help consumers exercise their rights, and enable the civil society to mitigate information overload. We discuss the societal, political, and technological challenges that stand before that vision.


2019 - Argumentation-based coordination in IoT: A speaking objects proof-of-concept [Relazione in Atti di Convegno]
Mariani, S.; Bicego, A.; Lippi, M.; Mamei, M.; Zambonelli, F.
abstract

Coordination of Cyberphysical Systems is an increasingly relevant concern for distributed systems engineering, mostly due to the rise of the Internet of Things vision in many application domains. Against this background, Speaking Objects has been proposed as a vision of future smart objects coordinating their collective perception and action through argumentation. Along this line, in this paper we describe a Proof-of-Concept implementation of the Speaking Objects vision in a smart home deployment.


2019 - Attention, please! A critical review of neural attention models in natural language processing [Articolo su rivista]
Galassi, A.; Lippi, M.; Torroni, P.
abstract

Attention is an increasingly popular mechanism used in a wide range of neural architectures. Because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures for natural language processing, with a focus on architectures designed to work with vector representation of the textual data. We discuss the dimensions along which proposals differ, the possible uses of attention, and chart the major research activities and open challenges in the area.


2019 - Automated Bearing Fault Detection via Long Short-Term Memory Networks [Relazione in Atti di Convegno]
Immovilli, F.; Lippi, M.; Cocconcelli, M.
abstract

This paper presents a method for automated bearing fault detection via motor current analysis using Long Short-Term Memory networks. Minimal pre-processing is applied to current signals. The proposed approach is experimentally validated on a laboratory trial comprising different test sets for condition monitoring and fault diagnosis of a 6-poles induction motor. Preliminary results confirmed the effectiveness of the proposed method to detect various bearing faults under different operating conditions, such as: shaft radial load and output torque.


2019 - CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service [Articolo su rivista]
Lippi, Marco; Pałka, Przemysław; Contissa, Giuseppe; Lagioia, Francesca; Micklitz, Hans-Wolfgang; Sartor, Giovanni; Torroni, Paolo
abstract

Terms of service of on-line platforms too often contain clauses that are potentially unfair to the consumer. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike.


2019 - Consumer protection requires artificial intelligence [Articolo su rivista]
Lippi, Marco; Contissa, Giuseppe; Lagioia, Francesca; Micklitz, Hans-Wolfgang; Pałka, Przemysław; Sartor, Giovanni; Torroni, Paolo
abstract

Technology companies have quickly become powerful with their access to large amounts of data and machine learning technologies, but consumers could be empowered too with automated tools to protect their rights.


2019 - Counts-of-counts similarity for prediction and search in relational data [Articolo su rivista]
Jaeger, Manfred; Lippi, Marco; Pellegrini, Giovanni; Passerini, Andrea
abstract

Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.


2019 - Deep learning for detecting and explaining unfairness in consumer contracts [Relazione in Atti di Convegno]
Lagioia, F.; Ruggeri, F.; Drazewski, K.; Lippi, M.; Micklitz, H. -W.; Torroni, P.; Sartor, G.
abstract

Consumer contracts often contain unfair clauses, in apparent violation of the relevant legislation. In this paper we present a new methodology for evaluating such clauses in online Terms of Services. We expand a set of tagged documents (terms of service), with a structured corpus where unfair clauses are liked to a knowledge base of rationales for unfairness, and experiment with machine learning methods on this expanded training set. Our experimental study is based on deep neural networks that aim to combine learning and reasoning tasks, one major example being Memory Networks. Preliminary results show that this approach may not only provide reasons and explanations to the user, but also enhance the automated detection of unfair clauses.


2019 - Distributed Speaking Objects: A Case for Massive Multiagent Systems [Relazione in Atti di Convegno]
Lippi, M.; Mamei, M.; Mariani, S.; Zambonelli, F.
abstract

Smart sensors and actuators, embedding learning and reasoning features and associated to everyday objects and locations, will soon densely populate our everyday environments. Being capable of understanding, reasoning, and reporting about what is happening (for sensors) and about what they can make possibly happen (for actuators), these “speaking objects” will thus be assimilable to autonomous situated agents. Accordingly, populations of speaking objects will define dense and massive multiagent systems, devoted to monitor and control our environments, let them be homes, industries or, in the large-scale, whole cities. In this context, the necessary coordination among speaking objects will be likely to become associated with the capability of argumenting about situations and about the current state of the affairs, triggering and directing proper distributed conversations, and eventually collectively reach future desirable state of the affairs. In this article, we detail the speaking objects vision, overview the key enabling technologies, and analyze the key challenges for engineering large-scale collectives of speaking objects and their conversations.


2019 - Editorial: Statistical relational artificial intelligence [Articolo su rivista]
Riguzzi, F.; Kersting, K.; Lippi, M.; Natarajan, S.
abstract


2019 - Evaluating origin–destination matrices obtained from CDR data [Articolo su rivista]
Mamei, M.; Bicocchi, N.; Lippi, M.; Mariani, S.; Zambonelli, F.
abstract

Understanding and correctly modeling urban mobility is a crucial issue for the development of smart cities. The estimation of individual trips from mobile phone positioning data (i.e., call detail records (CDR)) can naturally support urban and transport studies as well as marketing applications. Individual trips are often aggregated in an origin–destination (OD) matrix counting the number of trips from a given origin to a given destination. In the literature dealing with CDR data there are two main approaches to extract OD matrices from such data: (a) in time-based matrices, the analysis focuses on estimating mobility directly from a sequence of CDRs; (b) in routine-based matrices (OD by purpose) the analysis focuses on routine kind of movements, like home-work commute, derived from a trip generation model. In both cases, the OD matrix measured by CDR counts is scaled to match the actual number of people moving in the area, and projected to the road network to estimate actual flows on the streets. In this paper, we describe prototypical approaches to estimate OD matrices, describe an actual implementation, and present a number of experiments to evaluate the results from multiple perspectives.


2019 - GDPR privacy policies in CLAUDETTE: Challenges of omission, context and multilingualism [Relazione in Atti di Convegno]
Liepin, R.; Contissa, G.; Drazewski, K.; Lagioia, F.; Lippi, M.; Micklitz, H. -W.; Palka, P.; Sartor, G.; Torroni, P.
abstract

The latest developments in natural language processing and machine learning have created new opportunities in legal text analysis. In particular, we look at the texts of online privacy policies after the implementation of the European General Data Protection Regulation (GDPR). We analyse 32 privacy policies to design a methodology for automated detection and assessment of compliance of these documents. Preliminary results confirm the pressing issues with current privacy policies and the beneficial use of this approach in empowering consumers in making more informed decisions. However, we also encountered several serious issues in the process. This paper introduces the challenges through concrete examples of context dependence, omission of information, and multilingualism.


2019 - Improve Education Opportunities for Better Integration of Syrian Refugees in Turkey [Capitolo/Saggio]
Mamei, M.; Cylasun, S. M.; Lippi, M.; Pancotto, F.; Tumen, S.
abstract


2019 - Natural Language Statistical Features of LSTM-Generated Texts [Articolo su rivista]
Lippi, Marco; Montemurro, Marcelo A; Esposti, Mirko Degli; Cristadoro, Giampaolo
abstract

Long short-term memory (LSTM) networks have recently shown remarkable performance in several tasks that are dealing with natural language generation, such as image captioning or poetry composition. Yet, only few works have analyzed text generated by LSTMs in order to quantitatively evaluate to which extent such artificial texts resemble those generated by humans. We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders. In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures. Our main finding is that while both LSTM- and Markov-generated texts can exhibit features similar to real ones in their word-frequency statistics and entropy measures, LSTM-texts are shown to reproduce long-range correlations at scales comparable to those found in natural language. Moreover, for LSTM networks, a temperature-like parameter controlling the generation process shows an optimal value--for which the produced texts are closest to real language--consistent across different statistical features investigated.


2019 - Policy implications of the D4R Challenge [Capitolo/Saggio]
Ali Salah, Albert; Tarık Altuncu, M.; Balcisoy, Selim; Frydenlund, Erika; Mamei, Marco; Ali Akyol, Mehmet; Yavuz Arslanlı, Kerem; Bensason, Ivon; Boshuijzen-van Burken, Christine; Bosetti, Paolo; Boy, Jeremy; Bozcaga, Tugba; Mümin Cilasun, Seyit; Işık, Oğuz; Kalaycıoğlu, Sibel; Seyyide Kaptaner, Ayse; Kayi, Ilker; Ozan Kılıç, Özgün; Kjamili, Berat; Kucukali, Huseyin; Martin, Aaron; Lippi, Marco; Pancotto, Francesca; Rhoads, Daniel; Sevencan, Nur; Sezgin, Ervin; Solé-Ribalta, Albert; Sterly, Harald; Surer, Elif; Taşkaya Temizel, Tuğba; Tümen, Semih; Uluturk, Ismail
abstract

The Data for Refugees (D4R) Challenge resulted in many insights related to the movement patterns of the Syrian refugees within Turkey. In this chapter, we summarize some of the important findings, and suggest policy recommendations for the main areas of the challenge. These recommendations are sometimes broad suggestions, as the policy interventions involve many factors that are difficult to take into account. We give examples of such issues to help policy-makers.


2018 - An Argumentation-based Perspective over the Social IoT [Articolo su rivista]
Lippi, Marco; Mamei, Marco; Mariani, Stefano; Zambonelli, Franco
abstract

The crucial role played by social interactions between smart objects in the Internet of Things is being rapidly recognized by the Social Internet of Things (SIoT) vision. In this paper, we build upon the recently introduced vision of Speaking Objects – “things” interacting through argumentation – to show how different forms of human dialogue naturally fit cooperation and coordination requirements of the SIoT. In particular, we show how speaking objects can exchange arguments in order to seek for information, negotiate over an issue, persuade others, deliberate actions, and so on, namely, striving to reach consensus about the state of affairs and their goals. In this context, we illustrate how argumentation naturally enables such a form of conversational coordination through practical examples and a case study scenario.


2018 - Argument mining on clinical trials [Relazione in Atti di Convegno]
Mayer, T.; Cabrio, E.; Lippi, M.; Torroni, P.; Villata, S.
abstract

Argument-based decision making has been employed to support a variety of reasoning tasks over medical knowledge. These include evidence-based justifications of the effects of treatments, the detection of conflicts in the knowledge base, and the enabling of uncertain and defeasible reasoning in the health-care sector. However, a common limitation of these approaches is that they rely on structured input information. Recent advances in argument mining have shown increasingly accurate results in detecting argument components and predicting their relations from unstructured, natural language texts. In this study, we discuss evidence and claim detection from Randomized Clinical Trials. To this end, we create a new annotated dataset about four different diseases (glaucoma, diabetes, hepatitis B, and hypertension), containing 976 argument components (697 containing evidence, 279 claims). Empirical results are promising, and show the portability of the proposed approach over different branches of medicine.


2018 - Argumentative Link Prediction using Residual Networks and Multi-Objective Learning [Relazione in Atti di Convegno]
Galassi, A.; Lippi, M.; Torroni, P.
abstract

We explore the use of residual networks for argumentation mining, with an emphasis on link prediction. The method we propose makes no assumptions on document or argument structure. We evaluate it on a challenging dataset consisting of user-generated comments collected from an online platform. Results show that our model outperforms an equivalent deep network and offers results comparable with state-of-the-art methods that rely on domain knowledge.


2018 - Automated processing of privacy policies under the EU general data protection regulation [Relazione in Atti di Convegno]
Contissa, G.; Docter, K.; Lagioia, F.; Lippi, M.; Micklitz, H. -W.; Palka, P.; Sartor, G.; Torroni, P.
abstract

Two years after its entry into force, the EU General Data Protection Regulation became applicable on the 25th May 2018. Despite the long time for preparation, privacy policies of online platforms and services still often fail to comply with information duties and the standard of lawfulness of data processing. In this paper we present a new methodology for processing privacy policies under GDPR's provisions, and a novel annotated corpus, to be used by machine learning systems to automatically check the compliance and adequacy of privacy policies. Preliminary results confirm the potential of the methodology.


2018 - Can Deep Networks Learn to Play by the Rules? A Case Study on Nine Men's Morris [Articolo su rivista]
Chesani, Federico; Galassi, Andrea; Lippi, Marco; Mello, Paola
abstract

Deep networks have been successfully applied to a wide range of tasks in artificial intelligence, and game playing is certainly not an exception. In this paper, we present an experimental study to assess whether purely sub-symbolic systems, such as deep networks, are capable of learning to play by the rules, without any a-priori knowledge neither of the game, nor of its rules, but only by observing the matches played by another player. Similar problems arise in many other application domains, where the goal is to learn rules, policies, behaviours, or decisions, simply by the observation of the dynamics of a system. We present a case study conducted with residual networks on the popular board game of Nine Men's Morris, showing that this kind of sub-symbolic architecture is capable of correctly discriminating legal from illegal decisions, just from the observation of past matches of a single player.


2018 - Claim Detection in Judgments of the EU Court of Justice [Relazione in Atti di Convegno]
Lippi, M.; Lagioia, F.; Contissa, G.; Sartor, G.; Torroni, P.
abstract

Mining arguments from text has recently become a hot topic in Artificial Intelligence. The legal domain offers an ideal scenario to apply novel techniques coming from machine learning and natural language processing, addressing this challenging task. Following recent approaches to argumentation mining in juridical documents, this paper presents two distinct contributions. The first one is a novel annotated corpus for argumentation mining in the legal domain, together with a set of annotation guidelines. The second one is the empirical evaluation of a recent machine learning method for claim detection in judgments. The method, which is based on Tree Kernels, has been applied to context-independent claim detection in other genres such as Wikipedia articles and essays. Here we show that this method also provides a useful instrument in the legal domain, especially when used in combination with domain-specific information.


2018 - Predict Cellular network traffic with markov logic [Relazione in Atti di Convegno]
Lippi, M.; Mamei, M.; Zambonelli, F.
abstract

Forecasting spatio-temporal data is a challenging task in transportation scenarios involving agents. In this paper, we propose a statistical relational learning approach to cellular network traffic forecasting, that exploits spatial relationships between close cells in the network grid. The approach is based on Markov logic networks, a powerful framework that combines first-order logic and graphical models into a hybrid model capable of handling both uncertainty in data, and background knowledge of the problem. Experimental results conducted on a real-world data set show the potential of using such information. The proposed methodology can have a strong impact in mobility demand forecasting and in transportation applications.


2018 - Predicting the usefulness of amazon reviews using off-the-shelf argumentation mining [Relazione in Atti di Convegno]
Passon, M.; Lippi, M.; Serra, G.; Tasso, C.
abstract

Internet users generate content at unprecedented rates. Building intelligent systems capable of discriminating useful content within this ocean of information is thus becoming a urgent need. In this paper, we aim to predict the usefulness of Amazon reviews, and to do this we exploit features coming from an off-the-shelf argumentation mining system. We argue that the usefulness of a review, in fact, is strictly related to its argumentative content, whereas the use of an already trained system avoids the costly need of relabeling a novel dataset. Results obtained on a large publicly available corpus support this hypothesis.


2018 - Towards Consumer-Empowering Artificial Intelligence [Relazione in Atti di Convegno]
Contissa, Giuseppe; Lagioia, Francesca; Lippi, Marco; Micklitz, Hans-Wolfgang; Palka, Przemyslaw; Sartor, Giovanni; Torroni, Paolo
abstract

Artificial Intelligence and Law is undergoing a critical transformation. Traditionally focused on the development of expert systems and on a scholarly effort to develop theories and methods for knowledge representation and reasoning in the legal domain, this discipline is now adapting to a sudden change of scenery. No longer confined to the walls of academia, it has welcomed new actors, such as businesses and companies, who are willing to play a major role and seize new opportunities offered by the same transformational impact that recent AI breakthroughs are having on many other areas. As it happens, commercial interests create new opportunities but they also represent a potential threat to consumers, as the balance of power seems increasingly determined by the availability of data. We believe that while this transformation is still in progress, time is ripe for the next frontier of this field of study, where a new shift of balance may be enabled by tools and services that can be of service not only to businesses but also to consumers and, more generally, the civil society. We call that frontier consumer-empowering AI.


2017 - Argumentation in social media [Articolo su rivista]
Gurevych, Iryna; Lippi, Marco; Torroni, Paolo
abstract

No abstract available.


2017 - Automated detection of unfair clauses in online consumer contracts [Relazione in Atti di Convegno]
Lippi, M.; Palka, P.; Contissa, G.; Lagioia, F.; Micklitz, H. -W.; Panagis, Y.; Sartor, G.; Torroni, P.
abstract

Consumer contracts too often present clauses that are potentially unfair to the subscriber. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses in online contracts. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike.


2017 - Coordinating Distributed Speaking Objects [Relazione in Atti di Convegno]
Lippi, Marco; Mamei, Marco; Mariani, Stefano; Zambonelli, Franco
abstract

In this paper we sketch a vision of future environments densely populated by smart sensors and actuators-possibly embedded in everyday objects-that, rather than simply producing streams of data, are capable of understanding and reporting, via factual assertions and arguments, about what is happening (for sensors) and about what they can make possibly happen (for actuators). These 'speaking objects' form the nodes of a dense distributed computing infrastructure that can be exploited to monitor and control activities in our everyday environment. However, the nature of speaking objects will dramatically change the approaches to implementing and coordinating the activities of distributed processes. In fact, distributed coordination is likely to become associated with the capability of argumenting about situations and about the current 'state of the affairs', with the aim of triggering and directing proper distributed 'conversations' to collectively reach a future desirable state. Accordingly, we discuss how such a novel vision can build upon some readily available technologies, and the research challenges that it poses. Two case studies are used as exemplary scenarios.


2017 - Driving behaviour clustering for realistic traffic micro-simulators [Relazione in Atti di Convegno]
Petraro, Alessandro; Caselli, Federico; Milano, Michela; Lippi, Marco
abstract

Traffic simulators are effective tools to support decisions in urban planning systems, to identify criticalities, to observe emerging behaviours in road networks and to configure road infrastructures, such as road side units and traffic lights. Clearly the more realistic the simulator the more precise the insight provided to decision makers. This paper provides a first step toward the design and calibration of traffic micro-simulator to produce realistic behaviour. The long term idea is to collect and analyse real traffic traces collecting vehicular information, to cluster them in groups representing similar driving behaviours and then to extract from these clusters relevant parameters to tune the microsimulator. In this paper we have run controlled experiments where traffic traces have been synthetized to obtain different driving styles, so that the effectiveness of the clustering algorithm could be checked on known labels. We describe the overall methodology and the results already achieved on the controlled experiment, showing the clusters obtained and reporting guidelines for future experiments.


2017 - Reasoning with deep learning: An open challenge [Relazione in Atti di Convegno]
Lippi, Marco
abstract

Building machines capable of performing automated reasoning is one of the most complex but fascinating challenges in AI. In particular, providing an effective integration of learning and reasoning mechanisms is a long-standing research problem at the intersection of many different areas, such as machine learning, cognitive neuroscience, psychology, linguistic, and logic. The recent breakthrough achieved by deep learning methods in a variety of AI-related domains has opened novel research lines attempting to solve this complex and challenging task.


2016 - Argument mining from speech: Detecting claims in political debates [Relazione in Atti di Convegno]
Lippi, M.; Torroni, P.
abstract

The automatic extraction of arguments from text, also known as argument mining, has recently become a hot topic in artificial intelligence. Current research has only focused on linguistic analysis. However, in many domains where communication may be also vocal or visual, paralinguistic features too may contribute to the transmission of the message that arguments intend to convey. For example, in political debates a crucial role is played by speech. The research question we address in this work is whether in such domains one can improve claim detection for argument mining, by employing features from text and speech in combination. To explore this hypothesis, we develop a machine learning classifier and train it on an original dataset based on the 2015 UK political elections debate.


2016 - Argumentation mining: State of the art and emerging trends [Articolo su rivista]
Lippi, Marco; Torroni, Paolo
abstract

Argumentation mining aims at automatically extracting structured arguments from unstructured textual documents. It has recently become a hot topic also due to its potential in processing information originating from the Web, and in particular from social media, in innovative ways. Recent advances in machine learning methods promise to enable breakthrough applications to social and economic sciences, policy making, and information technology: something that only a few years ago was unthinkable. In this survey article, we introduce argumentation models and methods, review existing systems and applications, and discuss challenges and perspectives of this exciting new research area.


2016 - Constraint detection in natural language problem descriptions [Relazione in Atti di Convegno]
Kiziltan, Z.; Lippi, M.; Torroni, P.
abstract

Modeling in constraint programming is a hard task that requires considerable expertise. Automated model reformulation aims at assisting a naive user in modeling constraint problems. In this context, formal specification languages have been devised to express constraint problems in a manner similar to natural yet rigorous specifications that use a mixture of natural language and discrete mathematics. Yet, a gap remains between such languages and the natural language in which humans informally describe problems. This work aims to alleviate this issue by proposing a method for detecting constraints in natural language problem descriptions using a structured-output classifier. To evaluate the method, we develop an original annotated corpus which gathers 110 problem descriptions from several resources. Our results show significant accuracy with respect to metrics used in cognate tasks.


2016 - MARGOT: A web server for argumentation mining [Articolo su rivista]
Lippi, Marco; Torroni, Paolo
abstract

Argumentation mining is a recent challenge concerning the automatic extraction of arguments from unstructured textual corpora. Argumentation mining technologies are rapidly evolving and show a clear potential for application in diverse areas such as recommender systems, policy-making and the legal domain. There is a long-recognised need for tools that enable users to browse, visualise, search, and manipulate arguments and argument structures. There is, however, a lack of widely accessible tools. In this article we describe the technology behind MARGOT, the first online argumentation mining system designed to reach out to the wider community of potential users of these new technologies. We evaluate its performance and discuss its possible application in the analysis of content from various domains.


2016 - Optimally solving permutation sorting problems with efficient partial expansion bidirectional heuristic search [Articolo su rivista]
Lippi, Marco; Ernandes, Marco; Felner, Ariel
abstract

In this paper we consider several variants of the problem of sorting integer permutations with a minimum number of moves, a task with many potential applications ranging from computational biology to logistics. Each problem is formulated as a heuristic search problem, where different variants induce different sets of allowed moves within the search tree. Due to the intrinsic nature of this category of problems, which in many cases present a very large branching factor, classic unidirectional heuristic search algorithms such as A∗ and IDA∗ quickly become inefficient or even infeasible as the problem dimension grows. Therefore, more sophisticated algorithms are needed. To this aim, we propose to combine two recent paradigms which have been employed in difficult heuristic search problems showing good performance: enhanced partial expansion (EPE) and efficient single-frontier bidirectional search (eSBS). We propose a new class of algorithms combining the benefits of EPE and eSBS, named efficient Single-frontier Bidirectional Search with Enhanced Partial Expansion (eSBS-EPE). We then present an experimental evaluation that shows that eSBS-EPE is a very effective approach for this family of problems, often outperforming previous methods on large-size instances. With the new eSBS-EPE class of methods we were able to push the limit and solve the largest size instances of some of the problem domains (the pancake and the burnt pancake puzzles). This novel search paradigm hence provides a very promising framework also for other domains.


2016 - Semantic video labeling by developmental visual agents [Articolo su rivista]
Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano
abstract

In the recent years, computer vision has been undergoing a period of great development, testified by the many successful applications that are currently available in a variety of industrial products. Yet, when we come to the most challenging and foundational problem of building autonomous agents capable of performing scene understanding in unrestricted videos, there is still a lot to be done. In this paper we focus on semantic labeling of video streams, in which a set of semantic classes must be predicted for each pixel of the video. We propose to attack the problem from bottom to top, by introducing Developmental Visual Agents (DVAs) as general purpose visual systems that can progressively acquire visual skills from video data and experience, by continuously interacting with the environment and following lifelong learning principles. DVAs gradually develop a hierarchy of architectural stages, from unsupervised feature extraction to the symbolic level, where supervisions are provided by external users, pixel-wise. Differently from classic machine learning algorithms applied to computer vision, which typically employ huge datasets of fully labeled images to perform recognition tasks, DVAs can exploit even a few supervisions per semantic category, by enforcing coherence constraints based on motion estimation. Experiments on different vision tasks, performed on a variety of heterogeneous visual worlds, confirm the great potential of the proposed approach.


2016 - Statistical relational learning for game theory [Articolo su rivista]
Lippi, Marco
abstract

In this paper we motivate the use of models and algorithms from the area of Statistical Relational Learning (SRL) as a framework for the description and the analysis of games. SRL combines the powerful formalism of first-order logic with the capability of probabilistic graphical models in handling uncertainty in data and representing dependencies between random variables: for this reason, SRL models can be effectively used to represent several categories of games, including games with partial information, graphical games and stochastic games. Inference algorithms can be used to approach the opponent modeling problem, as well as to find Nash equilibria or Pareto optimal solutions. Structure learning algorithms can be applied, in order to automatically extract probabilistic logic clauses describing the strategies of an opponent with a high-level, human-interpretable formalism. Experiments conducted using Markov logic networks, one of the most used SRL frameworks, show the potential of the approach.


2015 - Argument mining: A machine learning perspective [Relazione in Atti di Convegno]
Lippi, Marco; Torroni, Paolo
abstract

Argument mining has recently become a hot topic, attracting the interests of several and diverse research communities, ranging from artificial intelligence, to computational linguistics, natural language processing, social and philosophical sciences. In this paper, we attempt to describe the problems and challenges of argument mining from a machine learning angle. In particular, we advocate that machine learning techniques so far have been under-exploited, and that a more proper standardization of the problem, also with regards to the underlying argument model, could provide a crucial element to develop better systems.


2015 - Context-independent claim detection for argument mining [Relazione in Atti di Convegno]
Lippi, Marco; Torroni, Paolo
abstract

Argumentation mining aims to automatically identify structured argument data from unstructured natural language text. This challenging, multifaceted task is recently gaining a growing attention, especially due to its many potential applications. One particularly important aspect of argumentation mining is claim identification. Most of the current approaches are engineered to address specific domains. However, argumentative sentences are often characterized by common rhetorical structures, independently of the domain. We thus propose a method that exploits structured parsing information to detect claims without resorting to contextual information, and yet achieve a performance comparable to that of state-of-the-art methods that heavily rely on the context.


2015 - En plein air visual agents [Relazione in Atti di Convegno]
Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano; Pelillo, Marcello
abstract

Nowadays, machine learning is playing a dominant role in most challenging computer vision problems. This paper advocates an extreme evolution of this interplay, where visual agents continuously process videos and interact with humans, just like children, exploiting life–long learning computational schemes. This opens the challenge of en plein air visual agents, whose behavior is progressively monitored and evaluated by novel mechanisms, where dynamic man-machine interaction plays a fundamental role. Going beyond classic benchmarks, we argue that appropriate crowd-sourcing schemes are suitable for performance evaluation of visual agents operating in this framework. We provide a proof of concept of this novel view, by showing methods and concrete solutions for en plein air visual agents. Crowdsourcing evaluation is reported, along with a life–long experiment on “The Aristocats” cartoon. We expect that the proposed radically new framework will stimulate related approaches and solutions.


2014 - Markov logic networks for optical chemical structure recognition [Articolo su rivista]
Frasconi, Paolo; Gabbrielli, Francesco; Lippi, Marco; Marinai, Simone
abstract

Optical chemical structure recognition is the problem of converting a bitmap image containing a chemical structure formula into a standard structured representation of the molecule. We introduce a novel approach to this problem based on the pipelined integration of pattern recognition techniques with probabilistic knowledge representation and reasoning. Basic entities and relations (such as textual elements, points, lines, etc.) are first extracted by a low-level processing module. A probabilistic reasoning engine based on Markov logic, embodying chemical and graphical knowledge, is subsequently used to refine these pieces of information. An annotated connection table of atoms and bonds is finally assembled and converted into a standard chemical exchange format. We report a successful evaluation on two large image data sets, showing that the method compares favorably with the current state-of-the-art, especially on degraded low-resolution images. The system is available as a web server at http://mlocsr.dinfo.unifi.it. © 2014 American Chemical Society.


2014 - On-line video motion estimation by invariant receptive inputs [Relazione in Atti di Convegno]
Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano
abstract

In this paper, we address the problem of estimating the optical flow in long-term video sequences. We devise a computational scheme that exploits the idea of receptive fields, in which the pixel flow does not only depends on the brightness level of the pixel itself, but also on neighborhood-related information. Our approach relies on the definition of receptive units that are invariant to affine transformations of the input data. This distinguishing characteristic allows us to build a video-receptive-inputs database with arbitrary detail level, that can be used to match local features and to determine their motion. We propose a parallel computational scheme, well suited for nowadays parallel architectures, to exploit motion information and invariant features from real-time video streams, for deep feature extraction, object detection, tracking, and other applications.


2013 - Balancing recall and precision in stock market predictors using support vector machines [Relazione in Atti di Convegno]
Lippi, Marco; Menconi, Lorenzo; Gori, Marco
abstract

Computational finance is one of the fields where machine learning and data mining have found in recent years a large application. Neverthless, there are still many open issues regarding the predictability of the stock market, and the possibility to build an automatic intelligent trader able to make forecasts on stock prices, and to develop a profitable trading strategy. In this paper, we propose an automatic trading strategy based on support vector machines, which employs recall-precision curves in order to allow a buying action for the trader only when the confidence of the prediction is high. We present an extensive experimental evaluation which compares our trader with several classic competitors. © Springer-Verlag Berlin Heidelberg 2013.


2013 - Information-based learning of deep architectures for feature extraction [Relazione in Atti di Convegno]
Melacci, Stefano; Lippi, Marco; Gori, Marco; Maggini, Marco
abstract

Feature extraction is a crucial phase in complex computer vision systems. Mainly two different approaches have been proposed so far. A quite common solution is the design of appropriate filters and features based on image processing techniques, such as the SIFT descriptors. On the other hand, machine learning techniques can be applied, relying on their capabilities to automatically develop optimal processing schemes from a significant set of training examples. Recently, deep neural networks and convolutional neural networks have been shown to yield promising results in many computer vision tasks, such as object detection and recognition. This paper introduces a new computer vision deep architecture model for the hierarchical extraction of pixel-based features, that naturally embed scale and rotation invariances. Hence, the proposed feature extraction process combines the two mentioned approaches, by merging design criteria derived from image processing tools with a learning algorithm able to extract structured feature representations from data. In particular, the learning algorithm is based on information-theoretic principles and it is able to develop invariant features from unsupervised examples. Preliminary experimental results on image classification support this new challenging research direction, when compared with other deep architectures models. © 2013 Springer-Verlag.


2013 - On-line laplacian one-class support vector machines [Relazione in Atti di Convegno]
Frandina, Salvatore; Lippi, Marco; Maggini, Marco; Melacci, Stefano
abstract

We propose a manifold regularization algorithm designed to work in an on-line scenario where data arrive continuously over time and it is not feasible to completely store the data stream for training the classifier in batch mode. The On-line Laplacian One-Class SVM (OLapOCSVM) algorithm exploits both positively labeled and totally unlabeled examples, updating the classifier hypothesis as new data becomes available. The learning procedure is based on conjugate gradient descent in the primal formulation of the SVM. The on-line algorithm uses an efficient buffering technique to deal with the continuous incoming data. In particular, we define a buffering policy that is based on the current estimate of the support of the input data distribution. The experimental results on real-world data show that OLapOCSVM compares favorably with the corresponding batch algorithms, while making it possible to be applied in generic on-line scenarios with limited memory requirements. © 2013 Springer-Verlag Berlin Heidelberg.


2013 - Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning [Articolo su rivista]
Lippi, Marco; Bertini, Matteo; Frasconi, Paolo
abstract

The literature on short-term traffic flow forecasting has undergone great development recently. Many works, describing a wide variety of different approaches, which very often share similar features and ideas, have been published. However, publications presenting new prediction algorithms usually employ different settings, data sets, and performance measurements, making it difficult to infer a clear picture of the advantages and limitations of each model. The aim of this paper is twofold. First, we review existing approaches to short-term traffic flow forecasting methods under the common view of probabilistic graphical models, presenting an extensive experimental comparison, which proposes a common baseline for their performance analysis and provides the infrastructure to operate on a publicly available data set. Second, we present two new support vector regression models, which are specifically devised to benefit from typical traffic flow seasonality and are shown to represent an interesting compromise between prediction accuracy and computational efficiency. The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods. © 2011 IEEE.


2013 - Type Extension Trees for feature construction and learning in relational domains [Articolo su rivista]
Jaeger, Manfred; Lippi, Marco; Passerini, Andrea; Frasconi, Paolo
abstract

Type Extension Trees are a powerful representation language for "count-of-count" features characterizing the combinatorial structure of neighborhoods of entities in relational domains. In this paper we present a learning algorithm for Type Extension Trees (TET) that discovers informative count-of-count features in the supervised learning setting. Experiments on bibliographic data show that TET-learning is able to discover the count-of-count feature underlying the definition of the h-index, and the inverse document frequency feature commonly used in information retrieval. We also introduce a metric on TET feature values. This metric is defined as a recursive application of the Wasserstein-Kantorovich metric. Experiments with a k-NN classifier show that exploiting the recursive count-of-count statistics encoded in TET values improves classification accuracy over alternative methods based on simple count statistics. © 2013 Elsevier B.V.


2013 - Variational foundations of online backpropagation [Relazione in Atti di Convegno]
Frandina, Salvatore; Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano
abstract

On-line Backpropagation has become very popular and it has been the subject of in-depth theoretical analyses and massive experimentation. Yet, after almost three decades from its publication, it is still surprisingly the source of tough theoretical questions and of experimental results that are somewhat shrouded in mystery. Although seriously plagued by local minima, the batch-mode version of the algorithm is clearly posed as an optimization problem while, in spite of its effectiveness, in many real-world problems the on-line mode version has not been given a clean formulation, yet. Using variational arguments, in this paper, the on-line formulation is proposed as the minimization of a classic functional that is inspired by the principle of minimal action in analytic mechanics. The proposed approach clashes sharply with common interpretations of on-line learning as an approximation of batch-mode, and it suggests that processing data all at once might be just an artificial formulation of learning that is hopeless in difficult real-world problems. © 2013 Springer-Verlag Berlin Heidelberg.


2012 - Efficient single frontier bidirectional search [Relazione in Atti di Convegno]
Lippi, Marco; Ernandes, Marco; Felner, Ariel
abstract

The Single Frontier Bi-Directional Search (SBS) framework was recently introduced. A node in SBS corresponds to a pair of states, one from each of the frontiers and it uses front-tofront heuristics. In this paper we present an enhanced version of SBS, called eSBS, where pruning and caching techniques are applied, which significantly reduce both time and memory needs of SBS. We then present a hybrid of eSBS and IDA* which potentially uses only the square root of the memory required by A* but enables to prune many nodes that IDA* would generate. Experimental results show the benefit of our new approaches on a number of domains. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.


2012 - Information theoretic learning for pixel-based visual agents [Relazione in Atti di Convegno]
Gori, Marco; Melacci, Stefano; Lippi, Marco; Maggini, Marco
abstract

In this paper we promote the idea of using pixel-based models not only for low level vision, but also to extract high level symbolic representations. We use a deep architecture which has the distinctive property of relying on computational units that incorporate classic computer vision invariances and, especially, the scale invariance. The learning algorithm that is proposed, which is based on information theory principles, develops the parameters of the computational units and, at the same time, makes it possible to detect the optimal scale for each pixel. We give experimental evidence of the mechanism of feature extraction at the first level of the hierarchy, which is very much related to SIFT-like features. The comparison shows clearly that, whenever we can rely on the massive availability of training data, the proposed model leads to better performances with respect to SIFT. © 2012 Springer-Verlag.


2012 - Metal binding in proteins: Machine learning complements X-ray absorption spectroscopy [Relazione in Atti di Convegno]
Lippi, Marco; Passerini, Andrea; Punta, Marco; Frasconi, Paolo
abstract

We present an application of machine learning algorithms for the identification of metalloproteins and metal binding sites on a genome scale. An extensive evaluation conducted in combination with X-ray absorption spectroscopy shows the great potentiality of the approach. © 2012 Springer-Verlag.


2012 - Predicting metal-binding sites from protein sequence [Articolo su rivista]
Passerini, Andrea; Lippi, Marco; Frasconi, Paolo
abstract

Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds. © 2011 IEEE.


2011 - Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy [Articolo su rivista]
Shi, Wuxian; Punta, Marco; Bohon, Jen; Sauder, J. Michael; D'Mello, Rhijuta; Sullivan, Mike; Toomey, John; Abel, Don; Lippi, Marco; Passerini, Andrea; Frasconi, Paolo; Burley, Stephen K.; Rost, Burkhard; Chance, Mark R.
abstract

High-throughput X-ray absorption spectroscopy was used to measure transition metal content based on quantitative detection of X-ray fluorescence signals for 3879 purified proteins from several hundred different protein families generated by the New York SGX Research Center for Structural Genomics. Approximately 9% of the proteins analyzed showed the presence of transition metal atoms (Zn, Cu, Ni, Co, Fe, or Mn) in stoichiometric amounts. The method is highly automated and highly reliable based on comparison of the results to crystal structure data derived from the same protein set. To leverage the experimental metalloprotein annotations, we used a sequence-based de novo prediction method, MetalDetector, to identify Cys and His residues that bind to transition metals for the redundancy reduced subset of 2411 sequences sharing <70% sequence identity and having at least one His or Cys. As the HT-XAS identifies metal type and protein binding, while the bioinformatics analysis identifies metal- binding residues, the results were combined to identify putative metal-binding sites in the proteins and their associated families. We explored the combination of this data with homology models to generate detailed structure models of metal-binding sites for representative proteins. Finally, we used extended X-ray absorption fine structure data from two of the purified Zn metalloproteins to validate predicted metalloprotein binding site structures. This combination of experimental and bioinformatics approaches provides comprehensive active site analysis on the genome scale for metalloproteins as a class, revealing new insights into metalloprotein structure and function. © 2011 by Cold Spring Harbor Laboratory Press.


2011 - MetalDetector v2.0: Predicting the geometry of metal binding sites from protein sequence [Articolo su rivista]
Passerini, Andrea; Lippi, Marco; Frasconi, Paolo
abstract

MetalDetector identifies CYS and HIS involved in transition metal protein binding sites, starting from sequence alone. A major new feature of release 2.0 is the ability to predict which residues are jointly involved in the coordination of the same metal ion. The server is available at http://metaldetector.dsi.unifi.it/v2.0/. © 2011 The Author(s).


2011 - Relational information gain [Articolo su rivista]
Lippi, Marco; Jaeger, Manfred; Frasconi, Paolo; Passerini, Andrea
abstract

We introduce relational information gain, a refinement scoring function measuring the informativeness of newly introduced variables. The gain can be interpreted as a conditional entropy in a well-defined sense and can be efficiently approximately computed. In conjunction with simple greedy general-to-specific search algorithms such as FOIL, it yields an efficient and competitive algorithm in terms of predictive accuracy and compactness of the learned theory. In conjunction with the decision tree learner TILDE, it offers a beneficial alternative to lookahead, achieving similar performance while significantly reducing the number of evaluated literals. © The Author(s) 2010.


2010 - Collective traffic forecasting [Relazione in Atti di Convegno]
Lippi, Marco; Bertini, Matteo; Frasconi, Paolo
abstract

Traffic forecasting has recently become a crucial task in the area of intelligent transportation systems, and in particular in the development of traffic management and control. We focus on the simultaneous prediction of the congestion state at multiple lead times and at multiple nodes of a transport network, given historical and recent information. This is a highly relational task along the spatial and the temporal dimensions and we advocate the application of statistical relational learning techniques. We formulate the task in the supervised learning from interpretations setting and use Markov logic networks with grounding-specific weights to perform collective classification. Experimental results on data obtained from the California Freeway Performance Measurement System (PeMS) show the advantages of the proposed solution, with respect to propositional classifiers. In particular, we obtained significant performance improvement at larger time leads. © 2010 Springer-Verlag Berlin Heidelberg.


2009 - Prediction of protein β-residue contacts by Markov logic networks with grounding-specific weights [Articolo su rivista]
Lippi, Marco; Frasconi, Paolo
abstract

Motivation: Accurate prediction of contacts between β-strand residues can significantly contribute towards ab initio prediction of the 3D structure of many proteins. Contacts in the same protein are highly interdependent. Therefore, significant improvements can be expected by applying statistical relational learners that overcome the usual machine learning assumption that examples are independent and identically distributed. Furthermore, the dependencies among β-residue contacts are subject to strong regularities, many of which are known a priori. In this article, we take advantage of Markov logic, a statistical relational learning framework that is able to capture dependencies between contacts, and constrain the solution according to domain knowledge expressed by means of weighted rules in a logical language. Results: We introduce a novel hybrid architecture based on neural and Markov logic networks with grounding-specific weights. On a non-redundant dataset, our method achieves 44.9% F1 measure, with 47.3% precision and 42.7% recall, which is significantly better (P &lt; 0.01) than previously reported performance obtained by 2D recursive neural networks. Our approach also significantly improves the number of chains for which β-strands are nearly perfectly paired (36% of the chains are predicted with F1 ≥ 70% on coarse map). It also outperforms more general contact predictors on recent CASP 2008 targets. © The Author 2009. Published by Oxford University Press. All rights reserved.


2008 - A semiparametric generative model for efficient structured-output supervised learning [Articolo su rivista]
Costa, Fabrizio; Passerini, Andrea; Lippi, Marco; Frasconi, Paolo
abstract

We present a semiparametric generative model for supervised learning with structured outputs. The main algorithmic idea is to replace the parameters of an underlying generative model (such as a stochastic grammars) with input-dependent predictions obtained by (kernel) logistic regression. This method avoids the computational burden associated with the comparison between target and predicted structure during the training phase, but requires as an additional input a vector of sufficient statistics for each training example. The resulting training algorithm is asymptotically more efficient than structured output SVM as the size of the output structure grows. At the same time, by computing parameters of a joint distribution as a function of the full input structure, typical expressiveness limitations of related conditional models (such as maximum entropy Markov models) can be potentially avoided. Empirical results on artificial and real data (in the domains of natural language parsing and RNA secondary structure prediction) show that the method works well in practice and scales up with the size of the output structures. © Springer Science+Business Media B.V. 2009.


2008 - MetalDetector: A web server for predicting metal-binding sites and disulfide bridges in proteins from sequence [Articolo su rivista]
Lippi, Marco; Passerini, Andrea; Punta, Marco; Rost, Burkhard; Frasconi, Paolo
abstract

The web server MetalDetector classifies histidine residues in proteins into one of two states (free or metal bound) and cysteines into one of three states (free, metal bound or disulfide bridged). A decision tree integrates predictions from two previously developed methods (DISULFIND and Metal Ligand Predictor). Cross-validated performance assessment indicates that our server predicts disulfide bonding state at 88.6% precision and 85.1% recall, while it identifies cysteines and histidines in transition metal-binding sites at 79.9% precision and 76.8% recall, and at 60.8% precision and 40.7% recall, respectively. © The Author 2008. Published by Oxford University Press. All rights reserved.