Foto personale

Pagina personale di MARCO LIPPI

Dipartimento di Scienze e Metodi dell'Ingegneria

Lippi, Marco ( 2017 ) - Reasoning with deep learning: An open challenge ( 2016 AI*IA Workshop on Deep Understanding and Reasoning: A Challenge for Next-Generation Intelligent Agents, URANIA 2016 - Genova - 2016) ( - CEUR Workshop Proceedings ) (CEUR-WS ) - CEUR WORKSHOP PROCEEDINGS - n. volume 1802 - pp. da 38 a 43 ISSN: 1613-0073 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Building machines capable of performing automated reasoning is one of the most complex but fascinating challenges in AI. In particular, providing an effective integration of learning and reasoning mechanisms is a long-standing research problem at the intersection of many different areas, such as machine learning, cognitive neuroscience, psychology, linguistic, and logic. The recent breakthrough achieved by deep learning methods in a variety of AI-related domains has opened novel research lines attempting to solve this complex and challenging task.

Lippi, Marco; Torroni, Paolo ( 2016 ) - Argumentation mining: State of the art and emerging trends - ACM TRANSACTIONS ON INTERNET TECHNOLOGY - n. volume 16 - pp. da 1 a 25 ISSN: 1533-5399 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Argumentation mining aims at automatically extracting structured arguments from unstructured textual documents. It has recently become a hot topic also due to its potential in processing information originating from the Web, and in particular from social media, in innovative ways. Recent advances in machine learning methods promise to enable breakthrough applications to social and economic sciences, policy making, and information technology: something that only a few years ago was unthinkable. In this survey article, we introduce argumentation models and methods, review existing systems and applications, and discuss challenges and perspectives of this exciting new research area.

Lippi, Marco; Torroni, Paolo ( 2016 ) - MARGOT: A web server for argumentation mining - EXPERT SYSTEMS WITH APPLICATIONS - n. volume 65 - pp. da 292 a 303 ISSN: 0957-4174 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Argumentation mining is a recent challenge concerning the automatic extraction of arguments from unstructured textual corpora. Argumentation mining technologies are rapidly evolving and show a clear potential for application in diverse areas such as recommender systems, policy-making and the legal domain. There is a long-recognised need for tools that enable users to browse, visualise, search, and manipulate arguments and argument structures. There is, however, a lack of widely accessible tools. In this article we describe the technology behind MARGOT, the first online argumentation mining system designed to reach out to the wider community of potential users of these new technologies. We evaluate its performance and discuss its possible application in the analysis of content from various domains.

Lippi, Marco; Ernandes, Marco; Felner, Ariel ( 2016 ) - Optimally solving permutation sorting problems with efficient partial expansion bidirectional heuristic search - AI COMMUNICATIONS - n. volume 29 - pp. da 513 a 536 ISSN: 0921-7126 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we consider several variants of the problem of sorting integer permutations with a minimum number of moves, a task with many potential applications ranging from computational biology to logistics. Each problem is formulated as a heuristic search problem, where different variants induce different sets of allowed moves within the search tree. Due to the intrinsic nature of this category of problems, which in many cases present a very large branching factor, classic unidirectional heuristic search algorithms such as A∗ and IDA∗ quickly become inefficient or even infeasible as the problem dimension grows. Therefore, more sophisticated algorithms are needed. To this aim, we propose to combine two recent paradigms which have been employed in difficult heuristic search problems showing good performance: enhanced partial expansion (EPE) and efficient single-frontier bidirectional search (eSBS). We propose a new class of algorithms combining the benefits of EPE and eSBS, named efficient Single-frontier Bidirectional Search with Enhanced Partial Expansion (eSBS-EPE). We then present an experimental evaluation that shows that eSBS-EPE is a very effective approach for this family of problems, often outperforming previous methods on large-size instances. With the new eSBS-EPE class of methods we were able to push the limit and solve the largest size instances of some of the problem domains (the pancake and the burnt pancake puzzles). This novel search paradigm hence provides a very promising framework also for other domains.

Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano ( 2016 ) - Semantic video labeling by developmental visual agents - COMPUTER VISION AND IMAGE UNDERSTANDING - n. volume 146 - pp. da 9 a 26 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In the recent years, computer vision has been undergoing a period of great development, testified by the many successful applications that are currently available in a variety of industrial products. Yet, when we come to the most challenging and foundational problem of building autonomous agents capable of performing scene understanding in unrestricted videos, there is still a lot to be done. In this paper we focus on semantic labeling of video streams, in which a set of semantic classes must be predicted for each pixel of the video. We propose to attack the problem from bottom to top, by introducing Developmental Visual Agents (DVAs) as general purpose visual systems that can progressively acquire visual skills from video data and experience, by continuously interacting with the environment and following lifelong learning principles. DVAs gradually develop a hierarchy of architectural stages, from unsupervised feature extraction to the symbolic level, where supervisions are provided by external users, pixel-wise. Differently from classic machine learning algorithms applied to computer vision, which typically employ huge datasets of fully labeled images to perform recognition tasks, DVAs can exploit even a few supervisions per semantic category, by enforcing coherence constraints based on motion estimation. Experiments on different vision tasks, performed on a variety of heterogeneous visual worlds, confirm the great potential of the proposed approach.

Lippi, Marco ( 2016 ) - Statistical relational learning for game theory - IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES - n. volume 8 - pp. da 412 a 425 ISSN: 1943-068X [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we motivate the use of models and algorithms from the area of Statistical Relational Learning (SRL) as a framework for the description and the analysis of games. SRL combines the powerful formalism of first-order logic with the capability of probabilistic graphical models in handling uncertainty in data and representing dependencies between random variables: for this reason, SRL models can be effectively used to represent several categories of games, including games with partial information, graphical games and stochastic games. Inference algorithms can be used to approach the opponent modeling problem, as well as to find Nash equilibria or Pareto optimal solutions. Structure learning algorithms can be applied, in order to automatically extract probabilistic logic clauses describing the strategies of an opponent with a high-level, human-interpretable formalism. Experiments conducted using Markov logic networks, one of the most used SRL frameworks, show the potential of the approach.

Lippi, Marco; Torroni, Paolo ( 2015 ) - Argument mining: A machine learning perspective ( 3rd International Workshop on Theory and Applications of Formal Argumentation, TAFA 2015 - Buenos Aires; Argentina - July 25-26, 2015) ( - Theory and Applications of Formal Argumentation ) (Springer Verlag Heidelberg DEU ) - n. volume 9524 - pp. da 163 a 176 ISBN: 9783319284590; 9783319284590 | 9783319284590 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Argument mining has recently become a hot topic, attracting the interests of several and diverse research communities, ranging from artificial intelligence, to computational linguistics, natural language processing, social and philosophical sciences. In this paper, we attempt to describe the problems and challenges of argument mining from a machine learning angle. In particular, we advocate that machine learning techniques so far have been under-exploited, and that a more proper standardization of the problem, also with regards to the underlying argument model, could provide a crucial element to develop better systems.

Lippi, Marco; Torroni, Paolo ( 2015 ) - Context-independent claim detection for argument mining ( 24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina - 25 July 2015 through 31 July 2015) ( - Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence: Buenos Aires, Argentina, 25–31 July 2015 ) (AAAI Press; International Joint Conferences on Artificial Intelligence Palo Alto (CA) USA ) - IJCAI - n. volume 2015 - pp. da 185 a 191 ISBN: 9781577357384 ISSN: 1045-0823 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Argumentation mining aims to automatically identify structured argument data from unstructured natural language text. This challenging, multifaceted task is recently gaining a growing attention, especially due to its many potential applications. One particularly important aspect of argumentation mining is claim identification. Most of the current approaches are engineered to address specific domains. However, argumentative sentences are often characterized by common rhetorical structures, independently of the domain. We thus propose a method that exploits structured parsing information to detect claims without resorting to contextual information, and yet achieve a performance comparable to that of state-of-the-art methods that heavily rely on the context.

Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano; Pelillo, Marcello ( 2015 ) - En plein air visual agents ( 18th International Conference on Image Analysis and Processing, ICIAP 2015 - Genoa; Italy - September 7-11, 2015) ( - Image Analysis and Processing — ICIAP 2015: 18th International Conference, Genoa, Italy, September 7-11, 2015, Proceedings, Part II ) (Springer Verlag Heidelberg DEU ) - n. volume 9280 - pp. da 697 a 709 ISBN: 9783319232331; 9783319232331 | 9783319232331 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Nowadays, machine learning is playing a dominant role in most challenging computer vision problems. This paper advocates an extreme evolution of this interplay, where visual agents continuously process videos and interact with humans, just like children, exploiting life–long learning computational schemes. This opens the challenge of en plein air visual agents, whose behavior is progressively monitored and evaluated by novel mechanisms, where dynamic man-machine interaction plays a fundamental role. Going beyond classic benchmarks, we argue that appropriate crowd-sourcing schemes are suitable for performance evaluation of visual agents operating in this framework. We provide a proof of concept of this novel view, by showing methods and concrete solutions for en plein air visual agents. Crowdsourcing evaluation is reported, along with a life–long experiment on “The Aristocats” cartoon. We expect that the proposed radically new framework will stimulate related approaches and solutions.

Frasconi, Paolo; Gabbrielli, Francesco; Lippi, Marco; Marinai, Simone ( 2014 ) - Markov logic networks for optical chemical structure recognition - JOURNAL OF CHEMICAL INFORMATION AND MODELING - n. volume 54 - pp. da 2380 a 2390 ISSN: 1549-9596 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Optical chemical structure recognition is the problem of converting a bitmap image containing a chemical structure formula into a standard structured representation of the molecule. We introduce a novel approach to this problem based on the pipelined integration of pattern recognition techniques with probabilistic knowledge representation and reasoning. Basic entities and relations (such as textual elements, points, lines, etc.) are first extracted by a low-level processing module. A probabilistic reasoning engine based on Markov logic, embodying chemical and graphical knowledge, is subsequently used to refine these pieces of information. An annotated connection table of atoms and bonds is finally assembled and converted into a standard chemical exchange format. We report a successful evaluation on two large image data sets, showing that the method compares favorably with the current state-of-the-art, especially on degraded low-resolution images. The system is available as a web server at http://mlocsr.dinfo.unifi.it. © 2014 American Chemical Society.

Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano ( 2014 ) - On-line video motion estimation by invariant receptive inputs ( 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014 - usa - 2014) ( - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops ) (IEEE Computer Society ) - pp. da 726 a 731 ISBN: 9781479943098; 9781479943098 | 9781479943098 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we address the problem of estimating the optical flow in long-term video sequences. We devise a computational scheme that exploits the idea of receptive fields, in which the pixel flow does not only depends on the brightness level of the pixel itself, but also on neighborhood-related information. Our approach relies on the definition of receptive units that are invariant to affine transformations of the input data. This distinguishing characteristic allows us to build a video-receptive-inputs database with arbitrary detail level, that can be used to match local features and to determine their motion. We propose a parallel computational scheme, well suited for nowadays parallel architectures, to exploit motion information and invariant features from real-time video streams, for deep feature extraction, object detection, tracking, and other applications.

Lippi, Marco; Menconi, Lorenzo; Gori, Marco ( 2013 ) - Balancing recall and precision in stock market predictors using support vector machines ( 22nd Italian Workshop on Neural Nets (WIRN) 2012 - Vietri sul Mare, Napoli - 17-19 May, 2012) ( - Smart Innovation, Systems and Technologies ) - n. volume 19 - pp. da 51 a 58 ISBN: 9783642354663; 9783642354663 | 9783642354663 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Computational finance is one of the fields where machine learning and data mining have found in recent years a large application. Neverthless, there are still many open issues regarding the predictability of the stock market, and the possibility to build an automatic intelligent trader able to make forecasts on stock prices, and to develop a profitable trading strategy. In this paper, we propose an automatic trading strategy based on support vector machines, which employs recall-precision curves in order to allow a buying action for the trader only when the confidence of the prediction is high. We present an extensive experimental evaluation which compares our trader with several classic competitors. © Springer-Verlag Berlin Heidelberg 2013.

Melacci, Stefano; Lippi, Marco; Gori, Marco; Maggini, Marco ( 2013 ) - Information-based learning of deep architectures for feature extraction ( 17th International Conference on Image Analysis and Processing, ICIAP 2013 - Naples, ita - 2013) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 8157 - pp. da 101 a 110 ISBN: 9783642411830; 9783642411830 | 9783642411830 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Feature extraction is a crucial phase in complex computer vision systems. Mainly two different approaches have been proposed so far. A quite common solution is the design of appropriate filters and features based on image processing techniques, such as the SIFT descriptors. On the other hand, machine learning techniques can be applied, relying on their capabilities to automatically develop optimal processing schemes from a significant set of training examples. Recently, deep neural networks and convolutional neural networks have been shown to yield promising results in many computer vision tasks, such as object detection and recognition. This paper introduces a new computer vision deep architecture model for the hierarchical extraction of pixel-based features, that naturally embed scale and rotation invariances. Hence, the proposed feature extraction process combines the two mentioned approaches, by merging design criteria derived from image processing tools with a learning algorithm able to extract structured feature representations from data. In particular, the learning algorithm is based on information-theoretic principles and it is able to develop invariant features from unsupervised examples. Preliminary experimental results on image classification support this new challenging research direction, when compared with other deep architectures models. © 2013 Springer-Verlag.

Frandina, Salvatore; Lippi, Marco; Maggini, Marco; Melacci, Stefano ( 2013 ) - On-line laplacian one-class support vector machines ( 23rd International Conference on Artificial Neural Networks, ICANN 2013 - Sofia, bgr - 2013) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 8131 - pp. da 186 a 193 ISBN: 9783642407277; 9783642407277 | 9783642407277 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We propose a manifold regularization algorithm designed to work in an on-line scenario where data arrive continuously over time and it is not feasible to completely store the data stream for training the classifier in batch mode. The On-line Laplacian One-Class SVM (OLapOCSVM) algorithm exploits both positively labeled and totally unlabeled examples, updating the classifier hypothesis as new data becomes available. The learning procedure is based on conjugate gradient descent in the primal formulation of the SVM. The on-line algorithm uses an efficient buffering technique to deal with the continuous incoming data. In particular, we define a buffering policy that is based on the current estimate of the support of the input data distribution. The experimental results on real-world data show that OLapOCSVM compares favorably with the corresponding batch algorithms, while making it possible to be applied in generic on-line scenarios with limited memory requirements. © 2013 Springer-Verlag Berlin Heidelberg.

Lippi, Marco; Bertini, Matteo; Frasconi, Paolo ( 2013 ) - Short-term traffic flow forecasting: An experimental comparison of time-series analysis and supervised learning - IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS - n. volume 14 - pp. da 871 a 882 ISSN: 1524-9050 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The literature on short-term traffic flow forecasting has undergone great development recently. Many works, describing a wide variety of different approaches, which very often share similar features and ideas, have been published. However, publications presenting new prediction algorithms usually employ different settings, data sets, and performance measurements, making it difficult to infer a clear picture of the advantages and limitations of each model. The aim of this paper is twofold. First, we review existing approaches to short-term traffic flow forecasting methods under the common view of probabilistic graphical models, presenting an extensive experimental comparison, which proposes a common baseline for their performance analysis and provides the infrastructure to operate on a publicly available data set. Second, we present two new support vector regression models, which are specifically devised to benefit from typical traffic flow seasonality and are shown to represent an interesting compromise between prediction accuracy and computational efficiency. The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods. © 2011 IEEE.

Jaeger, Manfred; Lippi, Marco; Passerini, Andrea; Frasconi, Paolo ( 2013 ) - Type Extension Trees for feature construction and learning in relational domains - ARTIFICIAL INTELLIGENCE - n. volume 204 - pp. da 30 a 55 ISSN: 0004-3702 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Type Extension Trees are a powerful representation language for "count-of-count" features characterizing the combinatorial structure of neighborhoods of entities in relational domains. In this paper we present a learning algorithm for Type Extension Trees (TET) that discovers informative count-of-count features in the supervised learning setting. Experiments on bibliographic data show that TET-learning is able to discover the count-of-count feature underlying the definition of the h-index, and the inverse document frequency feature commonly used in information retrieval. We also introduce a metric on TET feature values. This metric is defined as a recursive application of the Wasserstein-Kantorovich metric. Experiments with a k-NN classifier show that exploiting the recursive count-of-count statistics encoded in TET values improves classification accuracy over alternative methods based on simple count statistics. © 2013 Elsevier B.V.

Frandina, Salvatore; Gori, Marco; Lippi, Marco; Maggini, Marco; Melacci, Stefano ( 2013 ) - Variational foundations of online backpropagation ( 23rd International Conference on Artificial Neural Networks, ICANN 2013 - Sofia, bgr - 2013) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 8131 - pp. da 82 a 89 ISBN: 9783642407277; 9783642407277 | 9783642407277 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

On-line Backpropagation has become very popular and it has been the subject of in-depth theoretical analyses and massive experimentation. Yet, after almost three decades from its publication, it is still surprisingly the source of tough theoretical questions and of experimental results that are somewhat shrouded in mystery. Although seriously plagued by local minima, the batch-mode version of the algorithm is clearly posed as an optimization problem while, in spite of its effectiveness, in many real-world problems the on-line mode version has not been given a clean formulation, yet. Using variational arguments, in this paper, the on-line formulation is proposed as the minimization of a classic functional that is inspired by the principle of minimal action in analytic mechanics. The proposed approach clashes sharply with common interpretations of on-line learning as an approximation of batch-mode, and it suggests that processing data all at once might be just an artificial formulation of learning that is hopeless in difficult real-world problems. © 2013 Springer-Verlag Berlin Heidelberg.

Lippi, Marco; Ernandes, Marco; Felner, Ariel ( 2012 ) - Efficient single frontier bidirectional search ( 5th International Symposium on Combinatorial Search, SoCS 2012 - Niagara Falls, ON, can - 2012) ( - Proceedings of the 5th Annual Symposium on Combinatorial Search, SoCS 2012 ) - pp. da 49 a 56 ISBN: 9781577355847; 9781577355847 | 9781577355847 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The Single Frontier Bi-Directional Search (SBS) framework was recently introduced. A node in SBS corresponds to a pair of states, one from each of the frontiers and it uses front-tofront heuristics. In this paper we present an enhanced version of SBS, called eSBS, where pruning and caching techniques are applied, which significantly reduce both time and memory needs of SBS. We then present a hybrid of eSBS and IDA* which potentially uses only the square root of the memory required by A* but enables to prune many nodes that IDA* would generate. Experimental results show the benefit of our new approaches on a number of domains. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Gori, Marco; Melacci, Stefano; Lippi, Marco; Maggini, Marco ( 2012 ) - Information theoretic learning for pixel-based visual agents ( 12th European Conference on Computer Vision, ECCV 2012 - Florence, ita - 2012) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 7577 - pp. da 864 a 875 ISBN: 9783642337826; 9783642337826 | 9783642337826 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we promote the idea of using pixel-based models not only for low level vision, but also to extract high level symbolic representations. We use a deep architecture which has the distinctive property of relying on computational units that incorporate classic computer vision invariances and, especially, the scale invariance. The learning algorithm that is proposed, which is based on information theory principles, develops the parameters of the computational units and, at the same time, makes it possible to detect the optimal scale for each pixel. We give experimental evidence of the mechanism of feature extraction at the first level of the hierarchy, which is very much related to SIFT-like features. The comparison shows clearly that, whenever we can rely on the massive availability of training data, the proposed model leads to better performances with respect to SIFT. © 2012 Springer-Verlag.

Lippi, Marco; Passerini, Andrea; Punta, Marco; Frasconi, Paolo ( 2012 ) - Metal binding in proteins: Machine learning complements X-ray absorption spectroscopy ( 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2012 - Bristol, gbr - 2012) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 7524 - pp. da 854 a 857 ISBN: 9783642334856; 9783642334856 | 9783642334856 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We present an application of machine learning algorithms for the identification of metalloproteins and metal binding sites on a genome scale. An extensive evaluation conducted in combination with X-ray absorption spectroscopy shows the great potentiality of the approach. © 2012 Springer-Verlag.

Passerini, Andrea; Lippi, Marco; Frasconi, Paolo ( 2012 ) - Predicting metal-binding sites from protein sequence - IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS - n. volume 9 - pp. da 203 a 213 ISSN: 1545-5963 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds. © 2011 IEEE.

Shi, Wuxian; Punta, Marco; Bohon, Jen; Sauder, J. Michael; D'Mello, Rhijuta; Sullivan, Mike; Toomey, John; Abel, Don; Lippi, Marco; Passerini, Andrea; Frasconi, Paolo; Burley, Stephen K.; Rost, Burkhard; Chance, Mark R. ( 2011 ) - Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy - GENOME RESEARCH - n. volume 21 - pp. da 898 a 907 ISSN: 1088-9051 [Articolo in rivista (262) - Articolo su rivista]
Abstract

High-throughput X-ray absorption spectroscopy was used to measure transition metal content based on quantitative detection of X-ray fluorescence signals for 3879 purified proteins from several hundred different protein families generated by the New York SGX Research Center for Structural Genomics. Approximately 9% of the proteins analyzed showed the presence of transition metal atoms (Zn, Cu, Ni, Co, Fe, or Mn) in stoichiometric amounts. The method is highly automated and highly reliable based on comparison of the results to crystal structure data derived from the same protein set. To leverage the experimental metalloprotein annotations, we used a sequence-based de novo prediction method, MetalDetector, to identify Cys and His residues that bind to transition metals for the redundancy reduced subset of 2411 sequences sharing <70% sequence identity and having at least one His or Cys. As the HT-XAS identifies metal type and protein binding, while the bioinformatics analysis identifies metal- binding residues, the results were combined to identify putative metal-binding sites in the proteins and their associated families. We explored the combination of this data with homology models to generate detailed structure models of metal-binding sites for representative proteins. Finally, we used extended X-ray absorption fine structure data from two of the purified Zn metalloproteins to validate predicted metalloprotein binding site structures. This combination of experimental and bioinformatics approaches provides comprehensive active site analysis on the genome scale for metalloproteins as a class, revealing new insights into metalloprotein structure and function. © 2011 by Cold Spring Harbor Laboratory Press.

Passerini, Andrea; Lippi, Marco; Frasconi, Paolo ( 2011 ) - MetalDetector v2.0: Predicting the geometry of metal binding sites from protein sequence - NUCLEIC ACIDS RESEARCH - n. volume 39 [Articolo in rivista (262) - Articolo su rivista]
Abstract

MetalDetector identifies CYS and HIS involved in transition metal protein binding sites, starting from sequence alone. A major new feature of release 2.0 is the ability to predict which residues are jointly involved in the coordination of the same metal ion. The server is available at http://metaldetector.dsi.unifi.it/v2.0/. © 2011 The Author(s).

Lippi, Marco; Jaeger, Manfred; Frasconi, Paolo; Passerini, Andrea ( 2011 ) - Relational information gain - MACHINE LEARNING - n. volume 83 - pp. da 219 a 239 ISSN: 0885-6125 [Articolo in rivista (262) - Articolo su rivista]
Abstract

We introduce relational information gain, a refinement scoring function measuring the informativeness of newly introduced variables. The gain can be interpreted as a conditional entropy in a well-defined sense and can be efficiently approximately computed. In conjunction with simple greedy general-to-specific search algorithms such as FOIL, it yields an efficient and competitive algorithm in terms of predictive accuracy and compactness of the learned theory. In conjunction with the decision tree learner TILDE, it offers a beneficial alternative to lookahead, achieving similar performance while significantly reducing the number of evaluated literals. © The Author(s) 2010.

Lippi, Marco; Bertini, Matteo; Frasconi, Paolo ( 2010 ) - Collective traffic forecasting ( European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010 - Barcelona, esp - 2010) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 6322 - pp. da 259 a 273 ISBN: 364215882X; 364215882X | 364215882X [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Traffic forecasting has recently become a crucial task in the area of intelligent transportation systems, and in particular in the development of traffic management and control. We focus on the simultaneous prediction of the congestion state at multiple lead times and at multiple nodes of a transport network, given historical and recent information. This is a highly relational task along the spatial and the temporal dimensions and we advocate the application of statistical relational learning techniques. We formulate the task in the supervised learning from interpretations setting and use Markov logic networks with grounding-specific weights to perform collective classification. Experimental results on data obtained from the California Freeway Performance Measurement System (PeMS) show the advantages of the proposed solution, with respect to propositional classifiers. In particular, we obtained significant performance improvement at larger time leads. © 2010 Springer-Verlag Berlin Heidelberg.

Lippi, Marco; Frasconi, Paolo ( 2009 ) - Prediction of protein β-residue contacts by Markov logic networks with grounding-specific weights - BIOINFORMATICS - n. volume 25 - pp. da 2326 a 2333 ISSN: 1367-4803 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Motivation: Accurate prediction of contacts between β-strand residues can significantly contribute towards ab initio prediction of the 3D structure of many proteins. Contacts in the same protein are highly interdependent. Therefore, significant improvements can be expected by applying statistical relational learners that overcome the usual machine learning assumption that examples are independent and identically distributed. Furthermore, the dependencies among β-residue contacts are subject to strong regularities, many of which are known a priori. In this article, we take advantage of Markov logic, a statistical relational learning framework that is able to capture dependencies between contacts, and constrain the solution according to domain knowledge expressed by means of weighted rules in a logical language. Results: We introduce a novel hybrid architecture based on neural and Markov logic networks with grounding-specific weights. On a non-redundant dataset, our method achieves 44.9% F1 measure, with 47.3% precision and 42.7% recall, which is significantly better (P < 0.01) than previously reported performance obtained by 2D recursive neural networks. Our approach also significantly improves the number of chains for which β-strands are nearly perfectly paired (36% of the chains are predicted with F1 ≥ 70% on coarse map). It also outperforms more general contact predictors on recent CASP 2008 targets. © The Author 2009. Published by Oxford University Press. All rights reserved.

Costa, Fabrizio; Passerini, Andrea; Lippi, Marco; Frasconi, Paolo ( 2008 ) - A semiparametric generative model for efficient structured-output supervised learning - ANNALS OF MATHEMATICS AND OF ARTIFICIAL INTELLIGENCE - n. volume 54 - pp. da 207 a 222 ISSN: 1012-2443 [Articolo in rivista (262) - Articolo su rivista]
Abstract

We present a semiparametric generative model for supervised learning with structured outputs. The main algorithmic idea is to replace the parameters of an underlying generative model (such as a stochastic grammars) with input-dependent predictions obtained by (kernel) logistic regression. This method avoids the computational burden associated with the comparison between target and predicted structure during the training phase, but requires as an additional input a vector of sufficient statistics for each training example. The resulting training algorithm is asymptotically more efficient than structured output SVM as the size of the output structure grows. At the same time, by computing parameters of a joint distribution as a function of the full input structure, typical expressiveness limitations of related conditional models (such as maximum entropy Markov models) can be potentially avoided. Empirical results on artificial and real data (in the domains of natural language parsing and RNA secondary structure prediction) show that the method works well in practice and scales up with the size of the output structures. © Springer Science+Business Media B.V. 2009.

Lippi, Marco; Passerini, Andrea; Punta, Marco; Rost, Burkhard; Frasconi, Paolo ( 2008 ) - MetalDetector: A web server for predicting metal-binding sites and disulfide bridges in proteins from sequence - BIOINFORMATICS - n. volume 24 - pp. da 2094 a 2095 ISSN: 1367-4803 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The web server MetalDetector classifies histidine residues in proteins into one of two states (free or metal bound) and cysteines into one of three states (free, metal bound or disulfide bridged). A decision tree integrates predictions from two previously developed methods (DISULFIND and Metal Ligand Predictor). Cross-validated performance assessment indicates that our server predicts disulfide bonding state at 88.6% precision and 85.1% recall, while it identifies cysteines and histidines in transition metal-binding sites at 79.9% precision and 76.8% recall, and at 60.8% precision and 40.7% recall, respectively. © The Author 2008. Published by Oxford University Press. All rights reserved.