Sigillo Personale


Inglese
Foto personale

Pagina personale di Rita CUCCHIARA

Dipartimento di Ingegneria "Enzo Ferrari"

Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita ( 2017 ) - Affective level design for a role-playing videogame evaluated by a brain–computer interface and machine learning methods - THE VISUAL COMPUTER - n. volume 33 - pp. da 413 a 427 ISSN: 0178-2789 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Game science has become a research field, which attracts industry attention due to a worldwide rich sell-market. To understand the player experience, concepts like flow or boredom mental states require formalization and empirical investigation, taking advantage of the objective data that psychophysiological methods like electroencephalography (EEG) can provide. This work studies the affective ludology and shows two different game levels for Neverwinter Nights 2 developed with the aim to manipulate emotions; two sets of affective design guidelines are presented, with a rigorous formalization that considers the characteristics of role-playing genre and its specific gameplay. An empirical investigation with a brain–computer interface headset has been conducted: by extracting numerical data features, machine learning techniques classify the different activities of the gaming sessions (task and events) to verify if their design differentiation coincides with the affective one. The observed results, also supported by subjective questionnaires data, confirm the goodness of the proposed guidelines, suggesting that this evaluation methodology could be extended to other evaluation tasks.

Borghi, Guido; Gasparini, Riccardo; Vezzani, Roberto; Cucchiara, Rita ( 2017 ) - Embedded Recurrent Network for Head Pose Estimation in Car ( IEEE Intelligent Vehicles Symposium - Redondo Beach CA, USA - June 11-14) ( - Proceedings of the 2017 IEEE Intelligent Vehicles Symposium ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

An accurate and fast driver's head pose estimation is a rich source of information, in particular in the automotive context. Head pose is a key element for driver's behavior investigation, pose analysis, attention monitoring and also a useful component to improve the efficacy of Human-Car Interaction systems. In this paper, a Recurrent Neural Network is exploited to tackle the problem of driver head pose estimation, directly and only working on depth images to be more reliable in presence of varying or insufficient illumination. Experimental results, obtained from two public dataset, namely Biwi Kinect Head Pose and ICT-3DHP Database, prove the efficacy of the proposed method that overcomes state-of-art works. Besides, the entire system is implemented and tested on two embedded boards with real time performance.

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2017 ) - Hierarchical Boundary-Aware Neural Encoder for Video Captioning ( IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Honolulu, Hawaii - July, 22-25) ( - 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent video encoding scheme which can discover and leverage the hierarchical structure of the video. Unlike the classical encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose a novel LSTM cell, which can identify discontinuity points between frames or segments and modify the temporal connections of the encoding layer accordingly. We evaluate our approach on three large-scale datasets: the Montreal Video Annotation dataset, the MPII Movie Description dataset and the Microsoft Video Description Corpus. Experiments show that our approach can discover appropriate hierarchical representations of input videos and improve the state of the art results on movie description datasets.

Borghi, Guido; Venturelli, Marco; Vezzani, Roberto; Cucchiara, Rita ( 2017 ) - POSEidon: Face-from-Depth for Driver Pose Estimation ( 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Honolulu, Hawaii - July, 22-25, 2017) ( - Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regression neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth approach for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second.

Baraldi, Lorenzo; Grana, Costantino; Messina, Alberto; Cucchiara, Rita ( 2016 ) - A Browsing and Retrieval System for Broadcast Videos using Scene Detection and Automatic Annotation ( 24th ACM international conference on Multimedia - Amsterdam, The Netherlands - 15 - 19 October 2016) ( - Proceedings of the 2016 ACM on Multimedia Conference ) (ACM ) - pp. da 733 a 734 ISBN: 9781450336031 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a novel video access and retrieval system for edited videos. The key element of the proposal is that videos are automatically decomposed into semantically coherent parts (called scenes) to provide a more manageable unit for browsing, tagging and searching. The system features an automatic annotation pipeline, with which videos are tagged by exploiting both the transcript and the video itself. Scenes can also be retrieved with textual queries; the best thumbnail for a query is selected according to both semantics and aesthetics criteria.

Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - A Deep Multi-Level Network for Saliency Prediction ( 23rd International Conference on Pattern Recognition - Cancun, Mexico - 4-8 Dec 2016) ( - Proceedings of the 23rd International Conference on Pattern Recognition ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.

Fiore, Giuseppe Del; Mainetti, Luca; Mighali, Vincenzo; Patrono, Luigi; Alletto, Stefano; Cucchiara, Rita; Serra, Giuseppe ( 2016 ) - A location-aware architecture for an IoT-based smart museum - INTERNATIONAL JOURNAL OF ELECTRONIC GOVERNMENT RESEARCH - n. volume 12 - pp. da 39 a 55 ISSN: 1548-3886 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The Internet of Things, whose main goal is to automatically predict users' desires, can find very interesting opportunities in the art and culture field, as the tourism is one of the main driving engines of the modern society. Currently, the innovation process in this field is growing at a slower pace, so the cultural heritage is a prerogative of a restricted category of users. To address this issue, a significant technological improvement is necessary in the culture-dedicated locations, which do not usually allow the installation of hardware infrastructures. In this paper, we design and validate a no-invasive indoor location-aware architecture able to enhance the user experience in a museum. The system relies on the user's smartphone and a wearable device (with image recognition and localization capabilities) to automatically deliver personalized cultural contents related to the observed artworks. The proposal was validated in the MUST museum in Lecce (Italy).

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2016 ) - Analysis and Re-use of Videos in Educational Digital Libraries with Automatic Scene Detection ( 11th Italian Research Conference on Digital Libraries - Bolzano - Jan. 29-30) ( - Digital Libraries on the Move ) (Springer International Publishing CHE ) - n. volume 612 - pp. da 155 a 164 ISBN: 978-3-319-41937-4 ISSN: 1865-0937 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The advent of modern approaches to education, like Massive Open Online Courses (MOOC), made video the basic media for educating and transmitting knowledge. However, IT tools are still not adequate to allow video content re-use, tagging, annotation and personalization. In this paper we analyze the problem of identifying coherent sequences, called scenes, in order to provide the users with a more manageable editing unit. A simple spectral clustering technique is proposed and compared with state-of-the-art results. We also discuss correct ways to evaluate the performance of automatic scene detection algorithms.

Fergnani, Federica; Alletto, Stefano; Serra, Giuseppe; De Mira, Joaquim; Cucchiara, Rita ( 2016 ) - Body Part Based Re-identification from an Egocentric Perspective ( Computer Vision and Pattern Recognition - Las Vegas, USA - 26/06/2016) ( - Proceedings of CVPR ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

With the spread of wearable cameras, many consumer applications ranging from social tagging to video summarization would greatly benefit from people re-identification methods capable of dealing with the egocentric perspective. In this regard, first-person camera views present such a unique setting that traditional re-identification methods results in poor performance when applied to this scenario. In this paper, we present a simple but effective solution that overcomes the limitations of traditional approaches by dividing people images into meaningful body parts. Furthermore, by taking into account human gaze information concerning where people look at when trying to recognize a person, we devise a meaningful way to weight the contributions of different bodyparts. Experimental results validate the proposal on a novel egocentric re-identification dataset, the first of its kind, showing that the performance increases when compared to current state of the art on egocentric sequences is significant.

Paci, Francesco; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita; Benini, Luca ( 2016 ) - Context Change Detection for an Ultra-Low Power Low-Resolution Ego-Vision Imager ( First International Workshop on Egocentric Perception, Interaction and Computing - Amsterdam, The Netherlands - October 8-10, 2016) ( - Computer Vision – ECCV 2016 Workshops ) (Springer International Publishing CHE ) - n. volume 9913 - pp. da 589 a 602 ISBN: 9783319466033 ISSN: 1611-3349 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

With the increasing popularity of wearable cameras, such as GoPro or Narrative Clip, research on continuous activity monitoring from egocentric cameras has received a lot of attention. Research in hardware and software is devoted to find new efficient, stable and long-time running solutions; however, devices are too power-hungry for truly always-on operation, and are aggressively duty-cycled to achieve acceptable lifetimes. In this paper we present a wearable system for context change detection based on an egocentric camera with ultra-low power consumption that can collect data 24/7. Although the resolution of the captured images is low, experimental results in real scenarios demonstrate how our approach, based on Siamese Neural Networks, can achieve visual context awareness. In particular, we compare our solution with hand-crafted features and with state of art technique and propose a novel and challenging dataset composed of roughly 30000 low-resolution images.

Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita ( 2016 ) - Deep Head Pose Estimation from Depth Data for In-car Automotive Applications ( 2nd International Workshop on Understanding Human Activities through 3D Sensors (UHA3DS'16) - Cancun (Mexico) - Dec 4 , 2016) ( - Proceedings of the 2nd International Workshop on Understanding Human Activities through 3D Sensors ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, \textit{Biwi Kinect Head Pose}, showing that our approach achieves state-of-art results and is able to meet real time performance requirements.

Alletto, Stefano; Palazzi, Andrea; Solera, Francesco; Calderara, Simone; Cucchiara, Rita ( 2016 ) - DR(eye)VE: a Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving ( IEEE Internation Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - Las Vegas - 2016) ( - IEEE Internation Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Autonomous and assisted driving are undoubtedly hot topics in computer vision. However, the driving task is extremely complex and a deep understanding of drivers' behavior is still lacking. Several researchers are now investigating the attention mechanism in order to define computational models for detecting salient and interesting objects in the scene. Nevertheless, most of these models only refer to bottom up visual saliency and are focused on still images. Instead, during the driving experience the temporal nature and peculiarity of the task influence the attention mechanisms, leading to the conclusion that real life driving data is mandatory. In this paper we propose a novel and publicly available dataset acquired during actual driving. Our dataset, composed by more than 500,000 frames, contains drivers' gaze fixations and their temporal integration providing task-specific saliency maps. Geo-referenced locations, driving speed and course complete the set of released data. To the best of our knowledge, this is the first publicly available dataset of this kind and can foster new discussions on better understanding, exploiting and reproducing the driver's attention process in the autonomous and assisted cars of future generations.

Alletto, Stefano; Abati, Davide; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - Exploring Architectural Details Through aWearable Egocentric Vision Device - SENSORS - n. volume 16(2) - pp. da 1 a 15 ISSN: 1424-8220 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Augmented user experiences in the cultural heritage domain are in increasing demand by the new digital native tourists of 21st century. In this paper, we propose a novel solution that aims at assisting the visitor during an outdoor tour of a cultural site using the unique first person perspective of wearable cameras. In particular, the approach exploits computer vision techniques to retrieve the details by proposing a robust descriptor based on the covariance of local features. Using a lightweight wearable board the solution can localize the user with respect to the 3D point cloud of the historical landmark and provide him with information about the details he is currently looking at. Experimental results validate the method both in terms of accuracy and computational effort. Furthermore, user evaluation based on real-world experiments shows that the proposal is deemed effective in enriching a cultural experience.

Cucchiara, Rita; Bulling, Andreas; Kunze, Kai; Rehg, James ( 2016 ) - Eyewear Computing – Augmenting the Human with Head-Mounted Wearable Assistants ( - Report of Dagstuhl Seminar ) [Recensione in volume (301) - Recensione in Volume]
Abstract

The seminar was composed of workshops and tutorials on head-mounted eye tracking, egocentric vision, optics, and head-mounted displays. The seminar welcomed 30 academic and industry researchers from Europe, the US, and Asia with a diverse background, including wearable and ubiquitous computing, computer vision, developmental psychology, optics, and human-computer interaction. In contrast to several previous Dagstuhl seminars, we used an ignite talk format to reduce the time of talks to one half-day and to leave the rest of the week for hands-on sessions, group work, general discussions, and socialising. The key results of this seminar are 1) the identification of key research challenges and summaries of breakout groups on multimodal eyewear computing, egocentric vision, security and privacy issues, skill augmentation and task guidance, eyewear computing for gaming, as well as prototyping of VR applications, 2) a list of datasets and research tools for eyewear computing, 3) three small-scale datasets recorded during the seminar, 4) an article in ACM Interactions entitled “Eyewear Computers for Human-Computer Interaction”, as well as 5) two follow-up workshops on “Egocentric Perception, Interaction, and Computing” at the European Conference on Computer Vision (ECCV) as well as “Eyewear Computing” at the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp).

Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita; ( 2016 ) - Fast gesture recognition with Multiple StreamDiscrete HMMs on 3D Skeletons ( 23rd International Conference on Pattern Recognition - Cancun - Dec 4-8, 2016) ( - Proceedings of the 23rd International Conference on Pattern Recognition ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism. They have worth performance even with a limited training set. All these characteristics are hard to find together in other even more accurate methods. In this paper, we propose a novel doublestage classification approach, based on Multiple Stream Discrete Hidden Markov Models (MSD-HMM) and 3D skeleton joint data, able to reach high performances maintaining all advantages listed above. The approach allows both to quickly classify presegmented gestures (offline classification), and to perform temporal segmentation on streams of gestures (online classification) faster than real time. We test our system on three public datasets, MSRAction3D, UTKinect-Action and MSRDailyAction, and on a new dataset, Kinteract Dataset, explicitly created for Human Computer Interaction (HCI). We obtain state of the art performances on all of them.

Venturelli, Marco; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita ( 2016 ) - From Depth Data to Head Pose Estimation: a Siamese approach ( 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2017) - Porto, Portugal - 27 february - 1 march, 2017) ( - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP) ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The correct estimation of the head pose is a problem of the great importance for many applications. For instance, it is an enabling technology in automotive for driver attention monitoring. In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner. Traditional methods usually rely on visual facial features, such as facial landmarks or nose tip position. In contrast, we exploit a Convolutional Neural Network (CNN) to perform head pose estimation directly from depth data. We exploit a Siamese architecture and we propose a novel loss function to improve the learning of the regression network layer. The system has been tested on two public datasets, Biwi Kinect Head Pose and ICT-3DHP database. The reported results demonstrate the improvement in accuracy with respect to current state-of-the-art approaches and the real time capabilities of the overall framework.

Corbelli, Andrea; Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2016 ) - Historical Document Digitization through Layout Analysis and Deep Content Classification ( 23rd International Conference on Pattern Recognition - Cancun, Mexico - 4-8 Dec 2016) ( - Proceedings of the 23rd International Conference on Pattern Recognition ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Document layout segmentation and recognition is an important task in the creation of digitized documents collections, especially when dealing with historical documents. This paper presents an hybrid approach to layout segmentation as well as a strategy to classify document regions, which is applied to the process of digitization of an historical encyclopedia. Our layout analysis method merges a classic top-down approach and a bottom-up classification process based on local geometrical features, while regions are classified by means of features extracted from a Convolutional Neural Network merged in a Random Forest classifier. Experiments are conducted on the first volume of the ``Enciclopedia Treccani'', a large dataset containing 999 manually annotated pages from the historical Italian encyclopedia.

Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita ( 2016 ) - Layout analysis and content classification in digitized books ( 12th Italian Research Conference on Digital Libraries - Firenze - Feb. 4-5) ( - Proceedings of the 12th Italian Research Conference on Digital Libraries ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.

Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - Motion Segmentation using Visual and Bio-mechanical Features ( ACM Multimedia - Amsterdam - Ottobre 2016) ( - 1 ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Nowadays, egocentric wearable devices are continuously increasing their widespread among both the academic community and the general public. For this reason, methods capable of automatically segment the video based on the recorder motion patterns are gaining attention. These devices present the unique opportunity of both high quality video recordings and multimodal sensors readings. Significant efforts have been made in either analyzing the video stream recorded by these devices or the bio-mechanical sensor information. So far, the integration between these two realities has not been fully addressed, and the real capabilities of these devices are not yet exploited. In this paper, we present a solution to segment a video sequence into motion activities by introducing a novel data fusion technique based on the covariance of visual and bio-mechanical features. The experimental results are promising and show that the proposed integration strategy outperforms the results achieved focusing solely on a single source.

Cornia, Marcella; Baraldi, Lorenzo; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - Multi-Level Net: a Visual Saliency Prediction Model ( Fourth International Workshop on Assistive Computer Vision and Robotics - Amsterdam, The Netherlands - October 9th, 2016) ( - Computer Vision – ECCV 2016 Workshops ) (Springer International Publishing CHE ) - n. volume 9914 - pp. da 302 a 315 ISBN: 9783319488806 ISSN: 1611-3349 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

State of the art approaches for saliency prediction are based on Full Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combination of features coming from different layers of the network. We also present a new loss function to deal with the imbalance issue on saliency masks. Extensive results on three public datasets demonstrate the robustness of our solution. Our model outperforms the state of the art on SALICON, which is the largest and unconstrained dataset available, and obtains competitive results on MIT300 and CAT2000 benchmarks.

Gasparini, Riccardo; Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - Optimizing image registration for interactive applications ( 3rd International Conference on Augmented Reality, Virtual Reality, and Computer Graphics, AVR 2016 - ita - 2016) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) (Springer Verlag ) - LECTURE NOTES IN COMPUTER SCIENCE - n. volume 9768 - pp. da 479 a 488 ISBN: 9783319406206; 9783319406206 | 9783319406206 ISSN: 1611-3349 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

With the spread of wearable and mobile devices, the request for interactive augmented reality applications is in constant growth. Among the different possibilities, we focus on the cultural heritage domain where a key step in the development applications for augmented cultural experiences is to obtain a precise localization of the user, i.e. the 6 degree-of-freedom of the camera acquiring the images used by the application. Current state of the art perform this task by extracting local descriptors from a query and exhaustively matching them to a sparse 3D model of the environment. While this procedure obtains good localization performance, due to the vast search space involved in the retrieval of 2D-3D correspondences this is often not feasible in real-time and interactive environments. In this paper we hence propose to perform descriptor quantization to reduce the search space and employ multiple KD-Trees combined with a principal component analysis dimensionality reduction to enable an efficient search. We experimentally show that our solution can halve the computational requirements of the correspondence search with regard to the state of the art while maintaining similar accuracy levels.

Barnard, Shanis; Calderara, Simone; Pistocchi, Simone; Cucchiara, Rita; Podaliri-Vulpiani, Michele; Messori, Stefano; Ferri, Nicola ( 2016 ) - Quick, accurate, smart: 3D computer vision technology helps assessing confined animals' behaviour - PLOS ONE - n. volume 11 - pp. da 1 a 20 ISSN: 1932-6203 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Mankind directly controls the environment and lifestyles of several domestic species for purposes ranging from production and research to conservation and companionship. These environments and lifestyles may not offer these animals the best quality of life. Behaviour is a direct reflection of how the animal is coping with its environment. Behavioural indicators are thus among the preferred parameters to assess welfare. However, behavioural recording (usually from video) can be very time consuming and the accuracy and reliability of the output rely on the experience and background of the observers. The outburst of new video technology and computer image processing gives the basis for promising solutions. In this pilot study, we present a new prototype software able to automatically infer the behaviour of dogs housed in kennels from 3D visual data and through structured machine learning frameworks. Depth information acquired through 3D features, body part detection and training are the key elements that allow the machine to recognise postures, trajectories inside the kennel and patterns of movement that can be later labelled at convenience. The main innovation of the software is its ability to automatically cluster frequently observed temporal patterns of movement without any pre-set ethogram. Conversely, when common patterns are defined through training, a deviation from normal behaviour in time or between individuals could be assessed. The software accuracy in correctly detecting the dogs' behaviour was checked through a validation process. An automatic behaviour recognition system, independent from human subjectivity, could add scientific knowledge on animals' quality of life in confinement as well as saving time and resources. This 3D framework was designed to be invariant to the dog's shape and size and could be extended to farm, laboratory and zoo quadrupeds in artificial housing. The computer vision technique applied to this software is innovative in non-human animal behaviour science. Further improvements and validation are needed, and future applications and limitations are discussed.

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2016 ) - Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks - IEEE TRANSACTIONS ON MULTIMEDIA - pp. da 1 a 14 ISSN: 1520-9210 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper, we propose a novel scene detection algorithm which employs semantic, visual, textual and audio cues. We also show how the hierarchical decomposition of the storytelling video structure can improve retrieval results presentation with semantically and aesthetically effective thumbnails. Our method is built upon two advancements of the state of the art: 1) semantic feature extraction which builds video specific concept detectors; 2) multimodal feature embedding learning, that maps the feature vector of a shot to a space in which the Euclidean distance has task specific semantic properties. The proposed method is able to decompose the video in annotated temporal segments which allow for a query specific thumbnail extraction. Extensive experiments are performed on different data sets to demonstrate the effectiveness of our algorithm. An in-depth discussion on how to deal with the subjectivity of the task is conducted and a strategy to overcome the problem is suggested.

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2016 ) - Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features ( 6th ACM on International Conference on Multimedia Retrieval - New York, USA - 6-9 Giugno 2016) ( - Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval ) (ACM ) - pp. da 23 a 29 ISBN: 978-1-4503-4359-6 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are first segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning is proposed to retrieve the most significant scenes for a textual query. A ranking strategy based on deep features is finally used to tackle the problem of visualizing the best thumbnail. Qualitative and quantitative experiments are conducted on a collection of edited videos to demonstrate the effectiveness of our approach.

Manfredi, Marco; Grana, Costantino; Cucchiara, Rita; Smeulders, Arnold W.M. ( 2016 ) - Segmentation models diversity for object proposals - COMPUTER VISION AND IMAGE UNDERSTANDING - pp. da 1 a 9 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we present a segmentation proposal method which employs a box-hypotheses generation step followed by a lightweight segmentation strategy. Inspired by interactive segmentation, for each automatically placed bounding-box we compute a precise segmentation mask. We introduce diversity in segmentation strategies enhancing a generic model performance exploiting class-independent regional appearance features. Foreground probability scores are learned from groups of objects with peculiar characteristics to specialize segmentation models. We demonstrate results comparable to the state-of-the-art on PASCAL VOC 2012 and a further improvement by merging our proposals with those of a recent solution. The ability to generalize to unseen object categories is demonstrated on Microsoft COCO 2014.

Baraldi, Lorenzo; Grana, Costantino; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita ( 2016 ) - Shot, scene and keyframe ordering for interactive video re-use ( 11th International Conference on Computer Vision Theory and Applications - Rome - Feb 27-29, 2016) ( - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications ) - n. volume 4 - pp. da 626 a 631 ISBN: 9789897581755 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a complete system for shot and scene detection in broadcast videos, as well as a method to select the best representative key-frames, which could be used in new interactive interfaces for accessing large collections of edited videos. The final goal is to enable an improved access to video footage and the re-use of video content with the direct management of user-selected video-clips.

Solera, Francesco; Calderara, Simone; Cucchiara, Rita; ( 2016 ) - Socially Constrained Structural Learning for Groups Detection in Crowd - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 38 (5) - pp. da 995 a 1008 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function (G-MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.

Palazzi, Andrea; Calderara, Simone; Bicocchi, Nicola; Vezzali, Loris; Di Bernardo, Gian Antonio; Zambonelli, Franco; Cucchiara, Rita ( 2016 ) - Spotting prejudice with nonverbal behaviours ( 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2016) - Heidelberg, Germany - 12-16 September 2016) ( - Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing 2016 ) (ACM New York USA ) - pp. da 853 a 862 ISBN: 9781450344616 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Despite prejudice cannot be directly observed, nonverbal behaviours provide profound hints on people inclinations. In this paper, we use recent sensing technologies and machine learning techniques to automatically infer the results of psychological questionnaires frequently used to assess implicit prejudice. In particular, we recorded 32 students discussing with both white and black collaborators. Then, we identified a set of features allowing automatic extraction and measured their degree of correlation with psychological scores. Results confirmed that automated analysis of nonverbal behaviour is actually possible thus paving the way for innovative clinical tools and eventually more secure societies.

Coppi, Dalia; Calderara, Simone; Cucchiara, Rita ( 2016 ) - Transductive People Tracking in Unconstrained Surveillance - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY - n. volume 26 (4) - pp. da 762 a 775 ISSN: 1051-8215 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Long term tracking of people in unconstrained scenarios is still an open problem due to the absence of constant elements in the problem setting. The camera, when active, may move and both the background and the target appearance may change abruptly leading to the inadequacy of most standard tracking techniques. We propose to exploit a learning approach that considers the tracking task as a semi supervised learning (SSL) problem. Given few target samples the aim is to search the target occurrences in the video stream re-interpreting the problem as label propagation on a similarity graph. We propose a solution based on graph transduction that works iteratively frame by frame. Additionally, in order to avoid drifting, we introduce an update strategy based on an evolutionary clustering technique that chooses the visual templates that better describe target appearance evolving the model during the processing of the video. Since we model people appearance by means of covariance matrices on color and gradient information our framework is directly related to structure learning on Riemannian manifolds. Tests on publicly available datasets and comparisons with stateof- the-art techniques allow to conclude that our solution exhibit interesting performances in terms of tracking precision and recall in most of the considered scenarios.

Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita ( 2016 ) - Video registration in egocentric vision under day and night illumination changes - COMPUTER VISION AND IMAGE UNDERSTANDING - pp. da 1 a 25 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

With the spread of wearable devices and head mounted cameras, a wide range of application requiring precise user localization is now possible. In this paper we propose to treat the problem of obtaining the user position with respect to a known environment as a video registration problem. Video registration, i.e. the task of aligning an input video sequence to a pre-built 3D model, relies on a matching process of local keypoints extracted on the query sequence to a 3D point cloud. The overall registration performance is strictly tied to the actual quality of this 2D-3D matching, and can degrade if environmental conditions such as steep changes in lighting like the ones between day and night occur. To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process. To overcome the shortcomings of standard matching techniques, we introduce a novel embedding space that allows us to obtain robust matches by jointly taking into account local descriptors, their spatial arrangement and their temporal robustness. The proposal is evaluated using unconstrained egocentric video sequences both in terms of matching quality and resulting registration performance using different 3D models of historical landmarks. The results show that the proposed method can outperform state of the art registration algorithms, in particular when dealing with the challenges of night and day sequences.

Baraldi Lorenzo; Grana Costantino; Cucchiara Rita ( 2015 ) - A Deep Siamese Network for Scene Detection in Broadcast Videos ( 23rd ACM International Conference on Multimedia - Brisbane, Australia - Oct. 26-30) ( - Proceedings of the 23rd ACM international conference on Multimedia ) (ACM New York USA ) - pp. da 1199 a 1202 ISBN: 978-1-4503-3459-4 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against recent proposals for automatic scene segmentation. We also propose an improved performance measure that aims to reduce the gap between numerical evaluation and expected results, and propose and release a new benchmark dataset.

Vezzani, Roberto; Lombardi, Martino; Pieracci, Augusto; Santinelli, Paolo; Cucchiara, Rita ( 2015 ) - A General-Purpose Sensing Floor Architecture for Human-Environment Interaction - ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS - n. volume 5 - pp. da 1 a 26 ISSN: 2160-6455 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Smart environments are now designed as natural interfaces to capture and understand human behavior without a need for explicit human-computer interaction. In this paper, we present a general-purpose architecture that acquires and understands human behaviors through a sensing floor. The pressure field generated by moving people is captured and analyzed. Specific actions and events are then detected by a low-level processing engine and sent to high-level interfaces providing different functions. The proposed architecture and sensors are modular, general-purpose, cheap, and suitable for both small- and large-area coverage. Some sample entertainment and virtual reality applications that we developed to test the platform are presented.

Coppi, Dalia; Calderara, Simone; Cucchiara, Rita ( 2015 ) - Active query process for digital video surveillance forensic applications - SIGNAL, IMAGE AND VIDEO PROCESSING - n. volume 9 - pp. da 749 a 759 ISSN: 1863-1703 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Multimedia forensics is a new emerging discipline regarding the analysis and exploitation of digital data as support for investigation to extract probative elements. Among them, visual data about people and people activities, extracted from videos in an efficient way, are becoming day by day more appealing for forensics, due to the availability of large video-surveillance footage. Thus, many research studies and prototypes investigate the analysis of soft biometrics data, such as people appearance and people trajectories. In this work, we propose new solutions for querying and retrieving visual data in an interactive and active fashion for soft biometrics in forensics. The innovative proposal joins the capability of transductive learning for semi-supervised search by similarity and a typical multimedia methodology based on user-guided relevance feedback to allow an active interaction with the visual data of people, appearance and trajectory in large surveillance areas. Approaches proposed are very general and can be exploited independently by the surveillance setting and the type of video analytic tools.

Vezzani, Roberto; Lombardi, Martino; Cucchiara, Rita ( 2015 ) - Automatic configuration and calibration of modular sensing floors ( 12th IEEE International Conference on Advanced Video and Signal-Based Surveillance - Karlsruhe, Germany - Aug. 28 2015) ( - Proceedings of the 12th IEEE International Conference on Advanced Video and Signal-Based Surveillance ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Sensing floors are becoming an emerging solution for many privacy-compliant and large area surveillance systems. Many research and even commercial Technologies have been proposed in the last years. Similarly to distributed camera networks, the problem of calibration is crucial, specially when installed in wide areas. This paper addresses the general problem of automatic calibration and configuration of modular and scalable sensing floors. Working on training data only, the system automatically finds the spatial placement of each sensor module and estimates threshold parameters needed for people detection. Tests on several training sequences captured with a commercial sensing floor are provided to validate the method

Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita ( 2015 ) - Classification of Affective Data to Evaluate the Level Design in a Role-Playing Videogame ( 7th International Conference on Games and Virtual Worlds for Serious Applications, VS-Games 2015 - University of Skovde, swe - 2015) ( - VS-Games 2015 - 7th International Conference on Games and Virtual Worlds for Serious Applications ) (Institute of Electrical and Electronics Engineers Inc. ) - pp. da 1 a 8 ISBN: 9781479981021; 9781479981021 | 9781479981021 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a novel approach to evaluate game level design strategies, applied to role playing games. Following a set of well defined guidelines, two game levels were designed for Neverwinter Nights 2 to manipulate particular emotions like boredom or flow, and tested by 13 subjects wearing a brain computer interface helmet. A set of features was extracted from the affective data logs and used to classify different parts of the gaming sessions, to verify the correspondence of the original level aims and the effective results on people emotions. The very interesting correlations observed, suggest that the technique is extensible to other similar evaluation tasks.

Lombardi, Martino; Vezzani, Roberto; Cucchiara, Rita ( 2015 ) - Detection of Human Movements with Pressure Floor Sensors ( International Conference on Image Analysis and Processing - Genoa, Italy - Sep. 7-11, 2015) ( - Proceedings of the 18th International Conference on Image Analysis and Processing, LNCS 9280 ) - n. volume 9280 - pp. da 620 a 630 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Following the recent Internet of Everything (IoE) trend, several general-purpose devices have been proposed to acquire as much information as possible from the environment and from people interacting with it. Among the others, sensing floors are recently attracting the interest of the research community. In this paper, we propose a new model to store and process floor data. The model does not assume a regular grid distribution of the sensing elements and is based on the ground reaction force (GRF) concept, widely used in biomechanics. It allows the correct detection and tracking of people, outperforming the common background subtraction schema adopted in the past. Several tests on a real sensing floor prototype are reported and discussed

Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Egocentric Object Tracking: An Odometry-Based Solution ( International Conference on Image Analysis and Processing - Genova - September, 2015) ( - International Conference on Image Analysis and Processing - ICIAP 2015 ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Tracking objects moving around a person is one of the key steps in human visual augmentation: we could estimate their locations when they are out of our field of view, know their position, distance or velocity just to name a few possibilities. This is no easy task: in this paper, we show how current state-of-the-art visual tracking algorithms fail if challenged with a first-person sequence recorded from a wearable camera attached to a moving user. We propose an evaluation that highlights these algorithms' limitations and, accordingly, develop a novel approach based on visual odometry and 3D localization that overcomes many issues typical of egocentric vision. We implement our algorithm on a wearable board and evaluate its robustness, showing in our preliminary experiments an increase in tracking performance of nearly 20\% if compared to currently state-of-the-art techniques.

Varini, Patrizia; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Egocentric video personalization in cultural experiences scenarios ( 18th International Conference on Image Analysis and Processing, ICIAP 2015 - ita - 2015) ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) (Springer Verlag ) - LECTURE NOTES IN COMPUTER SCIENCE - n. volume 9279 - pp. da 694 a 704 ISBN: 9783319232300; 9783319232300 | 9783319232300 ISSN: 1611-3349 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose a novel approach for egocentric video personalization in a cultural experience scenario, based on shots automatic labelling according to different semantic dimensions, such as web leveraged knowledge of the surrounded cultural Points Of Interest, information about stops and moves, both relying on geolocalization, and camera’s wearer behaviour. Moreover we present a video personalization web system based on shots multi-dimensional semantic classification, that is designed to aid the visitor to browse and to retrieve relevant information to obtain a customized video. Experimental results show that the proposed techniques for video analysis achieve good performances in unconstrained scenario and user evaluation tests confirm that our solution is useful and effective.

Varini, Patrizia; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Egocentric Video Summarization of Cultural Tour based on User Preferences ( 23rd ACM international conference on Multimedia - Brisbane - 26-30 oct 2015) ( - Proceeding MM '15 Proceedings of the 23rd ACM international conference on Multimedia ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we propose a new method to obtain customized video summarization according to specific user preferences. Our approach is tailored on Cultural Heritage scenario and is designed on identifying candidate shots, selecting from the original streams only the scenes with behavior patterns related to the presence of relevant experiences, and further filtering them in order to obtain a summary matching the requested user's preferences. Our preliminary results show that the proposed approach is able to leverage user's preferences in order to obtain a customized summary, so that different users may extract from the same stream different summaries.

Baraldi, Lorenzo; Paci, Francesco; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Gesture Recognition using Wearable Vision Sensors to Enhance Visitors' Museum Experiences - IEEE SENSORS JOURNAL - n. volume 15 - pp. da 2705 a 2714 ISSN: 1530-437X [Articolo in rivista (262) - Articolo su rivista]
Abstract

We introduce a novel approach to cultural heritage experience: by means of ego-vision embedded devices we develop a system, which offers a more natural and entertaining way of accessing museum knowledge. Our method is based on distributed self-gesture and artwork recognition, and does not need fixed cameras nor radio-frequency identifications sensors. We propose the use of dense trajectories sampled around the hand region to perform self-gesture recognition, understanding the way a user naturally interacts with an artwork, and demonstrate that our approach can benefit from distributed training. We test our algorithms on publicly available data sets and we extend our experiments to both virtual and real museum scenarios, where our method shows robustness when challenged with real-world data. Furthermore, we run an extensive performance analysis on our ARM-based wearable device.

Serra, Giuseppe; Grana, Costantino; Manfredi, Marco; Cucchiara, Rita ( 2015 ) - GOLD: Gaussians of Local Descriptors for Image Representation - COMPUTER VISION AND IMAGE UNDERSTANDING - n. volume 134 - pp. da 22 a 32 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The Bag of Words paradigm has been the baseline from which several successful image classification solutions were developed in the last decade. These represent images by quantizing local descriptors and summarizing their distribution. The quantization step introduces a dependency on the dataset, that even if in some contexts significantly boosts the performance, severely limits its generalization capabilities. Differently, in this paper, we propose to model the local features distribution with a multivariate Gaussian, without any quantization. The full rank covariance matrix, which lies on a Riemannian manifold, is projected on the tangent Euclidean space and concatenated to the mean vector. The resulting representation, a Gaussian of local descriptors (GOLD), allows to use the dot product to closely approximate a distance between distributions without the need for expensive kernel computations. We describe an image by an improved spatial pyramid, which avoids boundary effects with soft assignment: local descriptors contribute to neighboring Gaussians, forming a weighted spatial pyramid of GOLD descriptors. In addition, we extend the model leveraging dataset characteristics in a mixture of Gaussian formulation further improving the classification accuracy. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. Experimental results on several publicly available datasets show that the proposed method obtains state-of-the-art performance.

Mighali, Vincenzo; Del Fiore, Giuseppe; Patrono, Luigi; Mainetti, Alletto, Stefano; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Innovative IoT-aware Services for a Smart Museum ( International Conference on World Wide Web workshop - Florence - 18-22 May 2015) ( - Proceedings of the 24th International Conference on World Wide Web ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Smart cities are a trading topic in both the academic literature and industrial world. The capability to provide the users with addedvalue services through low-power and low-cost smart objects is very attractive in many fields. Among these, art and culture represent very interesting examples, as the tourism is one of the main driving engines of modern society. In this paper, we propose an IoT-aware architecture to improve the cultural experience of the user, by involving the most important recent innovations in the ICT field. The main components of the proposed architecture are: (i) an indoor localization service based on the Bluetooth Low Energy technology, (ii) a wearable device able to capture and process images related to the user’s point of view, (iii) the user’s mobile device useful to display customized cultural contents and to share multimedia data in the Cloud, and (iv) a processing center that manage the core of the whole business logic. In particular, it interacts with both wearable and mobile devices, and communicates with the outside world to retrieve contents from the Cloud and to provide services also to external users. The proposal is currently under development and it will be validated in the MUST museum in Lecce.

Solera, Francesco; Calderara, Simone; Cucchiara, Rita ( 2015 ) - Learning to Divide and Conquer for Online Multi-Target Tracking ( 2015 IEEE International Conference on Computer Vision - Santiago (Chile) - 11-18 December 2015) ( - 2015 IEEE International Conference on Computer Vision ) (Institute of Electrical and Electronics Engineers Danvers (MA) USA ) - n. volume 11-18-December-2015 - pp. da 4373 a 4381 ISBN: 978-1-4673-8390-5; 978-1-4673-8391-2 | 978-1-4673-8391-2 ISSN: 1550-5499 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Online Multiple Target Tracking (MTT) is often addressed within the tracking-by-detection paradigm. Detections are previously extracted independently in each frame and then objects trajectories are built by maximizing specifically designed coherence functions. Nevertheless, ambiguities arise in presence of occlusions or detection errors. In this paper we claim that the ambiguities in tracking could be solved by a selective use of the features, by working with more reliable features if possible and exploiting a deeper representation of the target only if necessary. To this end, we propose an online divide and conquer tracker for static camera scenes, which partitions the assignment problem in local subproblems and solves them by selectively choosing and combining the best features. The complete framework is cast as a structural learning task that unifies these phases and learns tracker parameters from examples. Experiments on two different datasets highlights a significant improvement of tracking performances (MOTA +10%) over the state of the art.

Solera, Francesco; Calderara, Simone; Cucchiara, Rita ( 2015 ) - Learning to identify leaders in crowd ( 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - Boston USA - 7-12 June 2015) ( - 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) ) (IEE Piscataway USA ) - pp. da 43 a 48 ISBN: 978-1-4673-6759-2 ISSN: 2160-7508 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Leader identification is a crucial task in social analysis, crowd management and emergency planning. In this paper, we investigate a computational model for the individuation of leaders in crowded scenes. We deal with the lack of a formal definition of leadership by learning, in a supervised fashion, a metric space based exclusively on people spatiotemporal information. Based on Tarde's work on crowd psychology, individuals are modeled as nodes of a directed graph and leaders inherits their relevance thanks to other members references. We note this is analogous to the way websites are ranked by the PageRank algorithm. During experiments, we observed different feature weights depending on the specific type of crowd, highlighting the impossibility to provide a unique interpretation of leadership. To our knowledge, this is the first attempt to study leader identification as a metric learning problem

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2015 ) - Measuring scene detection performance ( 7th Iberian Conference on Pattern Recognition and Image Analysis - Santiago de Compostela; Spain - 17-19 June 2015) ( - Pattern Recognition and Image Analysis ) (Springer Verlag ) - n. volume 9117 - pp. da 395 a 403 ISBN: 9783319193892; 9783319193892 | 9783319193892 ISSN: 1611-3349 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we evaluate the performance of scene detection techniques, starting from the classic precision/recall approach, moving to the better designed coverage/overflow measures, and finally proposing an improved metric, in order to solve frequently observed cases in which the numeric interpretation is different from the expected results. Numerical evaluation is performed on two recent proposals for automatic scene detection, and comparing them with a simple but effective novel approach. Experimental results are conducted to show how different measures may lead to different interpretations.

Varini, Patrizia; Serra, Giuseppe; Cucchiara, Rita ( 2015 ) - Personalized Egocentric Video Summarization for Cultural Experience ( ICMR '15 5th ACM on International Conference on Multimedia Retrieval - Shangai - 23-26 June 2015) ( - Proceeding ICMR '15 Proceedings of the 5th ACM on International Conference on Multimedia Retrieval ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Recent egocentric video summarization approaches have dealt with motion analysis and social interaction without considering that user can be interested in preserving only part of the video related to his interests. In this paper we propose a new method for personalized video summarization of cultural experiences with the goal of extracting from the streams only the scenes corresponding to a user's specific topics request, chosen among the shots in which it's possible to deduce that the visitor was focusing on a point of interest. Preliminary experiments show that our approach is promising and allows visitor to better customize the summary of his experience.

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2015 ) - Scene segmentation using temporal clustering for accessing and re-using broadcast video ( IEEE International Conference on Multimedia and Expo, ICME 2015 - Torino, Italia - 2015) ( - Proceedings - IEEE International Conference on Multimedia and Expo ) (IEEE Computer Society ) - n. volume 2015- - pp. da 1 a 6 ISBN: 9781479970827; 9781479970827 | 9781479970827 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Scene detection is a fundamental tool for allowing effective video browsing and re-using. In this paper we present a model that automatically divides videos into coherent scenes, which is based on a novel combination of local image descriptors and temporal clustering techniques. Experiments are performed to demonstrate the effectiveness of our approach, by comparing our algorithm against two recent proposals for automatic scene segmentation. We also propose improved performance measures that aim to reduce the gap between numerical evaluation and expected results.

Baraldi, Lorenzo; Grana, Costantino; Cucchiara, Rita ( 2015 ) - Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video ( 16th International Conference on Computer Analysis of Images and Patterns - Valletta, Malta - Sep. 2-4) ( - Computer Analysis of Images and Patterns ) (Springer Verlag Germany Heidelberg DEU ) - n. volume 9256 - pp. da 801 a 811 ISBN: 978-3-319-23191-4; 978-3-319-23192-1 | 978-3-319-23192-1 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Video decomposition techniques are fundamental tools for allowing effective video browsing and re-using. In this work, we consider the problem of segmenting broadcast videos into coherent scenes, and propose a scene detection algorithm based on hierarchical clustering, along with a very fast state-of-the-art shot segmentation approach. Experiments are performed to demonstrate the effectiveness of our algorithms, by comparing against recent proposals for automatic shot and scene segmentation.

Solera, Francesco; Calderara, Simone; Cucchiara, Rita ( 2015 ) - Towards the evaluation of reproducible robustness in tracking-by-detection ( 12th IEEE International Conference on Advanced Video and Signal-Based Surveillance - Karlsrhue (Germany) - 25-28 August) ( - AVSS 2015 : 12th IEEE International Conference on Advanced Video and Signal-Based Surveillance : August 25-28, 2015, Karlsruhe Institute of Technology & Fraunhofer IOSB, Karlsruhe, Germany ) (IEEE Danvers (MA) USA ) - pp. da 1 a 6 ISBN: 9781467376327 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Conventional experiments on MTT are built upon the belief that fixing the detections to different trackers is sufficient to obtain a fair comparison. In this work we argue how the true behavior of a tracker is exposed when evaluated by varying the input detections rather than by fixing them. We propose a systematic and reproducible protocol and a MATLAB toolbox for generating synthetic data starting from ground truth detections, a proper set of metrics to understand and compare trackers peculiarities and respective visualization solutions.

Alletto, Stefano; Serra, Giuseppe; Calderara, Simone; Cucchiara, Rita ( 2015 ) - Understanding social relationships in egocentric vision - PATTERN RECOGNITION - n. volume 48 - pp. da 4082 a 4096 ISSN: 0031-3203 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on a personal point of view resulting difficult to be a-priori modeled. We propose the adoption of the unique head mounted cameras first person perspective (ego-vision) to promptly detect people interaction in different social contexts. The proposal relies on a complete and reliable system that extracts people׳s head pose combining landmarks and shape descriptors in a temporal smoothed HMM framework. Finally, interactions are detected through supervised clustering on mutual head orientation and people distances exploiting a structural learning framework that specifically adjusts the clustering measure according to a peculiar scenario. Our solution provides the flexibility to capture the interactions disregarding the number of individuals involved and their level of acquaintance in context with a variable degree of social involvement. The proposed system shows competitive performances on both publicly available ego-vision datasets and ad hoc benchmarks built with real life situations.

Alletto, Stefano; Serra, Giuseppe; Rita, Cucchiara ( 2015 ) - Wearable Vision for Retrieving Architectural Details in Augmented Tourist Experiences ( International Conference on Intelligent Technologies for Interactive Entertainment - Turin, Italy - 2015) ( - Proceedings of International Conference on Intelligent Technologies for Interactive Entertainment ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The interest in cultural cities is in constant growth, and so is the demand for new multimedia tools and applications that enrich their fruition. In this paper we propose an egocentric vision system to enhance tourists' cultural heritage experience. Exploiting a wearable board and a glass-mounted camera, the visitor can retrieve architectural details of the historical building he is observing and receive related multimedia contents. To obtain an effective retrieval procedure we propose a visual descriptor based on the covariance of local features. Differently than the common Bag of Words approaches our feature vector does not rely on a generated visual vocabulary, removing the dependence from a specific dataset and obtaining a reduction of the computational cost. 3D modeling is used to achieve a precise visitor's localization that allows browsing visible relevant details that the user may otherwise miss. Experimental results conducted on a publicly available cultural heritage dataset show that the proposed feature descriptor outperforms Bag of Words techniques.

M. Manfredi; C. Grana; S. Calderara; R. Cucchiara ( 2014 ) - A complete system for garment segmentation and color classification - MACHINE VISION AND APPLICATIONS - n. volume 25 - pp. da 955 a 969 ISSN: 0932-8092 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper, we propose a general approach for automatic segmentation, color-based retrieval and classification of garments in fashion store databases, exploiting shape and color information. The garment segmentation is automatically initialized by learning geometric constraints and shape cues, then it is performed by modeling both skin and accessory colors with Gaussian Mixture Models. For color similarity retrieval and classification, to adapt the color description to the users’ perception and the company marketing directives, a color histogram with an optimized binning strategy, learned on the given color classes, is introduced and combined with HOG features for garment classification. Experiments validating the proposed strategy, and a free-to-use dataset publicly available for scientific purposes, are finally detailed.

Vezzani, Roberto; Cucchiara, Rita ( 2014 ) - Benchmarking for Person Re-identification ( - Person Re-Identification ) (Springer-Verlag London London GBR ) - pp. da 333 a 349 ISBN: 9781447162957; 9781447162964 | 9781447162964 ISSN: 2191-6586 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

The evaluation of computer vision and pattern recognition systems is usually a burdensome and time-consuming activity. In this chapter all the benchmarks publicly available for re-identification will be reviewed and compared, starting from the ancestors VIPeR and Caviar to the most recent datasets for 3D modeling such as SARC3d (with calibrated cameras) and RGBD-ID (with range sensors). Specific requirements and constraints are highlighted and reported for each of the described collections. In addition, details on the metrics that are mostly used to test and evaluate the re-identification systems are provided.

G. Serra; C. Grana; M. Manfredi; R. Cucchiara ( 2014 ) - Covariance of Covariance Features for Image Classification ( ACM International Conference on Multimedia Retrieval - Glasgow - Apr 1-4) ( - Proceedings of International Conference on Multimedia Retrieval ) (ACM New York, NY USA ) - pp. da 411 a 414 ISBN: 9781450327824 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose a novel image descriptor built by computing the covariance of pixel level features on densely sampled patches and encoding them using their covariance. Appropriate projections to the Euclidean space and feature normalizations are employed in order to provide a strong descriptor usable with linear classifiers. In order to remove border effects, we further enhance the Spatial Pyramid representation with bilinear interpolation. Experimental results conducted on two common datasets for object and texture classification show that the performance of our method is comparable with state of the art techniques, but removing any dataset specific dependency in the feature encoding step.

Manfredi, Marco; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita ( 2014 ) - Detection of static groups and crowds gathered in open spaces by texture classification ( - Pattern Recognition Letters ) - PATTERN RECOGNITION LETTERS - n. volume 44 - pp. da 39 a 48 ISSN: 0167-8655 [Articolo in rivista (262) - Articolo su rivista]
Abstract

A surveillance system specifically developed to manage crowded scenes is described in this paper. In particular we focused on static crowds, composed by groups of people gathered and stayed in the same place for a while. The detection and spatial localization of static crowd situations is performed by means of a One Class Support Vector Machine, working on texture features extracted at patch level. Spatial regions containing crowds are identified and filtered using motion information to prevent noise and false alarms due to moving flows of people. By means of one class classification and inner texture descriptors, we are able to obtain, from a single training set, a sufficiently general crowd model that can be used for all the scenarios that shares a similar viewpoint. Tests on public datasets and real setups validate the proposed system.

Alletto, Stefano; Serra, Giuseppe; Calderara, Simone; Solera, Francesco; Cucchiara, Rita ( 2014 ) - From Ego to Nos-Vision: Detecting Social Relationships in First-Person Views ( Workshop on Egocentric (First-person) Vision - Columbus, Ohio - 23-28 June 2014) ( - 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a novel approach to detect groups in ego-vision scenarios. People in the scene are tracked through the video sequence and their head pose and 3D location are estimated. Based on the concept of f-formation, we define with the orientation and distance an inherently social pairwise feature that describes the affinity of a pair of people in the scene. We apply a correlation clustering algorithm that merges pairs of people into socially related groups. Due to the very shifting nature of social interactions and the different meanings that orientations and distances can assume in different contexts, we learn the weight vector of the correlation clustering using Structural SVMs. We extensively test our approach on two publicly available datasets showing encouraging results when detecting groups from first-person camera views.

Baraldi, Lorenzo; Paci, Francesco; Serra, Giuseppe; Benini, Luca; Cucchiara, Rita ( 2014 ) - Gesture Recognition in Ego-Centric Videos using Dense Trajectories and Hand Segmentation ( IEEE Embedded Vision Workshop - Columbus, Ohio - 23-28 June 2014) ( - Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on ) (IEEE ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We present a novel method for monocular hand gesture recognition in ego-vision scenarios that deals with static and dynamic gestures and can achieve high accuracy results using a few positive samples. Specifically, we use and extend the dense trajectories approach that has been successfully introduced for action recognition. Dense features are extracted around regions selected by a new hand segmentation technique that integrates superpixel classification, temporal and spatial coherence. We extensively testour gesture recognition and segmentation algorithms on public datasets and propose a new dataset shot with a wearable camera. In addition, we demonstrate that our solution can work in near real-time on a wearable device.

Alletto, Stefano; Serra, Giuseppe; Calderara, Simone; Cucchiara, Rita ( 2014 ) - Head Pose Estimation in First-Person Camera Views ( International Conference on Pattern Recognition - Stockholm, Sweden - 24-28 Aug. 2014) ( - International Conference on Pattern Recognition ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a new method for head pose real-time estimation in ego-vision scenarios that is a key step in the understanding of social interactions. In order to robustly detect head under changing aspect ratio, scale and orientation we use and extend the Hough-Based Tracker which allows to follow simultaneously each subject in the scene. In an ego-vision scenario where a group interacts in a discussion, each subject's head orientation will be more likely to remain focused for a while on the person who has the floor. In order to encode this behavior we include a stateful Hidden Markov Model technique that enforces the predicted pose with the temporal coherence from a video sequence. We extensively test our approach on several indoor and outdoor ego-vision videos with high illumination variations showing its validity and outperforming other recent related state of the art approaches.

D. Coppi; C. Grana; R. Cucchiara ( 2014 ) - Illustrations Segmentation in Digitized Documents Using Local Correlation Features ( 10th Italian Research Conference on Digital Libraries - Padova - Jan. 30-31) ( - Proceedings of the 10th Italian Research Conference on Digital Libraries ) (Elsevier Science BV Amsterdam NLD ) - PROCEDIA COMPUTER SCIENCE - n. volume 38 - pp. da 76 a 83 ISSN: 1877-0509 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose an approach for Document Layout Analysis based on local correlation features. We identify and extract illustrations in digitized documents by learning the discriminative patterns of textual and pictorial regions. The proposal has been demonstrated to be effective on historical datasets and to outperform the state-of-the-art in presence of challenging documents with a large variety of pictorial elements.

Pistocchi, S.; Calderara, S.; Barnard, S.; Ferri, N.; Cucchiara, R. ( 2014 ) - Kernelized Structural Classification for 3D Dogs Body Parts Detection ( Pattern Recognition (ICPR), 2014 22nd International Conference on - Stockholm SWE - 24-28 Aug 2014) ( - Pattern Recognition (ICPR), 2014 22nd International Conference on ) (IEEE USA ) - pp. da 1993 a 1998 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost neglected. Those few approaches are mostly focused on preserving livestock economic value while attention on the welfare of companion animals, like dogs, is now emerging as a social need. In this work, following the analogy with human behavior recognition, we propose a system for recognizing body parts of dogs kept in pens. We decide to adopt both 2D and 3D features in order to obtain a rich description of the dog model. Images are acquired using the Microsoft Kinect to capture the depth map images of the dog. Upon depth maps a Structural Support Vector Machine (SSVM) is employed to identify the body parts using both 3D features and 2D images. The proposal relies on a kernelized discriminative structural classificator specifically tailored for dogs independently from the size and breed. The classification is performed in an online fashion using the LaRank optimization technique to obtaining real time performances. Promising results have emerged during the experimental evaluation carried out at a dog shelter, managed by IZSAM, in Teramo, Italy.

Grana, Costantino; Serra, Giuseppe; Manfredi, Marco; Coppi, Dalia; Cucchiara, Rita ( 2014 ) - Layout analysis and content enrichment of digitized books - MULTIMEDIA TOOLS AND APPLICATIONS - pp. da 1 a 22 ISSN: 1380-7501 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we describe a system for automatically analyzing old documents and creating hyper linking between different epochs, thus opening ancient documents to young people and to make them available on the web with old and current content. We propose a supervised learning approach to segment text and illustration of digitized old documents using a texture feature based on local correlation aimed at detecting the repeating patterns of text regions and differentiate them from pictorial elements. Moreover we present a solution to help the user in finding contemporary content connected to what is automatically extracted from the ancient documents.

M. Manfredi; C. Grana; R. Cucchiara ( 2014 ) - Learning Graph Cut Energy Functions for Image Segmentation ( 22nd International Conference on Pattern Recognition - Stockholm, Sweden - Aug. 24-28) ( - Proceedings of the 22nd International Conference on Pattern Recognition ) (IEEE - Institute of Electrical and Electronics Engineers Piscataway, NJ USA ) - pp. da 960 a 965 ISBN: 978-1-4799-5208-3 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we address the task of learning how to segment a particular class of objects, by means of a training set of images and their segmentations. In particular we propose a method to overcome the extremely high training time of a previously proposed solution to this problem, Kernelized Structural Support Vector Machines. We employ a one-class SVM working with joint kernels to robustly learn significant support vectors (representative image-mask pairs) and accordingly weight them to build a suitable energy function for the graph cut framework. We report results obtained on two public datasets and a comparison of training times on different training set sizes.

M. Manfredi; C. Grana; R. Cucchiara ( 2014 ) - Learning Superpixel Relations for Supervised Image Segmentation ( 21st International Conference on Image Processing - Paris, France - Oct. 27-30) ( - Proceedings of the 21st International Conference on Image Processing ) (IEEE - Institute of Electrical and Electronics Engineers Piscataway, NJ USA ) - pp. da 4437 a 4441 ISBN: 978-1-4799-5750-7 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose to extend the well known graph cut segmentation framework by learning superpixel relations and use them to weight superpixel-to-superpixel edges in a superpixel graph. Adjacent superpixel-pairs are analyzed to build an object boundary model, able to discriminate between superpixel-pairs belonging to the same object or placed on the edge between the foreground object and the background. Several superpixel-pair features are investigated and exploited to build a non-linear SVM to learn object boundary appearance. The adoption of this modified graph cut enhances the performance of a previously proposed segmentation method on two publicly available datasets, reaching state-of-the-art results.

Baltieri, Davide; Vezzani, Roberto; Cucchiara, Rita ( 2014 ) - Mapping Appearance Descriptors on 3D Body Models for People Re-identification ( - International Journal of Computer Vision ) - INTERNATIONAL JOURNAL OF COMPUTER VISION - n. volume 111 - pp. da 345 a 364 ISSN: 0920-5691 [Articolo in rivista (262) - Articolo su rivista]
Abstract

People Re-identification aims at associating multiple instances of a person’s appearance acquired from different points of view, different cameras, or after a spatial or a limited temporal gap to the same identifier. The basic hypothesis is that the person’s appearance is mostly constant. Many appearance descriptors have been adopted in the past, but they are often subject to severe perspective and view-point issues. In this paper, we propose a complete re-identification framework which exploits non-articulated 3D body models to spatially map appearance descriptors (color and gradient histograms) into the vertices of a regularly sampled 3D body surface. The matching and the shot integration steps are directly handled in the 3D body model, reducing the effects of occlusions, partial views or pose changes, which normally afflict 2D descriptors. A fast and effective model to image alignment is also proposed. It allows operation on common surveillance cameras or image collections. A comprehensive experimental evaluation is presented using the benchmark suite 3DPeS

D. Borghesani; C. Grana; R. Cucchiara ( 2014 ) - Miniature illustrations retrieval and innovative interaction for digital illuminated manuscripts - MULTIMEDIA SYSTEMS - n. volume 20 - pp. da 65 a 79 ISSN: 0942-4962 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we propose a multimedia solution for the interactive exploration of illuminated manuscripts. We leveraged on the joint exploitation of content-based image retrieval and relevance feedback to provide an effective mechanism to navigate through the manuscript and add custom knowledge in the form of tags. The similarity retrieval between miniature illustrations is based on covariance descriptors, integrating color, spatial and gradient information. The proposed relevance feedback technique, namely Query Remapping Feature Space Warping, accounts for the user’s opinions by accordingly warping the data points. This is obtained by means of a remapping strategy (from the Riemannian space where covariance matrices lie, referring back to Euclidean space) useful to boost the retrieval performance. Experiments are reported to show the quality of the proposal. Moreover, the complete prototype with user interaction, as already showcased at museums and exhibitions, is presented.

Lucchese, Claudio; Cucchiara, Rita; Lombardi, Martino; Pieracci, Augusto; Santinelli, Paolo; Vezzani, Roberto ( 2014 ) - Substrate for a sensitive floor and method for displaying loads on the substrate [Brevetto (285) - Brevetto]
Abstract

The substrate (1; 50) for making a sensitive floor comprises: a first frame made of high-conductivity sensing means (2a-2d) having a first orientation; a second frame made of high-conductivity sensing means (3a-3d) which is adapted to be laid on said first frame and has a second orientation, other than said first orientation, said second frame (3a-3d) forming a support layer for floor finishing products; an element (4) made of a conductive material, which comprises: an elastically compressible thickness (S1), two opposite faces (104, 204) contacting said two first and second frames (2a-2d), (3a-3d), an electric resistor whose resistance is proportional to said thickness (S1).

A. W. M. Smeulder;D. M. Chu;R. Cucchiara;S. Calderara;A. Dehghan;M. Shah ( 2014 ) - Visual Tracking: An Experimental Survey - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 36 - pp. da 1442 a 1468 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is difficult problem, therefore it remains a most active area of research in Computer Vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.

Camurri, Marco; Vezzani, Roberto; Cucchiara, Rita ( 2014 ) - 3D Hough transform for sphere recognition on point clouds - MACHINE VISION AND APPLICATIONS - n. volume 25 - pp. da 1877 a 1891 ISSN: 0932-8092 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Three-dimensional object recognition on range data and 3D point clouds is becoming more important nowadays. Since many real objects have a shape that could be approximated by simple primitives, robust pattern recognition can be used to search for primitive models. For example, the Hough transform is a well-known technique which is largely adopted in 2D image space. In this paper, we systematically analyze different probabilistic/randomized Hough transform algorithms for spherical object detection in dense point clouds. In particular, we study and compare four variants which are characterized by the number of points drawn together for surface computation into the parametric space and we formally discuss their models. We also propose a new method that combines the advantages of both single-point and multi-point approaches for a faster and more accurate detection. The methods are tested on synthetic and real datasets.

C. Grana; D. Borghesani; M. Manfredi; R. Cucchiara ( 2013 ) - A Fast Approach for Integrating ORB Descriptors in the Bag of Words Model ( IS&T/SPIE Electronic Imaging - Burlingame, California, USA - Feb 4-6) ( - Multimedia Content and Mobile Devices ) (SPIE - Society of Photo-Optical Instrumentation Bellingham, Washington USA ) - n. volume 8667 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose to integrate the recently introduces ORB descriptors in the currently favored approach for image classification, that is the Bag of Words model. In particular the problem to be solved is to provide a clustering method able to deal with the binary string nature of the ORB descriptors. We suggest to use a k-means like approach, called k-majority, substituting Euclidean distance with Hamming distance and majority selected vector as the new cluster center. Results combining this new approach with other features are provided over the ImageCLEF 2011 dataset.

PICCININI P.; GAMBERINI R.; PRATI A.; RIMINI B.; CUCCHIARA R. ( 2013 ) - AN AUTOMATED PICKING WORKSTATION FOR HEALTHCARE APPLICATIONS - COMPUTERS & INDUSTRIAL ENGINEERING - n. volume 64 - pp. da 653 a 668 ISSN: 0360-8352 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The costs associated with the management of healthcare systems have been subject to continuous scrutiny for some time now, with a view to reducing them without affecting the quality as perceived by final users. A number of different solutions have arisen based on centralisation of healthcare services and investments in Information Technology (IT). One such example is centralised management of pharmaceuticals among a group of hospitals which is then incorporated into the different steps of the automation supply chain. This paper focuses on a new picking workstation available for insertion in automated pharmaceutical distribution centres and which is capable of replacing manual workstations and bringing about improvements in working time. The workstation described uses a sophisticated computer vision algorithm to allow picking of very diverse and complex objects randomly available on a belt or in bins. The algorithm exploits state-of-the-art feature descriptors for an approach that is robust against occlusions and distracting objects, and invariant to scale, rotation or illumination changes. Finally, the performance of the designed picking workstation is tested in a large experimentation focused on the management of pharmaceutical items.

M. Manfredi; C. Grana; R. Cucchiara ( 2013 ) - Automatic Single-Image People Segmentation and Removal for Cultural Heritage Imaging ( 2nd International Workshop on Multimedia for Cultural Heritage - Napoli - Sep 9) ( - New Trends in Image Analysis and Processing – ICIAP 2013 ) (Springer-Verlag Berlin Heidelberg DEU ) - n. volume LNCS 8158 - pp. da 188 a 197 ISBN: 9783642411892; 9783642411908 | 9783642411908 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, the problem of automatic people removal from digital photographs is addressed. Removing unintended people from a scene can be very useful to focus further steps of image analysis only on the object of interest, A supervised segmentation algorithm is presented and tested in several scenarios.

C. Grana; G. Serra; M. Manfredi; R. Cucchiara ( 2013 ) - Beyond Bag of Words for Concept Detection and Search of Cultural Heritage Archives ( 6th International Conference on Similarity Search and Applications (SISAP 2013) - A Coruña, Spain - Oct 2-4) ( - SISAP 2013 ) (Springer-Verlag Berlin Heidelberg DEU ) - n. volume LNCS 8199 - pp. da 233 a 244 ISBN: 978-3-642-41061-1; 9783642410628 | 9783642410628 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Several local features have become quite popular for concept detection and search, due to their ability to capture distinctive details. Typically a Bag of Words approach is followed, where a codebook is built by quantizing the local features. In this paper, we propose to represent SIFT local features extracted from an image as a multivariate Gaussian distribution, obtaining a mean vector and a covariance matrix. Differently from common techniques based on the Bag of Words model, our solution does not rely on the construction of a visual vocabulary, thus removing the dependence of the image descriptors on the specific dataset and allowing to immediately retargeting the features to different classification and search problems. Experimental results are conducted on two very different Cultural Heritage image archives, composed of illuminated manuscript miniatures, and architectural elements pictures collected from the web, on which the proposed approach outperforms the Bag of Words technique both in classification and retrieval.

Giuseppe Serra; Marco Camurri; Lorenzo Baraldi; Michela Benedetti; Rita Cucchiara ( 2013 ) - Hand Segmentation for Gesture Recognition in EGO-Vision ( ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices - Barcelona, Spain - 21 October 2013) ( - Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices ) (ACM New York USA ) - pp. da 31 a 36 ISBN: 978-1-4503-2399-4 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Portable devices for first-person camera views will play a central role in future interactive systems. One necessary step for feasible human-computer guided activities is gesture recognition, preceded by a reliable hand segmentation from egocentric vision. In this work we provide a novel hand segmentation algorithm based on Random Forest superpixel classification that integrates light, time and space consistency. We also propose a gesture recognition method based Exemplar SVMs since it requires a only small set of positive samples, hence it is well suitable for the egocentric video applications. Furthermore, this method is enhanced by using segmented images instead of full frames during test phase. Experimental results show that our hand segmentation algorithm outperforms the state-of-the-art approaches and improves the gesture recognition accuracy on both the publicly available EDSH dataset and our dataset designed for cultural heritage applications.

Lombardi, Martino; Pieracci, Augusto; Santinelli, Paolo; Vezzani, Roberto; Cucchiara, Rita ( 2013 ) - Human Behavior Understanding with Wide Area Sensing Floors ( 4th International Workshop on Human Behavior Understanding, HBU 2013 - Barcelona, Spain - 22 October 2013) ( - Lecture Notes in Computer ScienceHuman Behavior Understanding ) (Springer International Publishing Cham (ZG) CHE ) - n. volume 8212 - pp. da 112 a 123 ISBN: 9783319027135; 9783319027142 | 9783319027142 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The research on innovative and natural interfaces aims at developing devices able to capture and understand the human behavior without the need of a direct interaction. In this paper we propose and describe a framework based on a sensing floor device. The pressure field generated by people or objects standing on the floor is captured and analyzed. Local and global features are computed by a low level processing unit and sent to high level interfaces. The framework can be used in different applications, such as entertainment, education or surveillance. A detailed description of the sensing element and the processing architectures is provided, together with some sample applications developed to test the device capabilities.

C. Grana; G. Serra; M. Manfredi; R. Cucchiara ( 2013 ) - Image Classification with Multivariate Gaussian Descriptors ( 17th International Conference on Image Analysis and Processing (ICIAP 2013) - Napoli - Sep 11-13) ( - 17th International Conference on Image Analysis and Processing (ICIAP 2013) ) (Springer-Verlag Berlin Heidelberg DEU ) - n. volume LNCS 8157 - pp. da 111 a 120 ISBN: 978-3-642-41183-0; 978-3-642-41184-7; 978364241183 | 978-3-642-41184-7 | 9783642411830 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Techniques based on Bag Of Words approach represent images by quantizing local descriptors and summarizing their distribution in a histogram. Dierently, in this paper we describe an image as multivariate Gaussian distribution, estimated over the extracted local descriptors. The estimated distribution is mapped to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. The experimental results on Caltech-101 and ImageCLEF2011 show that the method obtains competitive performance with state-of-the art approaches.

Baltieri, Davide; Vezzani, Roberto; Cucchiara, Rita ( 2013 ) - Learning articulated body models for people re-identification ( 21st ACM international conference on Multimedia - MM '13 - Barcelona - October 21-25, 2013) ( - Proceedings of the 21st ACM international conference on Multimedia - MM '13 ) (ACM New York, NY, USA USA ) - pp. da 557 a 560 ISBN: 9781450324045 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

People re-identification is a challenging problem in surveillance and forensics and it aims at associating multiple instances of the same person which have been acquired from different points of view and after a temporal gap. Image-based appearance features are usually adopted but, in addition to their intrinsically low discriminability, they are subject to perspective and view-point issues. We propose to completely change the approach by mapping local descriptors extracted from RGB-D sensors on a 3D body model for creating a view-independent signature. An original bone-wise color descriptor is generated and reduced with PCA to compute the person signature. The virtual bone set used to map appearance features is learned using a recursive splitting approach. Finally, people matching for re-identification is performed using the Relaxed Pairwise Metric Learning, which simultaneously provides feature reduction and weighting. Experiments on a specific dataset created with the Microsoft Kinect sensor and the OpenNi libraries prove the advantages of the proposed technique with respect to state of the art methods based on 2D or non-articulated 3D body models.

M. Fornaciari; A. Prati; C. Grana; R. Cucchiara ( 2013 ) - Lightweight Sign Recognition for Mobile Devices ( Seventh ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC 2013) - Palm Spring, CA - Oct. 29 - Nov. 1) ( - Proceedings Of the Seventh ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC 2013) ) (IEEE - Institute of Electrical and Electronics Engineers Piscataway, NJ USA ) - pp. da 124 a 129 ISBN: 9781479921645 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The diffusion of powerful mobile devices has posed the basis for new applications implementing on the devices (which are embedded devices) sophisticated computer vision and pattern recognition algorithms. This paper describes the implementation of a complete system for automatic recognition of places localized on a map through the recognition of significant signs by means of the camera of a mobile device (smartphone, tablet, etc.). The paper proposes a novel classification algorithm based on the innovative use of bag-of-words on ORB features. The recognition is achieved using a simple yet effective search scheme which exploits GPS localization to limit the possible matches. This simple solution brings several advantages, such as the speed also on limited-resource devices, the usability also with limited training samples and the easiness of adapting to new training samples and classes. The overall architecture of the system is based on a REST-JSON client-server architecture. The experimental results have been conducted in a real scenario and evaluating the different parameters which influence the performance.

G. Serra; C. Grana; M. Manfredi; R. Cucchiara ( 2013 ) - Modeling Local Descriptors with Multivariate Gaussians for Object and Scene Recognition ( 21th International Conference on Multimedia (ACM Multimedia 2013) - Barcelona, Catalunya, Spain - Oct 21-25) ( - Proceedings of the 21th International Conference on Multimedia (ACM Multimedia 2013) ) (ACM New York USA ) - pp. da 709 a 712 ISBN: 978-1-4503-2404-5 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Common techniques represent images by quantizing local descriptors and summarizing their distribution in a histogram. In this paper we propose to employ a parametric description and compare its capabilities to histogram based approaches. We use the multivariate Gaussian distribution, applied over the SIFT descriptors, extracted with dense sampling on a spatial pyramid. Every distribution is converted to a high-dimensional descriptor, by concatenating the mean vector and the projection of the covariance matrix on the Euclidean space tangent to the Riemannian manifold. Experiments on Caltech-101 and ImageCLEF2011 are performed using the Stochastic Gradient Descent solver, which allows to deal with large scale datasets and high dimensional feature spaces.

Vezzani, Roberto; Baltieri, Davide; Cucchiara, Rita ( 2013 ) - People reidentification in surveillance and forensics: a Survey - ACM COMPUTING SURVEYS - n. volume 46 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The field of surveillance and forensics research is currently shifting focus and is now showing an ever increasing interest in the task of people reidentification. This is the task of assigning the same identifier to all instances of a particular individual captured in a series of images or videos, even after the occurrence of significant gaps over time or space. People reidentification can be a useful tool for people analysis in security as a data association method for long-term tracking in surveillance. However, current identification techniques being utilized present many difficulties and shortcomings. For instance, they rely solely on the exploitation of visual cues such as color, texture, and the object's shape. Despite the many advances in this field, reidentification is still an open problem. This survey aims to tackle all the issues and challenging aspects of people reidentification while simultaneously describing the previously proposed solutions for the encountered problems. This begins with the first attempts of holistic descriptors and progresses to the more recently adopted 2D and 3D model-based approaches. The survey also includes an exhaustive treatise of all the aspects of people reidentification, including available datasets, evaluation metrics, and benchmarking.

Martino Lombardi;Augusto Pieracci;Paolo Santinelli;Roberto Vezzani;Rita Cucchiara ( 2013 ) - Sensing floors for privacy-compliant surveillance of wide areas ( 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2013 - Krakow, Poland - 27-30 Aug. 2013) ( - Advanced Video and Signal Based Surveillance (AVSS), 2013 10th IEEE International Conference on ) (IEEE - Institute of Electrical and Electronics Engineering - USA ) - n. volume 1 - pp. da 105 a 110 ISBN: 9781479907038 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Surveillance systems can really benefit from the integration of multiple and heterogeneous sensors. In this paper we describe an innovative sensing floor. Thanks to its low cost and ease of installation, the floor is suitable for both private and public environments, from narrow zones to wide areas. The floor is made adding a sensing layer below commercial floating tiles. The sensor is scalable, reliable, and completely invisible to the users. The temporal and spatial resolutions of the data are high enough to identify the presence of people, to recognize their behavior and to detect events in a privacy compliant way. Experimental results on a real prototype implementation confirm the potentiality of the framework.

Solera, Francesco; Calderara, Simone; Cucchiara, Rita ( 2013 ) - Structured learning for detection of social groups in crowd ( 10th IEEE International Conference on Advanced Video and Signal-Based Surveillance: AVSS 2013 - Krakov (PL) - August 27-30 2013) ( - 2013 10th IEEE International Conference on Advanced Video and Signal-Based Surveillance : AVSS 2013 : August 27-30, 2013, Kraków, Poland ) (IEEE Piscataway (NJ) USA ) - n. volume 0 - pp. da 7 a 12 ISBN: 9781479907038 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Group detection in crowds will play a key role in future behavior analysis surveillance systems. In this work we build a new Structural SVM-based learning framework able to solve the group detection task by exploiting annotated video data to deduce a sociologically motivated distance measure founded on Hall's proxemics and Granger's causality. We improve over state-of-the-art results even in the most crowded test scenarios, while keeping the classification time affordable for quasi-real time applications. A new scoring scheme specifically designed for the group detection task is also proposed.

C. Grana; G. Serra; M. Manfredi; R. Cucchiara; R. Martoglia; F. Mandreoli ( 2013 ) - UNIMORE at ImageCLEF 2013: Scalable Concept Image Annotation ( CLEF 2013 Labs - Valencia, Spain - Sep 23-26) ( - CLEF 2013 Working Notes ) (- Valencia ESP ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose a large-scale Image annotation system for the Scalable Concept Image Annotation task. For each concept to be detected a separated classifier is built using the provided textual annotation. Images are represented as a Multivariate Gaussian distribution of a set of local features extracted over a dense regular grid. Textual analysis, on the web pages containing training images, is performed to retrieve a relevant set of samples for learning each concept classifier. An online SVMs solver based on Stochastic Gradient Descent is used to manage the large amount of training data. Experimental results show that the combination of different kind of local features encoded with our strategy achieves very competitive performance both in terms of mAP and mean F-measure.

Vezzani, Roberto; Cucchiara, Rita ( 2013 ) - Video surveillance online repository (ViSOR) ( 4th ACM Multimedia Systems Conference on - MMSys '13 - Oslo - Norvegia - Feb. 27th, 2013) ( - Proceedings of the 4th ACM Multimedia Systems Conference on - MMSys '13 ) (ACM New York USA ) - pp. da 90 a 95 ISBN: 9781450318945 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describe the ViSOR (Video Surveillance Online Repository) repository, designed with the aim of establishing an open platform for collecting, annotating, retrieving, and sharing surveillance videos, as well as evaluating the performance of automatic surveillance systems. The repository is free and researchers can collaborate sharing their own videos or datasets. Most of the included videos are annotated. Annotations are based on a reference ontology which has been defined integrating hundreds of concepts, some of them coming from the LSCOM and MediaMill ontologies. A new annotation classification schema is also provided, which is aimed at identifying the spatial, temporal and domain detail level used. The web interface allows video browsing, querying by annotated concepts or by keywords, compressed video previewing, media downloading and uploading. Finally, ViSOR includes a performance evaluation desk which can be used to compare different annotations.

C. Grana; D. Borghesani; R. Cucchiara ( 2012 ) - Class-based color bag of words for fashion retrieval ( 2012 IEEE International Conference on Multimedia and Expo - Melbourne, Australia - Jul 9-13) ( - Proceedings of the 2012 IEEE International Conference on Multimedia and Expo ) (IEEE / Institute of Electrical and Electronics Engineers Incorporated Piscataway, NJ USA ) - pp. da 444 a 449 ISBN: 9780769547114 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Color signatures, histograms and bag of colors are basic and effective strategies for describing the color content of images, for retrieving images by their color appearance or providing color annotation. In some domains, colors assume a specific meaning for users and the color-based classification and retrieval should mirror the initial suggestions given by users in the training set. For instance in fashion world, the names given to the dominant color of a garment or a dress reflect the fashion dictact and not an uniform division of the color space.In this paper we propose a general approach to implement color signature as a trained bag of words, defined on the basis of user defined color classes. The novel Class-based Color Bag of Words is a easy computable bag of words of color, constructed following an approach similar to the Median Cut algorithm, but biased by color distribution in the trained classes. Moreover, to dramatically reduce the computational effort we propose 3D integral histograms, a 3D extension of integral images, easily extensible for many histogram-based signature in 3D color space. Several comparisons in large fashion datasets confirm the discriminant power of this signature.

Calderara, Simone; Prati, Andrea; Cucchiara, Rita ( 2012 ) - Integrate tool for online analysis and offline mining of people trajectories - IET COMPUTER VISION - n. volume 6 - pp. da 334 a 347 ISSN: 1751-9632 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In the past literature, online alarm-based video-surveillance and offline forensic-based data mining systems are often treated separately, even from different scientific communities. However, the founding techniques are almost the same and, despite some examples in commercial systems, the cases on which an integrated approach is followed are limited. For this reason, this study describes an integrated tool capable of putting together these two subsystems in an effective way. Despite its generality, the proposal is here reported in the case of people trajectory analysis, both in real time and offline. Trajectories are modelled based on either their spatial location or their shape, and proper similarity measures are proposed. Special solutions to meet real-time requirements in both cases are also presented and the trade-off between efficiency and efficacy is analysed by comparing when using a statistical model and when not. Examples of results in large datasets acquired in the University campus are reported as preliminary evaluation of the system.

R. Cucchiara; A. Prati; R. Vezzani ( 2012 ) - Intelligent Video Surveillance ( - Critical Infrastructure Security: Assessment, Prevention, Detection, Response ) (WIT Press Southampton GBR ) - pp. da 177 a 189 ISBN: 9781845645625 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

Safety and security reasons are pushing the growth of surveillance systems, for both prevention and forensic tasks. Unfortunately, most of the installed systems have recording capability only, with quality so poor that makes them completely unhelpful. This chapter will introduce the concepts of modern systems for Intelligent Video Surveillance (IVS), with the claim of providing neither a complete treatment nor a technical description of this topic but of representing a simple and concise panorama of the motivations, components, and trends of these systems. Different from CCTV systems, IVS should be able, for instance, to monitor people in public areas and smart homes, to control urban traffi c, and to identity assessment for security and safety of critical infrastructure.

C. Grana; S. Calderara; D. Borghesani; R. Cucchiara ( 2012 ) - Learning Non-Target Items for Interesting Clothes Segmentation in Fashion Images ( 21st International Conference on Pattern Recognition (ICPR 2012) - Tsukuba Science City, Japan - Nov 11-15) ( - Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012) ) (IEEE Computer Society Press Los Alamitos, CA USA ) - pp. da 3317 a 3320 ISBN: 9784990644116 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose a color-based approach for skin detection and interest garment selection aimed at an automatic segmentation of pieces of clothing. For both purposes, the color description is extracted by an iterative energy minimization approach and an automatic initialization strategy is proposed by learning geometric constraints and shape cues. Experiments confirms the good performance of this technique both in the context of skin removal and in the context of classification of garments.

R. Cucchiara; C. Grana; D. Borghesani; M. Agosti; A.D. Bagdanov ( 2012 ) - Multimedia for Cultural Heritage: Key Issues ( International Workshop on Multimedia for Cultural Heritage - Modena - May 3) ( - Multimedia for Cultural Heritage ) (Springer Heidelberg DEU ) - n. volume CCIS 247 - pp. da 206 a 216 ISBN: 9783642279775 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Multimedia technologies have recently created the conditions for a true revolution in the Cultural Heritage domain, particularly in reference to the study, exploitation, and fruition of artistic works. New opportunities are arising for researchers in the field of multimedia to share their research results with people coming from the field of art and culture, and viceversa. This paper gathers together opinions and ideas shared during the final discussion session at the 1st International Workshop on Multimedia for Cultural Heritage, as a summary of the problems and possible directions to solve to them.

Baltieri, Davide; Vezzani, Roberto; Cucchiara, Rita ( 2012 ) - People Orientation Recognition by Mixtures of Wrapped Distributions on Random Trees ( 12th European Conference on Computer Vision - Florence, Italy - October 7-13, 2012) ( - Computer Vision -- ECCV 2012 ) - n. volume 7576 - pp. da 270 a 283 ISBN: 978-3-642-33714-7 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The recognition of people orientation in single images is still an open issue in several real cases, when the image resolution is poor, body parts cannot be distinguished and localized or motion cannot be exploited. However, the estimation of a person orientation, even an approximated one, could be very useful to improve people tracking and re-identification systems, or to provide a coarse alignment of body models on the input images. In these situations, holistic features seem to be more effective and faster than model based 3D reconstructions. In this paper we propose to describe the people appearance with multi-level HoG feature sets and to classify their orientation using an array of Extremely Randomized Trees classifiers trained on quantized directions. The outputs of the classifiers are then integrated into a global continuous probability density function using a Mixture of Approximated Wrapped Gaussian distributions. Experiments on the TUD Multiview Pedestrians, the Sarc3D, and the 3DPeS datasets confirm the efficacy of the method and the improvement with respect to state of the art approaches.

D. Borghesani; C. Grana; R. Cucchiara ( 2012 ) - Relevance Feedback as an Interactive Navigation Tool ( International Conference on Computer Vision Theory and Applications - Rome, Italy - Feb 24-26) ( - VISAPP 2012 - Proceedings of the International Conference on Computer Vision Theory and Applications ) (SciTePress – Science and Technology Publications Setubal PRT ) - n. volume 2 - pp. da 54 a 59 ISBN: 9789898565037 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Image collections are searched in common retrieval systems in many different ways, but the typical presentation is by means of a grid styled view. In this paper we try to suggest a novel use of relevance feedback as a tool to warp the view and allow the user to spatially navigate the image collection, and at the same time focus on his retrieval aim. This is obtained by the use of a distance based space warping on the 2D projection of the distance matrix.

R. Cucchiara; C. Grana ( 2012 ) - Special Issue: Recent Achievements in Multimedia for Cultural Heritage - Guest Editorial - JOURNAL OF MULTIMEDIA - n. volume 7 (2) - pp. da 107 a 108 ISSN: 1796-2048 [Articolo in rivista (262) - Articolo su rivista]
Abstract

For quite some time, libraries, document and historical centers from opposite corners of the world have been the caretakers of our rich and assorted social legacy. They have protected and furnished access to the testimonies of knowledge, beauty and inspiration, such as sculptures, paintings, music and literature. The new information technologies have created unbelievable opportunities to make this common heritage more accessible for all. Culture is following the digital path and “memory institutions” are adapting the way in which they communicate with their public. Multimedia technologies have recently created the conditions for a true revolution in the cultural heritage area, with reference to the study, valorization, and fruition of artistic works. New multimedia technologies shall be able to be utilized to plan unique approaches to the perception and fulfillment of the masterful legacy, for instance, through smart cultural objects and new interfaces with the backing of items such as story-telling, gaming and learning.All the plurality of masterpieces (paintings, books, manuscripts, even photos of sculptures and architecture) can be effectively embedded into a unique ``paradigm'' through digitization. This allows a significant reduction in costs, an enormous expansion of public accessibility (and therefore income), and at the same time a tremendous freedom for data elaboration. In brief, digitization enhances pleasure for the public and usefulness to experts on cultural heritage assets.

D. Borghesani; C. Grana; R. Cucchiara ( 2012 ) - Towards Artistic Collections Navigation Tools based on Relevance Feedback ( International Workshop on Multimedia for Cultural Heritage - Modena, Italy - May 3) ( - Multimedia for Cultural Heritage ) (Springer Heidelberg DEU ) - n. volume CCIS 247 - pp. da 143 a 153 ISBN: 9783642279775 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Artistic image collections are usually managed via textual metadata into standard content management systems. More sophisticated searches can be performed using image retrieval technologies based on visual content. Nevertheless, the problem of the information presentation remains. In this paper we try to move beyond the classic grid-styled presentation model, suggesting a novel use of relevance feedback as a navigation tool. Relevance feedback is therefore used to warp the view and allow the user to spatially navigate the image collection, and at the same time focus on his retrieval aim. This is obtained exploiting a distance based space warping on the 2D projection of the distance matrix. Multitouch gestures are employed to provide feedbacks by natural interaction with the system.

Simone Calderara;Rita Cucchiara ( 2012 ) - Understanding dyadic interactions applying proxemic theory on videosurveillance trajectories ( Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference - Providence USA - 16-21 June 2012) ( - 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops ) (IEEE - Institute of Electrical and Electronics Engineers New York USA ) - pp. da 20 a 27 ISBN: 9781467316101; 9781467316118; 9781467316125 | 9781467316118 | 9781467316125 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Understanding social and collective people behaviour in open spaces is one of the frontier of modern video surveillance. Many sociological theories, and proxemics in particular, have been proved their validity as a support for classifying and interpreting human behaviour. Proxemics suggest some simple but effective behavioural rules, useful to understand what people are doing and their social involvement with other individuals. In this paper we propose to extend the proxemics analysis along the time and provide a solution for analysing sequences of proxemic states computed between trajectories of people pairs (dyads). Trajectories, computed from videosurveillance videos, are first analysed and converted to a sequence of symbols according to proxemic theory. Then an elastic measure for comparing those sequences is introduced. Finally, interactions are classified both in an off-line unsupervised way and in an on-line fashion. Results on videosurveillance data, demonstrate that sequences of proxemic states can be effective in characterizing mutual interactions and experiments in capturing the most frequent dyads interactions and on-line classifying them when a labelled training set is available are proposed.

C. Grana; D. Borghesani; P. Santinelli; R. Cucchiara ( 2012 ) - Veiling Luminance estimation on FPGA-based embedded smart camera ( 2012 IEEE Intelligent Vehicles Symposium (IV) - Alcalá de Henares, Spain - Jun 3-7) ( - Proceedings of the 2012 IEEE Intelligent Vehicles Symposium (IV) ) (IEEE - Institute of Electrical and Electronics Engineers Piscataway, NJ USA ) - pp. da 334 a 339 ISBN: 9781467321174; 9781467321198 | 9781467321198 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describes the design and development of a Veiling Luminance estimation system based on the use of a CMOS image sensor, fully implemented on FPGA. The system is composed of the CMOS Image sensor, FPGA, DDR SDRAM, USB controller and SPI (Serial Peripheral Interface) Flash. The FPGA is used to build a system-on-chip integrating a soft processor (Xilinx MicroBlaze) and all the hardware blocks needed to handle the external peripherals and memory. The soft processor is used to handle image acquisition and all computational tasks need to compute the Veiling Luminance value. The advantages of this single chip FPGA implementation include the reduction of the hardware requirements, power consumption, and system complexity. The problem of the high dynamic range images have been addressed with multiple acquisitions at different exposure times. Vignetting, radial distortion and angular weighting, as required by veiling luminance definition, are handled by a single integer look-up table (LUT) access. Results are compared with a state of the art certified instrument.

D. Borghesani; C. Grana; R. Cucchiara ( 2012 ) - 2D Images Map Warping for Improved User Interaction ( 21st International Conference on Pattern Recognition (ICPR 2012) - Tsukuba Science City, Japan - Nov 11-15) ( - Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012) ) (IEEE Computer Society Press Los Alamitos, CA USA ) - pp. da 1096 a 1099 ISBN: 9784990644116 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we suggest an interaction model designed to fit users' expectations in front of an image retrieval system. A lightweight relevance feedback strategy, working directly on the 2D projection of image features, allows the user to spatially navigate the media collection maintaining the real-time constraint. A preliminary evaluation of this relevance feedback strategy shows good performance compared with other known approaches.

S. Cattini; C. Grana; R. Cucchiara; L. Rovati ( 2011 ) - A low-cost system and calibration method for veiling luminance measurement ( 2011 IEEE Instrumentation and Measurement Technology Conference (I2MTC) - Binjiang, China - May 10-12) ( - Proceedings of 2011 IEEE Instrumentation and Measurement Technology Conference (I2MTC) ) (IEEE Press Piscataway, NJ USA ) - pp. da 1 a 6 ISBN: 9781424479337 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

A CCD-based measuring instrument aimed at the veiling luminance estimation and the relative low-cost calibration method are described. The system may allow the estimation of the optimum luminance levels in road-tunnels lighting, thus both increasing the drivers safety and avoiding energy wasting hence unjustified higher lighting-costs.

A. Rashid; A. Prati; R. Cucchiara ( 2011 ) - A Real-Time Embedded Solution for Skew Correction in Banknote Analysis ( Seventh IEEE Workshop on Embedded Computer Vision - Colorado Springs, CO (USA) - 20 June 2010) ( - Proceedings of CVPR 2011 Workshops ) (IEEE Computer Society Washington, DC USA ) - pp. da 1 a 8 ISBN: 9781457705281 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Several industrial applications do require embedded solutionsboth for compacting the hardware occupation and reducing energy consumption, and for achieving high speed performance. This paper presents a computer vision system developed for correcting image skew in applications for banknote analysis and classification. The system must be very efficient and run on a fixed-point DSP with limited computational resources. Consequently, we propose three innovative improvements to basic and general-purpose image processing techniques that can be helpful in other computer vision applications on embedded devices. In particular, we address: a) an efficient labeling with an unionfind approach for hole filling, b) a fast Hough transform implementation, and c) a very high-speed estimation of affinetransformation for skew correction. The reported results demonstrate both the accuracy and the efficiency of the system,also in presence of severe skew. In terms of efficiency, the computational time is reduced of about two orders of magnitude.

R. Cucchiara; M. Fornaciari; R. Haider; F. Mandreoli; R. Martoglia; A. Prati; S. Sassatelli ( 2011 ) - A Reasoning Engine for Intruders' Localization in Wide Open Areas using a Network of Cameras and RFIDs ( First IEEE Workshop on Camera Networks and Wide Area Scene Analysis (CNWASA 2011) - Colorado Springs, CO (USA) - 20 June 2010) ( - Proceedings of CVPR 2011 Workshops ) (IEEE Computer Society Washington, DC USA ) - pp. da 1 a 8 ISBN: 9781457705281 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Wide open areas represent challenging scenarios forsurveillance systems, since sensory data can be affected bynoise, uncertainty, and distractors. Therefore, the tasks oflocalizing and identifying targets (e.g., people) in such environmentssuggest to go beyond the use of camera-only deployments.In this paper, we propose an innovative systemrelying on the joint use of cameras and RFIDs, allowing usto “map” RFID tags to people detected by cameras and,thus, highlighting potential intruders. To this end, sophisticatedfiltering techniques preserve the uncertainty of dataand overcome the heterogeneity of sensors, while an evidentialfusion architecture, based on Transferable Belief Model,combines the two sources of information and manages conflictbetween them. The conducted experimental evaluationshows very promising results.

M. Fornaciari; Davide Sottara; Andrea Prati; Paola Mello; Rita Cucchiara ( 2011 ) - An Evidential Fusion Architecture for People Surveillance in Wide Open Areas ( 6th International Conference on Hybrid Artificial Intelligent Systems (HAIS 2011) - Wroclaw, Poland - May 23-25, 2011) ( - Lecture Notes in Computer Science ) (Springer Heidelberg DEU ) - n. volume 6678 - pp. da 239 a 246 ISBN: 9783642212185 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

A new evidential fusion architecture is proposed to build anhybrid articial intelligent system for people surveillance in wide open areas. Authorized people and intruders are identied and localized thanks to the joint employment of cameras and RFID tags. Complex Event Processing and Transferable Belief Model are exploited for handling noisy data and uncertainty propagation. Experimental results on complex synthetic scenarios demonstrate the accuracy of the proposed solution.

D. Coppi; S. Calderara; R. Cucchiara ( 2011 ) - Appearance tracking by transduction in surveillance scenarios ( Advanced Video and Signal-Based Surveillance (AVSS), 2011 - Klagenfurt, Austria - Aug. 30 2011) ( - Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference on ) (IEEE Ed. Washinton DC USA ) - pp. da 142 a 147 ISBN: 9781457708442 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We propose a formulation of people tracking problem as a Transductive Learning (TL) problem. TL is an effective semi-supervised learning technique by which many classification problems have been recently reinterpreted as learning labels from incomplete datasets. In our proposal the joint exploitation of spectral graph theory and Riemannian manifold learning tools leads to the formulation of a robust approach for appearance based tracking in Video Surveillance scenarios. The key advantage of the presented method is a continuously updated model of the tracked target, used in the TL process, that allows to on-line learn the target visual appearance and consequently to improve the tracker accuracy. Experiments on public datasets show an encouraging advancement over alternative state-of the-art techniques.

C. Grana; D. Borghesani; R. Cucchiara ( 2011 ) - Automatic segmentation of digitalized historical manuscripts - MULTIMEDIA TOOLS AND APPLICATIONS - n. volume 55 (3) - pp. da 483 a 506 ISSN: 1380-7501 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The artistic content of historical manuscripts provides a lot of challenges in terms of automatic text extraction, picture segmentation and retrieval by similarity. In particular this work addresses the problem of automatic extraction of meaningful pictures, distinguishing them from handwritten text and floral and abstract decorations. The proposed solution firstly employs a circular statistics description of a directional histogram in order to extract text. Then visual descriptors are computed over the pictorial regions of the page: the semantic content is distinguished from the decorative parts using color histograms and a novel texture feature called Gradient Spatial Dependency Matrix. The feature vectors are finally processed using an embedding procedure which allows increased performance in later SVM classification. Results for both feature extraction and embedding based classification are reported, supporting the effectiveness of the proposal on high resolution replicas of artistic manuscripts.

G. Gualdi; A. Prati; R. Cucchiara ( 2011 ) - Contextual Information and Covariance Descriptors for People Surveillance: An Application for Safety of Construction Workers - EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING - n. volume 2011 - pp. da 1 a 16 ISSN: 1687-5176 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In computer science, contextual information can be used both to reduce computations and to increase accuracy. This paper discusses how it can be exploited for people surveillance in very cluttered environments in terms of perspective (i.e., weak scenecalibration) and appearance of the objects of interest (i.e., relevance feedback on the training of a classifier). These techniques are applied to a pedestrian detector that uses a LogitBoost classifier, appropriately modified to work with covariance descriptors which lie on Riemannian manifolds. On each detected pedestrian, a similar classifier is employed to obtain a precise localization of the head. Two novelties on the algorithms are proposed in this case: polar image transformations to better exploit the circular feature of the head appearance and multispectral image derivatives that catch not only luminance but also chrominance variations. The complete approach has been tested on the surveillance of a construction site to detect workers that do not wear the hard hat: in such scenarios, the complexity and dynamics are very high, making pedestrian detection a real challenge.

Simone Calderara; Uri Heinemann; Andrea Prati; Rita Cucchiara; Naftali Tishby ( 2011 ) - Detecting Anomalies in People’s Trajectories using Spectral Graph Analysis - COMPUTER VISION AND IMAGE UNDERSTANDING - n. volume 115(8) - pp. da 1099 a 1111 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Video surveillance is becoming the technology of choice for monitoring crowded areas for security threats. While video provides ample information for human inspectors, there is a great need for robust automated techniques that can efficiently detect anomalous behavior in streaming video from single ormultiple cameras. In this work we synergistically combine two state-of-the-art methodologies. The rst is the ability to track and label single person trajectories in a crowded area using multiple video cameras, and the second is a new class of novelty detection algorithms based on spectral analysis of graphs. By representing the trajectories as sequences of transitions betweennodes in a graph, shared individual trajectories capture only a small subspace of the possible trajectories on the graph. This subspace is characterized by large connected components of the graph, which are spanned by the eigenvectors with the low eigenvalues of the graph Laplacian matrix. Using this technique, we develop robust invariant distance measures for detectinganomalous trajectories, and demonstrate their application on realvideo data.

M. Casares; P. Santinelli; S. Velipasalar; A. Prati; R. Cucchiara ( 2011 ) - Energy-efficient Feedback Tracking on Embedded Smart Cameras by Hardware-level Optimization ( Fifth ACM/IEEE International Conference on Distributed Smart Cameras - Ghent, Belgium - 22-25 August 2011) ( - Proceedings of ICDSC 2011 ) (IEEE Washington, DC USA ) - pp. da 1 a 6 ISBN: 9781457717079 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Embedded systems have limited processing power, memory and energy. When camera sensors are added to an embedded system, the problem of limited resources becomes even more pronounced. In this paper, we introduce two methodologies to increase the energy-efficiency and battery-life of an embeddedsmart camera by hardware-level operations when performingobject detection and tracking. The CITRIC platform is employedas our embedded smart camera. First, down-sampling is performed at hardware level on the micro-controller of the imagesensor rather than performing software-level down-sampling atthe main microprocessor of the camera board. In addition, instead of performing object detection and tracking on wholeimage, we first estimate the location of the target in the nextframe, form a search region around it, then crop the next frameby using the HREF and VSYNC signals at the micro-controllerof the image sensor, and perform detection and tracking onlyin the cropped search region. Thus, the amount of data thatis moved from the image sensor to the main memory at eachframe is optimized. Also, we can adaptively change the size ofthe cropped window during tracking depending on the objectsize. Reducing the amount of transferred data, better use ofthe memory resources, and delegating image down-samplingand cropping tasks to the micro-controller on the image sensor,result in significant decrease in energy consumption and increasein battery-life. Experimental results show that hardware-leveldown-sampling and cropping, and performing detection andtracking in cropped regions provide 41.24% decrease in energyconsumption, and 107.2% increase in battery-life. Compared toperforming software-level down-sampling and processing wholeframes, proposed methodology provides an additional 8 hours ofcontinuous processing on 4 AA batteries, increasing the lifetimeof the camera to 15.5 hours.

M. Casares; P. Santinelli; S. Velipasalar; A. Prati; R. Cucchiara ( 2011 ) - Energy-efficient Object Detection and Tracking on Embedded Smart Cameras by Hardware-level Operations at the Image Sensor ( Seventh IEEE Workshop on Embedded Computer Vision - Colorado Springs, CO (USA) - 20 June 2010) ( - Proceedings of CVPR 2011 Workshops ) (IEEE Computer Society Washington, DC USA ) - pp. da 1 a 8 ISBN: 9781457705281 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Embedded smart cameras have limited processing power, memory and energy. In this paper, we introduce two methodologies to increase the energy-efficiency and the battery-life of an embedded smart camera by hardware-level operations when performing object detection and tracking. We use the CITRIC platform as our embedded smart camera. We first perform down-sampling at hardware-level on the microcontroller of the image sensor rather than performing software-level down-sampling at the main microprocessor of the camera board. In addition, instead of performing object detection on whole image, we first estimate the location of the target in the next frame, form a search region around it, then crop the next frame by using the HREF and VSYNC signals at the microcontrollerof the image sensor, and perform detection and tracking only in the cropped search region. Thus, the amount of data that is moved from the image sensor to the main memory at each frame, is greatly reduced. Thanks to reduced data transfer, better use of the memory resources and not occupying the main microprocessor with image down-sampling and cropping tasks, we obtain significant savings in energy consumption and battery-life. Experimental results show that hardware-level down-sampling and cropping, and performing detection in cropped regions provide 54:14% decrease in energy consumption, and 121:25% increase in battery-life compared to performing software-level downsampling and processing whole frame.

D. Borghesani; D. Coppi; C. Grana; S. Calderara; R. Cucchiara ( 2011 ) - Feature Space Warping Relevance Feedback with Transductive Learning ( 13th International Conference on Advanced Concepts for Intelligent Vision Systems - Ghent, Belgium - Aug 22-25) ( - Advanced Concepts for Intelligent Vision Systems ) (Springer-Verlag Berlin Heidelberg DEU ) - n. volume LNCS 6915 - pp. da 70 a 81 ISBN: 9783642236860 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Relevance feedback is a widely adopted approach to improve content-based information retrieval systems by keeping the user in the retrieval loop. Among the fundamental relevance feedback approaches, feature space warping has been proposed as an effective approach for bridging the gap between high-level semantics and the low-level features. Recently, combination of feature space warping and query point movement techniques has been proposed in contrast to learning based approaches, showing good performance under dierent data distributions. In this paper we propose to merge feature space warping and transductive learning, in order to benet from both the ability of adapting data to the user hints and the information coming from unlabeled samples. Experimental results on an image retrieval task reveal signicant performance improvements from the proposed method.

R. Cucchiara; M. Fornaciari; R. Haider; F. Mandreoli; A. Prati ( 2011 ) - Identification of Intruders in Groups of People using Cameras and RFIDs ( Fifth ACM/IEEE International Conference on Distributed Smart Cameras - Ghent, Belgium - 22-25 August 2011) ( - Proceedings of ICDSC 2011 ) (IEEE Washington, DC USA ) - pp. da 1 a 6 ISBN: 9781457717079 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The identification of intruders in groups of people moving in wide open areas represents a challenging scenario where coordination between cameras can be certainly used but this solution is not enough. In this paper, we propose to go beyond pure vision-based approaches by integrating the use of distributed cameras with the RFID technology. To this end, we introduce a system that “maps” RFID tags to people detected by cameras by using sophisticated techniques to filter the singular modalities and an evidential fusion architecture, based on Transferable Belief Model, to combine the two sources of information and manage conflict between them. The conducted experimental evaluation shows very promising results, especially in treating groups of people.

D. Coppi; S. Calderara; R. Cucchiara ( 2011 ) - Iterative active querying for surveillance data retrieval in crime detection and forensics ( International Conference on Imaging for Crime Detection and Prevention, - London, UK - 3-4 Nov. 2011) ( - 4th International Conference on Imaging for Crime Detection and Prevention, ICDP-11. ) (IET London GBR ) - pp. da 216 a 222 ISBN: 9781849195652 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Large sets of visual data are now available both, in real time andoff line, at time of investigation in multimedia forensics, however passive querying systems often encounter difficulties in retrieving significant results. In this paper we propose an iterativeactive querying system for video surveillance and forensic applications based on the continuous interaction between the userand the system. The positive and negative user feedbacks areexploited as the input of a graph based transductive procedurefor iteratively refining the initial query results. Experimentsare shown using people trajectories and people appearance asdistance metrics.

S. Calderara; A. Prati; R. Cucchiara ( 2011 ) - Markerless Body Part Tracking for Action Recognition - INTERNATIONAL JOURNAL OF MULTIMEDIA INTELLIGENCE AND SECURITY - n. volume 1(1) - pp. da 76 a 89 ISSN: 2042-3462 [Articolo in rivista (262) - Articolo su rivista]
Abstract

This paper presents a method for recognising human actions bytracking body parts without using artificial markers. A sophisticated appearance-based tracking able to cope with occlusions is exploited to extract a probability map for each moving object. A segmentation technique based on mixture of Gaussians (MoG) is then employed to extract and track significantpoints on this map, corresponding to significant regions on the human silhouette. The evolution of the mixture in time is analysed by transforming it in a sequence of symbols (corresponding to a MoG). The similarity between actions is computed by applying global alignment and dynamic programming techniques to the corresponding sequences and using a variational approximation of the Kullback-Leibler divergence to measure the dissimilarity between two MoGs. Experiments on publicly available datasets and comparison with existing methods are provided.

S. Calderara; A. Prati; R. Cucchiara ( 2011 ) - Mixtures of von Mises Distributions for People Trajectory Shape Analysis - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY - n. volume 21(4) - pp. da 457 a 471 ISSN: 1051-8215 [Articolo in rivista (262) - Articolo su rivista]
Abstract

People trajectory analysis is a recurrent task inmany pattern recognition applications, such as surveillance,behavior analysis, video annotation, and many others. In thispaper we propose a new framework for analyzing trajectoryshape, invariant to spatial shifts of the people motion in thescene. In order to cope with the noise and the uncertainty ofthe trajectory samples, we propose to describe the trajectoriesas a sequence of angles modelled by distributions of circularstatistics, i.e. a mixture of von Mises (MovM) distributions.To deal with MovM, we define a new specific EM algorithmfor estimating the parameters and derive a closed form of theBhattacharyya distance between single vM pdfs. Trajectories arethen modelled with a sequence of symbols, corresponding tothe most suitable distribution in the mixture, and comparedeach other after a global alignment procedure to cope withtrajectories of different lengths. The trajectories in the trainingset are clustered according with their shape similarity in an offlinephase, and testing trajectories are then classified with aspecific on-line EM, based on sufficient statistics. The approachis particularly suitable for classifying people trajectories in videosurveillance, searching for abnormal (i.e. infrequent) paths. Testson synthetic and real data are provided with also a completecomparison with other circular statistical and alignment methods.

D. Baltieri; R. Vezzani; R. Cucchiara; A. Utasi; C. Benedek; T. Sziranyi ( 2011 ) - Multi-view people surveillance using 3D information ( 11th International Workshop on Visual Surveillance - Barcelona, Spain - Nov, 2011) ( - Proceedings of the 11th International Workshop on Visual Surveillance ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 1817 a 1824 ISBN: 9781467300629 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we introduce a novel surveillance system, which uses 3D information extracted from multiple cameras to detect, track and re-identify people. The detection method is based on a 3D Marked Point Process model using two pixel-level features extracted from multi-plane projections of binary foreground masks, and uses a stochastic optimization framework to estimate the position and the height of each person. We apply a rule based Kalman-filter tracking on the detection results to find the object-to-object correspondence between consecutive time steps. Finally, a 3D body model based long-term tracking module connects broken tracks and is also used to re-identify people

C. Grana; M. Montangero; D. Borghesani; R. Cucchiara ( 2011 ) - Optimal Decision Trees Generation from OR-Decision Tables ( 16th International Conference on Image Analysis and Processing - Ravenna, Italy - Sep 14-16) ( - Image Analysis and Processing - ICIAP 2011 ) (Springer-Verlag Berlin Heidelberg DEU ) - n. volume LNCS 6978 - pp. da 443 a 452 ISBN: 9783642240843 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a novel dynamic programming algorithm to synthesize an optimal decision tree from OR-decision tables,an extension of standard decision tables,which allow to choose between several alternative actions in the same rule. Experiments are reported,showing the computational time improvements over state of the art implementations of connected components labeling,using this modelling technique.

D. Coppi; S. Calderara; R. Cucchiara ( 2011 ) - People appearance tracing in video by spectral graph transduction ( Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams - Barcelona, Spain - Nov 13 2011) ( - Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on ) (IEEE Ed. Washinton DC USA ) - pp. da 920 a 927 ISBN: 9781467300629 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Following people in different video sources is a challenging task: variations in the type of camera, in the lighting conditions, in the scene settings (e.g. crowd or occlusions) and in the point of view must be accounted. In this paper we propose a system based only on appearance information that, disregarding temporal and spatial information, can be flexibly applied on both moving and static cameras. We exploit the joint use of transductive learning and spectral properties of graph Laplacians proposing a formulation of the people tracing problem as a semi-supervised classification. The knowledge encoded in two labeled input sets of positive and negative samples of the target person and the continuous spectral update of these models allow us to obtain a robust approach for people tracing in surveillance video sequences. Experiments on publicly available datasets show satisfactory results and exhibit a good robustness in dealing with short and long term occlusions.

Vezzani, Roberto; Grana, Costantino; Cucchiara, Rita ( 2011 ) - Probabilistic people tracking with appearance models and occlusion classification: The AD-HOC system - PATTERN RECOGNITION LETTERS - n. volume 32 (6) - pp. da 867 a 877 ISSN: 0167-8655 [Articolo in rivista (262) - Articolo su rivista]
Abstract

AD-HOC (Appearance Driven Human tracking with Occlusion Classification) is a complete framework for multiple people tracking in video surveillance applications in presence of large occlusions. The appearance-based approach allows the estimation of the pixel-wise shape of each tracked person even during the occlusion. This peculiarity can be very useful for higher level processes, such as action recognition or event detection. A first step predicts the position of all the objects in the new frame while a MAP framework provides a solution for best placement. A second step associates each candidate foreground pixel to an object according to mutual object position and color similarity. A novel definition of non-visible regions accounts for the parts of the objects that are not detected in the current frame, classifying them as dynamic, scene or apparent occlusions. Results on surveillance videos are reported, using in-house produced videos and the PETS2006 test set.

C. Grana; D. Borghesani; R. Cucchiara ( 2011 ) - Relevance feedback strategies for artistic image collections tagging ( 1st ACM International Conference on Multimedia Retrieval - Trento, Italy - Apr 18-20) ( - Proceedings of the 1st ACM International Conference on Multimedia Retrieval ) (ACM Press New York USA ) - pp. da 353 a 360 ISBN: 9781450303361 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper provides an analysis on relevance feedback techniques in a multimedia system designed for the interactive exploration and annotation of artistic collections, in particular illuminated manuscripts. The relevance feedback is presented not only as a very effective technique to improve the performance of the system, but also as a clever way to increase the user experience, mixing the interactive surfing through the artistic content with the possibility to gather valuable information from the user, and consequently improving his retrieval satisfaction. We compare a modification of the Mean-Shift Feature Space Warping algorithm, as representative of the standard RF procedures, and a learning-based technique based on transduction, considered in order to overcome some limitation of the previous technique. Experiments are reported regarding the adopted visual features based on covariance matrices.

Davide Baltieri; Roberto Vezzani; Rita Cucchiara ( 2011 ) - SARC3D: a new 3D body model for People Tracking and Re-identification ( 16th International Conference on Image Analysis and Processing - Ravenna, Italy - Sept. 14-16) ( - Image Analysis and Processing – ICIAP 2011 ) - n. volume 1 - pp. da 197 a 206 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We propose a new simplified 3D body model (called Sarc3D) for surveillance application, that can be created, updated and compared in rea-time.People are detected and tracked in each calibrated camera, and their silhouette, appearance, position and orientation are extracted and used to place, scale and orientate a 3D body model. Foreach vertex of the model a signature (color features, reliability and saliency) is computed from the 2D appearance images and exploited for mathing. This approach achieves robustness against partial occlusions, pose and viewpoint changes. The complete proposal and a full experimental evaluation is presented, using a new benchmark suite and the PETS2009 dataset.

G. Gualdi; A. Prati; R. Cucchiara ( 2011 ) - Using Monolithic Classifiers On Multi-stage Pedestrian Detection ( 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance - Klagenfurt, Austria - August 30 – September 2, 2011) ( - Proceedings of AVSS 2011 ) (IEEE Press Washington, DC USA ) - pp. da 267 a 272 ISBN: 9780769512723 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Despite the many efforts in finding effective feature sets or accurate classifiers for people detection, few works have addressed ways for reducing the computational burden introducedby the sliding window paradigm. This paper proposes a multi-stage procedure for refining the search for pedestrians using the HOG features and the monolithic SVM classifier. The multi-stage procedure is based on particle-based estimation of pdfs and exploits the margin provided by the classifier to draw more particles on the areas where the classifier’s response is higher. This iterative algorithm achieves the same accuracy than sliding window using less particles (and thus being more efficient) and, conversely, is more accurate when configured to work at thesame computational load. Experimental results on publicly available datasets demonstrate that this method, previouslyproposed for boosted classifiers only, can be successfully applied to monolithic classifiers.

S. Calderara; P. Piccinini; R. Cucchiara ( 2011 ) - Vision based smoke detection system using image energy and color information - MACHINE VISION AND APPLICATIONS - n. volume 22 - pp. da 705 a 719 ISSN: 0932-8092 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Smoke detection is a crucial task in many video surveillance applications and could have a great impact to raise the level of safety of urban areas. Many commercial smoke detection sensors exist but most of them cannot be applied in open space or outdoor scenarios. With this aim, the paper presents a smoke detection system that uses a common CCD camera sensor to detect smoke in images and trigger alarms. First, a proper background model is proposed to reliably extract smoke regions and avoid over-segmentation and false positives in outdoor scenarios where many distractors are present, such as moving trees or light reflexes. A novel Bayesian approach is adopted to detect smoke regions in the scene analyzing image energy by means of the Wavelet Transform coefficients and Color Information. A statistical model of image energy is built, using a temporal Gaussian Mixture, to analyze the energy decay that typically occurs when smoke covers the scene then the detection is strengthen evaluating the color blending between a reference smoke color and the input frame. The proposed system is capable of detecting rapidly smoke events both in night and in day conditions with a reduced number of false alarms hence is particularly suitable for monitoring large outdoor scenarios where common sensors would fail. An extensive experimental campaign both on recorded videos and live cameras evaluates the efficacy and efficiency of the system in many real world scenarios, such as outdoor storages and forests.

D. Baltieri; R. Vezzani; R. Cucchiara ( 2011 ) - 3DPes: 3D People Dataset for Surveillance and Forensics ( J-HGBU '11 - joint ACM workshop on Human gesture and behavior understanding - Scottsdale, Arizona, USA - Nov 28 - Dec 1 2011) ( - Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding ) (ACN New York USA ) - n. volume 1 - pp. da 59 a 64 ISBN: 9781450309981 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The interest of the research community in creating reference datasets for performance analysis is always very high. Although new datasets, collecting large amounts of video footage are spreading in surveillance and forensics, few bench-marks with annotation data are available for testing specific tasks and especially for 3D/multi-view analysis. In this paper we present 3DPeS, a new dataset for 3D/multi- view surveillance and forensic applications. This has been designed for discussing and evaluating research results in people re-identification and other related activities (people detection, people segmentation and people tracking). The new assessed version of the dataset contains hundreds of video sequences of 200 people taken from a multi-camera distributed surveillance system over several days, with different light conditions; each person is detected multiple times and from different points of view. In surveillance scenarios, the dataset can be exploited to evaluate people reacquisition, 3D body models and people activity reconstruction algorithms. In forensics it can be adopted too, by relaxing some constraints (e.g. real time) and neglecting some information (e.g. calibration). Some results on this new dataset are presented using state of the art methods for people re-identification as a benchmark for future comparisons.

G. Monti; A. Prati; P. Piccinini; R. Cucchiara ( 2010 ) - A feature-based separation method, for separating a plurality of loosely-arranged duplicate articles and a system for actuating the method for supplying a packaging machine [Brevetto (285) - Brevetto]
Abstract

The invention relates to a segmentation method based on the characteristics for segmenting a plurality of duplicate articles (3) arranged loosely, which comprises stages of: acquiring an image (M) of a sample article (30); calculating keypoint-descriptors of the image (M); defining an identifying figure (Z) on the image (M); acquiring a first image (11) of a plurality of duplicate articles; performing a matching of the thus-defined keypoint-descriptor pairs; acquiring a position and an orientation of the identifying figure (Z) with respect to a first keypoint-descriptor pair of the image (M) having a match with a second keypoint-descriptor pair of the first image (11); defining, in the first image (11), an identifying figure of projection as a Euclidean transformation of the identifying figure (Z), with reference to the first and second pairs; applying the two preceding stages to a plurality of keypoint-descriptor pairs of the image (M) having a match with a keypoint-descriptor pair of the first image (11); collecting together identifying figures of projection having between them a predetermined degree of superposing; defining a representative figure for each group of identifying figures of projection which is formed by a minimum predetermined number of identifying figures of projection, which representative figure has a same shape and dimension as an identifying figure of projection, and is selected in order to estimate a position of a corresponding article illustrated in the first image of a plurality of duplicate articles. The invention also relates to a method for picking up articles (3) arranged loosely in a storage zone of articles (5) and for positioning the articles (3) in an outlet station (SU), and a group for actuating the method.

Calderara, Simone; Prati, Andrea; Cucchiara, Rita ( 2010 ) - Alignment-based Similarity of People Trajectories using Semi-directional Statistics ( 20th international conference on Pattern Recognition: ICPR 2010 - Istanbul ,Turkey - 23-26 August 2010) ( - 2010 20th international conference on Pattern Recognition: ICPR 2010 ) (IEEE Los Alamitos (CA) USA ) - pp. da 4275 a 4278 ISBN: 9780769541099 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a method for comparing people trajectories for video surveillance applications, based on semi-directional statistics. In fact, the modelling of a trajectory as a sequence of angles, speeds and time lags, requires the use of a statistical tool capable to jointly consider periodic and linear variables. Our statistical method is compared with two state-of-the-art methods.

C. Grana; D. Borghesani; G. Gualdi; R. Cucchiara ( 2010 ) - Bag-Of-Words Classification of Miniature Illustrations ( 11th International Workshop on Image Analysis for Multimedia Interactive Services - Desenzano del Garda, Brescia, Italy - Apr 12-14) ( - Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services ) (IEEE Computer Society Press Los Alamitos, CA USA ) - pp. da 61 a 64 ISBN: 9781424478484 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper a system for illuminated manuscripts images analysis is presented. In particular the bag-of-keypoints strategy, commonly adopted for object recognition, image classification and scene recognition, is applied to the classification of automatically extracted miniatures. Pictures are characterized by SURF descriptors, and a classification procedure is performed, comparing the results of Naive Bayes and histogram intersection distance measures.

C. Grana; D. Borghesani; R. Cucchiara ( 2010 ) - Decision Trees for Fast Thinning Algorithms ( 20th International Conference on Pattern Recognition - Istanbul, Turkey - Aug 23-27) ( - Proceedings of the 20th International Conference on Pattern Recognition - ICPR 2010 ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 2836 a 2839 ISBN: 9780769541099 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We propose a new efficient approach for neighborhood exploration, optimized with decision tables and decision trees, suitable for local algorithms in image processing. In this work, it is employed to speed up two widely used thinning techniques. The performance gain is shown over a large freely available dataset of scanned document images.

R. Vezzani; R. Cucchiara ( 2010 ) - Event Driven Software Architecture for Multi-camera and Distributed Surveillance Research Systems ( First IEEE Workshop on Camera Networks - CVPRW - San Francisco - 13-18 June 2010) ( - Proceedings of Computer Vision and Pattern Recognition Workshops ) (IEEE Computer Society Press Washington DC USA ) - pp. da 1 a 8 ISBN: 9781424470297 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Surveillance of wide areas with several connected cameras integrated in the same automatic system is no more a chimera, but modular, scalable and flexible architectures are mandatory to manage them. This paper points out the main issues on the development of distributed surveillance systems and proposes an integrated framework particularly suitable for research purposes. As first, exploiting a computer architecture analogy, a three layer tracking system is proposed, which copes with the integration of both overlapping and non overlapping cameras. Then, a static service oriented architecture is adopted to collect and manage the plethora of high level modules, such as face detection and recognition, posture and action classification, and so on. Finally, the overall architecture is controlled by an event driven communication infrastructure, which assures the scalability and the flexibility of the system.

Davide Baltieri; Roberto Vezzani; Rita Cucchiara ( 2010 ) - Fast Background Initialization with Recursive Hadamard Transform ( IEEE International Conference on Advanced Video and Signal Based Surveillance AVSS 2010 - Boston, Massachusetts, USA - 29 August-1 September 2010) ( - Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance AVSS 2010 ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 165 a 171 ISBN: 9780769542645 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present a new and fast techniquefor background estimation from cluttered image sequences.Most of the background initialization approaches developedso far collect a number of initial frames and then requirea slow estimation step which introduces a delay wheneverit is applied. Conversely, the proposed technique redistributesthe computational load among all the frames bymeans of a patch by patch preprocessing, which makesthe overall algorithm more suitable for real-time applications.For each patch location a prototype set is created andmaintained. The background is then iteratively estimatedby choosing from each set the most appropriate candidatepatch, which should verify a sort of frequency coherencewith its neighbors. To this aim, the Hadamard transformhas been adopted which requires less computation time thanthe commonly used DCT. Finally, a refinement step exploitsspatial continuity constraints along the patch borders toprevent erroneous patch selections. The approach has beencompared with the state of the art on videos from availabledatasets (ViSOR and CAVIAR), showing a speed up of about10 times and an improved accuracy

G. Monti; A. Prati; P. Piccinini; R. Cucchiara ( 2010 ) - Feature-based segmentation method, for segmenting a plurality of loosely-arranged duplicate articles and a group for actuating the method for supplying a packaging machine [Brevetto (285) - Brevetto]
Abstract

The invention relates to a segmentation method based on the characteristics for segmenting a plurality of duplicate articles (3) arranged loosely, which comprises stages of: acquiring an image (M) of a sample article (30); calculating keypoint-descriptors of the image (M); defining an identifying figure (Z) on the image (M); acquiring a first image (11) of a plurality of duplicate articles; performing a matching of the thus-defined keypoint-descriptor pairs; acquiring a position and an orientation of the identifying figure (Z) with respect to a first keypoint-descriptor pair of the image (M) having a match with a second keypoint-descriptor pair of the first image (11); defining, in the first image (11), an identifying figure of projection as a Euclidean transformation of the identifying figure (Z), with reference to the first and second pairs; applying the two preceding stages to a plurality of keypoint-descriptor pairs of the image (M) having a match with a keypoint-descriptor pair of the first image (11); collecting together identifying figures of projection having between them a predetermined degree of superposing; defining a representative figure for each group of identifying figures of projection which is formed by a minimum predetermined number of identifying figures of projection, which representative figure has a same shape and dimension as an identifying figure of projection, and is selected in order to estimate a position of a corresponding article illustrated in the first image of a plurality of duplicate articles. The invention also relates to a method for picking up articles (3) arranged loosely in a storage zone of articles (5) and for positioning the articles (3) in an outlet station (SU), and a group for actuating the method.

C. Grana; D. Borghesani; P. Santinelli; R. Cucchiara ( 2010 ) - High Performance Connected Components Labeling on FPGA ( First International Workshop Interactive Multimodal Pattern Recognition in Embedded Systems - Bilbao, Spain - Sep 1) ( - 2010 Workshops on Database and Expert Systems Applications ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 221 a 225 ISBN: 9780769541747 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a comparison of the two most advanced algorithms for connected components labeling, highlighting how they perform on a soft core SoC architecture based on FPGA. In particular we test our block based connected components labeling algorithm, optimized with decision tables and decision trees. The embedded system is composed of the CMOS image sensor, FPGA, DDR SDRAM, USB controller and SPI Flash. Results highlight the importance of caching and instructions and data cache sizes for high performance image processing tasks.

Roberto Vezzani; Davide Baltieri; Rita Cucchiara ( 2010 ) - HMM Based Action Recognition with Projection Histogram Features ( ICPR 2010 Contests - SDHA2010 - Semantic Description of Human Activities - Istanbul, Turkey - Aug 22, 2010) - LECTURE NOTES IN COMPUTER SCIENCE - n. volume 6388 - pp. da 286 a 293 ISSN: 0302-9743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Hidden Markov Models (HMM) have been widely used for action recognition, since they allow to easily model the temporal evolution of a single or a set of numeric features extracted from the data. The selection of the feature set and the related emission probability function are the key issues to be defined. In particular, if the training set is not sufficiently large, a manual or automatic feature selection and reduction is mandatory. In this paper we propose to model the emission probability function as a Mixture of Gaussian and the feature set is obtained from the projection histograms of the foreground mask. The projectionhistograms contain the number of moving pixel for each row and for each column of the frame and they provide sufficient information to infer the instantaneous posture of the person. Then, the HMM framework recovers the temporal evolution of the postures recognizing in such a manner the global action. The proposed method have been successfully tested on the UT-Tower and on the Weizmann Datasets.

C. Grana; D. Borghesani; R. Cucchiara ( 2010 ) - Improving classification and retrieval of illuminated manuscripts with semantic information ( 6th Italian Research Conference on Digital Libraries - Padova, Italy - Jan 28-29) ( - Digital Libraries ) (Springer-Verlag Berlin Heidelberg DEU ) - pp. da 183 a 193 ISBN: 9783642158490 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we detail a proposal of exploitation of expert-made commentaries in a unified system for illuminated manuscripts images analysis. In particular we will explore the possibility to improve the automatic segmentation of meaningful pictures, as well as the retrieval by similarity search engine, using clusters of keywords extracted from commentaries as semantic information.

S. Calderara; A. Prati; R. Cucchiara ( 2010 ) - Moving pixels in static cameras: detecting dangerous situations due to environment or people ( - Intelligent Multimedia Analysis for Security Applications ) (Springer Eds. New York City, USA USA ) - pp. da 1 a 28 ISBN: 9783642117541 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

Dangerous situations arise in everyday life and many efforts have been lavished to exploit technology to increase the level of safety in urban areas. Video analysis is absolutely one of the most important and emerging technology for security purposes. Automatic video surveillance systems commonly analyze the scene searching for moving objects. Well known techniques exist to cope with this problem that is commonly referred as \change detection". Every time a dierence against a reference model is sensed, it should be analyzed to allow the system to discriminateamong a usual situation or a possible threat. When the sensor is a camera, motion is the key element to detect changes and moving objects must be correctly classied according to their nature. In this context we can distinguish among two dierent kinds of threat that can lead to dangerous situations in a video-surveilled environment. The first one is due to environmental changes such as rain, fog or smoke present in the scene. This kind of phenomena are sensed by the camera as moving pixelsand, subsequently as moving objects in the scene. This kind of threats shares some common characteristics such as texture, shape and color information and can be detected observing the features' evolution in time. The second situation arises whenpeople are directly responsible of the dangerous situation. In this case a subject is acting in an unusual way leading to an abnormal situation. From the sensor's point of view, moving pixels are still observed, but specic features and time-dependent statistical models should be adopted to learn and then correctly detect unusual and dangerous behaviors. With these premises, this chapter will present two different case studies. The rst one describes the detection of environmental changes in theobserved scene and details the problem of reliably detecting smoke in outdoor environments using both motion information and global image features, such as color information and texture energy computed by the means of the Wavelet transform.The second refers to the problem of detecting suspicious or abnormal people behaviors by means of people trajectory analysis in a multiple cameras video-surveillance scenario. Specically, a technique to infer and learn the concept of normality is proposed jointly with a suitable statistical tool to model and robustly compare people trajectories.

G. Gualdi; A. Prati; R. Cucchiara ( 2010 ) - Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos ( 11th European Conference on Computer Vision (ECCV) - Heraklion, Crete (Greece) - 5-11 September 2010) ( - Lectures Notes in Computer Science ) (Springer-Verlag Berlin DEU ) - n. volume 6316 - pp. da 196 a 209 ISBN: 9783642155666 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Many works address the problem of object detection by means of machine learning with boosted classifiers. They exploit sliding window search, spanning the whole image: the patches, at all possible positions and sizes, are sent to the classifier. Several methods have been proposed to speed up the search (adding complementary features or using specialized hardware). In this paper we propose a statisticalbased search approach for object detection which uses a Monte Carlo sampling approach for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multi-stage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifier (i.e. its response). For videos, this approach is plugged in a Bayesian-recursive framework which exploits the temporal coherency of the pedestrians. Several tests on both still images and videos on common datasets are provided in order to demonstrate therelevant speedup and the increased localization accuracy with respect to sliding window strategy using a pedestrian classifier based on covariance descriptors and a cascade of Logitboost classifiers.

R. Cucchiara; M. Fornaciari; A. Prati; P. Santinelli ( 2010 ) - Mutual Calibration of Camera Motes and RFIDs for People Localization and Identification ( 4th ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC) - Atlanta, GA (USA) - 31 August-3 September 2010) ( - Proceedings of 4th ACM/IEEE International Conference on Distributed Smart Cameras ) (ACM New York, New York USA ) - pp. da 1 a 8 ISBN: 9781450303170 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Achieving both localization and identication of people ina wide open area using only cameras can be a challengingtask, which requires cross-cutting requirements : high reso-lution for identication, whereas low resolution for having awide coverage of the localization. Consequently, this paperproposes the joint use of cameras (only devoted to local-ization) and RFID sensors (devoted to identication) withthe nal objective of detecting and localizing intruders. Toground the observations on a common coordinate system,a calibration procedure is dened. This procedure only de-mands a training phase with a single person moving in thescene holding a RFID tag. Although preliminary, the resultsdemonstrate that this calibration is sufficiently accurate tobe applied whenever dierent scenarios, where area of over-lap between the eld of view (FoV) of a camera and theField of sense" (FoS) of a (blind) sensor must be efficientlydetermined.

C. Grana; D. Borghesani; R. Cucchiara ( 2010 ) - Optimized Block-based Connected Components Labeling with Decision Trees - IEEE TRANSACTIONS ON IMAGE PROCESSING - n. volume 19 (6) - pp. da 1596 a 1609 ISSN: 1057-7149 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we define a new paradigm for 8-connection labeling, which employes a general approach to improve neighborhood exploration and minimizes the number of memory accesses. Firstly we exploit and extend the decision table formalism introducing OR-decision tables, in which multiple alternative actions are managed. An automatic procedure to synthesize the optimal decision tree from the decision table is used, providing the most effective conditions evaluation order. Secondly we propose a new scanning technique that moves on a 2x2 pixel grid over the image, which is optimized by the automatically generated decision tree.An extensive comparison with the state of art approaches is proposed, both on synthetic and real datasets. The synthetic dataset is composed of different sizes and densities random images, while the real datasets are an artistic image analysis dataset, a document analysis dataset for text detection and recognition, and finally a standard resolution dataset for picture segmentation tasks. The algorithm provides an impressive speedup over the state of the art algorithms.

S. Calderara; R. Cucchiara ( 2010 ) - People trajectory mining with statistical pattern recognition ( International Workshop on Socially Intelligent Surveillance and Monitoring - San Francisco, USA - June 13 2010) ( - Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on ) (IEEE ED. Washinton DC USA ) - pp. da 1 a 8 ISBN: 9781424470297 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

People social interaction analysis is a complex and interesting problem that can be faced from several points of view depending on the application context. In videosurveillance contexts many indicators of people habits and relations exist and, among these, people trajectories analysis can reveal many aspects of the way people behave in social environments. We propose a statistical framework for trajectories mining that analyzes, in an integrated solution, several aspects of the trajectories such as location, shape and speed properties. Three different models are proposed to deal with non-idealities of the selected features in conjunction with a robust inexact- matching similarity measure for comparing sequences with different lengths. Experimental results in a real scenario demonstrates the efficacy of the framework in clustering people trajectories with the purpose of analyze frequent behaviors in complex environments.

G. Gualdi; A. Prati; R. Cucchiara ( 2010 ) - Perspective and Appearance Context for People Surveillance in Open Areas ( 2nd International Workshop on Use of Context in Video Processing (UCVP 2010) - San Francisco (CA), USA - 13 June 2010) ( - Proceedings of 2nd International Workshop on Use of Context in Video Processing (UCVP 2010) ) (IEEE Computer Society Washington, DC, USA USA ) - pp. da 1 a 6 ISBN: 9781424470280 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Contextual information can be used both to reduce computationsand to increase accuracy and this paper presentshow it can be exploited for people surveillance in terms ofperspective (i.e. weak scene calibration) and appearance ofthe objects of interest (i.e. relevance feedback on the trainingof a classifier). These techniques are applied to a pedestriandetector that exploits covariance descriptors througha LogitBoost classifier on Riemannian manifolds. The approachhas been tested on a construction working site wherecomplexity and dynamics are very high, making human detectiona real challenge. The experimental results demonstratethe improvements achieved by the proposed approach.

G. Gualdi; A. Prati; R. Cucchiara ( 2010 ) - Polar Representation of Covariance Descriptors for Circular Features - ELECTRONICS LETTERS - n. volume 46(15) - pp. da 1063 a 1064 ISSN: 0013-5194 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The use of polar representation of covariance descriptors, suitable for the classification of circular feature sets, is proposed. It overcomes the implicit limits of state-of-the-art methods based on axis-oriented rectangular patches. The suitability of the proposed solution is verified on two case studies, namely head detection and polymer classification in photomicrograph contexts.

D. Borghesani; C. Grana; R. Cucchiara ( 2010 ) - Rerum Novarum: Interactive Exploration of Illuminated Manuscripts ( 18th International Conference on Multimedia (ACM Multimedia 2010) - Florence, Italy - Oct 25-29) ( - Proceedings of the 18th International Conference on Multimedia (ACM Multimedia 2010) ) (ACM New York USA ) - pp. da 1621 a 1623 ISBN: 9781605589336 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describes an interactive application for the exploration and annotation of illuminated manuscripts, which typically contain thousands of pictures, used to comment or embellish the manuscript Gothic text. The system is composed by a modern user interface for browsing, surfing and querying, an automatic segmentation module, to ease the initial picture extraction task, and a similarity based retrieval engine, used to provide visually assisted tagging capabilities. A relevance feedback procedure is included to further refine the results.

C. Grana; D. Borghesani; R. Cucchiara ( 2010 ) - Surfing on Artistic Documents with Visually Assisted Tagging ( 18th International Conference on Multimedia (ACM Multimedia 2010) - Florence, Italy - Oct 25-29) ( - Proceedings of the 18th International Conference on Multimedia (ACM Multimedia 2010) ) (ACM New York USA ) - pp. da 1343 a 1352 ISBN: 9781605589336 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describes a complete architecture for the interactive exploration and annotation of artistic collections. In particular the focus is on Renaissance illuminated manuscripts, which typically contain thousands of pictures, used to comment or embellish the manuscript Gothic text. The final aim is to create a human centered multimedia application allowing the non practitioners to enjoy these masterpieces and expert users to share their knowledge. The system is composed by a modern user interface for browsing, surfing and querying, an automatic segmentation module, to ease the initial picture extraction task, and a similarity based retrieval engine, used to provide visually assisted tagging capabilities. A relevance feedback procedure is included to further refine the results. Experiments are reported regarding the adopted visual features based on covariance matrices and the Mean Shift Feature Space Warping relevance feedback. Finally some hints on the user interface for museum installations are discussed.

N. Bicocchi; M. Lasagni; M. Mamei; A. Prati; R. Cucchiara; F. Zambonelli ( 2010 ) - Unsupervised Learning in Body-area Networks ( International ICST Conference on Body Area Networks - Corfu Island - September 10-12, 2010) ( - International ICST Conference on Body Area Networks ) (ICST - Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering Begijnhoflaan BEL ) - pp. da 164 a 170 ISBN: 9789639995017 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Pattern recognition is becoming a key application in bodyarea networks. This paper presents a framework promoting unsupervised training for multi-modal, multi-sensor classification systems. Specifically, it enables sensors provided with patter-recognition capabilities to autonomously supervise the learning process of other sensors. The approach is discussed using a case study combining a smart camera and a body-worn accelerometer. The body-worn accelerometer sensor is trained to recognize four user activities pairing accelerometer data with labels coming from the camera. Experimental results illustrate the applicability of the approach in different conditions.

R. Vezzani; R. Cucchiara ( 2010 ) - Video Surveillance Online Repository (ViSOR): an integrated framework - MULTIMEDIA TOOLS AND APPLICATIONS - n. volume 50 - pp. da 359 a 380 ISSN: 1380-7501 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The availability of new techniques and tools for Video Surveillance and the capability of storing huge amounts of visual data acquired by hundreds of cameras every day call for a convergence between pattern recognition, computer vision and multimedia paradigms. A clear need for this convergence is shown by new research projects which attempt to exploit both ontology-based retrieval and video analysis techniques also in the field of surveillance.This paper presents the ViSOR (Video Surveillance Online Repository) framework, designed with the aim of establishing an open platform for collecting, annotating, retrieving, and sharing surveillance videos, as well as evaluating the performance of automatic surveillance systems. Annotations are based on a reference ontology which has been defined integrating hundreds of concepts, some of them coming from the LSCOM and MediaMill ontologies. A new annotation classification schema is also provided, which is aimed at identifying the spatial, temporal and domain detail level used.The ViSOR web interface allows video browsing, querying by annotated concepts or by keywords, compressed video previewing, media downloading and uploading.Finally, ViSOR includes a performance evaluation desk which can be used to compare different annotations.

Davide Baltieri; Roberto Vezzani; Rita Cucchiara ( 2010 ) - 3D Body Model Construction and Matching for Real Time People Re-Identification ( Italian Chapter Conference - Genova - 18-19 Nov 2010) ( - Italian Chapter Conference ) (Eurographics Association Genova ITA ) - pp. da 65 a 71 ISBN: 9783905673807 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Wide area video surveillance always requires to extract and integrate information coming from different cameras and views. Re-identification of people captured from different cameras or different views is one of most challenging problems. In this paper, we present a novel approach for people matching with vertices-based 3D human models.People are detected and tracked in each calibrated camera, and their silhouette, appearance, position and orientation are extracted and used to place, scale and orientate a 3D body model. Colour features are computed from the 2D appearance images and mapped to the 3D model vertices, generating the 3D model for each tracked person. A distance function between 3D models is defined in order to find matches among models belonging to the same person. This approach achieves robustness against partial occlusions, pose and viewpoint changes. A first experimental evaluation is conducted using images extracted from a real camera set-up.

P. Piccinini; A. Prati; R. Cucchiara ( 2009 ) - A Fast Multi-model Approach for Object Duplicate Extraction ( Ninth IEEE Computer Society Workshop on Application of Computer Vision (WACV 2009) - Snowbird, UT (USA) - 7-8 December 2009) ( - Proceedings of Ninth IEEE Computer Society Workshop on Application of Computer Vision (WACV 2009) ) (IEEE Washington, DC (USA) USA ) - pp. da 106 a 111 ISBN: 9781424454969 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents an innovative approach for localizingand segmenting duplicate objects for industrial applications.The working conditions are challenging, withcomplex heavily-occluded objects, arranged at random inthe scene. To account for high flexibility and processingspeed, this approach exploits SIFT keypoint extraction andmean shift clustering to efficiently partition the correspondencesbetween the object model and the duplicates ontothe different object instances. The re-projection (by meansof an Euclidean transform) of some delimiting points ontothe current image is used to segment the object shapes. Thisprocedure is compared in terms of accuracy with existinghomography-based solutions which make use of RANSACto eliminate outliers in the homography estimation. Moreover,in order to improve the extraction in the case of reflectiveor transparent objects, multiple object models are usedand fused together. Experimental results on different andchallenging kinds of objects are reported.

S. Calderara; C. Alaimo; A. Prati; R. Cucchiara ( 2009 ) - A Real-Time System for Abnormal Path Detection ( 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP-09) - London (UK) - 3 December 2009) ( - Proceedings of 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP-09) ) (IET Stevenage Herts GBR ) - pp. da 1 a 6 ISBN: 9781849192071 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a real-time system capable to extract andmodel object trajectories from a multi-camera setup with theaim of identifying abnormal paths. The trajectories are modeledas a sequence of positional distributions (2D Gaussians)and clustered in the training phase by exploiting an innovativedistance measure based on a global alignment techniqueand Bhattacharyya distance between Gaussians. An on-lineclassification procedure is proposed in order to on-the-fly classifynew trajectories into either “normal” or “abnormal” (in thesense of rarely seen before, thus unusual and potentially interesting).Experiments on a real scenario will be presented.

Vezzani, Roberto; Piccardi, Massimo; Cucchiara, Rita ( 2009 ) - An efficient Bayesian framework for on-line action recognition ( IEEE International Conference on Image Processing - Cairo, Egypt - November 7-11, 2009) ( - Proceedings of the IEEE International Conference on Image Processing ) (IEEE Signal Processing Society Piscataway USA ) - pp. da 3553 a 3556 ISBN: 9781424456536 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

On-line action recognition from a continuous stream of actionsis still an open problem with fewer solutions proposedcompared to time-segmented action recognition. The mostchallenging task is to classify the current action while findingits time boundaries at the same time. In this paper wepropose an approach capable of performing on-line actionsegmentation and recognition by means of batteries of HMMtaking into account all the possible time boundaries and actionclasses. A suitable Bayesian normalization is appliedto make observation sequences of different length comparableand computational optimizations are introduce to achievereal-time performances. Results on a well known actiondataset prove the efficacy of the proposed method

C. Grana; D. Borghesani; R. Cucchiara ( 2009 ) - Automatic Analysis of Historical Manuscripts ( 9th International Workshop on Pattern Recognition in Information Systems (PRIS 2009) - Milano - May 7) ( - Pattern Recognition in Information Systems ) (INSTICC Press Lisbona PRT ) - pp. da 93 a 102 ISBN: 9789898111890 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper a document analysis tool for historical manuscripts is proposed. The goal is to automatically segment layout components of the page, that is text, pictures and decorations. We specifically focused on the pictures, proposing a set of visual features able to identify significant pictures and separating them from all the floral and abstract decorations. The analysis is performed by blocks using a limited set of color and texture features, including a new texture descriptor particularly effective for this task, namely Gradient Spatial Dependency Matrix. The feature vectors are processed by an embedding procedure which allows increased performance in later SVM classification.

D. Borghesani; C. Grana; R. Cucchiara ( 2009 ) - Color features performance comparison for image retrieval ( 15th International Conference on Image Analysis and Processing - Vietri sul Mare, Salerno, Italy - Sep 8-11) ( - Image Analysis and Processing - ICIAP 2009 ) (Springer Heidelberg DEU ) - n. volume LNCS 5716 - pp. da 902 a 910 ISBN: 9783642041457 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a comparison of color features for image retrieval. In particular the UCID image database has been employed to compare the retrieval capabilities of different color descriptors. The set of descriptors comprises global and spatially related features, and the tests show that HSV based global features provide the best performance at varying brightness and contrast settings.

C. Grana; D. Borghesani; R. Cucchiara ( 2009 ) - Connected component labeling techniques on modern architectures ( 15th International Conference on Image Analysis and Processing - Vietri sul Mare, Salerno, Italy - Sep 8-11) ( - Image Analysis and Processing - ICIAP 2009 ) (Springer Heidelberg DEU ) - n. volume 5716 - pp. da 816 a 824 ISBN: 9783642041457 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present an overview of the historical evolution of connected component labeling algorithms, and in particular the ones applied on images stored in raster scan order. This brief survey aims at providing a comprehensive comparison of their performance on modern architectures, since the high availability of memory and the presence of caches make some solutions more suitable and fast. Moreover we propose a new strategy for label propagation based on a 2x2 blocks, which allows to improve the performance of many existing algorithms. The tests are conducted on high resolution images obtained from digitized historical manuscripts and a set of transformations is applied in order to show the algorithms behavior at different image resolutions and with a varying number of labels.

G. Gualdi; A. Prati; R. Cucchiara ( 2009 ) - Covariance Descriptors on Moving Regions for Human Detection in Very Complex Outdoor Scenes ( Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC 2009) - Como, Italy - 30 Aug-2 Sept, 2009) ( - Proceedings of Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC 2009) ) (IEEE Washington, DC (USA) USA ) - pp. da 1 a 8 ISBN: 9781424446209 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The detection of humans in very complex scenes can be very challenging, due to the performance degradation of classical motion detection and tracking approaches. An alternative approach is the detection of human-like patterns over the whole image. The present paper follows this line by extending Tuzel et al.’s technique [1] based on covariance descriptors and LogitBoost algorithm applied over Riemannian manifolds. Our proposal represents a significant extension of it by: (a) exploiting motion information to focus the attention over areas in which motion is present or was present in the recent past; (b) enriching the human classifier by additional, dedicated cascades trained on positive and negative samples taken from the specific scene; (c) using a rough estimation of the scene perspective, to reduce false detections and improve system performance. This approach is suitable in multi-camera scenarios, since the monolithic block for human-detection remains the same for the whole system, whereas the parameter tuning and set-up of the three proposed extensions (the only camera-dependent parts of the system), are automatically computed for each camera. The approach has been tested on a construction working site in which complexity and dynamics are very high, making human detection a real challenge. The experimental results demonstrate the improvements achieved by the proposed approach.

Bertini, Marco; Del Bimbo, Alberto; Serra, Giuseppe; Torniai, Carlo; Cucchiara, Rita; Grana, Costantino; Vezzani, Roberto ( 2009 ) - Dynamic Pictorially Enriched Ontologies for Video Digital Libraries - IEEE MULTIMEDIA - n. volume 16 (2) - pp. da 42 a 51 ISSN: 1070-986X [Articolo in rivista (262) - Articolo su rivista]
Abstract

This article presents a framework for automatic semantic annotation of video streams with an ontology that includes concepts expressed using linguistic terms and visual data.

R. Serra; R. Cucchiara ( 2009 ) - Emergent perspetives in artificial intelligence (Springer Heidelberg DEU ) - pp. da 1 a 508 ISBN: 9783642102905 [Monografia o trattato scientifico (276) - Monografia/Trattato scientifico]
Abstract

Proceedings of the XIth International Conference on Artificial Intelligence

C. Grana; D. Borghesani; R. Cucchiara ( 2009 ) - Fast Block Based Connected Components Labeling ( IEEE International Conference on Image Processing - Cairo, Egypt - Nov 7-12) ( - Proceedings of the IEEE International Conference on Image Processing ) (Conference Management Services, Inc. Bryan, Texas USA ) - pp. da 4061 a 4064 ISBN: 9781424456536; 9781424456550 | 9781424456550 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a new optimization technique for the neighborhood computation in connected component labeling focused on images stored in raster scan order. This new technique is based on a 2x2 square block analysis of the image, and it exploits the fact that, when using 8-connection, the pixels of a 2x2 square are all connected to each other. This implies that they will share the same label at the end of the computation. To prove the effectiveness of our proposal, we show a comprehensive comparison of the most used and advanced connected components labeling techniques presented so far. The tests are conducted on high resolution images obtained from digitized historical manuscripts and a set of transformations is applied in order to show the algorithms behavior at different image resolutions and with a varying number of labels.

Calderara, Simone; Prati, Andrea; Cucchiara, Rita ( 2009 ) - Learning People Trajectories using Semi-directional Statistics ( Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance - Genoa, Italy - 2-4 September 2009) ( - Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance ) (IEEE Danvers (MA) USA ) - pp. da 213 a 218 ISBN: 9780769537184 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a system for people trajectory shape analysis by exploiting a statistical approach which accounts for sequences of both directional (the directions of the trajectory) and linear (the speeds) data. A semi-directional distribution (AWLG - Approximated Wrapped and Linear Gaussian) is used with a mixture to find main directions and speeds. A variational version of the mutual information criterion is proposed to prove the statistical dependency of the data. Then, in order to compare data sequences, we define an inexact method with a Kullback-Leibler-based distance measure and employ a global alignment technique is to handle sequences of different lengths and with local shifts or deformations. A comprehensive analysis of variable dependency and parameter estimation techniques are reported and evaluated on both synthetic and real data sets.

P. Piccinini; A. Prati; R. Cucchiara ( 2009 ) - METODO DI SEGMENTAZIONE BASATO SULLE CARATTERISTICHE PER SEGMENTARE UNA PLURALITA’ DI ARTICOLI DUPLICATI DISPOSTI ALLA RINFUSA E GRUPPO CHE ATTUA TALE METODO PER ALIMENTARE UNA MACCHINA CONFEZIONATRICE [Brevetto (285) - Brevetto]
Abstract

Viene divulgato un metodo di segmentazione basato sulle caratteristiche per segmentare una pluralità di articoli duplicati (3) disposti alla rinfusa, comprendente le fasi di: acquisire un’immagine (M) di un articolo campione (30); calcolare coppie keypoint-descrittore dell’immagine (M); definire una figura identificativa (Z) sull’immagine (M); acquisire una prima immagine (I1) di una pluralità di articoli; calcolare coppie keypoint-descrittore della prima immagine (I1); eseguire il matching delle coppie keypoint-descrittore così definite; acquisire la posizione e l’orientamento relativo della figura identificativa (Z) rispetto ad una prima coppia keypoint-descrittore dell’immagine (M) avente un match con una seconda coppia keypoint-descrittore della prima immagine (I1); definire nella prima immagine (I1) una figura identificativa di proiezione come trasformazione euclidea della figura identificativa (Z) con riferimento alla prima e seconda coppia citate; applicare le due fasi precedenti ad una pluralità di coppie keypoint-descrittore dell’immagine (M) aventi un match con una coppia keypoint-descrittore della prima immagine (I1); raggruppare insieme figure identificative di proiezione aventi fra loro un prestabilito grado di sovrapposizione; definire una figura rappresentativa per ciascun gruppo di figure identificative di proiezione che è formato da un numero minimo prestabilito di figure identificative di proiezione, la quale figura rappresentativa ha la medesima forma e dimensioni di una figura identificativa di proiezione ed è scelta per stimare la posizione di un corrispondente articolo illustrato nella prima immagine (I1).Viene altresì divulgato un metodo per prelevare articoli (3) disposti alla rinfusa in una zona di accumulo di articoli (5) e per posizionare tali articoli (3) in una stazione di uscita (SU), ed un gruppo che attua tale metodo.

P. Piccinini; A. Prati; R. Cucchiara ( 2009 ) - Multiple Object Segmentation for Pick-and-Place Applications ( IAPR CONFERENCE ON MACHINE VISION APPLICATIONS MVA2009 - Yokohama, Japan - May 20-22, 2009) ( - Proceedings of IAPR CONFERENCE ON MACHINE VISION APPLICATIONS MVA2009 ) (MVA Conference Committee Yokohama, Japan JPN ) - pp. da 361 a 366 ISBN: 9784901122092 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a novel approach for detecting multipleinstances of the same object for pick-and-place automation.The working conditions are very challenging, with complex objects, arranged at random in the scene, and heavily occluded. This approach exploits SIFT to obtain a set of correspondences between the object model and the current image. In order to segment the multiple instances of the object, the correspondences are clustered among the objects using a voting scheme which determines the best estimate of the object’s center through mean shift. This procedure is compared in terms of accuracy with existing homography-based solutions which make use of RANSAC to eliminate outliers in the homography estimation.

R. Vezzani; D. Baltieri; R. Cucchiara ( 2009 ) - Pathnodes integration of standalone Particle Filters for people tracking on distributed surveillance systems ( International Conference on Image Analysis and Processing – ICIAP 2009 - Vietri Sul Mare, Salerno - 8 - 11 Settembre 2009) ( - Lecture Notes In Computer Science - Image Analysis and Processing – ICIAP 2009 ) (Springer-Verlag Berlin DEU ) - n. volume 5716/2009 - pp. da 404 a 413 ISBN: 9783642041457 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present a new approach to object tracking based on batteries of particle filter working in multicamera systems with non overlapped fields of view. In each view the moving objects are tracked with independent particle filters; each filter exploits a likelihood function based on both color and motion information. The consistent labeling of people exiting from a camera field of view and entering in a neighbor one is obtained sharing particles information for the initialization of new filtering trackers. The information exchange algorithm is based on path-nodes, which are a graph-based scene representation usually adopted in computer graphics. The approach has been tested even in case of simultaneous transitions, occlusions, and groups of people. Promising results have been obtained and here presented using a real setup of non overlapped cameras.

C. Grana; D. Borghesani; R. Cucchiara ( 2009 ) - Picture Extraction from Digitized Historical Manuscripts ( ACM International Conference on Image and Video Retrieval - Santorini, Greece - Jul 8-10) ( - Proceedings of ACM International Conference on Image and Video Retrieval (CIVR2009) ) (ACM Press New York USA ) - pp. da 169 a 176 ISBN: 9781605584805 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work we propose a system for automatic document segmentation to extract graphical elements from historical manuscripts and then to identify significant pictures from them, removing floral and abstract decorations. The system performs a block based analysis by means of color and texture features. The Gradient Spatial Dependency Matrix, a new texture operator particularly effective for this task, is proposed. The feature vectors are processed by an embedding procedure which allows increased performance in later SVM classification. Results for both feature extraction and embedding based classification are reported, supporting the effectiveness of the proposal.

S. Calderara; R. Cucchiara; A. Prati; R. Vezzani ( 2009 ) - Statistical Pattern Recognition for Multi-Camera Detection, Tracking and Trajectory Analysis ( - Multi-Camera Networks: Concepts and Applications ) (Academic Press Burlington, MA (USA) USA ) - pp. da 389 a 414 ISBN: 978 0 12 374633 7 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

This chapter will address most of the aspects of modern video surveillance with the reference to the research activity conducted at University of Modena and Reggio Emilia, Italy, within the scopes of the national FREE SURF (FREE SUrveillance in a pRivacy-respectFul way) and NATO-funded BE SAFE (Behavioral lEarning in Surveilled Areas with Feature Extraction) projects. Moving object detection and tracking from a single camera, multi-camera consistent labeling and trajectory shape analysis for path classification will be the main topics of this chapter.

A. PRATI; R. CUCCHIARA ( 2009 ) - Video Analysis for Ambient intelligence in Urban Environments ( - Intelligent Environments: Methods, Algorithms and Applications ) (Springer Heidelberg DEU ) [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

Ambient Intelligence (AmI) is an emerging field of research that comprises new paradigms, techniques and systems for intelligent processing of distributed sensing. A challenging arena for AmI framework is represented by urban environments that are characterized by high complexity, numerous sources of data,and spreading of interesting and non-trivial applications. In this context, the project LAICA (Laboratory of Ambient Intelligence for a friendly city) represents a real experiment of the usefulness of AmI for advanced services to citizens. This chapter will address solutions of video analysis that can be directly applied in urban AmI. It describes in details the uniqueness of LAICA approach, focusing in particular on the use of computer vision techniques for monitoring public parks. People surveillanceand web-based video broadcasting will be taken into account.

Calderara, Simone; Prati, Andrea; Cucchiara, Rita ( 2009 ) - Video surveillance and multimedia forensics: an application to trajectory analysis ( First ACM Workshop on Multimedia in Forensics - Beijing, China - 19-24 October 2009) ( - Proceedings of the First ACM Workshop on Multimedia in Forensics ) (ACM New York, USA USA ) - pp. da 13 a 18 ISBN: 9781605587554 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper reports an application of trajectory analysis in which forensics and video surveillance techniques are jointly employed for providing a new tool of multimedia forensics. Advanced video surveillance techniques are used to extract from a multi-camera system the trajectories of the moving people which are then modelled by either their positions (projected on the ground plane) or their directions of movement. Both these two representations can be very suitable for querying large video repositories, by searching for similar trajectories in terms of either sequences of positions or trajectory shape (encoded as sequence of angles, where positions do not care). Preliminary examples of the possible use of this approach are shown.

S. Calderara; A. Prati; R. Cucchiara ( 2008 ) - A Markerless Approach for Consistent Action Recognition in a Multi-camera System ( ACM/IEEE International Conference on Distributed Smart Cameras - Stanford, CA, USA - 7-10 September 2008) ( - Proceedings of ICDSC 2008 ) (IEEE Computer Society Washington, DC, USA USA ) - pp. da 1 a 8 ISBN: 9781424426645 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a method for recognizing human actions in a multi-camera setup. The proposed method automatically extracts significant points on the human body, without the need of artificial markers. A sophisticated appearance-based tracking able to cope with occlusions is exploited to extract a probability map for each moving object. A segmentation technique based on mixture of Gaussians is then employed to extract and track significant points on this map, corresponding to significant regions on the human silhouette. The point tracking produces a set of 3D trajectories that are compared with other trajectories by means of global alignment and dynamic programming techniques. Preliminary experiments showed the potentiality of the proposed approach.

Calderara, S.; Cucchiara, R.; Prati, A. ( 2008 ) - Action Signature: a Novel Holistic Representation for Action Recognition ( IEEE International Conference on Advanced Video and Signal Based Surveillance - Santa Fè (NM) - 1-3 September 2008) ( - AVSS 2008 : IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance ) (IEEE Danvers (MA) USA ) - pp. da 121 a 128 ISBN: 9780769533414 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Recognizing different actions with a unique approach can be a difficult task. This paper proposes a novel holistic representation of actions that we called "action signature". This 1D trajectory is obtained by parsing the 2D image containing the orientations of the gradient calculated on the motion feature map called motion-history image. In this way, the trajectory is a sketch representation of how the object motion varies in time. A robust statistical framework based on mixtures of von Mises distributions and dynamic programming for sequence alignment are used to compare and classify actions/trajectories. The experimental results show a rather high accuracy in distinguishing quite complicated actions, such as drinking, jumping, or abandoning an object.

R. Vezzani; R. Cucchiara ( 2008 ) - AD-HOC: Appearance Driven Human tracking with Occlusion Handling ( First International Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences - Leeds, UK - 5 September 2008) ( - Proceedings of THEMIS 2008 ) (J. Gonzàlez, T.B. Moeslund, L. Wang - ESP ) - n. volume 1 - pp. da 9 a 18 ISBN: 9788493525194 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

AD-HOC copes with the problem of multiple people tracking in video surveillance in presence of large occlusions. The main novelty is the adoption of an appearance-based approach in a formal Bayesian framework: the status of each object is defined at pixel level, where each pixel is characterized by the appearance, i.e. the color (integrated along the time) and the likelihood to belong to the object. With these data at pixel-level and a probability of non-occlusion at object-level, the problem of occlusions is addressed. The method does not aim at detecting the presence of an occlusion only, but classifies the type of occlusion at a sub-region level and evolve the status of theobject in a selective way. The AD-HOC tracking has been tested in many application for indoor and outdoor surveillance. Results on PETS2006 test set are reported where many people and abandoned objects are detected and tracked.

Roberto Vezzani; Rita Cucchiara ( 2008 ) - Annotation Collection and Online Performance Evaluation for Video Surveillance: the ViSOR Project ( 5th IEEE International Conference On Advanced Video and Signal Based Surveillance - Santa Fe, New Mexico - 1-3 september 2008) ( - Proceedings of AVSS2008 ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 1 - pp. da 227 a 234 ISBN: 9780769533414 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents the Visor (VIdeo Surveillance Online Repository) project designed with the aim of establishing anopen platform for collecting, annotating, retrieving, sharingsurveillance videos, and of evaluating the performanceof automatic surveillance systems. The main idea is to exploitthe collaborative paradigm spreading in the web communityto join together the ontology based annotation andretrieval concepts and the requirements of the computer visionand video surveillance communities. The ViSOR openrepository is based on a reference ontology which integratesmany concepts, also coming from LSCOM and MediaMillontologies. The web interface allows video browse, queryby annotated concepts or by keywords, compressed videopreview, media download and upload. The repository containsmetadata annotations, which can be either manuallycreated as ground truth or automatically generated by videosurveillance systems. Their automatic annotations can becompared each other or with the reference ground-truth exploitingan integrated on-line performance evaluator.

Calderara, Simone; Cucchiara, Rita; Prati, Andrea ( 2008 ) - Bayesian-competitive Consistent Labeling for People Surveillance (IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: subscription-service@ieee.org, INTERNET: http://www.ieee.org, Fax: (732)981-9667 ) - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 30 - pp. da 354 a 360 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

This paper presents a novel and robust approach to consistent labeling for people surveillance in multi-camera systems. A general framework scalable to any number of cameras with overlapped views is devised. An off-line training process automatically computes ground-plane homography and recovers epipolar geometry. When a new object is detected in any one camera, hypotheses for potential matching objects in the other cameras are established. Each of the hypotheses is evaluated using a prior and likelihood value. The prior accounts for the positions of the potential matching objects, while the likelihood is computed by warping the vertical axis of the new object on the field of view of the other cameras and measuring the amount of match. In the likelihood, two contributions (forward and backward) are considered so as to correctly handle the case of groups of people merged into single objects. Eventually, a maximum-a-posteriori approach estimates the best label assignment for the new object. Comparisons with other methods based on homography and extensive outdoor experiments demonstrate that the proposed approach is accurate and robust in coping with segmentation errors and in disambiguating groups.

C. Grana; D. Borghesani; R. Cucchiara ( 2008 ) - Describing Texture Directions with Von Mises Distributions ( 19th International Conference on Pattern Recognition - Tampa, Florida, USA - Dec 8-11) ( - Proceedings of the 19th International Conference on Pattern Recognition ) (IEEE Computer Society Press Los Alamitos, CA USA ) - pp. da 1 a 4 ISBN: 9781424421756 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work we describe a new approach for texture characterization. Starting from the autocorrelation matrix an elegant description through a mixture of Von Mises distributions is proposed. A compact 6 valued descriptor is produced for each block and served as input to an SVM classifier. Tests are carried out on high resolution illuminated manuscripts images.

G. Gualdi; A. Prati; R. Cucchiara; E. Ardizzone; M. La Cascia; L. Lo Presti; M. Morana ( 2008 ) - Enabling Technologies on Hybrid Camera Networks for Behavioral Analysis of Unattended Indoor Environments and Their Surroundings ( 1st ACM International Workshop on Vision Network for Behaviour Analysis - Vancouver, BC, Canada - 31 October 2008) ( - Proceedings of VNBA 2008 ) (ACM New York, NY USA ) - pp. da 101 a 108 ISBN: 9781605583136 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a layered network architecture and the enabling technologies for accomplishing vision-based behavioral analysis of unattended environments. Specifically the vision network covers both the attended environment and its surroundings by means of hybrid cameras. The layer overlooking at the surroundings is laid outdoor and tracks people, monitoring entrance/exit points. It recovers the geometry of the site under surveillance and communicates people positions to a higher level layer. The layer monitoring the unattended environment undertakes similar goals, with the addition of maintaining a global mosaic of the observed scene for further understanding. Moreover, it merges information coming from sensors beyond the vision to deepen the understanding or increase the reliability of the system. The behavioral analysis is demanded to a third layer that merges the information received from the two other layers and infers knowledge about what happened, happens and will be likely happening in the environment. The paper also describes a case study that was implemented in the Engineering Campus of the University of Modena and Reggio Emilia, where our surveillance system has been deployed in a computer laboratory which was often unaccessible due to lack of attendance.

Calderara, Simone; Prati, Andrea; Cucchiara, Rita ( 2008 ) - HECOL: Homography and Epipolar-based Consistent Labeling for Outdoor Park Surveillance (Academic Press Incorporated:6277 Sea Harbor Drive:Orlando, FL 32887:(800)543-9534, (407)345-4100, EMAIL: ap@acad.com, INTERNET: http://www.idealibrary.com, Fax: (407)352-3445 ) - COMPUTER VISION AND IMAGE UNDERSTANDING - n. volume 111(1) - pp. da 21 a 42 ISSN: 1077-3142 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Outdoor surveillance is one of the most attractive application of video processing and analysis. Robust algorithms must be defined and tuned to cope with the non-idealities of outdoor scenes. For instance, in a public park, an automatic video surveillance system must discriminate between shadows, reflections, waving trees, people standing still or moving, and other objects. Visual knowledge coming from multiple cameras can disambiguate cluttered and occluded targets by providing a continuous consistent labeling of tracked objects among the different views. This work proposes a new approach for coping with this problem in multi-camera systems with overlapped Fields of View (FoVs). The presence of overlapped zones allows the definition of a geometry-based approach to reconstruct correspondences between FoVs, using only homography and epipolar lines (hereinafter HECOL: Homography and Epipolar-based COnsistent Labeling) computed automatically with a training phase. We also propose a complete system that provides segmentation and tracking of people in each camera module. Segmentation is performed by means of the SAKBOT (Statistical and Knowledge Based Object Tracker) approach, suitably modified to cope with multi-modal backgrounds, reflections and other artefacts, typical of outdoor scenes. The extracted objects are tracked using a statistical appearance model robust against occlusions and segmentation errors. The main novelty of this paper is the approach to consistent labeling. A specific Camera Transition Graph is adopted to efficiently select the possible correspondence hypotheses between labels. A Bayesian MAP optimization assigns consistent labels to objects detected by several points of views: the object axis is computed from the shape tracked in each camera module and homography and epipolar lines allow a correct axis warping in other image planes. Both forward and backward probability contributions from the two different warping directions make the approach robust against segmentation errors, and capable of disambiguating groups of people. The system has been tested in a real setup of a urban public park, within the Italian LAICA (Laboratory of Ambient Intelligence for a friendly city) project. The experiments show how the system can correctly track and label objects in a distributed system with real-time performance. Comparisons with simpler consistent labeling methods and extensive outdoor experiments with ground truth demonstrate the accuracy and robustness of the proposed approach.

C. Grana; D. Borghesani; S. Calderara; R. Cucchiara ( 2008 ) - "Inside the Bible": Segmentation, Annotation and Retrieval for a New Browsing Experience ( 1st ACM SIGMM International Conference on Multimedia Information Retrieval (MIR 2008) - Vancouver, British Columbia, Canada - Oct 30-31) ( - Proceeding of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval (MIR 2008) ) (ACM Press New York USA ) - pp. da 379 a 386 ISBN: 9781605583129 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a system for automatic segmentation, annotation and image retrieval based on content, focused on illuminated manuscripts and in particular the Borso D'Este Holy Bible. To enhance the interaction possibilities with this work, full of decorations and illustrations, we exploit some well known document analysis techniques in addition to some new approaches, in order to achieve good segmentation of pages into meaningful visual objects with the relative annotation. We wanted to extend the standard keyword-based retrieval approach in a commentary with a modern visual-based retrieval by appearance similarity: an entire software user interface for exploration and visual search of illuminated manuscripts.

N. Bicocchi; M. Mamei; A. Prati; R. Cucchiara; F. Zambonelli ( 2008 ) - Pervasive Self-Learning with multi-modal distributed sensors ( SASO 2008: IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops - Venice, Italy - October 20-October 24 2008) ( - IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops ) (IEEE Computer Society Los Alamitos, CA, USA USA ) - pp. da 61 a 66 ISBN: 9780769535531 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Truly ubiquitous computing poses new and significantchallenges. One of the key aspects that will condition theimpact of these new tecnologies is how to obtain a manageablerepresentation of the surrounding environment startingfrom simple sensing capabilities. This will make devicesable to adapt their computing activities on an everchangingenvironment. This paper presents a frameworkto promote unsupervised training processes among differentsensors. This framework allows different sensors to exchangethe needed knowledge to create a model to classifyevents. In particular we developed, as a case study,a multi-modal multi-sensor classification system combiningdata from a camera and a body-worn accelerometer to identifythe user motion state. The body-worn accelerometerlearns a model of the user behavior exploiting the informationcoming from the camera and uses it later on to classifythe user motion in an autonomous way. Experimentsdemonstrate the accuracy of the proposed approach in differentsituations.

Piccinini, Paolo; Calderara, Simone; Cucchiara, Rita ( 2008 ) - Reliable smoke detection system in the domains of image energy and color ( ICIP 2008 : 2008 IEEE International Conference on Image Processing - San Diego (CA) - 2008) ( - ICIP 2008 : 2008 IEEE International Conference on Image Processing : proceedings : October 12-15, 20078 [sic], San Diego, California, U.S.A. ) (IEEE Piscataway (NJ) USA ) - pp. da 1376 a 1379 ISBN: 1424417643 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Smoke detection calls for a reliable and fast distinction between background, moving objects and variable shapes that are recognizable as smoke. In our system we propose a stable background suppression module joined with a smoke detection module working on segmented objects. It exploits two features: the energy variation in wavelet model and a color model of the smoke. The decrease of energy ratio in wavelet domain between background and current image is a clue to detect smoke representing the variations of texture level. A mixture of Gaussians models this texture ratio for temporal evolution. The color model is used as reference to measure the deviation of the current pixel color from the model. The two features have been combined using a Bayesian classifier to detect smoke in the scene. Experiments on real data and a comparison between our background model and Gaussian Mixture(MOG) model for smoke detection are presented. © 2008 IEEE.

Calderara, Simone; Piccinini, Paolo; Cucchiara, Rita ( 2008 ) - Smoke detection in video surveillance: A MoG model in the wavelet domain ( - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ) - n. volume 5008 - pp. da 119 a 128 ISBN: 3540795464; 3540795464 | 3540795464 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

The paper presents a new fast and robust technique of smoke detection in video surveillance images. The approach aims at detecting the spring or the presence of smoke by analyzing color and texture features of moving objects, segmented with background subtraction. The proposal embodies some novelties: first the temporal behavior of the smoke is modeled by a Mixture of Gaussians (MoG ) of the energy variation in the wavelet domain. The MoG takes into account the image energy variation due to either external luminance changes or the smoke propagation. It allows a distinction to energy variation due to the presence of real moving objects such as people and vehicles. Second, this textural analysis is enriched by a color analysis based on the blending function. Third, a Bayesian model is defined where the texture and color features, detected at block level, contributes to model the likelihood while a global evaluation of the entire image models the prior probability contribution. The resulting approach is very flexible and can be adopted in conjunction to a whichever video surveillance system based on dynamic background model. Several tests on tens of different contexts, both outdoor and indoor prove its robustness and precision. © 2008 Springer-Verlag Berlin Heidelberg.

Roberto Vezzani; Simone Calderara; Paolo Piccinini; Rita Cucchiara ( 2008 ) - Smoke detection in videosurveillance: the use of VISOR (Video Surveillance On-line Repository) ( Proceeding of ACM International Conference on Image and Video Retrieval - Niagara Falls, Ontario, Canada - 7-9 July 2008) ( - Proceedings of CIVR 2008 ) (ACM - ) - n. volume 1 - pp. da 289 a 298 ISBN: 9781605580708 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Visor (VIdeo Surveillance Online Repository) is a large videorepository, designed for containing annotated video surveillancefootages, comparing annotations, evaluating systemperformance, and performing retrieval tasks. The web interfaceallows video browse, query by annotated conceptsor by keywords, compressed video preview, media downloadand upload. The repository contains metadata annotations,both manually created ground-truth data and automaticallyobtained outputs of particular systems. An exampleof application is the collection of videos and annotationsfor smoke detection, an important video surveillance task. Inthis paper we present the architecture of ViSOR, the build-insurveillance ontology which integrates many concepts, alsocoming from LSCOM, and MediaMill, the annotation toolsand the visualization of results for performance evaluation.The annotation is obtained with an automatic smoke detectionsystem, capable to detect people, moving objects, andsmoke in real-time.

Prati, Andrea; Calderara, Simone; Cucchiara, Rita ( 2008 ) - Using circular statistics for trajectory shape analysis ( 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR - Anchorage (AK) - 2008) ( - 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 23-28 June 2008 ) - pp. da 3847 a 3854 ISBN: 9781424422425 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The analysis of patterns of movement is a crucial task for several surveillance applications, for instance to classify normal or abnormal people trajectories on the basis of their occurrence. This paper proposes to model the shape of a single trajectory as a sequence of angles described using a Mixture of Von Mises (MoVM) distribution. A complete EM (Expectation Maximization) algorithm is derived for MoVM parameters estimation and an on-line version proposed to meet real time requirement. Maximum-A-Posteriori is used to encode the trajectory as a sequence of symbols corresponding to the MoVM components. Iterative k-medoids clustering groups trajectories in a variable number of similarity classes. The similarity is computed aligning (with dynamic programming) two sequences and considering as symbol-to-symbol distance the Bhattacharyya distance between von Mises distributions. Extensive experiments have been performed on both synthetic and real data. ©2008 IEEE.

G. Gualdi; A. Albarelli; A. Prati; A. Torsello; M. Pelillo; R. Cucchiara ( 2008 ) - Using Dominant Sets for Object Tracking with Freely Moving Camera ( Workshop on Visual Surveillance - Marseille, France - 17 October 2008) ( - Proceedings of VS 2008 ) (- Marseille FRA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Object tracking with freely moving cameras is an openissue, since background information cannot be exploited forforeground segmentation, and plain feature tracking is notrobust enough for target tracking, due to occlusions, distractors and object deformations. In order to deal withsuch challenging conditions a traditional approach, basedon Camshift-like color-based features, is augmented by introducing a structural model of the object to be tracked incorporating previous knowledge about the spatial relationsbetween the parts. Hence, an attributed graph is built ontop of the features extracted from each frame and a graphmatching technique is used to extract the optimal matchwith the model. Pixel-wise and object-wise comparisonwith other tracking techniques with respect to manually obtained ground truth are presented.

Roberto Vezzani; Rita Cucchiara ( 2008 ) - ViSOR: Video Surveillance On-line Repository for Annotation Retrieval ( IEEE International Conference on Multimedia & Expo - Hannover, DE - 23-26 june 2008) ( - Proceedings of ICME 2008 ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 1 - pp. da 1281 a 1284 ISBN: 9781424425716 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The Imagelab Laboratory of the University of Modena andReggio Emilia has designed a large video repository, aimingat containing annotated video surveillance footages. The webinterface, named ViSOR (VIdeo Surveillance Online Repository),allows video browse, query by annotated concepts or bykeywords, compressed preview, video download and upload.The repository contains metadata annotation, both manuallyannotated ground-truth data and automatically obtained outputsof a particular system. In such a manner, the users of therepository are able to perform validation tasks of their ownalgorithms as well as comparative activities.

Calderara, Simone; Cucchiara, Rita; Prati, Andrea ( 2007 ) - A Distributed Outdoor Video Surveillance System for Detection of Abnormal People Trajectories ( First ACM/IEEE International Conference on Distributed Smart Cameras - Vienna, Austria - September 25-28 2007) ( - 2007 First ACM/IEEE International Conference on Distributed Smart Cameras ) (IEEE Piscataway (NJ) USA ) - pp. da 364 a 371 ISBN: 1424413540 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Distributed surveillance systems are nowadays widely adopted to monitor large areas for security purposes. In this paper, we present a complete multicamera system designed for people tracking from multiple partially overlapped views and capable of inferring and detecting abnormal people trajectories. Detection and tracking are performed by means of background suppression and an appearance-based probabilistic approach. Objects' label ambiguities are geometrically solved and the concept of "normality" is learned from data using a robust statistical model based on Von Mises distributions. Abnormal trajectories are detected using a first-order Bayesian network and, for each abnormal event, the appearance of the subject from each view is logged. Experiments demonstrate that our system can process with real-time performance up to three cameras simultaneously in an unsupervised setup and under varying environmental conditions.

Calderara, S.; Cucchiara, R.; Prati, A. ( 2007 ) - A Dynamic Programming Technique for Classifying Trajectories ( 14th International Conference on Image Analysis and Processing - Modena Italy - 10-14 September 2007) ( - ICIAP 2007: 14th International Conference on Image Analysis and Processing ) (IEEE Washington, DC, USA USA ) - pp. da 137 a 142 ISBN: 9780769528779 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes the exploitation of a dynamic programming technique for efficiently comparing people trajectories adopting an encoding scheme that jointly takes into account both the direction and the velocity of movement. With this approach, each pair of trajectories in the training set is compared and the corresponding distance computed. Clustering is achieved by using the k-medoids algorithm and each cluster is modeled with a 1-D Gaussian over the distance from the medoid. A MAP framework is adopted for the testing phase. The reported results are encouraging.

R. Cucchiara; A. Prati; R. Vezzani ( 2007 ) - A Multi-Camera Vision System for Fall Detection and Alarm Generation - EXPERT SYSTEMS - n. volume 24 (4) - pp. da 334 a 345 ISSN: 0266-4720 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In-house video surveillance can represent an excellent support for people with some difficulties (e.g. elderly or disabled people) living alone and with a limited autonomy. New hardware technologies and in particular digital cameras are now affordable and they have recently gained credit as tools for (semi-)automatically assuring people's safety. In this paper a multi-camera vision system for detecting and tracking people and recognizing dangerous behaviours and events such as a fall is presented. In such a situation a suitable alarm can be sent, e.g. by means of an SMS. A novel technique of warping people's silhouette is proposed to exchange visual information between partially overlapped cameras whenever a camera handover occurs. Finally, a multi-client and multi-threaded transcoding video server delivers live video streams to operators/remote users in order to check the validity of a received alarm. Semantic and event-based transcoding algorithms are used to optimize the bandwidth usage. A two-room setup has been created in our laboratory to test the performance of the overall system and some of the results obtained are reported.

G. Gualdi; A. Prati; R. Cucchiara ( 2007 ) - An Open Source Architecture for Low-Latency Video Streaming on PDAs ( International Symposium on Multimedia - Taichung, Taiwan - 10-12 December 2007) ( - Proceedings of ISM 2007 ) (IEEE Computer Society Washington, DC, USA USA ) - pp. da 302 a 309 ISBN: 9780769530581 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a open-source system for low-latency video streaming on PDAs, specifically addressing mobile video surveillance requirements. The system is based on H.264 and suitably modified to obtain the best trade-off between image quality and video fluidity, working also at very limited bandwidths. Moreover, the used con- trols allow to keep the number of lost frames very low. A large set of experiments and comparisons have been carried out and the achieved results demonstrate the efficacy and efficiency of our system.

R. Cucchiara; A. Prati; S. Calderara; R. Vezzani ( 2007 ) - Behavioral lEarning in Surveilled Areas with Feature Extraction [Altro (298) - Partecipazione a progetti di ricerca]
Abstract

The project aims at exploring how visual features can be automatically extracted from video using computer vision techniques and exploited by a classifier (generated by machine learning) to detect and identify suspicious people behavior in public places in real time. In this sense, CV and ML are jointly developed and studied to provide a better mix of innovative techniques.

C. Grana; R. Vezzani; D. Borghesani; R. Cucchiara ( 2007 ) - Compressed Domain Features Extraction for Shot Characterization ( 1st International Workshop on Knowledge Acquisition from Multimedia Content - Genova, Italy - Dec 5) ( - Proceedings of the 1st International Workshop on Knowledge Acquisition from Multimedia Content ) (T. Bürger, S. Dasiopoulou, C. Eckes, S.J. Perantonis, J. Pereira, V. Tzouvaras Innsbruck AUT ) - pp. da 71 a 80 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we propose a system for shot comparison directly working on the MPEG-1 stream in the compressed domain, extracting both color, texture and motion features considering all frames with a reasonable computational cost, and results comparable to those obtained on uncompressed keyframes. In particular a summary descriptor for each Group Of Pictures (GOP) is computed and employed for shot characterization and comparison. The Mallows distance allows to match different length clips in a unified framework.

CAlderara, S.; Cucchiara, R.; Prati, A. ( 2007 ) - Detection of Abnormal Behaviors using a Mixture of Von Mises Distributions ( IEEE Conference on Advanced Video and Signal based Surveillance - London (UK) - 5-7 September 2007) ( - 2007 IEEE Conference on advanced video and signal based surveillance : AVSS 2007 ) (IEEE Piscataway (NJ) USA ) - pp. da 141 a 146 ISBN: 9781424416967 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes the use of a mixture of Von Mises distributions to detect abnormal behaviors of moving people. The mixture is created from an unsupervised training set by exploiting k-medoids clustering algorithm based on Bhattacharyya distance between distributions. The extracted medoids are used as modes in the multi-modal mixture whose weights are the priors of the specific medoid. Given the mixture model a new trajectory is verified on the model by considering each direction composing it as independent. Experiments over a real scenario composed of multiple, partially-overlapped cameras are reported.

M. Bertini; A. Del Bimbo; C. Torniai; C. Grana; R. Cucchiara ( 2007 ) - Dynamic Pictorial Ontologies for Video Digital libraries Annotation ( 1st ACM Workshop on The Many Faces of Multimedia Semantics (MS 2007) - Ausburg, Germany - Sep 28) ( - Proceedings of the 1st ACM Workshop on The Many Faces of Multimedia Semantics (MS 2007) ) (ACM Press New York USA ) - pp. da 47 a 56 ISBN: 9781595937827 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present the dynamic pictorial ontology paradigm for video annotation. Ontologies are often used to describe a given domain for different goals, including description of multimedia data. In the case of video annotation, the visual knowledge cannot be described using only abstract concepts but is more effectively represented in a visual form. To this aim, we introduce visual concepts, elicited from the data set as the most representative prototypes that specialize abstract concepts. The ontology created is intrinsically dynamic since it must embrace the perceptual and visual experience during annotation. Thus visual concepts can change, adapting to the multimedia content analyzed. Motivation for this new ontology paradigm are discussed together with a proposal of a framework for ontology creation, maintenance, and automatic annotation of video. The creation and usage of dynamic pictorial ontologies have been tested for soccer domain exploiting low level perceptual features and higher level domain features.

C. Grana; R. Vezzani; R. Cucchiara ( 2007 ) - Enhancing HSV Histograms with Achromatic Points Detection for Video Retrieval ( 6th ACM International Conference on Image and Video Retrieval (CIVR 2007) - Amsterdam, The Netherlands - Jul 9-11) ( - Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR 2007) ) (ACM New York USA ) - pp. da 302 a 308 ISBN: 9781595937339 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Color is one of the most meaningful features used in content based retrieval of visual data. In video content based retrieval, color features computed on selected frames are integrated with other low-level features concerning texture, shape and motion in order to find clip similarities. For example, the Scalable Color feature defined in the MPEG-7 standard exploits HSV histograms to create color feature vectors. HSV is a widely adopted space in image and video retrieval, but its quantization for histogram generation can create misleading errors in classification of achromatic and low saturated colors. In this paper we propose an Enhanced HSV Histogram with achromatic point detection based on a single Hue and Saturation parameter that can correct this limitation. The enhanced histograms have proven to be effective in color analysis and they have been used in a system for automatic clip annotation called PEANO, where pictorial concepts are extracted by a clip clustering and used for similarity based automatic annotation.

P. REMAGNINO; A. PRATI; G.L. FORESTI; R. CUCCHIARA ( 2007 ) - Guest Editorial: Expert environments: machine intelligence methods for ambient intelligence (Blackwell Publishing Limited:9600 Garsington Road, Oxford OX4 2DQ United Kingdom:011 44 1865 776868 , (781)388-8200, EMAIL: agentservices@oxon.blackwellpublishing.com, e-help@blackwellpublishers.co.uk, INTERNET: http://www.blackwellpublishing.com, Fax: 011 44 1865 714591 ) - EXPERT SYSTEMS - n. volume 24 (5) - pp. da 293 a 294 ISSN: 0266-4720 [Articolo in rivista (262) - Articolo su rivista]
Abstract

-

C. Grana; R. Cucchiara ( 2007 ) - Linear Transition Detection as a Unified Shot Detection Approach - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY - n. volume 17 (4) - pp. da 483 a 489 ISSN: 1051-8215 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper, we propose an automatic system forvideo shot segmentation, called Linear Transition Detector (LTD),unique for both cuts and linear transitions detection. Comparisonwith publicly available shot detection systems is reported ondifferent sports (Formula 1, basket, soccer and cycling) andTRECVID 2005 results are also reported.

G. GUALDI; A. PRATI; R. CUCCHIARA ( 2007 ) - Mobile Video Surveillance with Low-Bandwidth Low-Latency Video Streaming ( ACM Workshop on Mobile Video - Ausburg (Germany) - 28 October 2007) ( - Proceedings of MV 2007 ) (ACM New York, NY USA ) - pp. da 67 a 72 ISBN: 9781595937797 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents a system for remote live video surveillance. Videos are acquired from a fixed camera at 10 fps and QVGA resolution, compressed at 5 or 20 kbit/s with H.264, and streamed to a remote site, where they get processed by an automatic video surveillance system. The target surveillance application performs moving object segmentation and tracking. Both ends (video acquisition and processing) could be connected through a wireless network, specifically GPRS.The whole system is studied and optimized to maintain low latency. The reported experiments demonstrate that the proposed system is able to send up to four video streams over GPRS or E-GPRS network, without significantly affecting the performance of the automatic video surveillance system. Comparative tests have been performed with other existing streaming solutions.

C. Grana; D. Vanini; S. Seidenari; G. Pellacani; R. Cucchiara ( 2007 ) - Network patterns recognition for automatic dermatoscopic images classification ( Medical Imaging 2007 - San Diego (CA) U.S.A. - Feb 17-22) ( - Proceedings of SPIE Medical Imaging ) (SPIE - The International Society for Optical Engineering Bellingham, WS USA ) - n. volume 6512 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we focus on the problem of automatic classification of melanocytic lesions, aiming at identifying the presence of reticular patterns. The recognition of reticular lesions is an important step in the description of the pigmented network, in order to obtain meaningful diagnostic information. Parameters like color, size or symmetry could benefit from the knowledge of having a reticular or non-reticular lesion. The detection of network patterns is performed with a three-steps procedure. The first step is the localization of line points, by means of the line points detection algorithm, firstly described by Steger. The second step is the linking of such points into a line considering the direction of the line at its endpoints and the number of line points connected to these. Finally a third step discards the meshes which couldn’t be closed at the end of the linking procedure and the ones characterized by anomalous values of area or circularity. The number of the valid meshes left and their area with respect to the whole area of the lesion are the inputs of a discriminant function which classifies the lesions into reticular and non-reticular. This approach was tested on two balanced (both sets are formed by 50 reticular and 50 non-reticular images) training and testing sets. We obtained above 86% correct classification of the reticular and non-reticular lesions on real skin images, with a specificity value never lower than 92%.

C. Grana; R. Vezzani; R. Cucchiara ( 2007 ) - Prototypes Selection with Context Based Intra-class Clustering for Video Annotation with Mpeg7 Features ( First International DELOS Conference - Pisa, Italy - Feb 13-14) ( - Digital Libraries: Research and Development ) (Springer Heidelberg DEU ) - n. volume 4877 - pp. da 268 a 277 ISBN: 9783540770879 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we analyze the effectiveness of perceptual features to automatically annotate video clips in domain-specific video digital libraries. Typically, automatic annotation is provided by computing clip similarity with respect to given examples, which constitute the knowledgebase, in accordance with a given ontology or a classification scheme. Since the amount of training clips is normally very large, we propose to automatically extract some prototypes, or visual concepts, for each class instead of using the whole knowledge base. The prototypes are generated after a Complete Link clustering based on perceptual features with an automatic selection of the number of clusters. Context based information are used in an intra-class clustering framework to provide selection of more discriminative clips. Reducing the number of samples makes the matching process faster and lessens the storage requirements. Clips are annotated following the MPEG-7 directives to provide easier portability. Results are provided on videos taken from sports and news digital libraries.

R. Cucchiara; C. Grana; R. Vezzani ( 2007 ) - Semi-automatic Video Digital Library Annotation Tools ( Third Italian Research Conference on Digital Library Systems (IRCDL 2007) - Padova - Jan 29-30) ( - Post-proceedings of the Third Italian Research Conference on Digital Library Systems (IRCDL 2007) ) (DELOS: a Network of Excellence on Digital Libraries Padova ITA ) - pp. da 18 a 21 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we present a general purpose systemfor hierarchical structural segmentation and automaticannotation of video clips, by means of standardizedlow level features. We propose to automatically extractsome prototypes for each class with a context basedintra-class clustering. Clips are annotated followingthe MPEG-7 standard directives to provide easierportability. Results of automatic annotation and semiautomaticmetadata creation are provided.

C. Grana; M. Davolio; R. Cucchiara ( 2007 ) - Similarity-Based Retrieval with MPEG-7 3D Descriptors: Performance Evaluation on the Princeton Shape Benchmark ( First International DELOS Conference - Pisa, Italy - Feb 13-14) ( - Digital Libraries: Research and Development ) (Springer Heidelberg DEU ) - n. volume LNCS 4877 - pp. da 308 a 317 ISBN: 9783540770879 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we describe in detail the new MPEG-7 Perceptual 3D Shape Descriptor and provide a set of tests with different 3D objects databases, mainly with the Princeton Shape Benchmark. With this purpose we created a function library called Retrieval-3D and fixed some bugs of the MPEG-7 eXperimentation Model (XM). We explain how to match the Attributed Relational Graph (ARG) of every 3D model with the modified nested Earth Mover’s Distance (mnEMD). Finally we compare our results with the best found in literature, including the first MPEG-7 3D descriptor, i.e. the Shape Spectrum Descriptor.

M. Bertini; A. Del Bimbo; C. Torniai; C. Grana; R. Vezzani; R. Cucchiara ( 2007 ) - Sports Video Annotation Using Enhanced HSV Histograms in Multimedia Ontologies ( International Workshop on Visual and Multimedia Digital Libraries - Modena, Italy - Sep 14) ( - ICIAP 2007 Workshops ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 160 a 167 ISBN: 9780769529219 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents multimedia ontologies, where multimedia data and traditional textual ontologies are merged. A solution for their implementation for the soccer video domain and a method to perform automatic soccer video annotation using these extended ontologies is shown. HSV is a widely adopted space in image and video retrieval, but its quantization for histogram generation can create misleading errors in classification of achromatic and low saturated colors. In this paper we propose an Enhanced HSV Histogram with achromatic point detection based on a single Hue and Saturation parameter that can correct this limitation.The more general concepts of the sport domain (e.g. play/break, crowd, etc.) are put in correspondence with the more general visual features of the video like color and texture, while the more specific concepts of the soccer domain (e.g. highlights such as attack actions) are put in correspondence with domain specific visual feature like the soccer playfield and the players. Experimental results for annotation of soccer videos using generic concepts are presented.

R. Cucchiara; A. Prati; R. Vezzani; L. Benini; E. Farella; P. Zappi ( 2007 ) - Using a Wireless Sensor Network to Enhance Video Surveillance - JOURNAL OF UBIQUITOUS COMPUTING AND INTELLIGENCE - n. volume 1 (2) - pp. da 1 a 11 ISSN: 1555-1326 [Articolo in rivista (262) - Articolo su rivista]
Abstract

To enhance video surveillance systems, multi-modal sensor integration can be a successful strategy. In this work, a computer vision system able to detect and track people from multiple cameras is integrated with a wireless sensor network mounting passive Pyroelectric InfraRed sensors. Thetwo subsystems are briefly described and possible cases in which computer vision algorithms are likely to fail are discussed. Then, simple but reliable outputs from the sensor nodes are exploited to improve the accuracy of the vision system. In particular, two case studies are reported: the first uses the presence detection of sensors to disambiguate between an open door and a moving person, while the second handles motion direction changes during occlusions. Preliminary results are reported and demonstrate the usefulness of the integration of the two subsystems.

C. Grana; D. Borghesani; R. Cucchiara ( 2007 ) - Video Shots Comparison using the Mallows Distance ( 1st International Workshop on Multimedia Data Mining and Management - Regensburg, Germany - Sep 3) ( - Eighteenth International Workshop on Database and Expert Systems Applications ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 49 a 53 ISBN: 9780769529325 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we focus on two aspects of the comparison of video shots. We present a new approach to extract a variable number of key frames from a shot, by the use of a hierarchical clustering with automatic level selection, in order to provide optimal allocation of features on different parts of the shot. We then employ the Mallows distance as an effective technique to compare the discrete distributions of features, independently from the features selected for the specific application. Results and comparisons on a soccer documentary video are provided.

Rita Cucchiara; Roberto Vezzani ( 2007 ) - VidiVideo [Altro (298) - Partecipazione a progetti di ricerca]
Abstract

Roberto Vezzani; Rita Cucchiara ( 2007 ) - Visor: Video Surveillance Online Repository ( BMVA symposium on Security and surveillance: performance evaluation - London, UK - 13 december 2007) ( - Proceedings of BMVA symposium on Security and surveillance: performance evaluation ) (- - ) - n. volume 1 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Aim of the Visor Project [1] is to gather and makefreely available a repository of surveillance andvideo footages for the research community onpattern recognition and multimedia retrieval. Thegoal is to create an open forum and a free repositoryto exchange, compare and discuss results of manyproblems in video surveillance and retrieval.Together with the videos, the repository containsmetadata annotation, both manually annotated asground-truth and automatically obtained by videosurveillance systems. Annotation refers to a largeontology of concepts on surveillance and securityrelated objects and events. The ontology has beendefined including concepts from LSCOM andMediaMill ontologies. As well as videos andannotations, Visor provides tools for enriching theontology, annotating new videos, searching bytextual queries, composing and downloading videos.

R. Cucchiara; C. Grana; A. Prati; R. Vezzani ( 2006 ) - A Distributed Domotic Surveillance System ( - Intelligent Distributed Video Surveillance Systems ) (IEE Press LONDON GBR ) - pp. da 91 a 117 ISBN: 9780863415043 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

Distributed video surveillance has a direct application in intelligent home automation or domotics (from the Latin word domus, that means “home”, and informatics); in particular, in-house videosurveillance can provide good support for people with some difficulties (e.g., elderly or disabled people) living alone and with a limited autonomy. New hardware technologies for surveillance are now affordable and provide high reliability. Problems related to reliable software solutions are not completely solved, especially concerning the application of general-purpose computer vision techniques in indoor environments. Indeed, assuming the objective is to detect the presence of people, track them, and recognize dangerous behaviours by means of abrupt changes in their posture, robust techniques must cope with non-trivial difficulties. In particular, luminance changes and shadows must be taken into account, frequent posture changes must be faced, and large and long-lasting occlusions are common due to the vicinity of the cameras and the presence of furnitureand doors that can often hide parts of the person’s body. These problems are analyzed and solutions based on background suppression, appearance-based probabilistic tracking, and probabilistic reasoning for posture recognition are described.

L. BERTELLI; R. CUCCHIARA; G. PATERNOSTRO; A. PRATI ( 2006 ) - A semi-automatic system for segmentation of cardiac M-mode images - PATTERN ANALYSIS AND APPLICATIONS - n. volume 9 (4) - pp. da 293 a 306 ISSN: 1433-7541 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Pixel classifiers are often adopted in pattern recognition as a suitable method for image segmentation. A common approach to the performance evaluation of classifier systems is based on the measurement of the classification errors and, at the same time, on the computational time. In general, multiclassifiers have proven to be more precise in the classification in many applications, but at the cost of a higher computational load. This paper analyzes different classifiers and proposes an evaluation of the classifiers in the case of semi-automatic processes with human interaction. Medical imaging is a typical application, where automatic or semi-automatic segmentation can be a valuable support to the diagnosis. The paper focuses on the segmentation of cardiac images of fruit flies (genetic model for analyzing human heart's diseases). Analysis is based on M-modes, that are gray-level images derived from mono-dimensional projections of the video frames on a line. Segmentation of the M-mode images is provided by classifiers and integrated in a multiclassifier. A neural network classifier, a Bayesian classifier, and a classifier based on hidden Markov chains are joined by means of a Behavior Knowledge Space fusion rule. The comparative evaluation is discussed in terms of both accuracy and required time, in which the time to correct the classifier errors by means of human intervention is also taken into account.

R. Cucchiara; C. Grana; D. Bulgarelli; R. Vezzani ( 2006 ) - A semi-automatic video annotation tool with MPEG-7 content collections ( Eighth IEEE International Symposium on Multimedia - San Diego, CA, USA - Dec 11-13) ( - Eight IEEE International Symposium on Multimedia ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 742 a 745 ISBN: 9780769527468 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we present a general purpose system for hierarchical structural segmentation and automatic annotation of video clips, by means of standardized low level features. We propose to automatically extract some prototypes for each class with a context based intra-class clustering. Clips are annotated following the MPEG-7 standard directives to provide easier portability. Results of automatic annotation and semiautomatic metadata creation are provided

R. Cucchiara; A. Prati; R. Vezzani ( 2006 ) - A system for automatic face obscuration for privacy purposes - PATTERN RECOGNITION LETTERS - n. volume 27 (15) - pp. da 1809 a 1815 ISSN: 0167-8655 [Articolo in rivista (262) - Articolo su rivista]
Abstract

This work proposes a method for automatic face obscuration capable of protecting people's identity. Since face detection heavily benefits from the possibility to exploit tracking, multi-camera people tracking has been integrated with a face detector based on colour clustering and Hough transform. Moreover, the multiple viewpoints provided by multiple cameras are exploited in order to always obtain a good-quality image of the face. The identity of people in different views is kept consistent by means of a geometrical, uncalibrated approach based on homographies. Experimental results show the accuracy of the proposed approach. (c) 2006 Elsevier B.V. All rights reserved.

R. CUCCHIARA; A. PRATI; R. VEZZANI ( 2006 ) - Advanced video surveillance with pan tilt zoom cameras ( Workshop on Visual Surveillance (VS) - Graz, Austria - 13 May 2006) ( - Proceeding of VS2006 ) (Faculty of Computing, Information Systems and Mathematics, Kingston University Kingston upon Thames, Surrey GBR ) - n. volume 1 - pp. da 49 a 56 ISBN: 00955300304 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper an advanced video surveillance system is proposed.Our goal is the detection of the people’s heads toallow their obscuration for privacy issues or to performrecognition tasks. We propose a system based on active PTZ(Pan-Tilt-Zoom) cameras that produce head images havinga large enough size, and can cover an area larger than stillcameras. Since conventional approaches are not suitable toPTZ cameras, the proposed approach is based on the socalleddirection histograms to compute the ego-motion andon frame differencing for detecting moving objects. It exploitspost-processing and active contours to extract preciseshape of moving objects to be fed to a probabilistic algorithmto track moving people in the scene. Person following,instead, is based on simple heuristic rules that movethe camera as soon as the selected person is close to theborder of the field of view. Finally, a color and shape basedhead detection that takes advantage of the people trackingis presented. Experimental results on a live active camerademonstrate the feasibility of real-time person followingand of the consecutive head detection phase.

R. Melli; C. Grana; R. Cucchiara ( 2006 ) - Comparison of color clustering algorithms for segmentation of dermatological images ( Medical Imaging 2006: Image Processing - San Diego, California, USA - Feb 13-16) ( - Medical Imaging 2006: Image Processing ) (SPIE Bellingham, WA USA ) - n. volume 6144 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Automatic segmentation of skin lesions in clinical images is a very challenging task; it is necessary for visual analysis of the edges, shape and colors of the lesions to support the melanoma diagnosis, but, at the same time, it is cumbersome since lesions (both naevi and melanomas) do not have regular shape, uniform color, or univocal structure. Most of the approaches adopt unsupervised color clustering. This works compares the most spread color clustering algorithms, namely median cut, k-means, fuzzy-c means and mean shift applied to a method for automatic border extraction, providing an evaluation of the upper bound in accuracy that can be reached with these approaches. Different tests have been performed to examine the influence of the choice of the parameter settings with respect to the performances of the algorithms. Then a new supervised learning phase is proposed to select the best number of clusters and to segment the lesion automatically. Examples have been carried out in a large database of medical images, manually segmented by dermatologists. From these experiments mean shift was resulted the best technique, in term of sensitivity and specificity. Finally, a qualitative evaluation of the goodness of segmentation has been validated by the human experts too, confirming the results of the quantitative comparison.

C. Grana; G. Pellacani; S. Seidenari; R. Cucchiara ( 2006 ) - Distance transform for automatic dermatologic images composition ( Medical Imaging 2006: Image Processing - San Diego, California, USA - Feb 13-16) ( - Medical Imaging 2006: Image Processing ) (SPIE Bellingham, WA USA ) - n. volume 6144 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we focus on the problem of automatically registering dermatological images, because even if different products are available, most of them share the problem of a limited field of view on the skin. A possible solution is then the composition of multiple takes of the same lesion with digital software, such as that for panorama images creation.In this work, to perform an automatic selection of matching points the Harris Corner Detector is used, and to cope with outlier couples we employed the RANSAC method. Projective mapping is then used to match the two images. Given a set of correspondence points, Singular Value Decomposition was used to compute the transform parameters.At this point the two images need to be blended together. One initial assumption is often implicitly made: the aim is to merge two rectangular images. But when merging occurs between more than two images iteratively, this assumption will fail. To cope with differently shaped images, we employed the Distance Transform and provided a weighted merging of images. Different tests were conducted with dermatological images, both with standard rectangular frame and with not typical shapes, as for example a ring due to the objective and lens selection. The successive composition of different circular images with other blending functions, such as the Hat function, doesn’t correctly get rid of the border and residuals of the circular mask are still visible. By applying Distance Transform blending, the result produced is insensitive of the outer shape of the image.

A. HAKEEM; R. VEZZANI; S. SHAH; R. CUCCHIARA ( 2006 ) - Estimating Geospatial Trajectory of a Moving Camera ( ICPR 2006 - Hong Kong - 20-24 Aug) ( - Proc. of International Conference on Pattern Recognition (ICPR 2006) ) (IEEE Computer Society Los Alamitos, California USA ) - n. volume 2 - pp. da 82 a 87 ISBN: 9780769525211 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a novel method for estimating thegeospatial trajectory of a moving camera. The proposedmethod uses a set of reference images with known GPS(global positioning system) locations to recover the trajectoryof a moving camera using geometric constraints. Theproposed method has three main steps. First, scale invariantfeatures transform (SIFT) are detected and matched betweenthe reference images and the video frames to calculatea weighted adjacency matrix (WAM) based on the numberof SIFT matches. Second, using the estimated WAM, themaximum matching reference image is selected for the currentvideo frame, which is then used to estimate the relativeposition (rotation and translation) of the video frame usingthe fundamental matrix constraint. The relative position isrecovered upto a scale factor and a triangulation amongthe video frame and two reference images is performed toresolve the scale ambiguity. Third, an outlier rejection andtrajectory smoothing (using b-spline) post processing stepis employed. This is because the estimated camera locationsmay be noisy due to bad point correspondence or degenerateestimates of fundamental matrices. Results of recoveringcamera trajectory are reported for real sequences.

E. Perini; S. Soria; A. Prati; R. Cucchiara ( 2006 ) - FaceMouse: a Human-Computer Interface for Tetraplegic People ( Intern. Workshop on Human-Computer Interaction (HCI) - Graz (Austria) - May 7) ( - Proc. of Intern. Workshop on Human-Computer Interaction (HCI) ) (Springer Washington, DC USA ) - pp. da 99 a 108 ISBN: 03029743 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a new human-machine interface particularly conceived for people with severe disabilities (specifically tetraplegic people), that allows them to interact with the computer for their everyday life by means of mouse pointer. In this system, called FaceMouse, instead of classical pointer paradigm that requires the user to look at the point where to move, we propose to use a paradigm called derivative paradigm, where the user does not indicate the precise position, but the direction along which the mouse pointer must be moved. The proposed system is composed of a common, lowcost webcam, and by a set of computer vision techniques developed to identify the parts of the user's face (the only body part that a tetraplegic person can move) and exploit them for moving the pointer. Specifically, the implemented algorithm is based on template matching to track the nose of the user and on cross-correlation to calculate the best match. Finally, several real applications of the system are described and experimental results carried out by disabled people are reported.

A. PRATI; F. SEGHEDONI; R. CUCCHIARA ( 2006 ) - Fast Dynamic Mosaicing and Person Following ( Proc. of International Conference on Pattern Recognition - Hong Kong - 20-24 August 2006) ( - Proceedings of ICPR 2006 ) (IEEE Computer Society Los Alamitos, California USA ) - n. volume 4 - pp. da 920 a 923 ISBN: 9780769525211 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

A system for video surveillance purposes in wide areas based on active cameras, also capable to follow a person in the scene by keeping him framed, is presented. The proposed approach is based on the so-called direction histograms to compute the ego-motion and on frame differencing for detecting moving objects. It exploits post-processing and active contours to extract precise shape of moving objects to be fed to a probabilistic algorithm to track moving people in the scene. Person following, instead, is based on simple heuristic rules that move the camera as soon as the selected person is close to the border of the field of view. Experimental results on a live active camera demonstrate the feasibility of real-time person following.

R. Cucchiara; A. Prati; C. Grana; R. Vezzani ( 2006 ) - FREE Surveillance in a pRivacy respectFul way [Altro (298) - Partecipazione a progetti di ricerca]
Abstract

The FREE SURF project aims at proposing new technologies for the next generations of video surveillance systems oriented to the automatic real-time control of the presence and actions undertaken by people in the environment, without the direct control of a human operator. The FREE SURF project is born with a twofold aim: first, innovative scientific research in the field of Computer Vision and Pattern Recognition, second, innovative applied research for the development of new generations of video surveillance systems, both effective and socially acceptable with respect to privacy concerns.The first objective is to conduct a thoughtful research activity in the field of Computer Engineering for video surveillance of people in "structural constraint FREE" systems, that is in systems free from structural and environmental constraints. The automatic visual control of human presence and actions in a given environment is, indeed, one of the most studied problems in the last decade. Nowadays, a very large literature exists, which presents algorithms and robust implementations for the recognition of single persons, in structured environments: closed environments with controlled illumination, open environments with large field of view (in order to consider people as small rigid moving objects), with few people, with only partially occluded fields of view, controlled by fixed cameras (to segment objects as different from the background), and installed with a precise manual calibration (for an exact 3D reconstruction).The final objective of the project is to study innovative methodologies and techniques for going further on: the final targets are environments free from structural constraints, in scenes with more people that live together and interact each other, as in parks or tourist areas. The foreseen activities are devoted to the study of new ways to extract visual data, from distributed camera systems, from hybrid systems with active cameras, capable to automatically move toward a target, from moving cameras, and coordinated with networks of sensors. New algorithms will be studied and working prototypes developed for people segmentation and tracking in videos acquired by multiple auto-calibrated cameras, by exploiting geometrical information and appearance (color and texture). Approaches for active camera control and mosaicing of the scene from moving cameras will be studied. Moreover, mobile agents systems will be studied to coordinate cameras and sensor networks in large scenes like archaeological sites. These techniques will all implemented in separated modules by each RU, but they will be coordinated in a single architecture to provide a common interface for the reasoning modules.All the previous modules have the common objective to extract visual data on the people in the scene. In particular, trajectory computation with invariants independent of the point of view, people posture analysis and soft biometries are the main data that will be extracted. Differently from projects dealing with biometric analysis, the FREE SURF project is oriented to the automatic visual analysis of the presence and behavior of people independently of their identities, which are not easy to assess in noisy, low-resolution videos with large filed of view, like those typical of distributed video surveillance systems. As a further support, hybrid system with PTZ and mobile cameras can provide, if needed, information with more details, which can be used in "posterity logging" by the experts. The visual data are provided to modules for dual activities: to monitor dangerous situations in real time, and to annotate interesting situations for future off-line queries. The first is a strategic tool to help the human operator in the prevention and fast responsiveness to facts regarding security, the second provides a valid support to investigations and a-posteriori analysis. These solutions may enable the many existing surveillance systems to provide effective su

S. CALDERARA; R. CUCCHIARA; A. PRATI ( 2006 ) - Group Detection at Camera Handoff for Collecting People Appearance in Multi-camera Systems ( Conference on Advanced Video and Signal-based Surveillance - Sydney, Australia - 22-24 November 2006) ( - Proceedings of AVSS 2006 ) (IEEE Computer Society Los Alamitos, California USA ) - pp. da 36 a 41 ISBN: 9780769526881 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Logging information on moving objects is crucial in video surveillance systems. Distributed multi-camera systems can provide the appearance of objects/people from different viewpoints and at different resolutions, allowing a more complete and precise logging of the information. This is achieved through consistent labeling to correlate collected information of the same person. This paper proposes a novel approach to consistent labeling also capable to fully characterize groups of people and to manage miss segmentations. The ground-plane homography and the epipolar geometry are automatically learned and exploited to warp objects' principal axes between overlapped cameras. A MAP estimator that exploits two contributions (forward and backward) is used to choose the most probable label configuration to be assigned at the handoff of a new object. Extensive experiments demonstrate the accuracy of the proposed method in detecting single and simultaneous handoffs, miss segmentations, and groups.

C. Grana; R. Cucchiara; G. Pellacani; S. Seidenari ( 2006 ) - Line Detection and Texture Characterization of Network Patterns ( International Conference on Pattern Recognition - Hong Kong - Aug 20-24) ( - Proceedings of International Conference on Pattern Recognition ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 2 - pp. da 275 a 278 ISBN: 9780769525211 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describes a complete approach to detect, localize and describe network patterns. Such texture is automatically detected with Gaussian derivative kernels and Fisher linear discriminant analysis; line closure and thinning is provided by morphological masking and line luminance profile fitting provides width estimation. Detection results on dermatological images are reported and discussed.

G. GUALDI; R. CUCCHIARA; A. PRATI ( 2006 ) - Low-latency Live Video Streaming over Low-Capacity Networks ( International Symposium on Multimedia - San Diego, CA (USA) - 11-13 December 2006) ( - Proceedings of ISM 2006 ) (IEEE Computer Society Washington, DC, USA USA ) - pp. da 449 a 456 ISBN: 9780769527468 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents an effective system for streaming over low-capacity networks (such as GPRS and EGPRS) of live videos with low latency. Existing solutions are either too complex or not suitable to our scope. For this reason, we developed a complete, ready-to-use streaming system based on H.264/AVC codec and UDP/IP stack. The system employs adaptive controls to achieve the best tradeoff between low latency and good video fluency, by keeping the UDP buffer occupancy at the decoder side between two given levels. Our experiments demonstrate that this system is able to transmit live videos at CIF format and 10 fps over GPRS/EGPRS with very low latency (1.73 sec on average, basically due to the network delay), good fluency and average quality, measured with PSNR, of 31 dB on GPRS at 23 kbps at 10 fps.

M. Bertini; A. Del Bimbo; C. Torniai; C. Grana; R. Cucchiara ( 2006 ) - MOM: multimedia ontology manager. A framework for automatic annotation and semantic retrieval of video sequences ( 14th ACM International Conference on Multimedia (ACM Multimedia 2006) - Santa Barbara, CA, USA - Oct 23-27) ( - Proceedings of the 14th ACM International Conference on Multimedia (ACM Multimedia 2006) ) (ACM New York USA ) - pp. da 787 a 788 ISBN: 1595934472 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Effective usage of multimedia digital libraries has to deal with the problem of building efficient content annotation and retrieval tools. MOM (Multimedia Ontology Manager) is a complete system that allows the creation of multimedia ontologies, supports automatic annotation and creation of extended text (and audio) commentaries of video sequences, and permits complex queries by reasoning on the ontology.

C. Grana; R. Vezzani; D. Bulgarelli; R. Cucchiara ( 2006 ) - MPEG-7 Pictorially Enriched Ontologies for Video Annotation ( Seconda Conferenza Italiana sui Sistemi Intelligenti - Ancona, Italy - Sep 27-29) ( - Atti della Seconda Conferenza Italiana sui Sistemi Intelligenti ) (- Ancona ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

A system for the automatic creation of Pictorially Enriched Ontologies is presented, that is ontologies for context-based video digital libraries, enriched by pictorial concepts for video annotation, summarization and similarity-based retrieval. Extraction of pictorial concepts with video clips clustering, ontology storing with MPEG-7, and the use of the ontology for stored video annotation are described. Re-sults on sport videos and TRECVID2005 video material are reported.

Calderara, S.; Cucchiara, R.; Prati, A. ( 2006 ) - Multimedia Surveillance: Content-based Retrieval with Multicamera People Tracking ( 4th ACM international workshop on Video surveillance and sensor networks - Santa Barbara (CA) - 27 October 2006) ( - Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks ) (ACM New York (NY) USA ) - pp. da 95 a 100 ISBN: 9781595934963; 9781604232486 | 9781604232486 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Multimedia surveillance relates to the exploitation of multimedia tools for retrieving information from surveillance data, for emerging applications such as video post-analysis for forensic purposes. Searching for all the sequences in which a certain person was present is a typical query that is carried out by means of example images. Unfortunately, surveillance cameras often have low resolution, making retrieval based on appearance difficult. This paper proposes to exploit a two-step retrieval process that merges similarity-based retrieval with multicamera tracking-based retrieval able to create consistent traces of a person from different views and, thus, different resolutions. A mixture model is used to summarize these traces into a single prototype on which retrieval is performed. Experimental results demonstrate the accuracy of the retrieval process also in the case of varying illumination conditions.

C. Grana; R. Vezzani; D. Bulgarelli; G. Gualdi; R. Cucchiara; M. Bertini; C. Torniai; A. Del Bimbo ( 2006 ) - PEANO: Pictorial Enriched Annotation of Video ( 14th ACM International Conference on Multimedia (ACM Multimedia 2006) - Santa Barbara USA - Oct 23-27) ( - Proceedings of the 14th ACM International Conference on Multimedia (ACM Multimedia 2006) ) (ACM New York USA ) - pp. da 793 a 794 ISBN: 1595934472 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this DEMO, we present a tool set for video digital library management that allows i) structural annotation of edited videos in MPEG-7 by automatically extracting shots and clips; ii) automatic semantic annotation based on perceptual similarity against a taxonomy enriched with pictorial concepts iii) video clip access and hierarchical summarization with stand-alone and web interface iv) access to clips from mobile platform in GPRS-UMTS videostreaming. The tools can be applied in different domain-specific Video Digital Libraries. The main novelty is the possibility to enrich the annotation with pictorial concepts that are added to a textual taxonomy in order to make the automatic annotation process more fast and often effective. The resulting multimedia ontology is described in the MPEG-7 framework. The PEANO (Perceptual Annotation of Video) tool has been tested over video art, sport (Soccer, Olimpic Games 2006, Formula 1) and news clips.

C. Grana; R. Cucchiara ( 2006 ) - Performance of the MPEG-7 Shape Spectrum Descriptor for 3D objects retrieval ( Second Italian Research Conference on Digital Library Management Systems - Padova - Jan 27) ( - Second Italian Research Conference on Digital Library Management Systems ) (ISTI-CNR Pisa ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we describe in detail the MPEG-7 Shape Spectrum Descriptor and provide a set of tests with different 3D objects databases. To verify if the literature reported low performance of this descriptor were due to the comparison employed, we also used the Earth Movers Distance which allows much more detailed histograms comparisons. Finally we compare our outcomes with the best results in related work.

Calderara, S.; Melli, R.; Prati, A. Cucchiara, R. ( 2006 ) - Reliable background suppression for complex scenes ( 4th ACM international workshop on Video surveillance and sensor networks - Santa Barbara (CA) - 27 October 2006) ( - Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks ) (ACM New York (NY) USA ) - pp. da 211 a 214 ISBN: 9781595934963; 9781604232486 | 9781604232486 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper describes a system for motion detection based on background suppression,specifically conceived for working in complex scenes with vacillating background,camouflage, illumination changing, etc.. The system contains proper techniques for background bootstrapping, shadow removal, ghost suppression and selective updating of the background model. The results on the challenging videos provided in VSSN '06 Open Source Algorithm Competition dataset demonstrate that the proposed system outperforms the widely-used mixture-of-Gaussians approach.

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2006 ) - Semantic adaptation of sport videos with user-centred performance analysis - IEEE TRANSACTIONS ON MULTIMEDIA - n. volume 8 (3) - pp. da 433 a 443 ISSN: 1520-9210 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In semantic video adaptation measures of performance must consider the impact of the errors in the automatic annotation over the adaptation in relationship with the preferences and expectations of the user. In this paper, we define two new performance measures Viewing Quality Loss and Bit-rate Cost Increase, that are obtained from classical peak signal-to-noise ration (PSNR) and bit rate, and relate the results of semantic adaptation to the errors in the annotation of events and objects and the user's preferences and expectations. We present and discuss results obtained with a system that performs automatic annotation of soccer sport video highlights and applies different coding strategies to different parts of the video according to their relative importance for the end user. With reference to this framework, we analyze how highlights' statistics and the errors of the annotation engine influence the performance of semantic adaptation and reflect into the quality of the video displayed at the user's client and the increase of transmission costs.

M. Bertini; R. Cucchiara; A. Del Bimbo; A. Prati ( 2006 ) - Semantic Annotation and Adaptation of Live Sports Videos ( Second Italian Research Conference on Digital Library Management Systems - Padova, Italy - 27 January 2006) ( - Proceedings of IRCDL 2006 ) (- - ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper addresses multimedia tools for universal multimedia access to sports videos by means of automatic annotation and content-based adaptation. The goal is to provide boosting technologies to allow the new generations of mobile devices (phones and PDAs) to better exploit the available bandwidth and to achieve a reasonable cost/quality trade-off in remote access to long-lasting live events, such as sport competitions. Although the available bandwidth for mobile communication has increased thanks to new telecommunication standards such as GPRSand UMTS, it is still insufficient for high quality video transmission. The limited resources of low-cost terminals and the high costs of data transfer hinder de-facto many possible multimedia services. First, the quality is limited by the small display size and memory available on many mobile devices. Second, the limited bandwidthmay affect user satisfaction either because of the time spent waiting for the download or the latency in streaming a live video. Moreover, even if the user is willing to wait for the download or accepts frame dropping, a reduction of data to send would be unavoidable in order to bring down the costs of the service. As a matter of fact, most telecommunication companies charge a fee proportional to the number of bytes transferred. Hence, the cost of accessing a long-lasting live video, such as a 90-minute soccer competition, is stilltoo high for most users.

C. Grana; R. Cucchiara ( 2006 ) - Sub-Shot Summarization for MPEG-7 based Fast Browsing ( Second Italian Research Conference on Digital Library Management Systems - Padova - Jan 27) ( - Proceedings of the Second Italian Research Conference on Digital Library Management Systems ) (ISTI-CNR Pisa ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we propose a system for automatic video summarization at sub-shot level. Our work covers two main aspects: the first is the sub-shot detection, which is performed without a priori constraints on the number or length of the shots. The algorithm is based on color histograms and motion features, and employs fuzzy c-means with variable number of clusters. The second aspect is an in depth discussion on the annotation of summaries with the MPEG-7 standard. Results on mixed genres TV material, from TRECVID videos, are reported.

S. CALDERARA; R. CUCCHIARA; A. PRATI ( 2006 ) - The LAICA project: Experiments on Multicamera People Tracking and Logging ( Conferenza Italiana Sistemi Intelligenti - Ancona, Italy - 27-29 September 2006) ( - Atti di CISI 2006 ) (- - ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Logging information on moving objects is crucial in video surveillance systems. Distributed multi-camera systems can provide the appearance of objects/people from differentviewpoints and at different resolutions, allowing a more complete and precise logging of the information. This is achieved through consistent labeling to correlate collected information of the same person. This paper proposes a novel approach to consistent labeling also capable tofully characterize groups of people and to manage miss segmentations. The ground-plane homography and the epipolar geometry are automatically learned and exploited to warp objects’ principal axes between overlapped cameras. A MAP estimator that exploits two contributions (forward and backward) is used to choose the most probable label con£guration to be assigned at the handoff of a new object. Extensive experiments demonstrate the accuracy of the proposed method in detecting single and simultaneous handoffs, miss segmentations, and groups.

C. Grana; R. Vezzani; R. Cucchiara ( 2006 ) - University of Modena and Reggio Emilia at TRECVID 2006 ( 2006 TREC Video Retrieval Evaluation - Gaithersburg, MD, USA - Nov 13-14) ( - 2006 TREC Video Retrieval Evaluation Notebook Papers and Slides ) (NIST Gaithersburg, MD USA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

What approach or combination of approaches did you test in each of your submitted runs?TRECVID2005_UNIMORE_??.xml: the same linear transition detector (LTD) was tested forevery run, with ten uniformly spaced thresholds for the detection.What if any significant differences (in terms of what measures) did you find among theruns?The system behaved as expected: the higher the threshold the better the recall. Of course theprecision lowered correspondently. Interesting enough, it seems that we cannot overcome theoverall limit around 80% for recall and 88% for precision, independently of the other parameter.Based on the results, can you estimate the relative contribution of each component of yoursystem/approach to its effectiveness?One of the main objective of our system was to test the performance of a single algorithm forboth cuts and gradual transitions. So all the merit and the demerits are related to our LTD.Overall, what did you learn about runs/approaches and the research question(s) thatmotivated them?The use of a single algorithm allows the system to be run without training. Just a singleparameter may be employed to tune the sensibility of the system, thus allowing its use in generalpurpose/user friendly systems.

C. Grana; D. Bulgarelli; R. Cucchiara ( 2006 ) - Video Clip Clustering for Assisted Creation of MPEG-7 Pictorially Enriched Ontologies ( Second International Symposium on Communications, Control and Signal Processing - Marrakech, Marocco - Mar 13-15) ( - Proceedings of Second International Symposium on Communications, Control and Signal Processing ) (SuviSoft Oy Ltd. Tampere FIN ) - pp. da 904 a 907 ISBN: 9782908849172 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present a system for the assisted creation of Pictorially Enriched Ontologies, that is ontologies for context-based digital libraries enriched by pictorial concepts for video annotation, summarization and similarity based retrieval. Here we detail the approach for video clips clustering and pictorial concepts extraction together with the approach for storing the ontology within the MPEG-7 framework. The clustering is performed by Complete Link hierarchical clustering on color histograms and motion features. Results on Formula 1 TV material are reported.

R. VEZZANI; R. CUCCHIARA; A. MALIZIA; L. CINQUE ( 2006 ) - 3-D Virtual Environments on Mobile Devices for Remote Surveillance ( IEEE International Conference on Advanced Video and Signal-Based Surveillance 2006 - Sidney, Australia - 22-24 November 2006) ( - Proceedings of AVSS 2006 ) (IEEE Computer Society Washington, DC USA ) - n. volume 1 - pp. da 100 a 104 ISBN: 9780769526881 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a distributed videosurveillanceframework. Our end is the remote monitoringof the behavior of people moving in a scene exploitinga virtual reconstruction on low capabilitiesdevices, like PDAs and cell phones. The main noveltyof this system is the effective integration of the computervision and computer graphics modules. The first,using a probabilistic frameworks, can detect the position,the trajectory and the posture of peoples movingin the scene. The second exploits the new possibility ofboth standard 3D graphics libraries on mobile (namelyJSR184 and M3G graphic format) and new PDAsprocessing capability in order to reconstruct the remotesurveillance data in real-time.

R. Cucchiara; C. Grana; A. Prati; R. Vezzani ( 2005 ) - A computer vision system for in-house video surveillance - IEE PROCEEDINGS. VISION, IMAGE AND SIGNAL PROCESSING - n. volume 152 (2) - pp. da 242 a 249 ISSN: 1350-245X [Articolo in rivista (262) - Articolo su rivista]
Abstract

In-house video surveillance to control the safety of people living in domestic environments is considered. In this context, common problems and general purpose computer vision techniques are discussed and implemented in an integrated solution comprising a robust moving object detection module which is able to disregard shadows, a tracking module designed to handle large occlusions, and a posture detector. These factors, shadows, large occlusions and people's posture, are the key problems that are encountered with in-house surveillance systems, A distributed system with cameras installed in each room of a house can be used to provide full coverage of people's movements. Tracking is based on a probabilistic approach in which the appearance and probability of occlusions are computed for the current camera and warped in the next camera's view by positioning the cameras to disambiguate the occlusions. The application context is the emerging area of domotics (from the Latin word domus, meaning 'home', and informatics). In particular, indoor video surveillance, which makes it possible for elderly and disabled people to live with a sufficient degree of autonomy, via interaction with this new technology, which can be distributed in a house at affordable costs and with high reliability.

C. Grana; G. Tardini; R. Cucchiara ( 2005 ) - Adaptation and Annotation of Formula 1 Sport Videos ( First Italian Research Conference on Digital Library Management Systems - Padova - Jan 28) ( - Post-proceedings of the First Italian Research Conference on Digital Library Management Systems ) (ISTI-CNR Pisa ITA ) - pp. da 85 a 90 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we approach the problem of detecting editing features suitable for video annotation, by paying attention to artifacts and effects introduced in video editing. In particular, a linear transition detection algorithm is presented, which can characterize the transition center and length with high precision. The technique works with sub-frame granularity and is able to include both abrupt cuts and longer dissolves in a single approach. Theoretical justification for the algorithm is provided with an optimization technique for real cases. We present results obtained exploiting the editing features on a Formula 1 video digital library, detecting replays and providing pre classification hints for automatic shot annotation.

R. CUCCHIARA; A. PRATI; R. VEZZANI ( 2005 ) - Ambient Intelligence for Security in Public Parks: the LAICA Project ( IEE International Symposium on Imaging for Crime Detection and Prevention 2005 - London, UK - 7-8 June 2005) ( - Proceedings of ICDP 2005 ) (Institution of Electrical Engineers London GBR ) - n. volume 1 - pp. da 139 a 144 ISBN: 9780863415357 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we address the exploitation of computervision techniques to develop multimedia services andautomatic monitoring systems related to the securityand the privacy in public areas. The research is part ofa two-year ltalian project called LAICA, intended toprovide advanced services for citizens and publicofficers. Citizens want fast and friendly web access topublic places, to see the environment in real-timewithout violating the privacy laws. Public officers andpolicy centres want a fast and reactive monitoringsystem, capable to automatically detect dangeroussituations, given the huge amount of cameras that cannot be monitored simultaneously by human operators.In this work, we describe the project and the definedmethodologies in multi-camera video mosaicing,people tracking and consistent labelling, and access toprocessed data with face obscuration.

R. CUCCHIARA; A. PRATI; C. OSTI; S. PAVANI ( 2005 ) - Ambient Intelligence in Urban Environments ( Nono Congresso della Associazione Italiana per l’Intelligenza Artificiale - Milano, Italy - 20 September 2005) ( - Atti del Nono Congresso della Associazione Italiana per l’Intelligenza Artificiale ) (Associazione Italiana per l'Intelligenza Artificiale - ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper reports advances achieved within a project called LAICA (Laboratorio di Ambient Intelligence per una Città Amica) on Ambient Intelligence in urban environments. The overall LAICA architecture is described and the unified operative centre developed by Regulus SpA (partner of the project) to collect and correlate data from different sensors and prototypes is depicted. Moreover, the paper describes the results obtained in developing a system for video surveillance in public parks, devoted to create a mosaic image of the scene and to extract and track moving people. Moreover, the system takes the privacy issues into account, proposing a method for face detection and tracking able to obscure faces in order to protect people’s identity.

M. Bertini; R. Cucchiara; A. Del Bimbo; A. Prati ( 2005 ) - An integrated framework for semantic annotation and adaptation - MULTIMEDIA TOOLS AND APPLICATIONS - n. volume 26 (3) - pp. da 345 a 363 ISSN: 1380-7501 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Tools for the interpretation of significant events from video and video clip adaptation can effectively support automatic extraction and distribution of relevant content from video streams. In fact, adaptation can adjust meaningful content, previously detected and extracted, to the user/client capabilities and requirements. The integration of these two functions is increasingly important, due to the growing demand of multimedia data from remote clients with limited resources (PDAs, HCCs, Smart phones). In this paper we propose an unified framework for event-based and object-based semantic extraction from video and semantic on-line adaptation. Two cases of application, highlight detection and recognition from soccer videos and people behavior detection in domotic* applications, are analyzed and discussed.

R. CUCCHIARA; R. VEZZANI ( 2005 ) - Assessing Temporal Coherence for Posture Classification with Large Occlusions ( IEEE Computer Society Workshop on Motion and Video Computing - Breckenridge, Colorado - 5-7 January 2005) ( - Proceedings of Motion 2005 ) (IEEE Computer Society Washington, DC USA ) - n. volume 2 - pp. da 269 a 274 ISBN: 07695227182 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present a people posture classificationapproach especially devoted to cope with occlusions. Inparticular, the approach aims at assessing temporal coherenceof visual data over probabilistic models. A mixed predictiveand probabilistic tracking is proposed: a probabilistictracking maintains along time the actual appearance ofdetected people and evaluates the occlusion probability; anadditional tracking with Kalman prediction improves the estimationof the people position inside the room. ProbabilisticProjection Maps (PPMs) created with a learning phaseare matched against the appearance mask of the track. Finally,an Hidden Markov Model formulation of the posturecorrects the frame-by-frame classification uncertainties andmakes the system reliable even in presence of occlusions.Results obtained over real indoor sequences are discussed.

R. CUCCHIARA; R. MELLI; A. PRATI ( 2005 ) - Auto-iris Compensation for Traffic Surveillance Systems ( IEEE Intelligent Transportation Systems Conference - Vienna, Austria - 13-15 September 2005) ( - Proceedings of ITSC 2005 ) (IEEE Piscataway, NJ, USA USA ) - pp. da 851 a 856 ISBN: 9780780392151 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper addresses auto-iris compensation. Auto-iris can be really troublesome for motion detection and tracking techniques based on background or frame differencing,since it can change quickly the average intensity of thecurrent frame. To cope with this, we introduced a two-step autoiris compensation approach in our traffic monitoring system. First, the auto-iris detection is based on the computation of the average of the luminance difference obtained by background suppression. Then, if an auto-iris is detected, the compensation phase is started. In this phase, the auto-iris’ behaviour is empirically modelled and, thus, compensated. Experimental results demonstrate the accuracy of the proposed approach, with both quantitative measures and visual analysis.

S. Calderara; A. Prati; R. Vezzani; R. Cucchiara ( 2005 ) - Consistent labeling for multi-camera object tracking ( 13th International Conference on Image Analysis and Processing - Cagliari, Italy - Sept. 6-8) ( - Image Analysis and Processing – ICIAP 2005 ) (Springer Heidelberg DEU ) - n. volume LNCS 3617 - pp. da 1206 a 1214 ISBN: 9783540288695 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present a new approach to multi-camera object tracking based on the consistent labeling. An automatic and reliable procedure allows to obtain the homographic transformation between two overlapped views, without any manual calibration of the cameras. Object's positions are matched by using the homography when the object is firstly detected in one of the two views. The approach has been tested also in the case of simultaneous transitions and in the case in which people are detected as a group during the transition. Promising results are reported over a real setup of overlapped cameras.

S. Calderara; R. Vezzani; A. Prati; R. Cucchiara ( 2005 ) - Entry Edge of Field of View for multi-camera tracking in distributed video surveillance ( IEEE International Conference on Advanced Video and Signal-Based Surveillance - Como, Italy - 15-16 September 2005) ( - Proceedings of IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS2005) ) (IEEE Computer Society - ) - n. volume 1 - pp. da 93 a 98 ISBN: 9780780393851 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Efficient solution to people tracking in distributed videosurveillance is requested to monitor crowded and large environments.This paper proposes a novel use of the EntryEdges of Field of View (E2oFoV) to solve the consistentlabeling problem between partially overlapped views. Anautomatic and reliable procedure allows to obtain the homographictransformation between two overlapped views,without any manual calibration of the cameras. Throughthe homography, the consistent labeling is established eachtime a new track is detected in one of the cameras. A CameraTransition Graph (CTG) is defined to speed up the establishmentprocess by reducing the search space. Experimentalresults prove the effectiveness of the proposed solutionalso in challenging conditions.

R. Cucchiara; A. Prati; R. Vezzani ( 2005 ) - Making the home safer and more secure through visual surveillance ( 5th International Conference on Methods and Techniques in Behavioral Research - Wageningen, The Netherlands - 30 August - 2 September 2005) ( - Proceedings of Measuring Behavior 2005 ) (Lucas P.J.J. Noldus, Fabrizio Grieco, Leanne W.S. Loijens, Patrick H. Zimmerman - ) - n. volume 1 - pp. da 172 a 175 ISBN: 9789074821711 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Video surveillance has a direct application in intelligent home automation or domotics (from the Latin word domus, that means “home”, and informatics). In particular, in-house video surveillance can provide good support for people with some difficulties (e.g. elderly or disabled people) living alone and with limited autonomy. A key aspect in video surveillance systems for domotics is that of analyzing behaviours of the monitored people. To accomplish this task, people must be detected and tracked, and their posture must be analyzed in order to model behaviours recognizing abrupt changes in it. Problems related to reliable software solutions are not completely solved, in particular luminance changes, shadows and frequent posture changes must be taken into account. Long-lasting occlusions are common due to the proximity of the cameras and the presence of furniture and doors that can often hide parts of a person’s body. For these reasons, a probabilistic and appearance-based tracking, particularly conceivable for people tracking and posture classification, has been developed. However, despite its effectiveness for long-lasting and large occlusions, this approach tends to fail whenever the person is monitored with multiple cameras and he appears in one of them already occluded. Different views provided by multiple cameras can be exploited to solve occlusions by warping known object appearance into the occluded view. To this aim, this paper describes an approach to posture classification based on projection histograms, reinforced by HMM for assuring temporal coherence of the posture.

C. Grana; G. Tardini; R. Cucchiara ( 2005 ) - MPEG-7 Compliant Shot Detection in Sport Videos ( Seventh IEEE International Symposium on Multimedia - Irvine - Dec 12-14) ( - Seventh IEEE International Symposium on Multimedia ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 395 a 402 ISBN: 9780769524894 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose a system for automatic detection of shots in sport videos. Our work covers two main aspects: the first is robust shot detection in presence of fast object motion and camera operations. To this aim we propose a new algorithm, unique for both cuts and linear transitions detection, which only needs the tuning of two parameters. An extended comparison with four transition detection algorithms, representing the state of the art in literature, is reported. Examples with formula 1, basket, soccer and cycling videos are analyzed. The second aspect is an in depth discussion on the annotation of shots and transitions with the MPEG-7 standard.

A. PRATI; R. CUCCHIARA ( 2005 ) - On the usefulness of object shape coding with MPEG-4 ( IEEE International Symposium on Multimedia - Irvine, CA, USA - 12-14 December 2005) ( - Proceedings of ISM 2005 ) (IEEE Computer Society Los Alamitos, California USA ) - pp. da 483 a 490 ISBN: 0 7695 2489 3 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper reports the results of an in-depth analysis ofthe degree of usefulness of object shape coding in videocompression. In particular, MPEG-4 is used as referencestandard. The influence of different coding parameters onthe performance is deeply examined and discussions on theresults are provided. Object shape coding is compared withclassical (MPEG-2) frame-based coding both at an objectivelevel (by comparing PSNR/quality and bitrate/filesize)and at a subjective level (asking to a set of users to expresstheir opinion on overall quality, cognitive effectiveness, andwillingness to pay). In conclusion, this paper aims at answering to the question whether it is convenient to use object shape coding instead of frame-based coding or not.

R. MELLI; R. CUCCHIARA; A. PRATI; L. DE COCK ( 2005 ) - Predictive and Probabilistic Tracking to Detect Stopped Vehicles ( IEEE Workshop on Applications of Computer Vision - Breckenridge, CO, USA - 5-7 January 2005) ( - Proceedings of WACV 2005 ) (IEEE Computer Society Los Alamitos, California USA ) - pp. da 388 a 393 ISBN: 0 7695 2271 8 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Many techniques and models have been proposed for vehicles surveillance in highways. In the past, tracking algorithms based on Kalman filter have been largely usedfor their efficiency in the prediction and low computationalcost. However, predictive filters can not solve long-lastingocclusions. In this paper, we propose a new mixed predictiveand probabilistic tracking that exploits the advantagesof predictive filters for moving vehicles and adopts probabilistic and appearance-based tracking for stopped vehicles. The proposed tracking is part of a complete videosurveillance system, oriented to control tunnels and highwaysfrom cluttered views, that is implemented in an embeddedDSP platform and provides background suppression,a novel shadow detection algorithm, tracking, and scenerecognition module. The experimental results are obtainedover several hours of videos acquired in pre-existing platforms of CCTV surveillance systems.

Cucchiara, Rita; Grana, Costantino; Prati, Andrea; Vezzani, Roberto ( 2005 ) - Probabilistic posture classification for human-behavior analysis - IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS - n. volume 35 (1) - pp. da 42 a 54 ISSN: 1083-4427 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Computer vision and ubiquitous multimedia access nowadays make feasible the development of a mostly automated system for human-behavior analysis. In this context, our proposal is to analyze human behaviors by classifying the posture of the monitored person and, consequently, detecting corresponding events and alarm situations, like a fall. To this aim, our approach can be divided in two phases: for each frame, the projection histograms (Haritaoglu et al., 1998) of each person are computed and compared with the probabilistic projection maps stored for each posture during the training phase; then, the obtained posture is further validated exploiting the information extracted by a tracking module in order to take into account the reliability of the classification of the first phase. Moreover, the tracking algorithm is used to handle occlusions, making the system particularly robust even in indoors environments. Extensive experimental results demonstrate a promising average accuracy of more than 95% in correctly classifying human postures, even in the case of challenging conditions.

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2005 ) - Real Time Semantic Adaptation of Sports Video with User-centred Performance Analysis ( International Workshop on Image Analysis for Multimedia Interactive Services - Montreux, Switzerland - 13-15 April 2005) ( - Proceedings of WIAMIS 2005 ) (IEE London GBR ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Semantic video adaptation improves traditional adaptation by taking into account the degree of relevance of the different portions of the content. It employs solutions to detect the significant parts of the video and applies different compression ratios to elements that have different importance. Performance of semantic adaptation heavily depends on the quality and precision of the automatic annotation, whether it operates in strict or nonstrict real time, and the codec which is used to perform adaptation at the event or object level. It should consider the effects of the errors in the automatic extraction of objects and events over the operation of the adaptation subsystem, and relate these effects to the preferences for the objects and events of the video program, that have been decided by the user. In this paper, we present strict real time annotation and adaptation of sports video and introduce two new performance measures: Viewing Quality Loss and Bit-rate Cost Increase, that are obtained from classical PSNR and Bit Ratio, but relate the results of semantic adaptation with the user’s preferences and expectations.

G. Tardini; C. Grana; R. Marchi; R. Cucchiara ( 2005 ) - Shot detection and motion analysis for automatic MPEG-7 annotation of sports videos ( 13th International Conference on Image Analysis and Processing - Cagliari, Italy - Sep 6-8) ( - Image Analysis and Processing – ICIAP 2005 ) (Springer Heidelberg DEU ) - n. volume LNCS 3617 - pp. da 653 a 660 ISBN: 9783540288695 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we describe general algorithms that are devised for MPEG-7 automatic annotation of Formula 1 videos, and in particular for camera-car shots detection. We employed a shot detection algorithm suitable for cuts and linear transitions detection, which is able to precisely detect both the transition's center and length. Statistical features based on MPEG motion compensation vectors arc then employed to provide motion characterization, using a subset of the motion types defined in MPEG-7, and shot type classification. Results on shot detection and classification are provided.

R. Cucchiara; C. Grana; G. Tardini ( 2005 ) - Shot Detection for Formula 1 Video Digital Libraries ( 7th International Workshop of the EU Network of Excellence DELOS on Audio-Visual Content and Information Visualization in Digital Libraries - Cortona (AR), Italy - May 4-6) ( - AVIVDiLib'05 Proceedings ) (Centromedia Capannori (Lucca) ITA ) - pp. da 131 a 140 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Metadata extraction is one of the first tasks to be performed for automatic Digital Library annotation, and in particular shot detection has been widely explored in literature. While a lot of methods have been proposed for the detection of abrupt cuts, only a small number of them has explicitly addressed the problem of gradual transitions. In this paper we propose an algorithm that exploits a precise model of linear transition. Experimental results on Formula 1 car races videos show the robustness of this method. These test videos are characterized by extreme situations such as fast camera and objects motion and very different kinds of shots. The algorithm is able to estimate the exact length of the transition and an error score is also given as a fitness measure to the linear model, to discriminate true transitions from false detections. The final shot segmentation is delivered as an MPEG7 compliant output.

R. CUCCHIARA; A. PRATI; L. BENINI; E. FARELLA ( 2005 ) - T_PARK: Ambient Intelligence for Security in Public Parks ( IEE International Workshop on Intelligent Environments, Special session on "Ambient Intelligence" - Colchester, UK - 28-29 June 2005) ( - Proceedings of IE 2005 ) (IEE London GBR ) - pp. da 243 a 251 ISBN: 0 86341 519 9 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, we present joint research activities in computer vision and sensor networks for a distributedsurveillance of urban parks. Distributed visual surveillance of urban environments is one of the most interesting scenarios in Ambient Intelligence; in addition, the automated monitoring of public parks, often crowded by children and aduits, is still a very difficult task due to the number of objects of interests. In this context, integrating the power of low cost sensors with the information provided by cameras can lead to a more reliable solution to people tracking in wide areas. Specifically, the deficiencies of one approach can be (at least partially) covered by the advantages of the other. The goal is to perform people tracking in parks (toachieve trackable parks - T-Parks), both in zones covered by overlapped cameras and afso, thanks to sensors, in areas not covered by any camera. In this paper, we propose a new technique for multi-camera people tracking based on a learning phase to automatically calibrate pairs of cameras and to build Areas of Field Views (AoFoVs) in order to establish consistent labelling of people. In addition, sensornetworks distributed at the borders of the AoFoV give an estimation of the probability of people overlapping, triggering specific algorithms of face detection or headcounting to identify the single person. The research ofT-Parks is part of a two-year Italian project called LAICA, intended to provide advanced services for citizens and public officers based on ambient intelligence technologies.

Y. Zhai; J. Liu; X. Cao; A. Basharat; A. Hakeem; S. Ali; M. Shah; C. Grana; R. Cucchiara ( 2005 ) - Video understanding and content-based retrieval ( 2005 TREC Video Retrieval Evaluation - Gaithersburg, MD - Nov 14-15) ( - 2005 TREC Video Retrieval Evaluation Notebook Papers and Slides ) (NIST Gaithersburg, MD USA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This year, the joint team of UCF and the University of Modenahas participated in the following tasks: (1) shot boundarydetection, (2) low-level feature extraction, (3) high-levelfeature extraction, (4) topic search and (5) BBC rushes management.The shot boundary detection was contributed bythe Image Lab at the University of Modena. The other taskswere performed by the Computer Vision Team at UCF.

R. Cucchiara; A. Prati; R. Vezzani ( 2004 ) - An Intelligent Surveillance System for Dangerous Situation Detection in Home Environments - INTELLIGENZA ARTIFICIALE - n. volume 1 (1) - pp. da 11 a 15 ISSN: 1724-8035 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we address the problem of human posture classification, in particular focusing to an indoor surveillance application. The approach was initially inspired to a previous works of Haritaoglou et al. [5] that uses histogram projections to classify people’s posture. Projection histograms are here exploited as the main feature for the posture classification, but, differently from [5], we propose a supervised statistical learning phase to create probability maps adopted as posture templates. Moreover, camera calibration and homography are included to solve perspective problems and to improve the precision of the classification. Furthermore, we make use of a finite state machine to detect dangerous situations as falls and to activate a suitable alarm generator. The system works on-line on standard workstations with network cameras.

G. Pellacani; C. Grana; R. Cucchiara; S. Seidenari ( 2004 ) - Automated extraction and description of dark areas in surface microscopy melanocytic lesion images - DERMATOLOGY - n. volume 208 (1) - pp. da 21 a 26 ISSN: 1018-8665 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Background: Identification of dark areas inside a melanocytic lesion (ML) is of great importance for melanoma diagnosis, both during clinical examination and employing programs for automated image analysis. Objective: The aim of our study was to compare two different methods for the automated identification and description of dark areas in epiluminescence microscopy images of MLs and to evaluate their diagnostic capability. Methods: Two methods for the automated extraction of ´absolute´ (ADAs) and ´relative´ dark areas (RDAs) and a set of parameters for their description were developed and tested on 339 images of MLs acquired by means of a polarized-light videomicroscope. Results: Significant differences in dark area distribution between melanomas and nevi were observed employing both methods, permitting a good discrimination of MLs (diagnostic accuracy = 74.6 and 71.2% for ADAs and RDAs, respectively). Conclusions: Both methods for the automated identification of dark areas are useful for melanoma diagnosis and can be implemented in programs for image analysis. Copyright

C. Grana; G. Pellacani; S. Seidenari; R. Cucchiara ( 2004 ) - Color Calibration for a Dermatological Video Camera System ( 17th International Conference on Pattern Recognition - Cambridge, UK - Aug 23-26) ( - Proceedings of the 17th International Conference on Pattern Recognition ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 3 - pp. da 798 a 801 ISBN: 9780769521282 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work, we describe a technique to calibrate images for skin analysis in dermatology. Using a common reference we correct non-uniform illumination effects, give an estimation of the gamma correction and produce a XYZ conversion matrix. The final result is then reverted to a non standard RGB color space, built from the instrument images. In this way different instruments behave uniformly allowing colorimetric characterization, while improving the results of common algorithms. The proposed techniques should be the initial support for a distributed framework where dermatological images can be consistently compared.

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2004 ) - Content-based Video Adaptation with User's Preference ( IEEE International Conference on Multimedia & Expo - Taipei, Taiwan - 27-30 June 2004) ( - Proceedings of ICME 2004 ) (IEEE Computer Society Los Alamitos, California USA ) - n. volume 3 - pp. da 1695 a 1698 ISBN: 0 7803 8603 5 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this papes we present an integrated system that hasbeen designed to support automatic semantic extraction ofhighlights in sports video and automatic video adaptationaccording to user’s preferences. To analyze the user’s satisfaction, we propose a new performance measure that explicitly takes into account the user’s preferences and considers the number and type of errors produced by the annotation engine and the way in which these errors affectthe compressed video quality and bandwidth allocation. Weprovide experimental results with application to soccer andswimming.

R. Cucchiara; A. Prati; C. Grana; R. Vezzani ( 2004 ) - DELOS: a Network of Excellence on Digital Libraries [Altro (298) - Partecipazione a progetti di ricerca]
Abstract

Digital libraries represent a new infrastructure and environment that has been made possible by the integration and use of a number of IC technologies, the availability of digital content on a global scale and a strong demand from users who are now online. They are destined to become an essential part of the information infrastructure in the 21st century.On the basis of these considerations, our 10-year grand vision for digital libraries is the following: digital libraries should enable any citizen to access all human knowledge any time and anywhere, in a friendly, multi-modal, efficient and effective way, by overcoming barriers of distance, language, and culture and by using multiple Internet-connected devices. The new generation digital libraries should not just be seen as static information repositories but as growing, interactively, and collaboratively used nuclei of what will be, at some stage, a good part of human knowledge that depends as much on information as on communication.The challenges and opportunities that motivate advanced digital library initiatives are associated with this view of the digital library environment.In recent years, a large number of digital library systems have been developed. However, each system is typically built from scratch and develops its own techniques, focusing on a specific type of information or services, and addressing the needs of a specific application domain. After this first experience, it has become clear that the future of digital libraries goes beyond what these initial efforts may indicate individually.It is time for generic digital library technology to be developed and incorporated into industrial-strength Digital Library Management Systems (DLMSs), offering advanced functionality through reliable and extensible services.The main objective of the DELOS network is thus to define and conduct a joint program of activities (JPA) in order to integrate and coordinate the ongoing research activities of the major European research teams in the field of digital libraries for the purpose of developing the next generation digital library technologies. The implementation of an integrated programme of this type will make the accomplishment of our grand vision for digital libraries feasible.Another main objective of the DELOS network is to integrate research activities carried out in a number of related fields crucial for the development of the next generation of digital libraries with ongoing research activities in the digital library field itself.

R. Cucchiara; D. Lovell; A. Prati; M.M. Trivedi ( 2004 ) - Introduction to the special section on in vehicle computer vision systems - IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY - n. volume 53 (6) - pp. da 1633 a 1635 ISSN: 0018-9545 [Articolo in rivista (262) - Articolo su rivista]
Abstract

-

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2004 ) - Neighbor cache prefetching for multimedia image and video processing - IEEE TRANSACTIONS ON MULTIMEDIA - n. volume 6 (4) - pp. da 539 a 552 ISSN: 1520-9210 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Cache performance is strongly influenced by the type of locality embodied in programs. In particular, multimedia programs handling images and videos are characterized by a bidimensional spatial locality, which is not adequately exploited by standard caches. In this paper we propose novel cache prefetching techniques for image data, called neighbor prefetching, able to improve exploitation of bidimensional spatial locality. A performance comparison is provided against other assessed prefetching techniques on a multimedia workload (with MPEG-2 and MPEG-4 decoding, image processing, and visual object segmentation), including a detailed evaluation of both the miss rate and the memory access time. Results prove that neighbor prefetching achieves a significant reduction in the time due to delayed memory cycles (more than 97% on MPEG-4 with respect to 75% of the second performing technique). This reduction leads to a substantial speedup on the overall memory access time (up to 140% for MPEG-4). Performance has been measured with the PRIMA trace-driven simulator, specifically devised to support cache prefetching.

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2004 ) - Object-based and Event-based Semantic Video Adaptation ( International Conference on Pattern Recognition - Cambridge, UK - 23-26 August 2004) ( - Proceedings of ICPR 2004 ) (IEEE Computer Society Los Alamitos, California USA ) - n. volume 4 - pp. da 987 a 990 ISBN: 0 7695 2128 2 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Semantic video adaptation allows to transmit video contentwith different viewing quality, depending on the relevanceof the content from the user’s viewpoint. To this end, an automatic annotation subsystem must be employed thatautomatically detect relevant objects and events in the videostream. In this paper we present a composite framework thatis made of an automatic annotation engine and a semantics-based adaptation module. Three new different compression solutions are proposed that work at the object or event level. Their performance is compared according to a new measure that takes into account the user’s satisfaction and the effects on it of the errors in the annotation module.

M. BERTINI; A. DEL BIMBO; A. PRATI; R. CUCCHIARA ( 2004 ) - Objects and Events Recognition for Sport Videos Transcoding ( 2nd International Symposium on Image/Video Communications over fixed and mobile - Brest, France - 7-9 July 2004) ( - Proceedings of ISIVC 2004 ) (École Nationale Supérieure des Télécommunications de Bretagne Brest FRA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

-

R. Cucchiara; C. Grana; G. Tardini; R. Vezzani ( 2004 ) - Probabilistic People Tracking for Occlusion Handling ( 17th International Conference on Pattern Recognition - Cambridge, UK - Aug 23-26) ( - Proceedings of the 17th International Conference on Pattern Recognition ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 1 - pp. da 132 a 135 ISBN: 9780769521282 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This work presents a novel people tracking approach, able to cope with frequent shape changes and large occlusions. In particular, the tracks are described by means of probabilistic masks and appearance models. Occlusions due to other tracks or due to background objects and false occlusions are discriminated. The tracking system is general enough to be applied with any motion segmentation module, it can track people interacting each other and it maintains the pixel assignment to track even with large occlusions. At the same time, the update model is very reactive, so as to cope with sudden body motion and silhouette's shape changes. Due to its robustness, it has been used in many experiments of people behavior control in indoor situations.

R. Cucchiara; A. Prati; R. Vezzani ( 2004 ) - Real-time motion segmentation from moving cameras - REAL-TIME IMAGING - n. volume 10 - pp. da 127 a 143 ISSN: 1077-2014 [Articolo in rivista (262) - Articolo su rivista]
Abstract

This paper describes our approach to real-time detection of camera motion and moving object segmentation in videos acquired from moving cameras. As far as we know, none of the proposals reported in the literature are able to meet real-time requirements. In this work, we present an approach based on a color segmentation followed by a region-merging on motion through Markov Random Fields (MRFs). The technique we propose is inspired to a work of Gelgon and Bouthemy (Pattern Recognition 33 (2000) 725-40), that has been modified to reduce computational cost in order to achieve a fast segmentation (about 10 frame per second). To this aim a modified region matching algorithm (namely Partitioned Region Matching) and an innovative arc-based MRF optimization algorithm with a suitable definition of the motion reliability are proposed. Results on both synthetic and real sequences are reported to confirm validity of our solution.

M. BERTINI; A. DEL BIMBO; A. PRATI; R. CUCCHIARA ( 2004 ) - Semantic Annotation and Transcoding for Sport Videos ( International Workshop on Image Analysis for Multimedia Interactive Services - Lisboa, Portugal - 21-23 April 2004) ( - Proceedings of WIAMIS 2004 ) (- Lisboa PRT ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Telecommunication companies are demonstrating interestin providing mobile video services. The availability of largerbandwidth, and the improvements in terms of resolution ofthe displays of third generation mobile phones, let telecomand content provider companies to provide new services totheir customers. Among these services users can watch acertain number of sport videos, usually a selection of thebest actions occurred during a play. In order to provide atimely and satisfying service to customers there is need oftools and systems that help to detect and recognize the interesting events, and optimize the use of bandwidth, coding these events and the most interesting objects within them at the best visual quality/bandwidth ratio.

M. BERTINI; A. DEL BIMBO; R. CUCCHIARA; A. PRATI ( 2004 ) - Semantic Annotation and Transcoding of Soccer Videos ( Asian Conference on Computer Vision - Jeju, Korea - 27-30 January 2004) ( - Proceedings of ACCV 2004 ) (- Jeju KOR ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

-

Cucchiara, Rita; Grana, Costantino; Prati, Andrea ( 2004 ) - Semantic Transcoding of Videos by using Adaptive Quantization - WANGJÌ WANGLÙ JÌSHÙ XUÉKAN - n. volume 5 - pp. da 31 a 39 ISSN: 1607-9264 [Articolo in rivista (262) - Articolo su rivista]
Abstract

This paper proposes the use of an approach of video transcoding driven by the video content and providedwith the adaptive quantization of MPEG standards.Computer vision techniques can extract semanticsfrom videos according with user's interests: the videosemantics is exploited to adapt the video in order tomeet the device's capabilities and the user'srequirements and preserve the best quality possible. Well assessed video analysis techniques are used to segment the video into objects grouped in classes ofrelevance to which the user can assign a weight proportional to their relevance. This weight is used todecide the quantization values to be applied in theMPEG-2 encoding to each macroblock. A modified version of the PSNR (Peak Signal-to-Noise Ratio) is used as performance metric and comparativeevaluation is reported with respect to other codingstandards such as JPEG, JPEG 2000, (basic) MPEG-2, and MPEG-4. Experimental results are provided on different situations, one indoor and oneoutdoor. Keywords:Videotranscoding, adaptive quantization, motion detection

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2004 ) - Semantic Video Adaptation based on Automatic Annotation of Sport Videos ( ACM SIGMM International Workshop on Multimedia Information Retrieval - New York, NY, USA - 15-16 October 2004) ( - Proceedings of MIR 2004 ) (ACM New York, NY, USA USA ) - pp. da 291 a 298 ISBN: 9781581139402 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Semantic video adaptation improves traditional adaptation by taking into account the degree of relevance of the different portions of the content. It employs solutions to detect the significant parts of the video and applies different compression ratios to elements that have different importance. Performance of semantic adaptation heavily depends on the precision of the automatic annotation andthe way of operation of the codec which is used to perform adaptation at the event or object level. In this paper, we discuss critical factors that affect performance of automatic annotation and define new performance measures of semantic adaptation, Viewing Quality Loss and Bitrate Cost Increase, that are obtained from classical PSNR and Bit Rate, but relate the results of semantic adaptation with the user’s preferences and expectations. The new measuresare discussed in detail for a system of sport annotation and adaptation with reference to different user profiles.

R. Cucchiara; C. Grana; G. Tardini ( 2004 ) - Track-based and object-based occlusion for people tracking refinement in indoor surveillance ( 2nd International Workshop on Video Surveillance & Sensor Networks - New York - Oct 15-16) ( - Proceedings of the ACM 2nd International Workshop on Video Surveillance & Sensor Networks ) (ACM New York USA ) - pp. da 81 a 87 ISBN: 9781581139341 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

People tracking deals with problems of shape changes, self-occlusions and track occlusions due to other interfering tracks and fixed objects that hide parts of the people shape. These problems are more critical in indoor surveillance and in particular in home automation settings, in which the need to merge information obtained form different cameras distributed around the house calls for the integration of reliable data obtained during time. Therefore, tracking algorithms should be carefully tuned to cope with occlusions and shape changes, working not only at pixel level but also at region level. In this work we provide a novel technique for object tracking, based on probabilistic masks and appearance models. Occlusions due to other tracks or due to background objects and false occlusions are discriminated. The classification of occluded regions of the track is exploited in a selective model update. The tracking system is general enough to be applied with any motion segmentation module, it can track people interacting each other and it maintains the pixel to track assignment even with large occlusions. At the same time, the model update is very reactive, so as to cope with sudden body motion and silhouette's shape changes. Due to its robustness, it has been used in different experiments of people behavior control in indoor situations.

R. Cucchiara; C. Grana; A. Prati; G. Tardini; R. Vezzani ( 2004 ) - Using computer vision techniques for dangerous situation detection in domotic applications ( IEE Symposium on Intelligent Distributed Surveillance Systems - Londra - Feb 23) ( - IEE Symposium on Intelligent Distributed Surveillance Systems ) (IEE Londra GBR ) - pp. da 1 a 5 ISBN: 9780863413926 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

We describe an integrated solution devised for inhouse video surveillance, to control the safety of people living in a domestic environment. The system is composed of robust moving object detection module, able to disregard shadows, a tracking module designed for large occlusion solution and of a posture detector. Shadows, large occlusions and deformable model of people are key features of inhouse surveillance. Moreover, the requirements of high speed reaction to dangerous situations and the need to implement a reliable and low cost televiewing system, led to the introduction of a new multimedia model of semantic transcoding, capable of supporting different user's requests and constraints of their devices (PDA, smart phones, ...). Our application context is the emerging area of domotics (from the Latin word domus that means "home" and informatics) and, in particular, indoor video surveillance of the house where people with some difficulties (elders and disabled people) can now live in a sufficient degree of autonomy, thanks to the strong interaction with the new technologies that can be distributed in the house with affordable costs and high reliability.

R. Cucchiara; C. Grana; A. Prati; R. Vezzani ( 2003 ) - A Hough transform-based method for radial lens distortion correction ( 12th International Conference on Image Analysis and Processing - Mantova, Italy - Sep 17-19) ( - Proceedings of the 12th International Conference on Image Analysis and Processing ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 182 a 187 ISBN: 9780769519487 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The paper presents an approach for a robust (semi-)automatic correction of radial lens distortion in images and videos. This method, based on the Hough transform, has the characteristics to be applicable also on videos from unknown cameras that, consequently, can not be a priori calibrated. We approximated the lens distortion by considering only the lower-order term of the radial distortion. Thus, the method relies on the assumption that pure radial distortion transforms straight lines into curves. The computation of the best value of the distortion parameter is performed in a multi-resolution way. The method precision depends on the scale of the multi-resolution and on the Hough space's resolution. Experiments are provided for both outdoor, uncalibrated camera and an indoor, calibrated one. The stability of the value found in different frames of the same video demonstrates the reliability of the proposed method.

C. Grana; G. Pellacani; R. Cucchiara; S. Seidenari ( 2003 ) - A new algorithm for border description of polarized light surface microscopic images of pigmented skin lesions - IEEE TRANSACTIONS ON MEDICAL IMAGING - n. volume 22 (8) - pp. da 959 a 964 ISSN: 0278-0062 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The aim of this study was to provide mathematical descriptors for the border of pigmented skin lesion images and to assess their efficacy for distinction among different lesion groups. New descriptors such as lesion slope and lesion slope regularity are introduced and mathematically defined. A new algorithm based on the Catmull-Rom spline method and the computation of the gray-level gradient of points extracted by interpolation of normal direction on spline points was employed. The efficacy of these new descriptors was tested on a data set of 510 pigmented skin lesions, composed by 85 melanomas and 425 nevi, by employing statistical methods for discrimination between the two populations.

R. Cucchiara; C. Grana; A. Prati; F. Vigetti; M. Piccardi ( 2003 ) - Camera-car Video Analysis for Steering Wheel's Tracking ( 1st International Workshop on In-Vehicle Cognitive Computer Vision Systems - Graz, Austria - Apr 3) ( - Proceedings of 1st International Workshop on In-Vehicle Cognitive Computer Vision Systems ) (- - ITA ) - pp. da 36 a 43 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Monitoring and controlling the driver’s guidance by analyzing the rotation impressed to the steering-wheel can be a very important task in order to improve safety. This paper proposes a general-purpose method to track the steering wheel’s absolute angle by using a single camera vision system mounted inside the car. The absolute angle is computed by means of the accumulation of inter-frame relative rotations and the error propagation is prevented with an alignment process. The approach is based on the modeling of the motion of the steering wheel, as it appears perspectivelydistorted by the point of view of the un-calibrated camera. We modified the Lucas-Kanade method for an approximatively rotational motion model in order to provide the detection and tracking of significant features on the wheel. The experimental results are compared with ground-truthed data obtained with different types of sensors.

R. Cucchiara; C. Grana; A. Prati; R. Vezzani ( 2003 ) - Computer Vision Techniques for PDA Accessibility of In-House Video Surveillance ( First ACM SIGMM international workshop on Video surveillance - Berkeley, California - Nov 2-8) ( - First ACM SIGMM international workshop on Video surveillance ) (ACM New York USA ) - pp. da 87 a 97 ISBN: 158113780X [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we propose an approach to indoor environment surveillance and, in particular, to people behaviour control in home automation context. The reference application is a silent and automatic control of the behaviour of people living alone in the house and specially conceived for people with limited autonomy (e.g., elders or disabled people). The aim is to detect dangerous events (such as a person falling down) and to react to these events by establishing a remote connection with low-performance clients, such as PDA (Personal Digital Assistant). To this aim, we propose an integrated server architecture, typically connected in intranet with network cameras, able to segment and track objects of interest; in the case of objects classified as people, the system must also evaluate the people posture and infer possible dangerous situations. Finally, the system is equipped with a specifically designed transcoding server to adapt the video content to PDA requirements (display area and bandwidth) and to the user's requests. The main issues of the proposal are a reliable real-time object detector and tracking module, a simple but effective posture classifier improved by a supervised learning phase, and an high performance transcoding inspired on MPEG-4 object-level standard, tailored to PDA. Results on different video sequences and performance analysis are discussed.

R. Cucchiara; C. Grana; M. Piccardi; A. Prati ( 2003 ) - Detecting moving objects, ghosts, and shadows in video streams - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 25 (10) - pp. da 1337 a 1342 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Background subtraction methods are widely exploited for moving object detection in videos in many applications, such as traffic monitoring, human motion capture, and video surveillance. How to correctly and efficiently model and update the background model and how to deal with shadows are two of the most distinguishing and challenging aspects of such approaches. This work proposes a general-purpose method that combines statistical assumptions with the object-level knowledge of moving objects, apparent objects (ghosts), and shadows acquired in the processing of the previous frames. Pixels belonging to moving objects, ghosts, and shadows are processed differently in order to supply an object-based selective update. The proposed approach exploits color information for both background subtraction and shadow detection to improve object segmentation and background update. The approach proves fast, flexible, and precise in terms of both pixel accuracy and reactivity to background changes.

A. PRATI; I. MIKIC; MM TRIVEDI; R. CUCCHIARA ( 2003 ) - Detecting moving shadows: Algorithms and evaluation - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 25 (7) - pp. da 918 a 923 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Moving shadows need careful consideration in the development of robust dynamic scene analysis systems. Moving shadow detection is critical for accurate object detection in video streams since shadow points are often misclassified as object points, causing errors in segmentation and tracking. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches is still lacking. In this paper, we present a comprehensive survey of moving shadow detection approaches. We organize contributions reported in the literature in four classes two of them are statistical and two are deterministic. We also present a comparative empirical evaluation of representative algorithms selected from these four classes. Novel quantitative (detection and discrimination rate) and qualitative metrics (scene and object independence, flexibility to shadow situations, and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences. These video sequences and associated ground-truth data are made available at http://cvrr.ucsd.edu/aton/shadow to allow for others in the community to experiment with new algorithms and metrics.

R. Cucchiara; A. Prati; R. Vezzani ( 2003 ) - Domotics for disability: smart surveillance and smart video server ( 8th Conference of the Italian Association of Artificial Intelligence - Pisa - 23-26 September) ( - Proceedings of the Workshop on Ambient Intelligence ) (- - ) - n. volume 1 - pp. da 46 a 57 ISBN: 9783540201199 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we address the problem of human posture classification, in particular focusing to an indoor surveillance application. The approach was initially inspired to a previous works of Haritaoglou et al. [6] that uses histogram projections to classify people’s posture. Projection histograms are here exploited as the main feature for the posture classification, but, differently from [6], we propose a supervised statistical learning phase to create probability maps adopted as posture templates. Moreover, camera calibration and homography is included to resolve prospective problems and improve the precision of classification. Furthermore, we make use of a finite state machineto detect dangerous situations as falls and to activate a suitable alarm generator. The system works on line on standard workstation with network cameras.

C. Grana; G. Pellacani; S. Seidenari; R. Cucchiara ( 2003 ) - Image Representation and Retrieval with Topological Trees ( Image: E-Learning, Understanding, Information Retrieval and Medical - Cagliari, Italy - Jun 9-10) ( - Image: E-Learning, Understanding, Information Retrieval and Medical Proceedings of the First International Workshop ) (World Scientific Publishing Co. Pte. Ltd. Singapore SGP ) - pp. da 112 a 122 ISBN: 9789812385871 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Typical processes of image representation comprehend initial region segmentation followed by a description of single regions’ feature and their relationships. Then a graph model can be exploited in order to integrate the knowledge of the specific regions (that are the attributed relational graph’s (ARG) nodes) and the regions’ relations (that are the ARG’s edges). In this work we use color features to guide region segmentation, geometric features to characterize regions one by one and topological features (and in particular inclusion) to describe regions’ relationships. Guided by the inclusion property we define the Topological Tree (TT) as an image representation model that exploiting the transitive property of inclusion, uses the adjacency and inclusion topological features. We propose an approach based on a recursive version of fuzzy c-means to construct the topological tree directly from the initial image, performing both segmentation and TT construction. The TT can be exploited in many applications of image analysis and image retrieval by similarity in those contexts where inclusion is a key feature: we propose an applicative case of analysis of dermatological images to support the melanoma diagnosis.In this paper describe details of the TT algorithm, including the management of not ideality and an approximate measure of tree similarity in order to retrieve skin lesion with a similar TT-based description.

R. Cucchiara; A. Prati; M. Piccardi ( 2003 ) - Improving data prefetching efficacy in multimedia applications - MULTIMEDIA TOOLS AND APPLICATIONS - n. volume 20 (3) - pp. da 159 a 178 ISSN: 1380-7501 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely-explored approach to improve cache performance is hardware prefetching, which allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches are unable to exploit the potential improvement in performance, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including MPEG-2 decoding and encoding, convolution, thresholding, and edge chain coding.

M. BERTINI; R. CUCCHIARA; A. DEL BIMBO; A. PRATI ( 2003 ) - Object and Event Detection for Semantic Annotation and Transcoding ( IEEE International Conference on Multimedia & Expo - Baltimore, MD, USA - 6-9 July 2003) ( - Proceedings of ICME 2003 ) (IEEE Piscataway, NJ, USA USA ) - n. volume II - pp. da 421 a 424 ISBN: 0 7803 7965 9 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Video annotation provides a suitable way to describe, organize, and index stored videos. On the other hand,transcoding aims at adapting content to the usedclientcapabilities and requirements. Both cues are now mandatory, given the tremendous demand of multimediaaccess from remote clients, in particular nowadays thatnew terminals with limited resources (PDAs, HCCs, Smartphones) have access to the network. In this paper wepropose an unified framework to define event-based andobject-based semantic extraction from video to provideboth semantic video annotation for video stored andsemantic on-line transcoding from live cameras. Two casestudies (highlights’ extraction from soccer videos for theannotation and people behavior detection in domoticapplication for transcoding) and corresponding experimental results are reported.

R. Cucchiara; A. Prati; R. Vezzani ( 2003 ) - Object Segmentation in Videos from Moving Camera with MRFs on Color and Motion Features ( IEEE Conference on Computer Vision and Pattern Recognition - Madison, Wisconsin, USA - 16-22 June) ( - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2003) ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 1 - pp. da 405 a 410 ISBN: 9780769519005 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we address the problem of fast segmenting moving objects in video acquired by moving camera or more generally with a moving background. We present an approach based on a color segmentation followed by a region-merging on motion through Markov Random Fields (MRFs). The technique we propose is inspired to a work of Gelgon and Bouthemy [6], that has been modified to reduce computational cost in order to achieve a fast segmentation (about ten frame per second). To this aim a modified region matching algorithm (namely Partitioned Region Matching) and an innovative arc-based MRF optimization algorithmwith a suitable definition of the motion reliability are proposed. Results on both synthetic and real sequences are reported to confirm validity of our solution.

R. Cucchiara; M. Trivedi; A. Prati ( 2003 ) - Proceedings of 1st Workshop on “In-Vehicle (Cognitive) Computer Vision Systems” (Dipartimento di Scienze dell'InformazioneUniversità di Modena Modena ITA ) [Curatela (284) - Curatela]
Abstract

-

R. Cucchiara; C. Grana; A. Prati ( 2003 ) - Semantic video transcoding using classes of relevance - INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS - n. volume 3 (1) - pp. da 145 a 169 ISSN: 0219-4678 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this work we present a framework for on-the-fly video transcoding that exploits computer vision-based techniques to adapt the Web access to the user requirements. Theproposed transcoding approach aims at coping with both user bandwidth and resources capabilities, and with user interests in the video's content. We propose an object-basedsemantic transcoding that, according to the user-dened classes of relevance, applies different transcoding techniques to the objects segmented in a scene. Object extraction is provided by on-the-fly video processing, without manual annotation. Multiple transcoding policies are reviewed and a performance evaluation metric based on the Weighted Mean Square Error (and corresponding PSNR), that takes into account the perceptual user requirements by means of classes of relevance, is dened. Results are analyzed by varying transcoding techniques, bandwidth requirements and video types (with indoor and outdoor scenes), showing that the use of semantics can dramatically improve the bandwidth to distortion ratio.

R. CUCCHIARA; A. PRATI; F. VIGETTI ( 2003 ) - Steering wheel's angle tracking from camera-car ( IEEE Intelligent Vehicle Symposium - Columbus, OH, USA - 9-11 June 2003) ( - Proceedings of IV 2003 ) (IEEE Piscataway, NJ, USA USA ) - pp. da 406 a 409 ISBN: 0 7803 7848 2 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper proposes a general-purpose method to trackthe steering wheel’s absolute angle by using a single camera vision system mounted inside the car. The approachis based on the modeling of the motion of thesteering wheel, as it appears perspectively distorted bythe point of view of the un-calibrated camera. We modifiedthe Lucas-Kanade method for an approzimativelyrotational motion model in order to provide the detectionand tracking of significant features on the wheel.The experimental results are compared with ground-trutheddata obtained with different types of sensors.

L. CINQUE; R. CUCCHIARA; S. LEVIALDI; G. PIGNALBERI ( 2002 ) - A Decision Support System for Range Image Segmentation ( 3rd International Conference on Digital Information Processing and Control in Ex - - - 28-30 May 2002) ( - Proceedings of 3rd International Conference on Digital Information Processing and Control in Ex ) (- - USA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

-

R. Cucchiara; C. Grana; A. Prati ( 2002 ) - A Framework for Semantic Video Transcoding ( Ottavo Convegno della Associazione Italiana per l'Intelligenza Artificiale - Siena, Italy - Sep 10-13) ( - Atti dell'Ottavo Convegno Associazione Italiana per l'Intelligenza Artificiale ) (Associazione Italiana per l'Intelligenza Artificiale - ITA ) - pp. da 637 a 644 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work we present a transcoding framework and an object-based technique to adapt live and stored videos to the user bandwidth and resources capabilities.Multiple transcoding policies are reviewed and a performance evaluation metric based on the Weighted Mean Square Error that allows different classes of relevance is presented.We present results for different transcoding policies and for different bandwidth requirements, showing that the use of semantic can improve the bandwidth to distortion ratio.

R. Cucchiara; C. Grana; A. Prati; S. Seidenari; G. Pellacani ( 2002 ) - Building the Topological Tree by Recursive FCM Color Clustering ( 16th International Conference on Pattern Recognition - Quebec City, Canada - Aug 11-15) ( - Proceedings of the 16th International Conference on Pattern Recognition ) (IEEE Computer Society Los Alamitos, CA USA ) - n. volume 1 - pp. da 759 a 762 ISBN: 9780769516967 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we define a Topological Tree (TT) as a knowledge representation method that aims to describe important visual and spatial features of image regions, namely the color similarity, the inclusion and the spatial adjacency. The topological tree exhibits some interesting properties that can be exploited to extract knowledge from images for information retrieval, image understanding and diagnosis purposes. Examples of applications in dermatology are described. The TT can be constructed after segmentation, by computing the spatial relationships of regions or can be generated directly during the segmentation: to this aim we present a novel recursive fuzzy c-means (FCM) clustering algorithm based on the Principal Component Analysis of the color space. The recursive FCM proves to be effective for underlining the adjacency and inclusion property of regions.

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2002 ) - Data-type Dependent Cache Prefetching for MPEG Applications ( IEEE International Performance, Computing e Communications Conference - Phoenix, Arizona, USA - 3-5 April 2002) ( - Proceedings of IPCCC 2002 ) (IEEE Computer Society, IEEE Communications Society, IEEE Computer Society Technical Committee on the Internet Los Alamitos, California USA ) - pp. da 115 a 122 ISBN: 0 7803 7371 5 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Data cache prefetching is an effective technique to improve performance of cache memories, whenever the prefetching algorithm is able to correctly predict useful data to be prefetched. To this aim, adequate information on the program’s data locality must be used by the prefetching algorithm. In particular, multimedia applications are characterized by a substantial amount of image and video processing, which exhibits spatial locality in both the dimensions of the 2D data structures used for images and frames. However, in multimedia programs many memory references are made also to non-image data, characterized by standard spatial locality. In this work, we explore the adoption of different prefetching techniques in dependence of the data type (i.e., image and non-image), thus making itpossible to tune the prefetching algorithms to the differentforms of locality, and achieving overall performance optimization. In order to prevent interference between the two different data types, a split cache with two separated caches for image and non-image data is also evaluated as an alternative to a standard unified cache. Results on a multimedia workload (MPEG-2 and MPEG-4 decoders) show that standard prefetching techniques such as One-block-lookahead and the Stride Prediction Table are effective for standard data, while novel 2D prefetching techniques perform best on image data. In addition, at a parity of size, unified caches offer in general better performance that split caches, thank to the more flexible allocation of a unified cache space.

R. Cucchiara; C. Grana; A. Prati ( 2002 ) - Detecting Moving Objects and their Shadows: An Evaluation with the PETS2002 Dataset ( Third IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS’2002) - Copenhagen, Denmark - Jun 1) ( - Proceedings of the Third IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS’2002) ) (James M. Ferryman Reading, UK GBR ) - pp. da 18 a 25 ISBN: 076951698X [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This work presents a general-purpose method for moving visual object segmentation in videos and discusses results attained on sequences of PETS2002 datasets. The proposed approach, called Sakbot, exploits color and motion information to detect objects, shadows and ghosts, i.e. foreground objects with apparent motion. The method is based on background suppression in the color space. The main peculiarity of the approach is the exploitation of motion and shadow information to selectively update the background, improving the statistical background model with the knowledge of detected objects. The approach is able to detect Moving Visual Objects (MVOs), and stopped objects too, since the motion status is maintained at the level of tracking module. HSV color space is exploited for shadow detection in order to enhance both segmentation and background update. Time measures and precision performance analysis in tracking and counting people is provided for surveillance and monitoring purposes.

S. Seidenari; G. Pellacani; C. Grana; R. Cucchiara ( 2002 ) - Development of a new program for image analysis of digital videomicroscopic images of pigmented skin lesions ( 10th Congress of the European academy of Dermatology (EADV) - Praga, Repubblica Ceca - Oct 2-6) ( - - ) (Elsevier Amsterdam NLD ) - JOURNAL OF THE EUROPEAN ACADEMY OF DERMATOLOGY AND VENEREOLOGY - n. volume 16 suppl. 1 - pp. da 188 a 188 ISSN: 0926-9959 [Abstract in rivista (266) - Abstract in Rivista]
Abstract

Although an improvement of the diagnostic accuracy of pigmented skin lesions (PSL) has been achieved by the epiluminescence technique (ELM), the interpretation of ELM criteria is often confusing, especially for inexperienced observers. To enhance the reproducibility and accuracy of clinical judgement and the training of inexperienced operators, programs for PSL image analysis and algorithms for automatic diagnosis have been developed. The aim of our study was to develop a new program for PSL image analysis, able to describe different aspects of PSLs and to test its descriptive capability on PSL acquired by means of a digital videomicroscope (VMS 110A, Scalar Mitsubishi, Japan) using 20-fold magnification. After automatic border identification and baricentre determination, some geometric parameters, describing shape characteristics of the lesion, were calculated. A mathematical description of the border cut-off was obtained. The texture of the lesion was calculated applying the co-occurrence matrix at different image resolutions. Dark areas and colour areas, referring to selected colour groups, were obtained and their aspect and distribution were mathematically defined and calculated. 281 common nevi and 117 melanomas were numerically described by our program and the capability of the mathematical parameters to distinguish between benign and malignant lesion was tested by means of discriminant analysis. Significant differences were observed for most parameters between different PSL populations. The automatic classification enabled the distinction between melanomas and nevi with a 100% sensitivity and a 82.9% specificity.

R. Cucchiara; C. Grana; S. Seidenari; G. Pellacani ( 2002 ) - Exploiting color and topological features for region segmentation with recursive fuzzy c-means - MACHINE GRAPHICS & VISION - n. volume 11 (2/3) - pp. da 169 a 182 ISSN: 1230-0535 [Articolo in rivista (262) - Articolo su rivista]
Abstract

In this paper we define a novel approach for image segmentation into regions which focuses on both visual and topological cues, namely color similarity, inclusion and spatial adjacency. Many color clustering algorithms have been proposed in the past for skin lesion images but none exploits explicitly the inclusion properties between regions. Our algorithm is based on a recursive version of fuzzy c-means (FCM) clustering algorithm in the 2D color histogram constructed by Principal Component Analysis (PCA) of the color space. The distinctive feature of the proposal is that recursion is guided by the evaluation of adjacency and mutual inclusion properties of extracted regions; then, the recursive analysis addresses only included regions or regions with a not-negligible size. This approach allows a coarse-to-fine segmentation which focuses the attention on the inner parts of the images, in order to highlight the internal structure of the object depicted in the image. This could be particularly useful in many applications, especially in the biomedical image analysis. In this work we apply the technique to the segmentation of skin lesions in dermatoscopic images. It could be a suitable support for the diagnosis of skin melanoma, since dermatologists are interested in the analysis of the spatial relations, the symmetrical positions and the inclusion of regions.

R. Cucchiara; C. Grana; M. Piccardi ( 2002 ) - Iterative fuzzy clustering for detecting regions of interest in skin lesions - AIIA NOTIZIE - n. volume 15 - pp. da 36 a 39 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Image analysis tools are spreading in dermatology since the introduction of dermoscopy (epiluminescence microscopy), in the effort of algorithmically reproducing clinical evaluations. Color-based region segmentation of skin lesions is one of the key steps for correctly collecting statistics that can help clinicians in their diagnosis. Nevertheless, an efficient and accurate region segmentation algorithm has not been proposed in the literatureyet. This work proposes an iterative fuzzy c-means clustering algorithm based on PCA with the Karhunen-Loève transform of the color space. A topological tree is provided to store the mutual inclusions of the regions and then used to summarize the structural properties of the skin lesion. Preliminary experimental results are presented and discussed.

F. CAVALLI; R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2002 ) - Performance analysis of MPEG-4 decoder and encoder ( IEEE Region 8 International Symposium on Video/Image Processing and Multimedia Communications - Zadar, Croazia - 16-19 June 2002) ( - Proceedings VIPromCom-2002 ) (Croatian Society Electronics in Marine - Elmar Zadar HRV ) - pp. da 227 a 231 ISBN: 9789537044015 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper, a performance analysis of MPEG-4 encoder and decoder programs on standard personal computer is presented. The paper first describes the MPEG-4 computational load and discusses related works, then outlines the performance analysis. Experimental results show that while the decoder program can be easily executed in real time, the encoder requires execution times in the order of seconds per frame which call for substantial optimisation to satisfy the real-time constraints.

R. Cucchiara; C. Grana; A. Prati ( 2002 ) - Semantic Transcoding for Live Video Server ( Tenth ACM international conference on Multimedia - Juan-les-Pins, France - Dec 1-6) ( - Proceedings of the tenth ACM international conference on Multimedia ) (ACM New York USA ) - pp. da 223 a 226 ISBN: 9781581136203 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this paper we present transcoding techniques for a video server architecture that enables the user to access live video streams by using different devices with different capabilities. For live videos, annotation methods cannot be exploited. Instead we propose methods of on-the-fly transcoding that adapt the video content with respect to the user resources and the video semantic. Thus we propose an object-based transcoding with "classes of relevance" (for instance People, Face and Background). To compare the different strategies we propose a metric based on the Weighted Mean Square Error that allows the analysis of different application scenarios by means of a class-wise distortion measure. The obtained results show that the use of semantic can improve the bandwidth to distortion ratio significantly.

R. Cucchiara; C. Grana ( 2002 ) - Using the Topological Tree for skin lesion structure description ( Sixth International Conference on Knowledge-Based Intelligent Information & Engineering Systems - Podere d'Ombriano, Crema, Italy - Sep 16-18) ( - Knowledge-based Intelligent Information Engineering Systems & Allied Technologies ) (IOS Press/Ohmsha Amsterdam NLD ) - pp. da 166 a 170 ISBN: 9781586032807 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In this work we describe the Topological Tree (TT) as a knowledge representation method that relates some important visual and spatial features of image regions, namely the color similarity, the inclusion and the spatial adjacency. Starting from color-based region segmentation of an image into disjoint regions, their spatial relationships can be devised and described with graph-based methods. We are interested in the region’s propriety “to be included into” (in the sense of “surrounded by”) another region. This property could be very useful in biomedical imaging and in particular in the diagnosis of skin melanoma. The TT can be constructed after segmentation, by computing the spatial relationships of regions or can be generated directly during the segmentation: to this aim we present a novel recursive fuzzy c-means (FCM) clustering algorithm based on the PCA of the color space. In the paper, in addition to the TT definition and the construction algorithm description, some results are presented and discussed.

A. PRATI; R. CUCCHIARA; I. MIKIC; MM TRIVEDI ( 2001 ) - Analysis and detection of shadows in video streams: a comparative evaluation ( IEEE-CS Computer Vision and Pattern Recognition conference - Kauai, Hawaii, USA - 8-14 December 2001) ( - Proceedings of IEEE-CS Computer Vision and Pattern Recognition conference ) (IEEE Computer Society Los Alamitos, California USA ) - n. volume 2 - pp. da 571 a 576 ISBN: 9780769512723 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Robustness to changes in illumination conditions as well as viewing perspectives is an important requirement for many computer vision applications. One of the key factors in enhancing the robustness of dynamic scene analysis is that of accurate and reliable means for shadow detection. Shadow detection is critical for correct object detection in image sequences. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches isstill lacking. In this paper, the full range of problems underlyingthe shadow detection are identified and discussed. We classify the proposed solutions to this problem using a taxonomy of four main classes, called deterministic model and non-model based and statistical parametric and nonparametric. Novel quantitative (detection and discrimination accuracy) and qualitative metrics (scene and object independence, flexibility to shadow situations and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor videosequences.

A. PRATI; I. MIKIC; R. CUCCHIARA; M.M. TRIVEDI ( 2001 ) - Comparative Evaluation of Moving Shadow Detection Algorithms ( 3rd Workshop on Empirical Evaluation in Computer Vision - Kauai, Hawaii, USA - 14 December 2001) ( - Proceedings of 3rd Workshop on Empirical Evaluation in Computer Vision ) (IEEE Computer Society Los Alamitos, California USA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Moving shadows need careful consideration in the development of robust dynamic scene analysis systems. Moving shadow detection is critical for accurate object detection in video streams, since shadow points are often misclassified as object points causing errors in segmentation and tracking. Many algorithms have been proposed in the literature that deal with shadows. However, acomparative evaluation of the existing approaches is still lacking. In this paper, the full range of problems underlying the shadowdetection are identified and discussed. We present a comprehensive survey of moving shadow detection approaches. We organize contributions reported in the literature in four classes. We also present a comparative empirical evaluation of representative algorithms selected from these four classes. Quantitative (detection and discrimination accuracy) and qualitative metrics (scene and object independence, flexibility to shadow situations and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences. These video sequences and associated “ground-truth” data are made available at http://cvrr.ucsd.edu:88/aton/shadow to allow for others in the community to experiment with new algorithms and metrics.

R. Cucchiara; C. Grana; M. Piccardi; A. Prati ( 2001 ) - Detecting objects, shadows and ghosts in video streams by exploiting color and motion information ( 11th International Conference on Image Analysis and Processing (ICIAP 2001) - Palermo, Italy - Sep 26-28) ( - Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP 2001) ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 360 a 365 ISBN: 9780769511832 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Many approaches to moving object detection for traffic monitoring and video surveillance proposed in the literature are based on background suppression methods. How to correctly and efficiently update the background model and how to deal with shadows are two of the more distinguishing and challenging features of such approaches. This work presents a general-purpose method for segmentation of moving visual objects (MVOs) based on an object-level classification in MVOs, ghosts and shadows. Background suppression needs a background model to be estimated and updated: we use motion and shadow information to selectively exclude from the background model MVOs and their shadows, while retaining ghosts. The color information (in the HSV color space) is exploited to shadow suppression and, consequently, to enhance both MVOs segmentation and background update.

R. Cucchiara; C. Grana; M. Piccardi; A. Prati; S. Sirotti ( 2001 ) - Improving shadow suppression in moving object detection with HSV color information ( IEEE Conference on Intelligent Transportation Systems - Oakland, CA - Aug 25-29) ( - IEEE Conference on Intelligent Transportation Systems ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 334 a 339 ISBN: 9780780371941 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in the scene. However, problems arise due to shadows. In particular, moving shadows can affect the correct localization, measurements and detection of moving objects. This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking. The major novelty of the shadow detection technique is the analysis carried out in the HSV color space to improve the accuracy in detecting shadows. Signal processing and optic motivations of the approach proposed are described. The integration and exploitation of the shadow detection module into the system are outlined and experimental results are shown and evaluated

R. Cucchiara; C. Grana; M. Piccardi ( 2001 ) - Iterative fuzzy clustering for detecting regions of interest in skin lesions ( Workshop su "Intelligenza Artificiale, Visione e Pattern Recognition" - Bari - Sep 24) ( - Atti del Workshop su "Intelligenza Artificiale, Visione e Pattern Recognition" ) (- - ITA ) - pp. da 31 a 38 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Image analysis tools are spreading in dermatology since the introduction of dermoscopy (epiluminescence microscopy), in the effort of algorithmically reproducing clinical evaluations. Color-based region segmentation of skin lesions is one of the key steps for correctly collecting statistics that can help clinicians in their diagnosis. Nevertheless, an efficient and accurate region segmentation algorithm has not been proposed in the literature yet. This work proposes an iterative fuzzy c-means clustering algorithm based on PCA with the Karhunen-Loève transform of the color space. A topological tree is provided to store the mutual inclusions of the regions and then used to summarize the structural properties of the skin lesion. Preliminary experimental results are presented and discussed.

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2001 ) - Temporal analysis of cache prefetching strategies for multimedia applications ( IEEE International Performance, Computing e Communications Conference - Phoenix, Arizona, USA - 4-6 April 2001) ( - Proceedings of IEEE International Performance, Computing e Communications Conference ) (IEEE Computer Society, IEEE Communications Society, IEEE Computer Society Technical Committee on the Internet Los Alamitos, California USA ) - pp. da 311 a 318 ISBN: 0 7803 7001 5 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Prefetching is a widely adopted technique for improving performance of cache memories. Performances are typically affected by the design parameters, such as cache size and associativity, but also by the type of locality embodied in the programs. In particular multimedia tools and programs handling images and video are characterized & a bi-dimensional spatiallocality that could be greatly exploited by the inclusion of prefetching in the cache architecture. In this paper we compare some prefetching techniques for multimedia programs (such as MPEG compression, image processing, visual object egmentation) by performing a detailed evaluation of the memory access time. The goal is to prove that a signifcant speedup can be achieved by using either standard prefecthing techniques (such as OBL or adaptive prefetchind or some innovative andimage-oriented prefetching methods, like the neighbor prefetching described in the paper. Performance are measured with the PRIMA trace-driven simulator.

R. Cucchiara; C. Grana; G. Neri; M. Piccardi; A. Prati ( 2001 ) - The Sakbot system for moving object detection and tracking ( 2nd European Workshop on Advanced Video-Based Surveillance Systems - Kingston upon Thames, UK - Sep 4) ( - Proceedings of 2nd European Workshop on Advanced Video-Based Surveillance Systems ) (- - GBR ) - pp. da 159 a 171 ISBN: 0000000000 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

This paper presents Sakbot, a system for moving object detection and tracking in traffic monitoring and video surveillance applications. The system is endowed with robust and efficient detection techniques, which main features are the statistical and knowledge-based background update and the use of HSV color information for shadow suppression. Tracking is performed by means of a flexible tracking module based on symbolic reasoning, which can be tuned to several different applications.

R. Cucchiara; C. Grana; G. Neri; M. Piccardi; A. Prati ( 2001 ) - The Sakbot system for moving object detection and tracking ( - Video-Based Surveillance Systems: Computer Vision and Distributed Processing ) (Springer Heidelberg DEU ) - pp. da 145 a 158 ISBN: 9780792376323 [Contributo in volume (Capitolo o Saggio) (268) - Capitolo/Saggio]
Abstract

This paper presents Sakbot, a system for moving object detection in traffic monitoring and video surveillance applications. The system is endowed with robust and efficient detection techniques, which main features are the statistical and knowledge-based background update and the use of HSV color information for shadow suppression. Tracking is provided by a symbolic reasoning module allowing flexible object tracking over a variety of different applications. This system proves effective on many different situations, both from the point of view of the scene appearance and the purpose of the application.

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2000 ) - Focus based Feature Extraction for Pallets Recognition ( British Machine Vision Conference - Bristol, UK - 11-14 September 2000) ( - Proceedings of British Machine Vision Conference ) (IEE London GBR ) - n. volume 2 - pp. da 695 a 704 ISBN: 1 901725 13 8 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Visual recognition for object grasping is a well-known challenge for robot automation in industrial applications. A typical example is pallet recognition in industrial environment for pick-and-place automated process. The aim of vision and reasoning algorithms is to help robots in choosing the best pallets holes location. This work proposes an application-based approach, which fulfil all requirements, dealing with every kind of occlusions and light situations possible. Even some ”meaning noise” (or ”meaning misunderstanding”) is considered. A pallet model, with limited degrees of freedom, is described and, starting from it, a complete approach to pallet recognition is outlined. In the model we define both virtual and real corners, that are geometricalobject proprieties computed by different image analysis operators. Real corners are perceived by processing brightness information directly from the image, while virtual corners are inferred at a higher level of abstraction. A final reasoning stage selects the best solution fitting the model. Experimental results and performance are reported in order to demonstrate the suitability of the proposed approach.

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2000 ) - Hardware prefetching techniques for cache memories in multimedia applications ( International Workshop on Computer Architectures for Machine Perception - Padova, Italy - 11-13 September 2000) ( - Proceedings of International Workshop on Computer Architectures for Machine Perception ) (IEEE Computer Society Los Alamitos, California USA ) - pp. da 311 a 319 ISBN: 0 7695 0740 9 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The workload of niultimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely explored approach to improve cache performance is hardware prefetching that allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches partially miss thepotential performance improvement, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including convolutions with kernels, MPEG-2 decoding, and edgechain coding.

R. CUCCHIARA; M. PICCARDI; P. MELLO ( 2000 ) - Image Analysis and Rule-Based Reasoning for a Traffic Monitoring (IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: subscription-service@ieee.org, INTERNET: http://www.ieee.org, Fax: (732)981-9667 ) - IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS - n. volume 1(2) - pp. da 119 a 130 ISSN: 1524-9050 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The paper presents an approach for detecting vehicles in urban traffic scenes by means of rule-based reasoning on visual data. The strength of the approach is its formal separation between the low-level image processing modules (used for extracting visual data under various illumination conditions) and the high-level module, which provides a general-purpose knowledge-based framework for tracking vehicles in the scene. The image-processing modules extract visual data from the scene by spatio-temporal analysis during daytime, and by morphological analysis of headlights at night, The high-level module is designed as a forward chaining production rule system, working on symbolic data, i.e., vehicles and their attributes (area, pattern, direction, and others) and exploiting a set of heuristic rules tuned to urban traffic conditions, The synergy between the artificial intelligence techniques of the high-level and the low-level image analysis techniques provides the system with flexibility and robustness.

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 2000 ) - Improving data prefetching efficacy in multimedia applications ( Convegno Scuola Superiore G. Romolo Reiss - L'Aquila, Italy - July 2000) ( - SSGRR 2000 ) (Scuola Superiore G. Romolo Reiss L'Aquila, Italy ITA ) [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs diers from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely-explored approach to improve cache performance is hardware prefetching, which allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches unable to exploit the potential improvement in performance, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results (both on efficiency and on efficacy of the proposed approach) are reported for a suite of multimedia image processing programs including MPEG-2 decoding and encoding, convolution, thresholding, and edge chain coding.

R. Cucchiara; M. Piccardi; A. Prati ( 2000 ) - Scuola "La Visione delle Macchine" (Dipartimento di Scienze dell'InformazioneUniversità di Modena Modena ITA ) [Curatela (284) - Curatela]
Abstract

-

R. Cucchiara; C. Grana; M. Piccardi; A. Prati ( 2000 ) - Statistic and knowledge-based moving object detection in traffic scenes ( 3rd IEEE Conference on Intelligent Transportation Systems - Dearborn, MI, USA - Oct 1-3) ( - Proceedings of the 3rd IEEE Conference on Intelligent Transportation Systems ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 27 a 32 ISBN: 9780780359710 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The most common approach used for vision-based traffic surveillance consists of a fast segmentation of moving visual objects (MVOs) in the scene together with an intelligent reasoning module capable of identifying, tracking and classifying the MVOs in dependency of the system goal. In this paper we describe our approach for MVOs segmentation in an unstructured traffic environment. We consider complex situations with moving people, vehicles and infrastructures that have different aspect model and motion model. In this case we define a specific approach based on background subtraction with statistic and knowledge-based background update. We show many results of real-time tracking of traffic MVOs in outdoor traffic scene such as roads, parking area intersections, and entrance with barriers

E. LAMMA; P. MELLO; M. MILANO; R. CUCCHIARA; G. GAVANELLI; M. PICCARDI ( 1999 ) - Constraint Propagation and Value Acquisition: why we should do it Interactively ( Sixteenth International Joined Conference on Artificial Intelligence (IJCAI99) - Stockholm, Sweden - July 31 - Aug. 6) ( - Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence ) (Morgan Kaufmann Publishers Inc. San Francisco, CA USA ) - pp. da 468 a 477 ISBN: 9781558606135 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

In Constraint Satisfaction Problems (CSPs) values belonging to variable domains should be completely known before the constraint propagationprocess starts. In many applications, however, the acquisition of domain values is a computational expensive process or some domainvalues could not be available at the beginningof the computation. For this purpose, we introduce an Interactive Constraint SatisfactionProblem (ICSP) model as extension of the widely used CSP model. The variable domainvalues can be acquired when needed duringthe resolution process by means of InteractiveConstraints, which retrieve (possibly consistent)information. Experimental results on randomly generated CSPs and for 3D object recognition show the effectiveness of the proposedapproach.

R. CUCCHIARA; M. PICCARDI ( 1999 ) - Eliciting Visual Primitives for Detecting Elongated Shapes - IMAGE AND VISION COMPUTING - n. volume 17(5) - pp. da 347 a 355 ISSN: 0262-8856 [Articolo in rivista (262) - Articolo su rivista]
Abstract

Elsevier eds

R. CUCCHIARA; M. PICCARDI; A. PRATI ( 1999 ) - Exploiting Cache in Multimedia ( IEEE International Conference on Multimedia Computing and Systems (ICMCS) - Florence, Italy - 7-11 June 1999) ( - International Conference on Multimedia Computing and Systems ) (IEEE Computer Society Los Alarnitos, California USA ) - n. volume 1 - pp. da 345 a 350 ISBN: 9780769502533 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The paper explores cache strategies for multimedia. Although many architectural improvements have been designed for multimedia, the cache structure and the standard caching policies of general-purpose processors exhibit poor performance in exploiting the 2D spatial locality typical of programs handling and processing images. In this paper we propose a novel caching approach suitably tailored to the requirement of multimedia programs. Our proposal exploits hardware pre-fetching for allocating in cache blocks of data that satisfy the 2D spatial locality requirements. Results refer to a benchmark suite of multimedia program including MPEG decoding and image processing programs with different data dependency and access scheme to image data.

R. Cucchiara; M. Gavanelli; E. Lamma; P. Mello; M. Milano; M. Piccardi ( 1999 ) - Extending CLP(FD) with Interactive Data Acquisition for 3D Visual Object Recognition ( First International Conference on the Practical Application of Constraint Technologies and Logic Programming - London, UK - Apr 19-21) ( - Proceedings of the First International Conference on the Practical Application of Constraint Technologies and Logic Programming ) (The Practical Application Company Blackpool, UK GBR ) - pp. da 137 a 155 ISBN: 978 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

-

R. CUCCHIARA; M. PICCARDI; P. MELLO ( 1999 ) - Image Analysis and Rule-Based Reasoning for a Traffic Monitoring ( IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems - Tokyo - Oct. 5-8) ( - Proceedings of the IEEE/IEEJ/JSAI International Conference on Intelligent Transportation Systems ) (IEEE Computer Society Los Alamitos, CA USA ) - pp. da 758 a 763 ISBN: 9780780349759 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The paper describes a system for detecting vehicles in urban traffic scenes in daytime and at night by means of image analysis and rule-based reasoning. The strength of the proposed approach is its formal separation between the low-level image processing modules (detecting moving vehicles under day and night light) and the high-level module, which provides a single framework for tracking vehicles in the scene. The image processing modules perform spatio-temporal analysis on moving templates in daytime images, and morphological analysis of headlight pairs in night images. The high-level module is designed as a forward chained production rule system, working on symbolic data, i.e. vehicles and their attributes, and exploiting a set of heuristic roles tuned to urban traffic conditions. The synergy between the artificial intelligence techniques of the high level and low-level image analysis techniques provides the system with flexibility and robustness.

R. CUCCHIARA; M. PICCARDI; A. PRATI; N. SCARABOTTOLO ( 1999 ) - Real-time Detection of Moving Vehicles ( International Conference on Image Analysis and Processing (ICIAP) - Venice, Italy - 27-29 September 1999) ( - Proceedings of International Conference on Image Analysis and Processing ) (IEEE Computer Society Los Alarnitos, California USA ) - pp. da 618 a 623 ISBN: 0 7695 0040 4 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Computer vision-based traffic flow monitoring is of major importance for enforcing traffic management policies. Information such as the number of vehicles passing on a road per time unit, or vehicles' turning rates at intersections are exploited by traffic management policies to supervise traffic-light timings. Computer vision-based traffic flow monitoring requiresextraction of moving vehicles from traffic scenes in real time. To accomplish this task, efficient algorithms must be used and effective, low-cost hardware implementation must be pursued. This paper first describes the algorithms used in VTTS (Vehicular Traffic Tracking System) to achieve segmentation of moving vehicles. Then, hardware implementation on a re-programmable FPGA-based board is described in detail.

R. CUCCHIARA; M. GAVANELLI; A. PRATI; M. PICCARDI ( 1999 ) - Rule-based reasoning on visual data for urban traffic monitoring ( Sesto Convegno della Associazione Italiana per l'Intelligenza Artificiale - Bologna, Italy - -) ( - Atti del Sesto Convegno della Associazione Italiana per l'Intelligenza Artificiale ) (Associazione Italiana per l'Intelligenza Artificiale - ITA ) - pp. da 89 a 98 ISBN: 978 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

The paper describes a system for detecting vehicles in urban traffic scenes by means of rule-based reasoning on visual data. The strength of the proposed approach is its formal separation between low-level image processing modules (able for extracting visual data under various illumination conditions) and the high-level module, which provides a single framework for tracking vehicles in the scene. The image processing modules extract visual data from the scene, by spatio-temporal analysis during day-time, and by morphological analysis of headlights at night. The high-level module is designed as a forward chaining production rule system, working on symbolic data, i.e. vehicles and their attributes (area, pattern, direction...) and exploiting a set of heuristic rules tuned to urban traffic conditions. The synergy between the artificial intelligence techniques of the high level and the low-level image analysis techniques provides the system with flexibility and robustness.

R. CUCCHIARA; P. ONFIANI; A. PRATI; N. SCARABOTTOLO ( 1999 ) - Segmentation of Moving Objects at Frame Rate: A Dedicated Hardware Solution ( IEE Conf. on Image Processing and its Applications (IPA) - Manchester, UK - 13-15 July 1999) ( - Proceedings of IEE Conf. on Image Processing and its Applications ) (IEE London GBR ) - n. volume 1 - pp. da 138 a 142 ISBN: 0 85296 717 9 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Many works in image processing concern segmentation of moving objects in sequence of images. This problem is particularly critical, since it represents the first step of many complex processes of computer vision, for applications like object tracking, video-surveillance, monitoring, and autonomous navigation. In such applications, both real-time and low-cost requirements should be satisfied.To this aim we propose a dedicated hardware solution, based on reconfigurable logic, that provides motion detection and moving objects segmentation at framerate.

R. CUCCHIARA; M. PICCARDI ( 1999 ) - Vehicle Detection under Day and Night Illumination ( International Symposia on Intelligent Industrial Automation - Genova - June 1-4) ( - Proceedings of the International Symposia on Intelligent Industrial Automation ) (Academic Press Rochester, NY, USA USA ) - pp. da 789 a 784 ISBN: 9783906454160 [Contributo in Atti di convegno (273) - Relazione in Atti di Convegno]
Abstract

Effective detection of vehicles in urban traffic scenes can be achieved by exploiting image analysis techniques. Nevertheless, vehicle detection in daytime and at night can’t be approached with the same image analysis algorithms, due to the strongly different illumination conditions. This paper describes the two different sets of image analysis algorithms that have been used in the VTTS system (Vehicular Traffic Tracking System) for extracting vehicles from image sequences acquired in daytime and at night. In the system, a supervising level selects the set of algorithms to apply and performs vehicle tracking under control of a rule-based decision module. The paper describes the tracking module, and reports experimental results for both vehicle detection andtracking.

R. CUCCHIARA; G. NERI; M. PICCARDI ( 1998 ) - A real-time hardware implementation of the hough transform (Elsevier BV:PO Box 211, 1000 AE Amsterdam Netherlands:011 31 20 4853757, 011 31 20 4853642, 011 31 20 4853641, EMAIL: nlinfo-f@elsevier.nl, INTERNET: http://www.elsevier.nl, Fax: 011 31 20 4853598 ) - JOURNAL OF SYSTEMS ARCHITECTURE - n. volume 45 (1) - pp. da 31 a 45 ISSN: 1383-7621 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The paper presents a hardware implementation of algorithms based on the Hough transform (HT) for real-time straight line detection. In particular, the basic HT on the edge points (EHT) and the Gradient-Weighted Hough transform (GWHT) for gray-level images are analyzed in detail and implemented on a pipelined architecture using Field Programmable Gate Arrays (FPGA). Algorithms execution times are compared with other hardware and software based systems in order to assess the efficiency of the presented approach. The paper shows how the achievable performance can meet the real-time requirements of an industrial inspection application.

R. CUCCHIARA ( 1998 ) - Genetic algorithms for clustering in machine vision - MACHINE VISION AND APPLICATIONS - n. volume 11(1) - pp. da 1 a 6 ISSN: 0932-8092 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The paper presents a genetic algorithm for clustering objects in images based on their visual features. In particular, a novel solution code (named Boolean Matching Code) and a correspondent reproduction operator (the Single Gene Crossover) are defined specifically for clustering and are compared with other standard genetic approaches. The paper describes the clustering algorithm in detail, in order to show the suitability of the genetic paradigm and underline the importance of effective tuning of algorithm parameters to the application. The algorithm is evaluated on some test sets and an example of its application in automated visual inspection is presented.

R. CUCCHIARA; F. FILICORI ( 1998 ) - The Vector-Gradient Hough Transform (IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: subscription-service@ieee.org, INTERNET: http://www.ieee.org, Fax: (732)981-9667 ) - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE - n. volume 20 (7) - pp. da 746 a 751 ISSN: 0162-8828 [Articolo in rivista (262) - Articolo su rivista]
Abstract

The paper presents a new transform, called vector-gradient Hough transform, for identifying elongated shapes in gray-scale images. This goal is achieved not only by collecting information on the edges of the objects, but also by reconstructing their transversal profile of luminosity. The main features of the new approach are related to its vector space formulation and the associated capability of exploiting all the vector information of the luminosity gradient