Nuova ricerca

Vittorio CUCULO

Ricercatore t.d. art. 24 c. 3 lett. A
Dipartimento di Ingegneria "Enzo Ferrari"


Home | Didattica |


Pubblicazioni

2024 - Pain and Fear in the Eyes: Gaze Dynamics Predicts Social Anxiety from Fear Generalisation [Relazione in Atti di Convegno]
Patania, Sabrina; D’Amelio, Alessandro; Cuculo, Vittorio; Limoncini, Matteo; Ghezzi, Marco; Conversano, Vincenzo; Boccignone, Giuseppe
abstract


2024 - Trends, Applications, and Challenges in Human Attention Modelling [Relazione in Atti di Convegno]
Cartella, Giuseppe; Cornia, Marcella; Cuculo, Vittorio; D'Amelio, Alessandro; Zanca, Dario; Boccignone, Giuseppe; Cucchiara, Rita
abstract

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling. This survey offers a reasoned overview of recent efforts to integrate human attention mechanisms into contemporary deep learning models and discusses future research directions and challenges.


2024 - Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images [Articolo su rivista]
Cartella, Giuseppe; Cuculo, Vittorio; Cornia, Marcella; Cucchiara, Rita
abstract

Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural language of your desired output is all you need to obtain breathtaking results. However, as the use of generative models grows, so do concerns about the propagation of malicious content and misinformation. Consequently, the research community is actively working on the development of novel fake detection techniques, primarily focusing on low-level features and possible fingerprints left by generative models during the image generation process. In a different vein, in our work, we leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. To achieve this, we collect a novel dataset of partially manipulated images using diffusion models and conduct an eye-tracking experiment to record the eye movements of different observers while viewing real and fake stimuli. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images. Statistical findings reveal that, when perceiving counterfeit samples, humans tend to focus on more confined regions of the image, in contrast to the more dispersed observational pattern observed when viewing genuine images. Our dataset is publicly available at: https://github.com/aimagelab/unveiling-the-truth.


2023 - Inferring Causal Factors of Core Affect Dynamics on Social Participation through the Lens of the Observer [Articolo su rivista]
D'Amelio, Alessandro; Patania, Sabrina; Buršić, Sathya; Cuculo, Vittorio; Boccignone, Giuseppe
abstract

A core endeavour in current affective computing and social signal processing research is the construction of datasets embedding suitable ground truths to foster machine learning methods. This practice brings up hitherto overlooked intricacies. In this paper, we consider causal factors potentially arising when human raters evaluate the affect fluctuations of subjects involved in dyadic interactions and subsequently categorise them in terms of social participation traits. To gauge such factors, we propose an emulator as a statistical approximation of the human rater, and we first discuss the motivations and the rationale behind the approach.The emulator is laid down in the next section as a phenomenological model where the core affect stochastic dynamics as perceived by the rater are captured through an Ornstein-Uhlenbeck process; its parameters are then exploited to infer potential causal effects in the attribution of social traits. Following that, by resorting to a publicly available dataset, the adequacy of the model is evaluated in terms of both human raters' emulation and machine learning predictive capabilities. We then present the results, which are followed by a general discussion concerning findings and their implications, together with advantages and potential applications of the approach.


2023 - On Using rPPG Signals for DeepFake Detection: A Cautionary Note [Relazione in Atti di Convegno]
D’Amelio, Alessandro; Lanzarotti, Raffaella; Patania, Sabrina; Grossi, Giuliano; Cuculo, Vittorio; Valota, Andrea; Boccignone, Giuseppe
abstract


2023 - Using Gaze for Behavioural Biometrics [Articolo su rivista]
D’Amelio, Alessandro; Patania, Sabrina; Bursic, Sathya; Cuculo, Vittorio; Boccignone, Giuseppe
abstract

A principled approach to the analysis of eye movements for behavioural biometrics is laid down. The approach grounds in foraging theory, which provides a sound basis to capture the unique- ness of individual eye movement behaviour. We propose a composite Ornstein-Uhlenbeck process for quantifying the exploration/exploitation signature characterising the foraging eye behaviour. The rel- evant parameters of the composite model, inferred from eye-tracking data via Bayesian analysis, are shown to yield a suitable feature set for biometric identification; the latter is eventually accomplished via a classical classification technique. A proof of concept of the method is provided by measuring its identification performance on a publicly available dataset. Data and code for reproducing the analyses are made available. Overall, we argue that the approach offers a fresh view on either the analyses of eye-tracking data and prospective applications in this field.


2022 - DeepFakes Have No Heart: A Simple rPPG-Based Method to Reveal Fake Videos [Relazione in Atti di Convegno]
Boccignone, Giuseppe; Bursic, Sathya; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella; Patania, Sabrina
abstract

We present a simple, yet general method to detect fake videos displaying human subjects, generated via Deep Learning techniques. The method relies on gauging the complexity of heart rate dynamics as derived from the facial video streams through remote photoplethysmography (rPPG). Features analyzed have a clear semantics as to such physiological behaviour. The approach is thus explainable both in terms of the underlying context model and the entailed computational steps. Most important, when compared to more complex state-of-the-art detection methods, results so far achieved give evidence of its capability to cope with datasets produced by different deep fake models.


2022 - pyVHR: a Python framework for remote photoplethysmography [Articolo su rivista]
Boccignone, G.; Conte, Donatello; Cuculo, V.; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, R.; Mortara, Edoardo
abstract

Remote photoplethysmography (rPPG) aspires to automatically estimate heart rate (HR) variability from videos in realistic environments. A number of effective methods relying on data-driven, model-based and statistical approaches have emerged in the past two decades. They exhibit increasing ability to estimate the blood volume pulse (BVP) signal upon which BPMs (Beats per Minute) can be estimated. Furthermore, learning-based rPPG methods have been recently proposed. The present pyVHR framework represents a multi-stage pipeline covering the whole process for extracting and analyzing HR fluctuations. It is designed for both theoretical studies and practical applications in contexts where wearable sensors are inconvenient to use. Namely, pyVHR supports either the development, assessment and statistical analysis of novel rPPG methods, either traditional or learning-based, or simply the sound comparison of well-established methods on multiple datasets. It is built up on accelerated Python libraries for video and signal processing as well as equipped with parallel/accelerated ad-hoc procedures paving the way to online processing on a GPU. The whole accelerated process can be safely run in real-time for 30 fps HD videos with an average speedup of around 5. This paper is shaped in the form of a gentle tutorial presentation of the framework.


2020 - An Open Framework for Remote-PPG Methods and Their Assessment [Articolo su rivista]
Boccignone, Giuseppe; Conte, Donatello; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
abstract

This paper presents a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG). There has been a remarkable development of rPPG techniques in recent years, and the publication of several surveys too, yet a sound assessment of their performance has been overlooked at best, whether not undeveloped. The methodological rationale behind the framework we propose is that in order to study, develop and compare new rPPG methods in a principled and reproducible way, the following conditions should be met: 1) a structured pipeline to monitor rPPG algorithms' input, output, and main control parameters; 2) the availability and the use of multiple datasets; and 3) a sound statistical assessment of methods' performance. The proposed framework is instantiated in the form of a Python package named pyVHR (short for Python tool for Virtual Heart Rate), which is made freely available on GitHub (github.com/phuselab/pyVHR). Here, to substantiate our approach, we evaluate eight well-known rPPG methods, through extensive experiments across five public video datasets, and subsequent nonparametric statistical analysis. Surprisingly, performances achieved by the four best methods, namely POS, CHROM, PCA and SSR, are not significantly different from a statistical standpoint higighting the importance of evaluate the different approaches with a statistical assessment.


2020 - Anomaly detection from log files using unsupervised deep learning [Relazione in Atti di Convegno]
Bursic, S.; Cuculo, V.; D'Amelio, A.
abstract

Computer systems have grown in complexity to the point where manual inspection of system behaviour for purposes of malfunction detection have become unfeasible. As these systems output voluminous logs of their activity, machine led analysis of them is a growing need with already several existing solutions. These largely depend on having hand-crafted features, require raw log preprocessing and feature extraction or use supervised learning necessitating having a labeled log dataset not always easily procurable. We propose a two part deep autoencoder model with LSTM units that requires no hand-crafted features, no preprocessing of data as it works on raw text and outputs an anomaly score for each log entry. This anomaly score represents the rarity of a log event both in terms of its content and temporal context. The model was trained and tested on a dataset of HDFS logs containing 2 million raw lines of which half was used for training and half for testing. While this model cannot match the performance of a supervised binary classifier, it could be a useful tool as a coarse filter for manual inspection of log files where a labeled dataset is unavailable.


2020 - Gender recognition in the wild with small sample size : A dictionary learning approach [Relazione in Atti di Convegno]
D'Amelio, A.; Cuculo, V.; Bursic, S.
abstract

In this work we address the problem of gender recognition from facial images acquired in the wild. This problem is particularly difficult due to the presence of variations in pose, ethnicity, age and image quality. Moreover, we consider the special case in which only a small sample size is available for the training phase. We rely on a feature representation obtained from the well known VGG-Face Deep Convolutional Neural Network (DCNN) and exploit the effectiveness of a sparse-driven sub-dictionary learning strategy which has proven to be able to represent both local and global characteristics of the train and probe faces. Results on the publicly available LFW dataset are provided in order to demonstrate the effectiveness of the proposed method.


2020 - How to look next? A data-driven approach for scanpath prediction [Relazione in Atti di Convegno]
Boccignone, G.; Cuculo, V.; D'Amelio, A.
abstract

By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour.


2020 - On Gaze Deployment to Audio-Visual Cues of Social Interactions [Articolo su rivista]
Boccignone, G.; Cuculo, V.; D'Amelio, A.; Grossi, G.; Lanzarotti, R.
abstract

Attention supports our urge to forage on social cues. Under certain circumstances, we spend the majority of time scrutinising people, markedly their eyes and faces, and spotting persons that are talking. To account for such behaviour, this article develops a computational model for the deployment of gaze within a multimodal landscape, namely a conversational scene. Gaze dynamics is derived in a principled way by reformulating attention deployment as a stochastic foraging problem. Model simulation experiments on a publicly available dataset of eye-tracked subjects are presented. Results show that the simulated scan paths exhibit similar trends of eye movements of human observers watching and listening to conversational clips in a free-viewing condition


2019 - Give Ear to My Face: Modelling Multimodal Attention to Social Interactions [Relazione in Atti di Convegno]
Boccignone, Giuseppe; Cuculo, Vittorio; D’Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
abstract

We address the deployment of perceptual attention to social interactions as displayed in conversational clips, when relying on multimodal information (audio and video). A probabilistic modelling framework is proposed that goes beyond the classic saliency paradigm while integrating multiple information cues. Attentional allocation is determined not just by stimulus-driven selection but, importantly, by social value as modulating the selection history of relevant multimodal items. Thus, the construction of attentional priority is the result of a sampling procedure conditioned on the potential value dynamics of socially relevant objects emerging moment to moment within the scene. Preliminary experiments on a publicly available dataset are presented.


2019 - OpenFACS: An Open Source FACS-Based 3D Face Animation System [Relazione in Atti di Convegno]
Cuculo, V.; D'Amelio, A.
abstract

We present OpenFACS, an open source FACS-based 3D face animation system. OpenFACS is a software that allows the simulation of realistic facial expressions through the manipulation of specific action units as defined in the Facial Action Coding System. OpenFACS has been developed together with an API which is suitable to generate real-time dynamic facial expressions for a three-dimensional character. It can be easily embedded in existing systems without any prior experience in computer graphics. In this note, we discuss the adopted face model, the implemented architecture and provide additional details of model dynamics. Finally, a validation experiment is proposed to assess the effectiveness of the model.


2019 - Predictive Sampling of Facial Expression Dynamics Driven by a Latent Action Space [Relazione in Atti di Convegno]
Boccignone, G.; Bodini, M.; Cuculo, V.; Grossi, G.
abstract

We present a probabilistic generative model for tracking by prediction the dynamics of affective spacial expressions in videos. The model relies on Bayesian filter sampling of facial landmarks conditioned on motor action parameter dynamics; namely, trajectories shaped by an autoregressive Gaussian Process Latent Variable state-space. The analysis-by-synthesis approach at the heart of the model allows for both inference and generation of affective expressions. Robustness of the method to occlusions and degradation of video quality has been assessed on a publicly available dataset.


2019 - Problems with Saliency Maps [Relazione in Atti di Convegno]
Boccignone, Giuseppe; Cuculo, Vittorio; D’Amelio, Alessandro
abstract

Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures.


2019 - Robust single-sample face recognition by sparsity-driven sub-dictionary learning using deep features [Articolo su rivista]
Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella; Lin, Jianyi
abstract

Face recognition using a single reference image per subject is challenging, above all when referring to a large gallery of subjects. Furthermore, the problem hardness seriously increases when the images are acquired in unconstrained conditions. In this paper we address the challenging Single Sample Per Person (SSPP) problem considering large datasets of images acquired in the wild, thus possibly featuring illumination, pose, face expression, partial occlusions, and low-resolution hurdles. The proposed technique alternates a sparse dictionary learning technique based on the method of optimal direction and the iterative ℓ 0 -norm minimization algorithm called k-LIMAPS. It works on robust deep-learned features, provided that the image variability is extended by standard augmentation techniques. Experiments show the effectiveness of our method against the hardness introduced above: first, we report extensive experiments on the unconstrained LFW dataset when referring to large galleries up to 1680 subjects; second, we present experiments on very low-resolution test images up to 8 × 8 pixels; third, tests on the AR dataset are analyzed against specific disguises such as partial occlusions, facial expressions, and illumination problems. In all the three scenarios our method outperforms the state-of-the-art approaches adopting similar configurations.


2019 - Social traits from stochastic paths in the core affect space [Relazione in Atti di Convegno]
Boccignone, Giuseppe; Cuculo, Vittorio; D'Amelio, Alessandro; Lanzarotti, Raffaella
abstract

We discuss a preliminary investigation on the feasibility of inferring traits of social participation from the observable behaviour of individuals involved in dyadic interactions. Trait inference relies on a stochastic model of the dynamics occurring in the individual core affect state-space. Results obtained on a publicly available interaction dataset are presented and examined.


2019 - Worldly eyes on video: Learnt vs. reactive deployment of attention to dynamic stimuli [Relazione in Atti di Convegno]
Cuculo, V.; D'Amelio, A.; Grossi, G.; Lanzarotti, R.
abstract

Computational visual attention is a hot topic in computer vision. However, most efforts are devoted to model saliency, whilst the actual eye guidance problem, which brings into play the sequence of gaze shifts characterising overt attention, is overlooked. Further, in those cases where the generation of gaze behaviour is considered, stimuli of interest are by and large static (still images) rather than dynamic ones (videos). Under such circumstances, the work described in this note has a twofold aim: (i) addressing the problem of estimating and generating visual scan paths, that is the sequences of gaze shifts over videos; (ii) investigating the effectiveness in scan path generation offered by features dynamically learned on the base of human observers attention dynamics as opposed to bottom-up derived features. To such end a probabilistic model is proposed. By using a publicly available dataset, our approach is compared against a model of scan path simulation that does not rely on a learning step.


2018 - Deep construction of an affective latent space via multimodal enactment [Articolo su rivista]
Boccignone, Giuseppe; Conte, Donatello; Cuculo, Vittorio; D'Amelio, Alessandro; Grossi, Giuliano; Lanzarotti, Raffaella
abstract

We draw on a simulationist approach to the analysis of facially displayed emotions, e.g., in the course of a face-to-face interaction between an expresser and an observer. At the heart of such perspective lies the enactment of the perceived emotion in the observer. We propose a novel probabilistic framework based on a deep latent representation of a continuous affect space, which can be exploited for both the estimation and the enactment of affective states in a multimodal space (visible facial expressions and physiological signals). The rationale behind the approach lies in the large body of evidence from affective neuroscience showing that when we observe emotional facial expressions, we react with congruent facial mimicry. Further, in more complex situations, affect understanding is likely to rely on a comprehensive representation grounding the reconstruction of the state of the body associated with the displayed emotion. We show that our approach can address such problems in a unified and principled perspective, thus avoiding ad hoc heuristics while minimizing learning efforts.


2018 - Personality Gaze Patterns Unveiled via Automatic Relevance Determination [Relazione in Atti di Convegno]
Cuculo, Vittorio; D’Amelio, Alessandro; Lanzarotti, Raffaella; Boccignone, Giuseppe
abstract

Understanding human gaze behaviour in social context, as along a face-to-face interaction, remains an open research issue which is strictly related to personality traits. In the effort to bridge the gap between available data and models, typical approaches focus on the analysis of spatial and temporal preferences of gaze deployment over specific regions of the observed face, while adopting classic statistical methods. In this note we propose a different analysis perspective based on novel data-mining techniques and a probabilistic classification method that relies on Gaussian Processes exploiting Automatic Relevance Determination (ARD) kernel. Preliminary results obtained on a publicly available dataset are provided.


2017 - A Note on Modelling a Somatic Motor Space for Affective Facial Expressions [Relazione in Atti di Convegno]
Alessandro, D'Amelio; Cuculo, V.; Grossi, G.; Lanzarotti, R.; Lin, J.
abstract

We discuss modelling issues related to the design of a somatic facial motor space. The variants proposed are conceived to be part of a larger system for dealing with simulation-based face emotion analysis along dual interactions.


2017 - AMHUSE: A Multimodal dataset for HUmour SEnsing [Relazione in Atti di Convegno]
Boccignone, G.; Donatello, Conte; Cuculo, V.; Lanzarotti, R.
abstract

We present AMHUSE (A Multimodal dataset for HUmour SEnsing) along with a novel web-based annotation tool named DANTE (Di- mensional ANnotation Tool for Emotions). The dataset is the result of an experiment concerning amusement elicitation, involving 36 subjects in order to record the reactions in presence of 3 amusing and 1 neutral video stimuli. Gathered data include RGB video and depth sequences along with physiological responses (electrodermal activity, blood volume pulse, temperature). The videos were later annotated by 4 experts in terms of valence and arousal continuous dimensions. Both the dataset and the annotation tool are made publicly available for research purposes.


2017 - Taking the Hidden Route: Deep Mapping of Affect via 3D Neural Networks [Relazione in Atti di Convegno]
Ceruti, C.; Cuculo, V.; D’Amelio, A.; Grossi, G.; Lanzarotti, R.
abstract

In this note we address the problem of providing a fast, automatic, and coarse processing of the early mapping from emotional facial expression stimuli to the basic continuous dimensions of the core affect representation of emotions, namely valence and arousal. Taking stock of results in affective neuroscience, such mapping is assumed to be the earliest stage of a complex unfolding of processes that eventually entail detailed perception and emotional reaction involving the proper body. Thus, differently from the vast majority of approaches in the field of affective facial expression processing, we assume and design such a feedforward mechanism as a preliminary step to provide a suitable prior to the subsequent core affect dynamics, in which recognition is actually grounded. To this end we conceive and exploit a 3D spatiotemporal deep network as a suitable architecture to instantiate such early component, and experiments on the MAHNOB dataset prove the rationality of this approach.


2017 - Virtual EMG via Facial Video Analysis [Relazione in Atti di Convegno]
Boccignone, G.; Cuculo, V.; Grossi, G.; Lanzarotti, R.; Migliaccio, R.
abstract

In this note, we address the problem of simulating electromyographic signals arising from muscles involved in facial expressions - markedly those conveying affective information -, by relying solely on facial landmarks detected on video sequences. We propose a method that uses the framework of Gaussian Process regression to predict the facial electromyographic signal from videos where people display non-posed affective expressions. To such end, experiments have been conducted on the OPEN EmoRec II multimodal corpus.


2015 - The Color of Smiling : Computational Synaesthesia of Facial Expressions [Relazione in Atti di Convegno]
Cuculo, V.; Lanzarotti, R.; Boccignone, G.
abstract

This note gives a preliminary account of the transcoding or rechanneling problem between different stimuli as it is of interest for the natural interaction or affective computing fields. By the consideration of a simple example, namely the color response of an affective lamp to a sensed facial expression, we frame the problem within an information-theoretic perspective. A full justification in terms of the Information Bottleneck principle promotes a latent affective space, hitherto surmised as an appealing and intuitive solution, as a suitable mediator between the different stimuli.


2014 - Using sparse coding for landmark localization in facial expressions [Relazione in Atti di Convegno]
Cuculo, V.; Lanzarotti, R.; Boccignone, G.
abstract

In this article we address the issue of adopting a local sparse coding representation (Histogram of Sparse Codes), in a part-based framework for inferring the locations of facial landmarks. The rationale behind this approach is that unsupervised learning of sparse code dictionaries from face data can be an effective approach to cope with such a challenging problem. Results obtained on the CMU Multi-PIE Face dataset are presented providing support for this approach.