Nuova ricerca

ALESSANDRO SIMONI

Dottorando
Dipartimento di Ingegneria "Enzo Ferrari"


Home | Curriculum(pdf) |


Pubblicazioni

2022 - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps [Articolo su rivista]
Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto
abstract

Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications, such as the detection of unsafe situations or the study of mutual interactions for statistical and social purposes. In this paper, we propose a non-invasive and light-invariant framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera. The method can be applied to any robot without requiring hardware access to the internal states. We introduce a novel representation of the predicted pose, namely Semi-Perspective Decoupled Heatmaps (SPDH), to accurately compute 3D joint locations in world coordinates adapting efficient deep networks designed for the 2D Human Pose Estimation. The proposed approach, which takes as input a depth representation based on XYZ coordinates, can be trained on synthetic depth data and applied to real-world settings without the need for domain adaptation techniques. To this end, we present the SimBa dataset, based on both synthetic and real depth images, and use it for the experimental evaluation. Results show that the proposed approach, made of a specific depth map representation and the SPDH, overcomes the current state of the art.


2022 - Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data [Relazione in Atti di Convegno]
D'Eusanio, A.; Pini, S.; Borghi, G.; Simoni, A.; Vezzani, R.
abstract

The effective and reliable detection and classification of dynamic hand gestures is a key element for building Natural User Interfaces, systems that allow the users to interact using free movements of their body instead of traditional mechanical tools. However, methods that temporally segment and classify dynamic gestures usually rely on a great amount of labeled data, including annotations regarding the class and the temporal segmentation of each gesture. In this paper, we propose an unsupervised approach to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence. The input data is represented by the 3D position of the hand joints, along with their speed and acceleration, collected through a Leap Motion device. Experimental results show a promising accuracy on both the detection and the classification task and that only limited computational power is required, confirming that the proposed method can be applied in real-world applications.


2021 - Extracting accurate long-term behavior changes from a large pig dataset [Relazione in Atti di Convegno]
Bergamini, L.; Pini, S.; Simoni, A.; Vezzani, R.; Calderara, S.; Eath, R. B. D.; Fisher, R. B.
abstract

Visual observation of uncontrolled real-world behavior leads to noisy observations, complicated by occlusions, ambiguity, variable motion rates, detection and tracking errors, slow transitions between behaviors, etc. We show in this paper that reliable estimates of long-term trends can be extracted given enough data, even though estimates from individual frames may be noisy. We validate this concept using a new public dataset of approximately 20+ million daytime pig observations over 6 weeks of their main growth stage, and we provide annotations for various tasks including 5 individual behaviors. Our pipeline chains detection, tracking and behavior classification combining deep and shallow computer vision techniques. While individual detections may be noisy, we show that long-term behavior changes can still be extracted reliably, and we validate these results qualitatively on the full dataset. Eventually, starting from raw RGB video data we are able to both tell what pigs main daily activities are, and how these change through time.


2021 - Improving Car Model Classification through Vehicle Keypoint Localization [Relazione in Atti di Convegno]
Simoni, Alessandro; D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Vezzani, Roberto
abstract

In this paper, we present a novel multi-task framework which aims to improve the performance of car model classification leveraging visual features and pose information extracted from single RGB images. In particular, we merge the visual features obtained through an image classification network and the features computed by a model able to predict the pose in terms of 2D car keypoints. We show how this approach considerably improves the performance on the model classification task testing our framework on a subset of the Pascal3D dataset containing the car classes. Finally, we conduct an ablation study to demonstrate the performance improvement obtained with respect to a single visual classifier network.


2021 - Multi-Category Mesh Reconstruction From Image Collections [Relazione in Atti di Convegno]
Simoni, Alessandro; Pini, Stefano; Vezzani, Roberto; Cucchiara, Rita
abstract

Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image. However, current methods are trained on image collections of a single category in order to exploit specific priors, and they often make use of category-specific 3D templates. In this paper, we present an alternative approach that infers the textured mesh of objects combining a series of deformable 3D models and a set of instance-specific deformation, pose, and texture. Differently from previous works, our method is trained with images of multiple object categories using only foreground masks and rough camera poses as supervision. Without specific 3D templates, the framework learns category-level models which are deformed to recover the 3D shape of the depicted object. The instance-specific deformations are predicted independently for each vertex of the learned 3D mesh, enabling the dynamic subdivision of the mesh during the training process. Experiments show that the proposed framework can distinguish between different object categories and learn category-specific shape priors in an unsupervised manner. Predicted shapes are smooth and can leverage from multiple steps of subdivision during the training process, obtaining comparable or state-of-the-art results on two public datasets. Models and code are publicly released.


2021 - SHREC 2021: Skeleton-based hand gesture recognition in the wild [Articolo su rivista]
Caputo, Ariel; Giacchetti, Andrea; Soso, Simone; Pintani, Deborah; D'Eusanio, Andrea; Pini, Stefano; Borghi, Guido; Simoni, Alessandro; Vezzani, Roberto; Cucchiara, Rita; Ranieri, Andrea; Giannini, Franca; Lupinetti, Katia; Monti, Marina; Maghoumi, Mehran; LaViola Jr, Joseph; Le, Minh-Quan; Nguyen, Hai-Dang; Tran, Minh-Triet
abstract

This paper presents the results of the Eurographics 2019 SHape Retrieval Contest track on online gesture recognition. The goal of this contest was to test state-of-the-art methods that can be used to online detect command gestures from hands' movements tracking on a basic benchmark where simple gestures are performed interleaving them with other actions. Unlike previous contests and benchmarks on trajectory-based gesture recognition, we proposed an online gesture recognition task, not providing pre-segmented gestures, but asking the participants to find gestures within recorded trajectories. The results submitted by the participants show that an online detection and recognition of sets of very simple gestures from 3D trajectories captured with a cheap sensor can be effectively performed. The best methods proposed could be, therefore, directly exploited to design effective gesture-based interfaces to be used in different contexts, from Virtual and Mixed reality applications to the remote control of home devices.


2020 - A Transformer-Based Network for Dynamic Hand Gesture Recognition [Relazione in Atti di Convegno]
D'Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
abstract

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.


2020 - Future Urban Scenes Generation Through Vehicles Synthesis [Relazione in Atti di Convegno]
Simoni, Alessandro; Bergamini, Luca; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita
abstract

In this work we propose a deep learning pipeline to predict the visual future appearance of an urban scene. Despite recent advances, generating the entire scene in an end-to-end fashion is still far from being achieved. Instead, here we follow a two stages approach, where interpretable information is included in the loop and each actor is modelled independently. We leverage a per-object novel view synthesis paradigm; i.e. generating a synthetic representation of an object undergoing a geometrical roto-translation in the 3D space. Our model can be easily conditioned with constraints (e.g. input trajectories) provided by state-of-the-art tracking methods or by the user itself. This allows us to generate a set of diverse realistic futures starting from the same input in a multi-modal fashion. We visually and quantitatively show the superiority of this approach over traditional end-to-end scene-generation methods on CityFlow, a challenging real world dataset.


2020 - Multimodal Hand Gesture Classification for the Human-Car Interaction [Articolo su rivista]
D'Eusanio, Andrea; Simoni, Alessandro; Pini, Stefano; Borghi, Guido; Vezzani, Roberto; Cucchiara, Rita
abstract