Nuova ricerca


Assegnista di ricerca presso: Centro Interdipartimentale di Ricerca Artificial Intelligence Research and Innovation center (AIRI)

Home |


2020 - Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [Relazione in Atti di Convegno]
Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Alletto, Stefano; Cucchiara, Rita

In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene.

2018 - Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face [Relazione in Atti di Convegno]
Fabbri, Matteo; Borghi, Guido; Lanzi, Fabio; Vezzani, Roberto; Calderara, Simone; Cucchiara, Rita

Can faces acquired by low-cost depth sensors be useful to see some characteristic details of the faces? Typically the answer is not. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available for darkness of difficult luminance conditions. Experimental results are very promising and are as far as better than previous proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

2018 - Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World [Relazione in Atti di Convegno]
Fabbri, Matteo; Lanzi, Fabio; Calderara, Simone; Palazzi, Andrea; Vezzani, Roberto; Cucchiara, Rita

Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules.