Nuova ricerca

MATTEO TOMEI


Home | Curriculum(pdf) |


Pubblicazioni

2023 - Superpixel Positional Encoding to Improve ViT-based Semantic Segmentation Models [Relazione in Atti di Convegno]
Amoroso, Roberto; Tomei, Matteo; Baraldi, Lorenzo; Cucchiara, Rita
abstract


2022 - Matching Faces and Attributes Between the Artistic and the Real Domain: the PersonArt Approach [Articolo su rivista]
Cornia, Marcella; Tomei, Matteo; Baraldi, Lorenzo; Cucchiara, Rita
abstract


2021 - A Computational Approach for Progressive Architecture Shrinkage in Action Recognition [Articolo su rivista]
Tomei, Matteo; Baraldi, Lorenzo; Fiameni, Giuseppe; Bronzin, Simone; Cucchiara, Rita
abstract


2021 - Estimating (and fixing) the Effect of Face Obfuscation in Video Recognition [Relazione in Atti di Convegno]
Tomei, Matteo; Baraldi, Lorenzo; Bronzin, Simone; Cucchiara, Rita
abstract


2021 - RMS-Net: Regression and Masking for Soccer Event Spotting [Relazione in Atti di Convegno]
Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita
abstract


2021 - Video action detection by learning graph-based spatio-temporal interactions [Articolo su rivista]
Tomei, Matteo; Baraldi, Lorenzo; Calderara, Simone; Bronzin, Simone; Cucchiara, Rita
abstract

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.


2019 - Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation [Relazione in Atti di Convegno]
Tomei, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
abstract

The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain. This is partially due to the small amount of annotated artistic data, which is not even comparable to that of natural images captured by cameras. In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. Our architecture can generate natural images by retrieving and learning details from real photos through a similarity matching strategy which leverages a weakly-supervised semantic understanding of the scene. Experimental results show that the proposed technique leads to increased realism and to a reduction in domain shift, which improves the performance of pre-trained architectures for classification, detection, and segmentation. Code is publicly available at: https://github.com/aimagelab/art2real.


2019 - Image-to-Image Translation to Unfold the Reality of Artworks: an Empirical Analysis [Relazione in Atti di Convegno]
Tomei, Matteo; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
abstract

State-of-the-art Computer Vision pipelines show poor performances on artworks and data coming from the artistic domain, thus limiting the applicability of current architectures to the automatic understanding of the cultural heritage. This is mainly due to the difference in texture and low-level feature distribution between artistic and real images, on which state-of-the-art approaches are usually trained. To enhance the applicability of pre-trained architectures on artistic data, we have recently proposed an unpaired domain translation approach which can translate artworks to photo-realistic visualizations. Our approach leverages semantically-aware memory banks of real patches, which are used to drive the generation of the translated image while improving its realism. In this paper, we provide additional analyses and experimental results which demonstrate the effectiveness of our approach. In particular, we evaluate the quality of generated results in the case of the translation of landscapes, portraits and of paintings coming from four different styles using automatic distance metrics. Also, we analyze the response of pre-trained architecture for classification, detection and segmentation both in terms of feature distribution and entropy of prediction, and show that our approach effectively reduces the domain shift of paintings. As an additional contribution, we also provide a qualitative analysis of the reduction of the domain shift for detection, segmentation and image captioning.


2019 - What was Monet seeing while painting? Translating artworks to photo-realistic images [Relazione in Atti di Convegno]
Tomei, Matteo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita
abstract

State of the art Computer Vision techniques exploit the availability of large-scale datasets, most of which consist of images captured from the world as it is. This brings to an incompatibility between such methods and digital data from the artistic domain, on which current techniques under-perform. A possible solution is to reduce the domain shift at the pixel level, thus translating artistic images to realistic copies. In this paper, we present a model capable of translating paintings to photo-realistic images, trained without paired examples. The idea is to enforce a patch level similarity between real and generated images, aiming to reproduce photo-realistic details from a memory bank of real images. This is subsequently adopted in the context of an unpaired image-to-image translation framework, mapping each image from one distribution to a new one belonging to the other distribution. Qualitative and quantitative results are presented on Monet, Cezanne and Van Gogh paintings translation tasks, showing that our approach increases the realism of generated images with respect to the CycleGAN approach.