Nuova ricerca


Home |


2023 - Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates [Articolo su rivista]
Moratelli, Nicholas; Barraco, Manuele; Morelli, Davide; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning.

2023 - Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation [Relazione in Atti di Convegno]
Sarto, Sara; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language models. In this paper, we propose a new recipe for a contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S), that in a novel way unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos, outperforming existing reference-based metrics like CIDEr and SPICE and reference-free metrics like CLIP-Score. Finally, we test the system-level correlation of the proposed metric when considering popular image captioning approaches, and assess the impact of employing different cross-modal features. We publicly release our source code and trained models.

2023 - SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning [Relazione in Atti di Convegno]
Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual descriptions for input images. Research efforts in this field mainly focus on developing novel architectural components to extend image captioning models and using large-scale image-text datasets crawled from the web to boost final performance. In this work, we explore an alternative to web-crawled data and augment the training dataset with synthetic images generated by a latent diffusion model. In particular, we propose a simple yet effective synthetic data augmentation framework that is capable of significantly improving the quality of captions generated by a standard Transformer-based model, leading to competitive results on the COCO dataset.

2023 - With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning [Relazione in Atti di Convegno]
Barraco, Manuele; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

2022 - CaMEL: Mean Teacher Learning for Image Captioning [Relazione in Atti di Convegno]
Barraco, Manuele; Stefanini, Matteo; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

2022 - The Unreasonable Effectiveness of CLIP features for Image Captioning: an Experimental Analysis [Relazione in Atti di Convegno]
Barraco, Manuele; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita

2021 - Forecasting Parking Lots Availability: Analysis from a Real-World Deployment [Relazione in Atti di Convegno]
Barraco, M.; Bicocchi, N.; Mamei, M.; Zambonelli, F.

Smart parking technologies are rapidly being deployed in cities and public/private places around the world for the sake of enabling users to know in real time the occupancy of parking lots and offer applications and services on top of that information. In this work, we detail a real-world deployment of a full-stack smart parking system based on industrial-grade components. We also propose innovative forecasting models (based on CNN-LSTM) to analyze and predict parking occupancy ahead of time. Experimental results show that our model can predict the number of available parking lots in a ±3% range with about 80% accuracy over the next 1-8 hours. Finally, we describe novel applications and services that can be developed given such forecasts and associated analysis.