Nuova ricerca

DAVIDE CAFFAGNI

Dottorando
Dipartimento di Ingegneria "Enzo Ferrari"


Home |


Pubblicazioni

2024 - The Revolution of Multimodal Large Language Models: A Survey [Relazione in Atti di Convegno]
Caffagni, Davide; Cocchi, Federico; Barsellotti, Luca; Moratelli, Nicholas; Sarto, Sara; Baraldi, Lorenzo; Baraldi, Lorenzo; Cornia, Marcella; Cucchiara, Rita
abstract


2024 - Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs [Relazione in Atti di Convegno]
Caffagni, Davide; Cocchi, Federico; Moratelli, Nicholas; Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
abstract


2023 - SynthCap: Augmenting Transformers with Synthetic Data for Image Captioning [Relazione in Atti di Convegno]
Caffagni, Davide; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
abstract

Image captioning is a challenging task that combines Computer Vision and Natural Language Processing to generate descriptive and accurate textual descriptions for input images. Research efforts in this field mainly focus on developing novel architectural components to extend image captioning models and using large-scale image-text datasets crawled from the web to boost final performance. In this work, we explore an alternative to web-crawled data and augment the training dataset with synthetic images generated by a latent diffusion model. In particular, we propose a simple yet effective synthetic data augmentation framework that is capable of significantly improving the quality of captions generated by a standard Transformer-based model, leading to competitive results on the COCO dataset.