Nuova ricerca


Ricercatore t.d. art. 24 c. 3 lett. A
Dipartimento di Ingegneria "Enzo Ferrari"

Home | Didattica |


2023 - Enhancing PFI Prediction with GDS-MIL: A Graph-based Dual Stream MIL Approach [Relazione in Atti di Convegno]
Bontempo, Gianpaolo; Bartolini, Nicola; Lovino, Marta; Bolelli, Federico; Anni, Virtanen; Ficarra, Elisa

Whole-Slide Images (WSI) are emerging as a promising resource for studying biological tissues, demonstrating a great potential in aiding cancer diagnosis and improving patient treatment. However, the manual pixel-level annotation of WSIs is extremely time-consuming and practically unfeasible in real-world scenarios. Multi-Instance Learning (MIL) have gained attention as a weakly supervised approach able to address lack of annotation tasks. MIL models aggregate patches (e.g., cropping of a WSI) into bag-level representations (e.g., WSI label), but neglect spatial information of the WSIs, crucial for histological analysis. In the High-Grade Serous Ovarian Cancer (HGSOC) context, spatial information is essential to predict a prognosis indicator (the Platinum-Free Interval, PFI) from WSIs. Such a prediction would bring highly valuable insights both for patient treatment and prognosis of chemotherapy resistance. Indeed, NeoAdjuvant ChemoTherapy (NACT) induces changes in tumor tissue morphology and composition, making the prediction of PFI from WSIs extremely challenging. In this paper, we propose GDS-MIL, a method that integrates a state-of-the-art MIL model with a Graph ATtention layer (GAT in short) to inject a local context into each instance before MIL aggregation. Our approach achieves a significant improvement in accuracy on the ``Ome18'' PFI dataset. In summary, this paper presents a novel solution for enhancing PFI prediction in HGSOC, with the potential of significantly improving treatment decisions and patient outcomes.

2023 - Predicting gene and protein expression levels from DNA and protein sequences with Perceiver [Articolo su rivista]
Stefanini, Matteo; Lovino, Marta; Cucchiara, Rita; Ficarra, Elisa

Background and objective: The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks. The State-of-the-art models (e.g., Xpresso and Basenjii) predict mRNA levels exploiting Convolutional (CNN) or Long Short Term Memory (LSTM) Networks. However, CNN prediction depends on convolutional kernel size, and LSTM suffers from capturing long-range dependencies in the sequence. Concerning the prediction of protein levels, as far as we know, there is no model for predicting protein levels by exploiting the gene or protein sequences. Methods: Here, we exploit a new model type (called Perceiver) for mRNA and protein level prediction, exploiting a Transformer-based architecture with an attention module to attend to long-range interactions in the sequences. In addition, the Perceiver model overcomes the quadratic complexity of the standard Transformer architectures. This work's contributions are 1. DNAPerceiver model to predict mRNA levels from the sequence upstream and downstream of the TSS; 2. ProteinPerceiver model to predict protein levels from the protein sequence; 3. Protein&DNAPerceiver model to predict protein levels from TSS and protein sequences. Results: The models are evaluated on cell lines, mice, glioblastoma, and lung cancer tissues. The results show the effectiveness of the Perceiver-type models in predicting mRNA and protein levels. Conclusions: This paper presents a Perceiver architecture for mRNA and protein level prediction. In the future, inserting regulatory and epigenetic information into the model could improve mRNA and protein level predictions. The source code is freely available at

2022 - A survey on data integration for multi-omics sample clustering [Articolo su rivista]
Lovino, Marta; Randazzo, Vincenzo; Ciravegna, Gabriele; Barbiero, Pietro; Ficarra, Elisa; Cirrincione, Giansalvo

2022 - FusionFlow: An Integrated System Workflow for Gene Fusion Detection in Genomic Samples [Capitolo/Saggio]
Citarrella, F.; Bontempo, G.; Lovino, M.; Ficarra, E.

2022 - FusionFlow: an integrated system workflow for gene fusion detection in genomic samples [Relazione in Atti di Convegno]
Citarrella, Francesca; Bontempo, Gianpaolo; Lovino, Marta; Ficarra, Elisa

2022 - Identifying the oncogenic potential of gene fusions exploiting miRNAs [Articolo su rivista]
Lovino, M.; Montemurro, M.; Barrese, V. S.; Ficarra, E.

It is estimated that oncogenic gene fusions cause about 20% of human cancer morbidity. Identifying potentially oncogenic gene fusions may improve affected patients’ diagnosis and treatment. Previous approaches to this issue included exploiting specific gene-related information, such as gene function and regulation. Here we propose a model that profits from the previous findings and includes the microRNAs in the oncogenic assessment. We present ChimerDriver, a tool to classify gene fusions as oncogenic or not oncogenic. ChimerDriver is based on a specifically designed neural network and trained on genetic and post-transcriptional information to obtain a reliable classification. The designed neural network integrates information related to transcription factors, gene ontologies, microRNAs and other detailed information related to the functions of the genes involved in the fusion and the gene fusion structure. As a result, the performances on the test set reached 0.83 f1-score and 96% recall. The comparison with state-of-the-art tools returned comparable or higher results. Moreover, ChimerDriver performed well in a real-world case where 21 out of 24 validated gene fusion samples were detected by the gene fusion detection tool Starfusion. ChimerDriver integrates transcriptional and post-transcriptional information in an ad-hoc designed neural network to effectively discriminate oncogenic gene fusions from passenger ones. ChimerDriver source code is freely available at

2022 - Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers [Articolo su rivista]
Pipoli, Vittorio; Cappelli, Mattia; Palladini, Alessandro; Peluso, Carlo; Lovino, Marta; Ficarra, Elisa

Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model.Methods: This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem.Results: Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R-2 evaluation metric compared to 0.74 of Xpresso.Conclusion: The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power. (C) 2022 Elsevier B.V. All rights reserved.

2022 - SARS-CoV-2 variants classification and characterization [Relazione in Atti di Convegno]
Borgato, S.; Bottino, M.; Lovino, M.; Ficarra, E.

As of late 2019, the SARS-CoV-2 virus has spread globally, giving several variants over time. These variants, unfortunately, differ from the original sequence identified in Wuhan, thus risking compromising the efficacy of the vaccines developed. Some software has been released to recognize currently known and newly spread variants. However, some of these tools are not entirely automatic. Some others, instead, do not return a detailed characterization of all the mutations in the samples. Indeed, such characterization can be helpful for biologists to understand the variability between samples. This paper presents a Machine Learning (ML) approach to identifying existing and new variants completely automatically. In addition, a detailed table showing all the alterations and mutations found in the samples is provided in output to the user. SARS-CoV-2 sequences are obtained from the GISAID database, and a list of features is custom designed (e.g., number of mutations in each gene of the virus) to train the algorithm. The recognition of existing variants is performed through a Random Forest classifier while identifying newly spread variants is accomplished by the DBSCAN algorithm. Both Random Forest and DBSCAN techniques demonstrated high precision on a new variant that arose during the drafting of this paper (used only in the testing phase of the algorithm). Therefore, researchers will significantly benefit from the proposed algorithm and the detailed output with the main alterations of the samples. Data availability: the tool is freely available at

2021 - Circular RNA profiling distinguishes medulloblastoma groups and shows aberrant RMST overexpression in WNT medulloblastoma [Articolo su rivista]
Rickert, Daniel; Bartl, Jasmin; Picard, Daniel; Bernardi, Flavia; Qin, Nan; Lovino, Marta; Puget, Stéphanie; Meyer, Frauke-Dorothee; Mahoungou Koumba, Idriss; Beez, Thomas; Varlet, Pascale; Dufour, Christelle; Fischer, Ute; Borkhardt, Arndt; Reifenberger, Guido; Ayrault, Olivier; Remke, Marc

2020 - DEEPrior: a deep learning tool for the prioritization of gene fusions [Articolo su rivista]
Lovino, Marta; Ciaburri, Maria Serena; Urgese, Gianvito; Di Cataldo, Santa; Ficarra, Elisa

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation: Both DEEPrior and the protein fusions dataset are freely available from GitHub at ( The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information: Supplementary data are available at Bioinformatics online.

2020 - Multi-omics Classification on Kidney Samples Exploiting Uncertainty-Aware Models [Relazione in Atti di Convegno]
Lovino, Marta; Bontempo, Gianpaolo; Cirrincione, Giansalvo; Ficarra, Elisa

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One of the most common approaches consists of creating a classifier for each omic and subsequently making a consensus among the classifiers that assign to each sample the most voted class among the outputs on the individual omics. However, this approach does not consider the confidence in the prediction ignoring that biological information coming from a certain omic may be more reliable than others. Therefore, it is here proposed a method consisting of a tree-based multi-layer perceptron (MLP), which estimates the class-membership probabilities for classification. In this way, it is not only possible to give relevance to all the omics, but also to label as Unknown those samples for which the classifier is uncertain in its prediction. The method was applied to a dataset composed of 909 kidney cancer samples for which these three omics were available: gene expression (mRNA), microRNA expression (miRNA), and methylation profiles (meth) data. The method is valid also for other tissues and on other omics (e.g. proteomics, copy number alterations data, single nucleotide polymorphism data). The accuracy and weighted average f1-score of the model are both higher than 95%. This tool can therefore be particularly useful in clinical practice, allowing physicians to focus on the most interesting and challenging samples.

2020 - Predicting the oncogenic potential of gene fusions using convolutional neural networks [Relazione in Atti di Convegno]
Lovino, Marta; Gianvito, Urgese; Enrico, Macii; Santa Di Cataldo, ; Ficarra, Elisa

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer development. To this date, the available approaches mostly rely on protein domain analysis to provide a probability score explaining the oncogenic potential of a gene fusion. In this paper, a Convolutional Neural Network model is proposed to discriminate gene fusions into oncogenic or non-oncogenic, exploiting only the protein sequence without protein domain information. Our proposed model obtained accuracy value close to 90% on a dataset of fused sequences.

2020 - Unsupervised Multi-Omic Data Fusion: the Neural Graph Learning Network [Relazione in Atti di Convegno]
Barbiero, Pietro; Lovino, Marta; Siviero, Mattia; Ciravegna, Gabriele; Randazzo, Vincenzo; Ficarra, Elisa; Cirrincione, Giansalvo

In recent years, due to the high availability of omic data, data-driven biology has greatly expanded. However, the analysis of different data sources is still an open challenge. A few multi-omics approaches have been proposed in the literature, none of which takes into consideration the intrinsic topology of each omic, though. In this work, an unsupervised learning method based on a deep neural network is proposed. Foreach omic, a separate network is trained, whose outputs are fused into a single graph; at this purpose, an innovative loss function has been designed to better represent the data cluster manifolds. The graph adjacency matrix is exploited to determine similarities among samples. With this approach, omics having a different number of features are merged into a unique representation. Quantitative and qualitative analyses show that the proposed method has comparable results to the state of the art. The method has great intrinsic flexibility as it can be customized according to the complexity of the tasks and it has a lot of room for future improvements compared to more fine-tuned methods, opening the way for future research.

2019 - A Deep Learning Approach to the Screening of Oncogenic Gene Fusions in Humans [Articolo su rivista]
Lovino, Marta; Urgese, Gianvito; Macii, Enrico; Di Cataldo, Santa; Ficarra, Elisa

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of protein fusion transcripts of developing into a cancer is a very challenging and yet not fully explored research problem. To this date, all the available approaches in literature try to explain the oncogenic potential of gene fusions based on protein domain analysis, that is cancer-specific and not easy to adapt to newly developed information. In our work, we choose the raw protein sequences as the input baseline, and propose the use of deep learning, and more specifically Convolutional Neural Networks, to infer the oncogenity probability score of gene fusion transcripts and to group them into a number of categories (e.g., oncogenic/not oncogenic). This is an inherently flexible methodology that, unlike previous approaches, can be re-trained with very less efforts on newly available data (for example, from a different cancer). Based on experimental results on a large dataset of pre-annotated gene fusions, our method is able to predict the oncogenity potential of gene fusion transcripts with accuracy of about 72%, which increases to 86% if we consider the only instances that are classified with a high confidence level.

2019 - Exploiting Gene Expression Profiles for the Automated Prediction of Connectivity between Brain Regions [Articolo su rivista]
Roberti, Ilaria; Lovino, Marta; Di Cataldo, Santa; Ficarra, Elisa; Urgese, Gianvito

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated the correlation between anatomical connectivity patterns and gene expression of neurons, using transcriptomic information to automatically predict such patterns is still an open challenge. In this work, we present a completely data-driven approach relying on machine learning (i.e., neural networks) to learn the anatomical connection directly from a training set of gene expression data. To do so, we combined gene expression and connectivity data from the Allen Mouse Brain Atlas to generate thousands of gene expression profile pairs from different brain regions. To each pair, we assigned a label describing the physical connection between the corresponding brain regions. Then, we exploited these data to train neural networks, designed to predict brain area connectivity. We assessed our solution on two prediction problems (with three and two connectivity class categories) involving cortical and cerebellum regions. As demonstrated by our results, we distinguish between connected and unconnected regions with 85% prediction accuracy and good balance of precision and recall. In our future work we may extend the analysis to more complex brain structures and consider RNA-Seq data as additional input to our model.