ELISA FICARRA - personale UniMoRe

Nuova ricerca

ELISA FICARRA

Professore Ordinario
Dipartimento di Ingegneria "Enzo Ferrari"

Pubblicazioni

2024 - A Graph-Based Multi-Scale Approach with Knowledge Distillation for WSI Classification [Articolo su rivista]
Bontempo, Gianpaolo; Bolelli, Federico; Porrello, Angelo; Calderara, Simone; Ficarra, Elisa
abstract

The usage of Multi Instance Learning (MIL) for classifying Whole Slide Images (WSIs) has recently increased. Due to their gigapixel size, the pixel-level annotation of such data is extremely expensive and time-consuming, practically unfeasible. For this reason, multiple automatic approaches have been raised in the last years to support clinical practice and diagnosis. Unfortunately, most state-of-the-art proposals apply attention mechanisms without considering the spatial instance correlation and usually work on a single-scale resolution. To leverage the full potential of pyramidal structured WSI, we propose a graph-based multi-scale MIL approach, DAS-MIL. Our model comprises three modules: i) a self-supervised feature extractor, ii) a graph-based architecture that precedes the MIL mechanism and aims at creating a more contextualized representation of the WSI structure by considering the mutual (spatial) instance correlation both inter and intra-scale. Finally, iii) a (self) distillation loss between resolutions is introduced to compensate for their informative gap and significantly improve the final prediction. The effectiveness of the proposed framework is demonstrated on two well-known datasets, where we outperform SOTA on WSI classification, gaining a +2.7% AUC and +3.7% accuracy on the popular Camelyon16 benchmark.

2024 - ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision [Relazione in Atti di Convegno]
Capitani, Giacomo; Bolelli, Federico; Porrello, Angelo; Calderara, Simone; Ficarra, Elisa
abstract

The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness.

2024 - Enhancing Patch-Based Learning for the Segmentation of the Mandibular Canal [Articolo su rivista]
Lumetti, Luca; Pipoli, Vittorio; Bolelli, Federico; Ficarra, Elisa; Grana, Costantino
abstract

Segmentation of the Inferior Alveolar Canal (IAC) is a critical aspect of dentistry and maxillofacial imaging, garnering considerable attention in recent research endeavors. Deep learning techniques have shown promising results in this domain, yet their efficacy is still significantly hindered by the limited availability of 3D maxillofacial datasets. An inherent challenge is posed by the size of input volumes, which necessitates a patch-based processing approach that compromises the neural network performance due to the absence of global contextual information. This study introduces a novel approach that harnesses the spatial information within the extracted patches and incorporates it into a Transformer architecture, thereby enhancing the segmentation process through the use of prior knowledge about the patch location. Our method significantly improves the Dice score by a factor of 4 points, with respect to the previous work proposed by Cipriano et al., while also reducing the training steps required by the entire pipeline. By integrating spatial information and leveraging the power of Transformer architectures, this research not only advances the accuracy of IAC segmentation, but also streamlines the training process, offering a promising direction for improving dental and maxillofacial image analysis.

2024 - Integrated microRNA and proteome analysis of cancer datasets with MoPC [Articolo su rivista]
Lovino, M.; Ficarra, E.; Martignetti, L.
abstract

MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study is crucial in revealing the fundamental processes underlying pathologies and, in particular, cancer. To date, most studies on miRNA regulation consider the effect of specific miRNAs on specific target mRNAs, providing wet-lab validation. However, few tools have been developed to explain the miRNAmediated regulation at the protein level. In this paper, the MoPC computational tool is presented, that relies on the partial correlation between mRNAs and proteins conditioned on the miRNA expression to predict miRNA-target interactions in multi-omic datasets. MoPC returns the list of significant miRNA-target interactions and plot the significant correlations on the heatmap in which the miRNAs and targets are ordered by the chromosomal location. The software was applied on three TCGA/CPTAC datasets (breast, glioblastoma, and lung cancer), returning enriched results in three independent targets databases.

2024 - PIK3R1 fusion drives chemoresistance in ovarian cancer by activating ERK1/2 and inducing rod and ring-like structures [Articolo su rivista]
Rausio, H.; Cervera, A.; Heuser, V. D.; West, G.; Oikkonen, J.; Pianfetti, E.; Lovino, M.; Ficarra, E.; Taimen, P.; Hynninen, J.; Lehtonen, R.; Hautaniemi, S.; Carpen, O.; Huhtinen, K.
abstract

Gene fusions are common in high-grade serous ovarian cancer (HGSC). Such genetic lesions may promote tumorigenesis, but the pathogenic mechanisms are currently poorly understood. Here, we investigated the role of a PIK3R1-CCDC178 fusion identified from a patient with advanced HGSC. We show that the fusion induces HGSC cell migration by regulating ERK1/2 and increases resistance to platinum treatment. Platinum resistance was associated with rod and ring-like cellular structure formation. These structures contained, in addition to the fusion protein, CIN85, a key regulator of PI3K-AKT-mTOR signaling. Our data suggest that the fusion-driven structure formation induces a previously unrecognized cell survival and resistance mechanism, which depends on ERK1/2-activation.

2023 - BERT Classifies SARS-CoV-2 Variants [Capitolo/Saggio]
Ghione, G.; Lovino, M.; Ficarra, E.; Cirrincione, G.
abstract

Medical diagnostics faced numerous difficulties during the COVID-19 pandemic. One of these has been the need for ongoing monitoring of SARS-CoV-2 mutations. Genomics is the technique most frequently used for precisely identifying variants. The ongoing global gathering of RNA samples of the virus has made such an approach possible. Nevertheless, variant identification techniques are frequently resource-intensive. As a result, the diagnostic capability of small medical laboratories might not be sufficient. In this work, an effective deep learning strategy for identifying SARS-CoV-2 variants is presented. This work makes two contributions: (1) a fine-tuning architecture of Bidirectional Encoder Representations from Transformers (BERT) to identify SARS-CoV-2 variants; (2) providing biological insights by exploiting BERT self-attention. Such an approach enables the analysis of the S gene of the virus to quickly recognize its variant. The selected model BERT is a transformer-based neural network first developed for natural language processing. Nonetheless, it has been effectively used in numerous applications, such as genomic sequence analysis. Thus, the fine-tuning of BERT was performed to adapt it to the RNA sequence domain, achieving a 98.59% F1-score on test data: it was successful in identifying variants circulating to date. The interpretability of the model was examined, since BERT utilizes the self-attention mechanism. In fact, it was discovered that by attending particular areas of the S gene, BERT extracts pertinent biological information on variants. Thus, the presented approach allows obtaining insights into the particular characteristics of SARS-CoV-2 RNA samples.

2023 - Buffer-MIL: Robust Multi-instance Learning with a Buffer-Based Approach [Relazione in Atti di Convegno]
Bontempo, G.; Lumetti, L.; Porrello, A.; Bolelli, F.; Calderara, S.; Ficarra, E.
abstract

Histopathological image analysis is a critical area of research with the potential to aid pathologists in faster and more accurate diagnoses. However, Whole-Slide Images (WSIs) present challenges for deep learning frameworks due to their large size and lack of pixel-level annotations. Multi-Instance Learning (MIL) is a popular approach that can be employed for handling WSIs, treating each slide as a bag composed of multiple patches or instances. In this work we propose Buffer-MIL, which aims at tackling the covariate shift and class imbalance characterizing most of the existing histopathological datasets. With this goal, a buffer containing the most representative instances of each disease-positive slide of the training set is incorporated into our model. An attention mechanism is then used to compare all the instances against the buffer, to find the most critical ones in a given slide. We evaluate Buffer-MIL on two publicly available WSI datasets, Camelyon16 and TCGA lung cancer, outperforming current state-of-the-art models by 2.2% of accuracy on Camelyon16.

2023 - DAS-MIL: Distilling Across Scales for MILClassification of Histological WSIs [Relazione in Atti di Convegno]
Bontempo, Gianpaolo; Porrello, Angelo; Bolelli, Federico; Calderara, Simone; Ficarra, Elisa
abstract

The adoption of Multi-Instance Learning (MIL) for classifying Whole-Slide Images (WSIs) has increased in recent years. Indeed, pixel-level annotation of gigapixel WSI is mostly unfeasible and time-consuming in practice. For this reason, MIL approaches have been profitably integrated with the most recent deep-learning solutions for WSI classification to support clinical practice and diagnosis. Nevertheless, the majority of such approaches overlook the multi-scale nature of the WSIs; the few existing hierarchical MIL proposals simply flatten the multi-scale representations by concatenation or summation of features vectors, neglecting the spatial structure of the WSI. Our work aims to unleash the full potential of pyramidal structured WSI; to do so, we propose a graph-based multi-scale MIL approach, termed DAS-MIL, that exploits message passing to let information flows across multiple scales. By means of a knowledge distillation schema, the alignment between the latent space representation at different resolutions is encouraged while preserving the diversity in the informative content. The effectiveness of the proposed framework is demonstrated on two well-known datasets, where we outperform SOTA on WSI classification, gaining a +1.9% AUC and +3.3¬curacy on the popular Camelyon16 benchmark.

2023 - Enhancing PFI Prediction with GDS-MIL: A Graph-based Dual Stream MIL Approach [Relazione in Atti di Convegno]
Bontempo, Gianpaolo; Bartolini, Nicola; Lovino, Marta; Bolelli, Federico; Virtanen, Anni; Ficarra, Elisa
abstract

Whole-Slide Images (WSI) are emerging as a promising resource for studying biological tissues, demonstrating a great potential in aiding cancer diagnosis and improving patient treatment. However, the manual pixel-level annotation of WSIs is extremely time-consuming and practically unfeasible in real-world scenarios. Multi-Instance Learning (MIL) have gained attention as a weakly supervised approach able to address lack of annotation tasks. MIL models aggregate patches (e.g., cropping of a WSI) into bag-level representations (e.g., WSI label), but neglect spatial information of the WSIs, crucial for histological analysis. In the High-Grade Serous Ovarian Cancer (HGSOC) context, spatial information is essential to predict a prognosis indicator (the Platinum-Free Interval, PFI) from WSIs. Such a prediction would bring highly valuable insights both for patient treatment and prognosis of chemotherapy resistance. Indeed, NeoAdjuvant ChemoTherapy (NACT) induces changes in tumor tissue morphology and composition, making the prediction of PFI from WSIs extremely challenging. In this paper, we propose GDS-MIL, a method that integrates a state-of-the-art MIL model with a Graph ATtention layer (GAT in short) to inject a local context into each instance before MIL aggregation. Our approach achieves a significant improvement in accuracy on the ``Ome18'' PFI dataset. In summary, this paper presents a novel solution for enhancing PFI prediction in HGSOC, with the potential of significantly improving treatment decisions and patient outcomes.

2023 - MiREx: mRNA levels prediction from gene sequence and miRNA target knowledge [Articolo su rivista]
Pianfetti, E.; Lovino, M.; Ficarra, E.; Martignetti, L.
abstract

Messenger RNA (mRNA) has an essential role in the protein production process. Predicting mRNA expression levels accurately is crucial for understanding gene regulation, and various models (statistical and neural network-based) have been developed for this purpose. A few models predict mRNA expression levels from the DNA sequence, exploiting the DNA sequence and gene features (e.g., number of exons/introns, gene length). Other models include information about long-range interaction molecules (i.e., enhancers/silencers) and transcriptional regulators as predictive features, such as transcription factors (TFs) and small RNAs (e.g., microRNAs - miRNAs). Recently, a convolutional neural network (CNN) model, called Xpresso, has been proposed for mRNA expression level prediction leveraging the promoter sequence and mRNAs’ half-life features (gene features). To push forward the mRNA level prediction, we present miREx, a CNN-based tool that includes information about miRNA targets and expression levels in the model. Indeed, each miRNA can target specific genes, and the model exploits this information to guide the learning process. In detail, not all miRNAs are included, only a selected subset with the highest impact on the model. MiREx has been evaluated on four cancer primary sites from the genomics data commons (GDC) database: lung, kidney, breast, and corpus uteri. Results show that mRNA level prediction benefits from selected miRNA targets and expression information. Future model developments could include other transcriptional regulators or be trained with proteomics data to infer protein levels.

2023 - Neuro Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal [Working paper]
Marconato, Emanuele; Bontempo, Gianpaolo; Ficarra, Elisa; Calderara, Simone; Passerini, Andrea; Teso, Stefano
abstract

2023 - Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal [Relazione in Atti di Convegno]
Marconato, E.; Bontempo, G.; Ficarra, E.; Calderara, S.; Passerini, A.; Teso, S.
abstract

We introduce Neuro-Symbolic Continual Learning, where a model has to solve a sequence of neuro-symbolic tasks, that is, it has to map sub-symbolic inputs to high-level concepts and compute predictions by reasoning consistently with prior knowledge. Our key observation is that neuro-symbolic tasks, although different, often share concepts whose semantics remains stable over time. Traditional approaches fall short: existing continual strategies ignore knowledge altogether, while stock neuro-symbolic architectures suffer from catastrophic forgetting. We show that leveraging prior knowledge by combining neurosymbolic architectures with continual strategies does help avoid catastrophic forgetting, but also that doing so can yield models affected by reasoning shortcuts. These undermine the semantics of the acquired concepts, even when detailed prior knowledge is provided upfront and inference is exact, and in turn continual performance. To overcome these issues, we introduce COOL, a COncept-level cOntinual Learning strategy tailored for neuro-symbolic continual problems that acquires high-quality concepts and remembers them over time. Our experiments on three novel benchmarks highlights how COOL attains sustained high performance on neuro-symbolic continual learning tasks in which other strategies fail.

2023 - Predicting gene and protein expression levels from DNA and protein sequences with Perceiver [Articolo su rivista]
Stefanini, Matteo; Lovino, Marta; Cucchiara, Rita; Ficarra, Elisa
abstract

Background and objective: The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks. The State-of-the-art models (e.g., Xpresso and Basenjii) predict mRNA levels exploiting Convolutional (CNN) or Long Short Term Memory (LSTM) Networks. However, CNN prediction depends on convolutional kernel size, and LSTM suffers from capturing long-range dependencies in the sequence. Concerning the prediction of protein levels, as far as we know, there is no model for predicting protein levels by exploiting the gene or protein sequences. Methods: Here, we exploit a new model type (called Perceiver) for mRNA and protein level prediction, exploiting a Transformer-based architecture with an attention module to attend to long-range interactions in the sequences. In addition, the Perceiver model overcomes the quadratic complexity of the standard Transformer architectures. This work's contributions are 1. DNAPerceiver model to predict mRNA levels from the sequence upstream and downstream of the TSS; 2. ProteinPerceiver model to predict protein levels from the protein sequence; 3. Protein&DNAPerceiver model to predict protein levels from TSS and protein sequences. Results: The models are evaluated on cell lines, mice, glioblastoma, and lung cancer tissues. The results show the effectiveness of the Perceiver-type models in predicting mRNA and protein levels. Conclusions: This paper presents a Perceiver architecture for mRNA and protein level prediction. In the future, inserting regulatory and epigenetic information into the model could improve mRNA and protein level predictions. The source code is freely available at https://github.com/MatteoStefanini/DNAPerceiver.

2023 - W2WNet: A two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality [Articolo su rivista]
Ponzio, F.; Macii, E.; Ficarra, E.; Di Cataldo, S.
abstract

Ideally, Convolutional Neural Networks (CNNs) should be trained with high quality images with minimum noise and correct ground truth labels. Nonetheless, in many real-world scenarios, such high quality is very hard to obtain, and datasets may be affected by any sort of image degradation and mislabelling issues. This negatively impacts the performance of standard CNNs, both during the training and the inference phase. To address this issue we propose Wise2WipedNet (W2WNet), a new two-module Convolutional Neural Network, where a Wise module exploits Bayesian inference to identify and discard spurious images during the training and a Wiped module takes care of the final classification, while broadcasting information on the prediction confidence at inference time. The goodness of our solution is demonstrated on a number of public benchmarks addressing different image classification tasks, as well as on a real-world case study on histological image analysis. Overall, our experiments demonstrate that W2WNet is able to identify image degradation and mislabelling issues both at training and at inference time, with positive impact on the final classification accuracy.

2022 - A survey on data integration for multi-omics sample clustering [Articolo su rivista]
Lovino, Marta; Randazzo, Vincenzo; Ciravegna, Gabriele; Barbiero, Pietro; Ficarra, Elisa; Cirrincione, Giansalvo
abstract

2022 - Catastrophic Forgetting in Continual Concept Bottleneck Models [Relazione in Atti di Convegno]
Marconato, E.; Bontempo, G.; Teso, S.; Ficarra, E.; Calderara, S.; Passerini, A.
abstract

2022 - Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations [Articolo su rivista]
Mascolini, Alessio; Cardamone, Dario; Ponzio, Francesco; Di Cataldo, Santa; Ficarra, Elisa
abstract

Computer-aided analysis of biological images typically requires extensive training on large-scale annotated datasets, which is not viable in many situations. In this paper, we present Generative Adversarial Network Discriminator Learner (GAN-DL), a novel self-supervised learning paradigm based on the StyleGAN2 architecture, which we employ for self-supervised image representation learning in the case of fluorescent biological images.

2022 - FusionFlow: an integrated system workflow for gene fusion detection in genomic samples [Relazione in Atti di Convegno]
Citarrella, Francesca; Bontempo, Gianpaolo; Lovino, Marta; Ficarra, Elisa
abstract

2022 - High Resolution Explanation Maps for CNNs using Segmentation Networks [Relazione in Atti di Convegno]
Mascolini, A.; Ponzio, F.; Macii, E.; Ficarra, E.; Di Cataldo, S.
abstract

Recent developments have resulted in multiple techniques trying to explain how deep neural networks achieve their predictions. The explainability maps provided by such techniques are useful to understand what the network has learned and increase user confidence in critical applications such as the medical field or autonomous driving. Nonetheless, they typically have very low resolutions, severely limiting their capability of identifying finer details or multiple subjects. In this paper we employ an encoder-decoder architecture with skip connection known as U-Net, originally developed for segmenting medical images, as an image classifier and we show that state of the art explainable techniques applied to U-Net can generate pixel level explanation maps for images of any resolution.

2022 - Identifying the oncogenic potential of gene fusions exploiting miRNAs [Articolo su rivista]
Lovino, M.; Montemurro, M.; Barrese, V. S.; Ficarra, E.
abstract

It is estimated that oncogenic gene fusions cause about 20% of human cancer morbidity. Identifying potentially oncogenic gene fusions may improve affected patients’ diagnosis and treatment. Previous approaches to this issue included exploiting specific gene-related information, such as gene function and regulation. Here we propose a model that profits from the previous findings and includes the microRNAs in the oncogenic assessment. We present ChimerDriver, a tool to classify gene fusions as oncogenic or not oncogenic. ChimerDriver is based on a specifically designed neural network and trained on genetic and post-transcriptional information to obtain a reliable classification. The designed neural network integrates information related to transcription factors, gene ontologies, microRNAs and other detailed information related to the functions of the genes involved in the fusion and the gene fusion structure. As a result, the performances on the test set reached 0.83 f1-score and 96% recall. The comparison with state-of-the-art tools returned comparable or higher results. Moreover, ChimerDriver performed well in a real-world case where 21 out of 24 validated gene fusion samples were detected by the gene fusion detection tool Starfusion. ChimerDriver integrates transcriptional and post-transcriptional information in an ad-hoc designed neural network to effectively discriminate oncogenic gene fusions from passenger ones. ChimerDriver source code is freely available at https://github.com/martalovino/ChimerDriver.

2022 - LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences [Articolo su rivista]
Winkler, J.; Urgese, G.; Ficarra, E.; Reinert, K.
abstract

Background: The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results: We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions: With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.

2022 - Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers [Articolo su rivista]
Pipoli, Vittorio; Cappelli, Mattia; Palladini, Alessandro; Peluso, Carlo; Lovino, Marta; Ficarra, Elisa
abstract

Background and objectives: In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model.Methods: This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem.Results: Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R-2 evaluation metric compared to 0.74 of Xpresso.Conclusion: The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power. (C) 2022 Elsevier B.V. All rights reserved.

2022 - SARS-CoV-2 variants classification and characterization [Relazione in Atti di Convegno]
Borgato, S.; Bottino, M.; Lovino, M.; Ficarra, E.
abstract

As of late 2019, the SARS-CoV-2 virus has spread globally, giving several variants over time. These variants, unfortunately, differ from the original sequence identified in Wuhan, thus risking compromising the efficacy of the vaccines developed. Some software has been released to recognize currently known and newly spread variants. However, some of these tools are not entirely automatic. Some others, instead, do not return a detailed characterization of all the mutations in the samples. Indeed, such characterization can be helpful for biologists to understand the variability between samples. This paper presents a Machine Learning (ML) approach to identifying existing and new variants completely automatically. In addition, a detailed table showing all the alterations and mutations found in the samples is provided in output to the user. SARS-CoV-2 sequences are obtained from the GISAID database, and a list of features is custom designed (e.g., number of mutations in each gene of the virus) to train the algorithm. The recognition of existing variants is performed through a Random Forest classifier while identifying newly spread variants is accomplished by the DBSCAN algorithm. Both Random Forest and DBSCAN techniques demonstrated high precision on a new variant that arose during the drafting of this paper (used only in the testing phase of the algorithm). Therefore, researchers will significantly benefit from the proposed algorithm and the detailed output with the main alterations of the samples. Data availability: the tool is freely available at https://github.com/sofiaborgato/-SARS-CoV-2-variants-classification-and-characterization.

2021 - A Bayesian approach to Expert Gate Incremental Learning [Relazione in Atti di Convegno]
Mieuli, V.; Ponzio, F.; Mascolini, A.; Macii, E.; Ficarra, E.; Di Cataldo, S.
abstract

Incremental learning involves Machine Learning paradigms that dynamically adjust their previous knowledge whenever new training samples emerge. To address the problem of multi-task incremental learning without storing any samples of the previous tasks, the so-called Expert Gate paradigm was proposed, which consists of a Gate and a downstream network of task-specific CNNs, a.k.a. the Experts. The gate forwards the input to a certain expert, based on the decision made by a set of autoencoders. Unfortunately, as a CNN is intrinsically incapable of dealing with inputs of a class it was not specifically trained on, the activation of the wrong expert will invariably end into a classification error. To address this issue, we propose a probabilistic extension of the classic Expert Gate paradigm. Exploiting the prediction uncertainty estimations provided by Bayesian Convolutional Neural Networks (B-CNNs), the proposed paradigm is able to either reduce, or correct at a later stage, wrong decisions of the gate. The goodness of our approach is shown by experimental comparisons with state-of-the-art incremental learning methods.

2021 - A Novel Proof-of-concept Framework for the Exploitation of ConvNets on Whole Slide Images [Capitolo/Saggio]
Alessio, Mascolini; Puzzo, S.; Incatasciato, G.; Ponzio, F.; Ficarra, E.; Di Cataldo, S.
abstract

Traditionally, the analysis of histological samples is visually performed by a pathologist, who inspects under the microscope the tissue samples, looking for malignancies and anomalies. This visual assessment is both time consuming and highly unreliable due to the subjectivity of the evaluation. Hence, there are growing efforts towards the automatisation of such analysis, oriented to the development of computer-aided diagnostic tools, with a ever-growing role of techniques based on deep learning. In this work, we analyze some of the issues commonly associated with providing deep learning based techniques to medical professionals. We thus introduce a tool, aimed at both researchers and medical professionals, which simplifies and accelerates the training and exploitation of such models. The outcome of the tool is an attention map representing cancer probability distribution on top of the Whole Slide Image, driving the pathologist through a faster and more accurate diagnostic procedure.

2021 - Exploration of Convolutional Neural Network models for source code classification [Articolo su rivista]
Barchi, F.; Parisi, E.; Urgese, G.; Ficarra, E.; Acquaviva, A.
abstract

The application of Artificial Intelligence is becoming common in many engineering fields. Among them, one of the newest and rapidly evolving is software generation, where AI can be used to automatically optimise the implementation of an algorithm for a given computing platform. In particular, Deep Learning technologies can be used to the decide how to allocate pieces of code to hardware platforms with multiple cores and accelerators, that are common in high performance and edge computing applications. In this work, we explore the use of Convolutional Neural Networks (CNN)s to analyse the application source code and decide the best compute unit to minimise the execution time. We demonstrate that CNN models can be successfully applied to source code classification, providing higher accuracy with consistently reduced learning time with respect to state-of-the-art methods. Moreover, we show the robustness of the method with respect to source code pre-processing, compiler options and hyper-parameters selection.

2021 - FUNGI: FUsioN Gene Integration toolset [Articolo su rivista]
Cervera, Alejandra; Rausio, Heidi; Kähkönen, Tiia; Andersson, Noora; Partel, Gabriele; Rantanen, Ville; Paciello, Giulia; Ficarra, Elisa; Hynninen, Johanna; Hietanen, Sakari; Carpén, Olli; Lehtonen, Rainer; Hautaniemi, Sampsa; Huhtinen, Kaisa
abstract

2021 - Optimizing Quality Inspection and Control in Powder Bed Metal Additive Manufacturing: Challenges and Research Directions [Articolo su rivista]
Di Cataldo, Santa; Vinco, Sara; Urgese, Gianvito; Calignano, Flaviana; Ficarra, Elisa; Macii, Alberto; Macii, Enrico
abstract

2021 - PhyliCS: a Python library to explore scCNA data and quantify spatial tumor heterogeneity [Articolo su rivista]
Montemurro, M.; Grassi, E.; Pizzino, C. G.; Bertotti, A.; Ficarra, E.; Urgese, G.
abstract

Background: Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. Results: We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. Conclusions: PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.

2020 - BioSeqZip: a collapser of NGS redundant reads for the optimisation of sequence analysis [Articolo su rivista]
Urgese, Gianvito; Parisi, Emanuele; Scicolone, Orazio; Di Cataldo, Santa; Ficarra, Elisa
abstract

Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the two-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times. Method: BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single or multi-sample data-sets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state. Results: Our extensive experiments on RNA-seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billions of reads into 963 millions of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least. Availability: BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip Supplementary information: Supplementary data are available at Bioinformatics online.

2020 - Cytoarchitectural analysis of the neuron-to-glia association in the dorsal root ganglia of normal and diabetic mice [Articolo su rivista]
Ciglieri, Elisa; Vacca, Maurizia; Ferrini, Francesco; Atteya, Mona A; Aimar, Patrizia; Ficarra, Elisa; Di Cataldo, Santa; Merighi, Adalberto; Salio, Chiara
abstract

Dorsal root ganglia (DRGs) host the somata of sensory neurons which convey information from the periphery to the central nervous system. These neurons have heterogeneous size and neurochemistry, and those of small-to-medium size, which play an important role in nociception, form two distinct subpopulations based on the presence (peptidergic) or absence (non-peptidergic) of transmitter neuropeptides. Few investigations have so far addressed the spatial relationship between neurochemically different subpopulations of DRG neurons and glia. We used a whole-mount mouse lumbar DRG preparation, confocal microscopy and computer-aided 3D analysis to unveil that IB4+ non-peptidergic neurons form small clusters of 4.7 ± 0.26 cells, differently from CGRP+ peptidergic neurons that are, for the most, isolated (1.89 ± 0.11 cells). Both subpopulations of neurons are ensheathed by a thin layer of satellite glial cells (SGCs) that can be observed after immunolabeling with the specific marker glutamine synthetase (GS). Notably, at the ultrastructural level we observed that this glial layer was discontinuous, as there were patches of direct contact between the membranes of two adjacent IB4+ neurons. To test whether this cytoarchitectonic organization was modified in the diabetic neuropathy, one of the most devastating sensory pathologies, mice were made diabetic by streptozotocin (STZ). In diabetic animals, cluster organization of the IB4+ non-peptidergic neurons was maintained, but the neuro-glial relationship was altered, as STZ treatment caused a statistically significant increase of GS staining around CGRP+ neurons but a reduction around IB4+ neurons. Ultrastructural analysis unveiled that SGC coverage was increased at the interface between IB4+ cluster-forming neurons in diabetic mice, with a 50% reduction in the points of direct contacts between cells. These observations demonstrate the existence of a structural plasticity of the DRG cytoarchitecture in response to STZ.

2020 - DEEPrior: a deep learning tool for the prioritization of gene fusions [Articolo su rivista]
Lovino, Marta; Ciaburri, Maria Serena; Urgese, Gianvito; Di Cataldo, Santa; Ficarra, Elisa
abstract

Summary: In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation: Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information: Supplementary data are available at Bioinformatics online.

2020 - Effective evaluation of clustering algorithms on single-cell CNA data [Relazione in Atti di Convegno]
Montemurro, Marilisa; Urgese, Gianvito; Grassi, Elena; Pizzino, Carmelo Gabriele; Bertotti, Andrea; Ficarra, Elisa
abstract

Clustering methods are increasingly applied to single-cell DNA sequencing (scDNAseq) data to infer the subclonal structure of cancer. However, the complexity of these data exacerbates some data-science issues and affects clustering results. Additionally, determining whether such inferences are accurate and clusters recapitulate the real cell phylogeny is not trivial, mainly because ground truth information is not available for most experimental settings. Here, by exploiting simulated sequencing data representing known phylogenies of cancer cells, we propose a formal and systematic assessment of well-known clustering methods to study their performance and identify the approach providing the most accurate reconstruction of phylogenetic relationships.

2020 - Exploiting "uncertain" deep networks for data cleaning in digital pathology [Relazione in Atti di Convegno]
Ponzio, Francesco; Deodato, Giacomo; Macii, Enrico; Di Cataldo, Santa; Ficarra, Elisa
abstract

2020 - Multi-omics Classification on Kidney Samples Exploiting Uncertainty-Aware Models [Relazione in Atti di Convegno]
Lovino, Marta; Bontempo, Gianpaolo; Cirrincione, Giansalvo; Ficarra, Elisa
abstract

Due to the huge amount of available omic data, classifying samples according to various omics is a complex process. One of the most common approaches consists of creating a classifier for each omic and subsequently making a consensus among the classifiers that assign to each sample the most voted class among the outputs on the individual omics. However, this approach does not consider the confidence in the prediction ignoring that biological information coming from a certain omic may be more reliable than others. Therefore, it is here proposed a method consisting of a tree-based multi-layer perceptron (MLP), which estimates the class-membership probabilities for classification. In this way, it is not only possible to give relevance to all the omics, but also to label as Unknown those samples for which the classifier is uncertain in its prediction. The method was applied to a dataset composed of 909 kidney cancer samples for which these three omics were available: gene expression (mRNA), microRNA expression (miRNA), and methylation profiles (meth) data. The method is valid also for other tissues and on other omics (e.g. proteomics, copy number alterations data, single nucleotide polymorphism data). The accuracy and weighted average f1-score of the model are both higher than 95%. This tool can therefore be particularly useful in clinical practice, allowing physicians to focus on the most interesting and challenging samples.

2020 - Predicting the oncogenic potential of gene fusions using convolutional neural networks [Relazione in Atti di Convegno]
Lovino, Marta; Gianvito, Urgese; Enrico, Macii; Santa Di Cataldo, ; Ficarra, Elisa
abstract

Predicting the oncogenic potential of a gene fusion transcript is an important and challenging task in the study of cancer development. To this date, the available approaches mostly rely on protein domain analysis to provide a probability score explaining the oncogenic potential of a gene fusion. In this paper, a Convolutional Neural Network model is proposed to discriminate gene fusions into oncogenic or non-oncogenic, exploiting only the protein sequence without protein domain information. Our proposed model obtained accuracy value close to 90% on a dataset of fused sequences.

2020 - Unification of miRNA and isomiR research: the mirGFF3 format and the mirtop API [Articolo su rivista]
Desvignes, Thomas; Loher, Phillipe; Eilbeck, Karen; Ma, Jeffery; Urgese, Gianvito; Fromm, Bastian; Sydes, Jason; Aparicio-Puerta, Ernesto; Barrera, Victor; Espin, Roderic; Londin, Eric; Telonis, Aristeidis G; Ficarra, Elisa; Friedlander, Marc R; Postlethwait, John H; Rigoutsos, Isidore; Hackenberg, Michael; Vlachos, Ioannis S; Halushka, Marc K.; Pantano, Lorena
abstract

Motivation MicroRNAs (miRNAs) are small RNA molecules (∼22 nucleotide long) involved in post-transcriptional gene regulation. Advances in high-throughput sequencing technologies led to the discovery of isomiRs, which are miRNA sequence variants. While many miRNA-seq analysis tools exist, the diversity of output formats hinders accurate comparisons between tools and precludes data sharing and the development of common downstream analysis methods. Results To overcome this situation, we present here a community-based project, miRNA Transcriptomic Open Project (miRTOP) working towards the optimization of miRNA analyses. The aim of miRTOP is to promote the development of downstream isomiR analysis tools that are compatible with existing detection and quantification tools. Based on the existing GFF3 format, we first created a new standard format, mirGFF3, for the output of miRNA/isomiR detection and quantification results from small RNA-seq data. Additionally, we developed a command line Python tool, mirtop, to create and manage the mirGFF3 format. Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files. Some tools have also incorporated the mirGFF3 format directly into their code, such as, miRge2.0, IsoMIRmap and OptimiR. Its open architecture enables any tool or pipeline to output or convert results into mirGFF3. Collectively, this isomiR categorization system, along with the accompanying mirGFF3 and mirtop API, provide a comprehensive solution for the standardization of miRNA and isomiR annotation, enabling data sharing, reporting, comparative analyses and benchmarking, while promoting the development of common miRNA methods focusing on downstream steps of miRNA detection, annotation and quantification. Availability and implementation https://github.com/miRTop/mirGFF3/ and https://github.com/miRTop/mirtop.

2020 - Unsupervised Multi-Omic Data Fusion: the Neural Graph Learning Network [Relazione in Atti di Convegno]
Barbiero, Pietro; Lovino, Marta; Siviero, Mattia; Ciravegna, Gabriele; Randazzo, Vincenzo; Ficarra, Elisa; Cirrincione, Giansalvo
abstract

In recent years, due to the high availability of omic data, data-driven biology has greatly expanded. However, the analysis of different data sources is still an open challenge. A few multi-omics approaches have been proposed in the literature, none of which takes into consideration the intrinsic topology of each omic, though. In this work, an unsupervised learning method based on a deep neural network is proposed. Foreach omic, a separate network is trained, whose outputs are fused into a single graph; at this purpose, an innovative loss function has been designed to better represent the data cluster manifolds. The graph adjacency matrix is exploited to determine similarities among samples. With this approach, omics having a different number of features are merged into a unique representation. Quantitative and qualitative analyses show that the proposed method has comparable results to the state of the art. The method has great intrinsic flexibility as it can be customized according to the complexity of the tasks and it has a lot of room for future improvements compared to more fine-tuned methods, opening the way for future research.

2019 - A Deep Learning Approach to the Screening of Oncogenic Gene Fusions in Humans [Articolo su rivista]
Lovino, Marta; Urgese, Gianvito; Macii, Enrico; Di Cataldo, Santa; Ficarra, Elisa
abstract

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of protein fusion transcripts of developing into a cancer is a very challenging and yet not fully explored research problem. To this date, all the available approaches in literature try to explain the oncogenic potential of gene fusions based on protein domain analysis, that is cancer-specific and not easy to adapt to newly developed information. In our work, we choose the raw protein sequences as the input baseline, and propose the use of deep learning, and more specifically Convolutional Neural Networks, to infer the oncogenity probability score of gene fusion transcripts and to group them into a number of categories (e.g., oncogenic/not oncogenic). This is an inherently flexible methodology that, unlike previous approaches, can be re-trained with very less efforts on newly available data (for example, from a different cancer). Based on experimental results on a large dataset of pre-annotated gene fusions, our method is able to predict the oncogenity potential of gene fusion transcripts with accuracy of about 72%, which increases to 86% if we consider the only instances that are classified with a high confidence level.

2019 - Aneuploid acute myeloid leukemia exhibits a signature of genomic alterations in the cell cycle and protein degradation machinery [Articolo su rivista]
Simonetti, Giorgia; Padella, Antonella; do Valle, Italo Farìa; Fontana, Maria Chiara; Fonzi, Eugenio; Bruno, Samantha; Baldazzi, Carmen; Guadagnuolo, Viviana; Manfrini, Marco; Ferrari, Anna; Paolini, Stefania; Papayannidis, Cristina; Marconi, Giovanni; Franchini, Eugenia; Zuffa, Elisa; Laginestra, Maria Antonella; Zanotti, Federica; Astolfi, Annalisa; Iacobucci, Ilaria; Bernardi, Simona; Sazzini, Marco; Ficarra, Elisa; Hernandez, Jesus Maria; Vandenberghe, Peter; Cools, Jan; Bullinger, Lars; Ottaviani, Emanuela; Testoni, Nicoletta; Cavo, Michele; Haferlach, Torsten; Castellani, Gastone; Remondini, Daniel; Martinelli, Giovanni
abstract

2019 - Dealing with Lack of Training Data for Convolutional Neural Networks: The Case of Digital Pathology [Articolo su rivista]
Ponzio, Francesco; Urgese, Gianvito; Ficarra, Elisa; Di Cataldo, Santa
abstract

Thanks to their capability to learn generalizable descriptors directly from images, deep Convolutional Neural Networks (CNNs) seem the ideal solution to most pattern recognition problems. On the other hand, to learn the image representation, CNNs need huge sets of annotated samples that are unfeasible in many every-day scenarios. This is the case, for example, of Computer-Aided Diagnosis (CAD) systems for digital pathology, where additional challenges are posed by the high variability of the cancerous tissue characteristics. In our experiments, state-of-the-art CNNs trained from scratch on histological images were less accurate and less robust to variability than a traditional machine learning framework, highlighting all the issues of fully training deep networks with limited data from real patients. To solve this problem, we designed and compared three transfer learning frameworks, leveraging CNNs pre-trained on non-medical images. This approach obtained very high accuracy, requiring much less computational resource for the training. Our findings demonstrate that transfer learning is a solution to the automated classification of histological samples and solves the problem of designing accurate and computationally-efficient CAD systems with limited training data.

2019 - Exploiting Gene Expression Profiles for the Automated Prediction of Connectivity between Brain Regions [Articolo su rivista]
Roberti, Ilaria; Lovino, Marta; Di Cataldo, Santa; Ficarra, Elisa; Urgese, Gianvito
abstract

The brain comprises a complex system of neurons interconnected by an intricate network of anatomical links. While recent studies demonstrated the correlation between anatomical connectivity patterns and gene expression of neurons, using transcriptomic information to automatically predict such patterns is still an open challenge. In this work, we present a completely data-driven approach relying on machine learning (i.e., neural networks) to learn the anatomical connection directly from a training set of gene expression data. To do so, we combined gene expression and connectivity data from the Allen Mouse Brain Atlas to generate thousands of gene expression profile pairs from different brain regions. To each pair, we assigned a label describing the physical connection between the corresponding brain regions. Then, we exploited these data to train neural networks, designed to predict brain area connectivity. We assessed our solution on two prediction problems (with three and two connectivity class categories) involving cortical and cerebellum regions. As demonstrated by our results, we distinguish between connected and unconnected regions with 85% prediction accuracy and good balance of precision and recall. In our future work we may extend the analysis to more complex brain structures and consider RNA-Seq data as additional input to our model.

2019 - Going Deeper into Colorectal Cancer Histopathology [Capitolo/Saggio]
Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; Di Cataldo, Santa
abstract

The early diagnosis of colorectal cancer (CRC) traditionally leverages upon the microscopic examination of histological slides by experienced pathologists, which is very time-consuming and rises many issues about the reliability of the results. In this paper we propose using Convolutional Neural Networks (CNNs), a class of deep networks that are successfully used in many contexts of pattern recognition, to automatically distinguish the cancerous tissues from either healthy or benign lesions. For this purpose, we designed and compared different CNN-based classification frameworks, involving either training CNNs from scratch on three classes of colorectal images, or transfer learning from a different classification problem. While a CNN trained from scratch obtained very good (about 90%) classification accuracy in our tests, the same CNN model pre-trained on the ImageNet dataset obtained even better accuracy (around 96%) on the same testing samples, requiring much lesser computational resources.

2019 - Novel and Rare Fusion Transcripts Involving Transcription Factors and Tumor Suppressor Genes in Acute Myeloid Leukemia [Articolo su rivista]
Padella and, Antonella; Simonetti and, Giorgia; Paciello and, Giulia; Giotopoulos and, George; Baldazzi and, Carmen; Righi and, Simona; Ghetti and, Martina; Stengel and, Anna; Guadagnuolo and, Viviana; De Tommaso and, Rossella; Papayannidis and, Cristina; Robustelli and, Valentina; Franchini and, Eugenia; Ghelli Luserna di Rorà and, Andrea; Ferrari and, Anna; Fontana and Maria, Chiara; Bruno and, Samantha; Ottaviani and, Emanuela; Soverini and, Simona; Storlazzi and Clelia, Tiziana; Haferlach and, Claudia; Sabattini and, Elena; Testoni and, Nicoletta; Iacobucci and, Ilaria; Huntly and Brian, J. P.; Ficarra, Elisa; Martinelli and, Giovanni
abstract

Approximately 18% of acute myeloid leukemia (AML) cases express a fusion transcript. However, few fusions are recurrent across AML and the identification of these rare chimeras is of interest to characterize AML patients. Here, we studied the transcriptome of 8 adult AML patients with poorly described chromosomal translocation(s), with the aim of identifying novel and rare fusion transcripts. We integrated RNA-sequencing data with multiple approaches including computational analysis, Sanger sequencing, fluorescence in situ hybridization and in vitro studies to assess the oncogenic potential of the ZEB2-BCL11B chimera. We detected 7 different fusions with partner genes involving transcription factors (OAZ-MAFK, ZEB2-BCL11B), tumor suppressors (SAV1-GYPB, PUF60-TYW1, CNOT2-WT1) and rearrangements associated with the loss of NF1 (CPD-PXT1, UTP6-CRLF3). Notably, ZEB2-BCL11B rearrangements co-occurred with FLT3 mutations and were associated with a poorly differentiated or mixed phenotype leukemia. Although the fusion alone did not transform murine c-Kit+ bone marrow cells, 45.4% of 14q32 non-rearranged AML cases were also BCL11B-positive, suggesting a more general and complex mechanism of leukemogenesis associated with BCL11B expression. Overall, by combining different approaches, we described rare fusion events contributing to the complexity of AML and we linked the expression of some chimeras to genomic alterations hitting known genes in AML.

2019 - Single-cell DNA Sequencing Data: a Pipeline for Multi-Sample Analysis [Relazione in Atti di Convegno]
Marilisa, Montemurro; Grassi, Elena; Urgese, Gianvito; Emanuele, Parisi; Gabriele Pizzino, Carmelo; Bertotti, Andrea; Ficarra, Elisa
abstract

Nowadays, single-cell DNA (sc-DNA) sequencing is showing up to be a valuable instrument to investigate intra and inter-tumor heterogeneity and infer its evolutionary dynamics, by using the high-resolution data it produces. That is why the demand for analytical tools to manage this kind of data is increasing. Here we propose a pipeline capable of producing multi-sample copy-number variation (CNV) analysis on large-scale single-cell DNA sequencing data and investigate spatial and temporal tumor heterogeneity.

2019 - Single-cell DNA Sequencing Data: a Pipeline for Multi-Sample Analysis [Abstract in Atti di Convegno]
Montemurro, Marilisa; Grassi, Elena; Urgese, Gianvito; Gabriele Pizzino, Carmelo; Bertotti, Andrea; Ficarra, Elisa
abstract

In order to help cancer researchers in understanding tumor heterogeneity and its evolutionary dynamics, we propose a software pipeline to explore intra-tumor heterogeneity by means of scDNA sequencing data.

2018 - Colorectal Cancer Classification using Deep Convolutional Networks. An Experimental Study [Relazione in Atti di Convegno]
Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; Di Cataldo, Santa
abstract

The analysis of histological samples is of paramount importance for the early diagnosis of colorectal cancer (CRC). The traditional visual assessment is time-consuming and highly unreliable because of the subjectivity of the evaluation. On the other hand, automated analysis is extremely challenging due to the variability of the architectural and colouring characteristics of the histological images. In this work, we propose a deep learning technique based on Convolutional Neural Networks (CNNs) to differentiate adenocarcinomas from healthy tissues and benign lesions. Fully training the CNN on a large set of annotated CRC samples provides good classification accuracy (around 90% in our tests), but on the other hand has the drawback of a very computationally intensive training procedure. Hence, in our work we also investigate the use of transfer learning approaches, based on CNN models pre-trained on a completely different dataset (i.e. the ImageNet). In our results, transfer learning considerably outperforms the CNN fully trained on CRC samples, obtaining an accuracy of about 96% on the same test dataset.

2018 - Low-cost pupillometry for human-computer interface [Poster]
Goddi, A; Ponzio, F; Ficarra, E; di Cataldo, S; Roatta, S.
abstract

Changes in pupil size are governed by the autonomic nervous system but may also be systematically driven by voluntary shifting the gaze in depth. Thus, the pupil accommodative response (PAR) that accompanies voluntary gaze shifts from a far (3 m distance) to a near (30 cm) visual target might be exploited as a simple human-computer interface (HCI), bypassing the somato-motor system.

2018 - MDM2 and Aurora Kinase a Contribute to SETD2 Loss of Function in Advanced Systemic Mastocytosis: Implications for Pathogenesis and Treatment [Abstract in Rivista]
Mancini, Manuela; Monaldi, Cecilia; De Santis, Sara; Papayannidis, Cristina; Rondoni, Michela; Bavaro, Luana; Martelli, Margherita; Maria Chiara, Abbenante; Curti, Antonio; Ficarra, Elisa; Paciello, Giulia; Chiara Fontana, Maria; Zanotti, Roberta; Bonifacio, Massimiliano; Scaffidi, Luigi; Pagano, Livio; Criscuolo, Marianna; Albano, Francesco; Ciceri, Fabio; Elena, Chiara; Tosi, Patrizia; Delledonne, Massimo; Avanzato, Carla; Xumerle, Luciano; Valent, Peter; Martinelli, Giovanni; Cavo, Michele; Soverini, Simona
abstract

2018 - RALE051: a novel established cell line of sporadic Burkitt lymphoma [Articolo su rivista]
L’Abbate, Alberto; Iacobucci, Ilaria; Lonoce, Angelo; Turchiano, Antonella; Ficarra, Elisa; Paciello, Giulia; Cattina, Federica; Ferrari, Anna; Imbrogno, Enrica; Agostinelli, Claudio; Zinzani, Pierluigi; Martinelli, Giovanni; Derenzini, Enrico; Storlazzi, Clelia Tiziana
abstract

2018 - geneEX a novel tool to assess differential expression from gene and exon sequencing data [Relazione in Atti di Convegno]
Scicolone, ORAZIO MARIA; Paciello, Giulia; Ficarra, Elisa
abstract

2017 - A multi-modal brain image registration framework for US-guided neuronavigation systems. Integrating MR and US for minimally invasive neuroimaging [Relazione in Atti di Convegno]
Ponzio, Francesco; Macii, Enrico; Ficarra, Elisa; DI CATALDO, Santa
abstract

US-guided neuronavigation exploits the simplicity of use and minimal invasiveness of Ultrasound (US) imaging and the high tissue resolution and signal-to-noise ratio of Magnetic Resonance Imaging (MRI) to guide brain surgeries. More specifically, the intra-operative 3D US images are combined with pre-operative MR images to accurately localise the course of instruments in the operative field with minimal invasiveness. Multi-modal image registration of 3D US and MR images is an essential part of such system. In this paper, we present a complete software framework that enables the registration US and MR brain scans based on a multi resolution deformable transform, tackling elastic deformations (i.e. brain shifts) possibly occurring during the surgical procedure. The framework supports also simpler and faster registration techniques, based on rigid or affine transforms, and enables the interactive visualisation and rendering of the overlaid US and MRI volumes. The registration was experimentally validated on a public dataset of realistic brain phantom images, at different levels of artificially induced deformations.

2017 - FuGePrior: A novel gene fusion prioritization algorithm based on accurate fusion structure analysis in cancer RNA-seq samples [Articolo su rivista]
Paciello, Giulia; Ficarra, Elisa
abstract

2017 - Mining textural knowledge in biological images: applications, methods and trends [Articolo su rivista]
DI CATALDO, Santa; Ficarra, Elisa
abstract

Texture analysis is a major task in many areas of computer vision and pattern recognition, including biological imaging. Indeed, visual textures can be exploited to distinguish specific tissues or cells in a biological sample, to highlight chemical reactions between molecules, as well as to detect subcellular patterns that can be evidence of certain pathologies. This makes automated texture analysis fundamental in many applications of biomedicine, such as the accurate detection and grading of multiple types of cancer, the differential diagnosis of autoimmune diseases, or the study of physiological processes. Due to their specific characteristics and challenges, the design of texture analysis systems for biological images has attracted ever-growing attention in the last few years. In this paper, we perform a critical review of this important topic. First, we provide a general definition of texture analysis and discuss its role in the context of bioimaging, with examples of applications from the recent literature. Then, we review the main approaches to automated texture analysis, with special attention to the methods of feature extraction and encoding that can be successfully applied to microscopy images of cells or tissues. Our aim is to provide an overview of the state of the art, as well as a glimpse into the latest and future trends of research in this area.

2017 - Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer [Articolo su rivista]
Isella, Claudio; Brundu, FRANCESCO GAVINO; Bellomo, Sara E.; Galimi, Francesco; Zanella, Eugenia; Consalvo Petti, Roberta; Fiori, Alessandro; Orzan, Francesca; Senetta, Rebecca; Boccaccio, Carla; Ficarra, Elisa; Marchionni, Luigi; Trusolino, Livio; Medico, Enzo; Bertotti, Andrea
abstract

Stromal content heavily impacts the transcriptional classification of colorectal cancer (CRC), with clinical and biological implications. Lineage-dependent stromal transcriptional components could therefore dominate over more subtle expression traits inherent to cancer cells. Since in patient-derived xenografts (PDXs) stromal cells of the human tumour are substituted by murine counterparts, here we deploy human-specific expression profiling of CRC PDXs to assess cancer-cell intrinsic transcriptional features. Through this approach, we identify five CRC intrinsic subtypes (CRIS) endowed with distinctive molecular, functional and phenotypic peculiarities: (i) CRIS-A: mucinous, glycolytic, enriched for microsatellite instability or KRAS mutations; (ii) CRIS-B: TGF-β pathway activity, epithelial–mesenchymal transition, poor prognosis; (iii) CRIS-C: elevated EGFR signalling, sensitivity to EGFR inhibitors; (iv) CRIS-D: WNT activation, IGF2 gene overexpression and amplification; and (v) CRIS-E: Paneth cell-like phenotype, TP53 mutations. CRIS subtypes successfully categorize independent sets of primary and metastatic CRCs, with limited overlap on existing transcriptional classes and unprecedented predictive and prognostic performances.

2017 - Spatial distribution of peptidergic and non-peptidergic nociceptors in mouse dorsal root ganglia: a cluster story [Abstract in Atti di Convegno]
Ciglieri, Elisa; Vacca, Maurizia; Ferrini, Francesco; Di Cataldo, Santa; Ficarra, Elisa; Salio, Chiara
abstract

2017 - isomiR-SEA: miRNA and isomiR expression level detection in seven RNA-Seq datasets [Poster]
Urgese, Gianvito; Paciello, Giulia; Macii, Enrico; Acquaviva, Andrea; Ficarra, Elisa
abstract

Background: Massive parallel sequencing of transcriptomes revealed the presence of miRNA variants named isomiRs. The sequence variations identified within isomiR molecules can affect their targeting activity, with consequences in gene expression and potential impact in multi-factorial diseases. miRNAs are considered good biomarkers, making their adoption for disease characterization highly desirable. Several methodologies and tools were devised to identify and quantify miRNAs from sequencing data. However, all these tools are built on-top of general-purpose alignment algorithms, providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites. Method: To overcome these limitations we developed the isomiR-SEA algorithm. By implementing a miRNA-specific alignment procedure, isomiR-SEA analysis accounts for accurate miRNA/isomiR expression levels and for a precise evaluation of the conserved interaction sites. As first, isomiR-SEA identifies miRNA seeds within the tags. If the seed is found, the alignment is extended and the positions of the encountered mismatches recorded. Then, the collected info is evaluated to distinguish among miRNAs and isomiRs and to assess the conservation of the interaction sites. Results & Conclusion: isomiR-SEA performance was assessed on 7 public RNA-Seq datasets. 40% of reads attributed to miRNAs (189M) comes from mature miRNAs, 50% derives instead from 3’ isomiRs, and the remaining reads account for 5’/SNP isomiRs or combinations between them. Furthermore, about 2% of reads lost some interaction sites. This proves the importance of a miRNA-specific alignment algorithm to correctly evaluate miRNA targeting activity. Expression levels of isomiRs detected in the two experiments were aggregated and classified with two deepness. In experiment 1, isoforms with indel (in one or both ends) are grouped together. Whereas, in experiment 2 we make a distinction between reads aligned on the mature miRNA with insertion (+) or deletion (-) on 5' or 3' ends. This shows the capability of isomiR-SEA to generate enriched results that can be analysed in down-stream analysis customized for the investigation purpose.

2016 - A COMBINED APPROACH TO DETECT RARE FUSION EVENTS IN ACUTE MYELOID LEUKEMIA [Relazione in Atti di Convegno]
A., Padella; G., Simonetti; Paciello, Giulia; A., Ferrari; E., Zago; C., Baldazzi; V., Guadagnuolo; C., Papayannidis; V., Robustelli; E., Imbrogno; N., Testoni; M., Cavo; M., Delledonne; I., Iacobucci; Ct, Storlazzi; Ficarra, Elisa; G., Martinelli
abstract

2016 - A novel gaussian extrapolation approach for 2-D gel electrophoresis saturated protein spots [Capitolo/Saggio]
Natale, Massimo; Caiazzo, Alfonso; Ficarra, Elisa
abstract

2016 - ANAlyte: a modular image analysis tool for ANA testing with Indirect Immunofluorescence [Articolo su rivista]
DI CATALDO, Santa; Tonti, Simone; Bottino, ANDREA GIUSEPPE; Ficarra, Elisa
abstract

Background and objectives. The automated analysis of Indirect Immunofluorescence images for Anti-Nuclear Autoantibody (ANA) testing is a fairly recent field that is receiving ever-growing interest from the research community. ANA testing leverages on the categorization of intensity level and fluorescent pattern of IIF images of HEp-2 cells to perform a differential diagnosis of important autoimmune diseases. Nevertheless, it suffers from tremendous lack of repeatability due to subjectivity in the visual interpretation of the images. The automatization of the analysis is seen as the only valid solution to this problem. Several works in literature address individual steps of the work-flow, nonetheless integrating such steps and assessing their effectiveness as a whole is still an open challenge. Methods. We present a modular tool, ANAlyte, able to characterize a IIF image in terms of fluorescent intensity level and fluorescent pattern without any user-interactions. For this purpose, ANAlyte integrates the following: (i) Intensity Classifier module, that categorizes the intensity level of the input slide based on multi-scale contrast assessment (ii) Cell Segmenter module, that splits the input slide into individual HEp-2 cells; (iii) Pattern Classifier module, that determines the fluorescent pattern of the slide based on the pattern of the individual cells. Results. To demonstrate the accuracy and robustness of our tool, we experimentally validated ANAlyte on two different public benchmarks of IIF HEp-2 images with rigorous leave-one-out cross-validation strategy. We obtained overall accuracy of fluorescent intensity and pattern classification respectively around 85% and above 90%. We assessed all results by comparisons with some of the most representative state of the art works. Conclusions. Unlike most of the other works in the recent literature, ANAlyte aims at the automatization of all the major steps of ANA image analysis. Results on public benchmarks demonstrate that the tool can characterize HEp-2 slides in terms of intensity and fluorescent pattern with accuracy better or comparable with the state of the art techniques, even when such techniques are run on manually segmented cells. Hence, ANAlyte can be proposed as a valid solution to the problem of ANA testing automatization.

2016 - Automated 3D immunofluorescence analysis of Dorsal Root Ganglia for the investigation of neural circuit alterations: a preliminary study [Relazione in Atti di Convegno]
DI CATALDO, Santa; Tonti, Simone; Ciglieri, Elisa; Ferrini, Francesco; Macii, Enrico; Ficarra, Elisa; Salio, Chiara
abstract

Diabetic polyneuropathy is a major complication of diabetes mellitus, causing severe alterations of the neural circuits between spinal nerves and spinal cord. The analysis of 3D confocal images of dorsal root ganglia in diabetic mice, where different fluorescent markers are used to identify different types of nociceptors, can help understanding the unknown mechanisms of this pathology. Nevertheless, due to the inherent challenges of 3D confocal imaging, a thorough and comprehensive visual investigation is very difficult. In this work we introduce a tool, 3DRG, that provides a fully-automated segmentation and 3D rendering of positively labeled nociceptors in a dorsal root ganglion, as well a quantitative characterisation of its immunopositivity to each fluorescent marker. Our preliminary experiments on 3D confocal images of entire dorsal root ganglia from healthy and diabetic mice provided very interesting insights about the effects of the pathology on two different types of nociceptors.

2016 - Detection of rearranged light chain sequences by RNA-Seq in B-cell lymphomas and reactive lymphadenopathies [Abstract in Atti di Convegno]
Paciello, Giulia; Pighi, C.; Ficarra, Elisa; Zamo', A.
abstract

2016 - Novel fusion transcripts identified by RNAseq cooperate with somatic mutations in the pathogenesis of acute myeloid leukemia [Abstract in Rivista]
Antonella, Padella; Giorgia, Simonetti; Anna, Ferrari; Paciello, Giulia; Elisa, Zago; Carmen, Baldazzi; Viviana, Guadagnuolo; Cristina, Papayannidis; Valentina, Robustelli; Enrica, Imbrogno; Nicoletta, Testoni; Massimo, Delledonne; Ilaria, Iacobucci; Tiziana Clelia, Storlazzi; Ficarra, Elisa; Pier Luigi, Lollini; Giovanni, Martinelli
abstract

2016 - The Genomic and Transcriptomic Landscape of Systemic Mastocytosis [Abstract in Rivista]
Simona, Soverini; Caterina De, Benedittis; Michela, Rondoni; Cristina, Papayannidis; Ficarra, Elisa; Paciello, Giulia; Marco, Manfrini; Manuela, Mancini; Roberta, Zanotti; Luigi, Scaffidi; Giorgina, Specchia; Francesco, Albano; Serena, Merante; Chiara, Elena; Livio, Pagano; Domenica, Gangemi; Patrizia, Tosi; Luana, Bavaro
abstract

2016 - Unsupervised analysis of cancer-cell intrinsic transcriptional traits defines a new classification system for colorectal cancer with improved predictive and prognostic value [Abstract in Rivista]
Andrea, Bertotti; Claudio, Isella; Sara E., Bellomo; Brundu, FRANCESCO GAVINO; Francesco, Galimi; Ficarra, Elisa; Livio, Trusolino; Enzo, Medico
abstract

2016 - isomiR-SEA: An RNA-Seq analysis tool for miRNAs/isomiRs expression level profiling and miRNA-mRNA interaction sites evaluation [Articolo su rivista]
Urgese, Gianvito; Paciello, Giulia; Acquaviva, Andrea; Ficarra, Elisa
abstract

>Background: Massive parallel sequencing of transcriptomes, revealed the presence of many miRNAs and miRNAs variants named isomiRs with a potential role in several cellular processes through their interaction with a target mRNA. Many methods and tools have been recently devised to detect and quantify miRNAs from sequencing data. However, all of them are implemented on top of general purpose alignment methods, thus providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites. >Results: To overcome these limitations we present a novel algorithm named isomiR-SEA, that is able to provide users with very accurate miRNAs expression levels and both isomiRs and miRNA-mRNA interaction sites precise classifications. Tags are mapped on the known miRNAs sequences thanks to a specialized alignment algorithm developed on top of biological evidence concerning miRNAs structure. Specifically, isomiR-SEA checks for miRNA seed presence in the input tags and evaluates, during all the alignment phases, the positions of the encountered mismatches, thus allowing to distinguish among the different isomiRs and conserved miRNA-mRNA interaction sites. >Conclusions: isomiR-SEA performances have been assessed on two public RNA-Seq datasets proving that the implemented algorithm is able to account for more reliable and accurate miRNAs expression levels with respect to those provided by two compared state of the art tools. Moreover, differently from the few methods currently available to perform isomiRs detection, the proposed algorithm implements the evaluation of isomiRs and conserved miRNA-mRNA interaction sites already in the first alignment phases, thus avoiding any additional filtering stages potentially responsible for the loss of useful information.

2015 - A novel patient-derived tumorgraft model with TRAF1-ALK anaplastic large-cell lymphoma translocation [Articolo su rivista]
F., Abate; M., Todaro; J. A., van der Krogt; M., Boi; I., Landra; R., Machiorlatti; F., Tabbò; K., Messana; A., Barreca; D., Novero; M., Gaudiano; S., Aliberti; F., Di Giacomo; T., Tousseyn; E., Lasorsa; R., Crescenzo; L., Bessone; Ficarra, Elisa; Acquaviva, Andrea; A., Rinaldi; M., Ponzoni; Dl, Longo; S., Aime; M., Cheng; B., Ruggeri; Pp, Piccaluga; S., Pileri; E., Tiacci; B., Falini; B., Pera Gresely; L., Cerchietti; J., Iqbal; Wc, Chan; Ld, Shultz; I., Kwee; R., Piva; I., Wlodarska; R., Rabadan; F., Bertoni; G., Inghirami; The European T., cell Lymphoma Study Group
abstract

Although anaplastic large-cell lymphomas (ALCL) carrying anaplastic lymphoma kinase (ALK) have a relatively good prognosis, aggressive forms exist. We have identified a novel translocation, causing the fusion of the TRAF1 and ALK genes, in one patient who presented with a leukemic ALK+ ALCL (ALCL-11). To uncover the mechanisms leading to high-grade ALCL, we developed a human patient-derived tumorgraft (hPDT) line. Molecular characterization of primary and PDT cells demonstrated the activation of ALK and nuclear factor kB (NFkB) pathways. Genomic studies of ALCL-11 showed the TP53 loss and the in vivo subclonal expansion of lymphoma cells, lacking PRDM1/Blimp1 and carrying c-MYC gene amplification. The treatment with proteasome inhibitors of TRAF1-ALK cells led to the downregulation of p50/p52 and lymphoma growth inhibition. Moreover, a NFkB gene set classifier stratified ALCL in distinct subsets with different clinical outcome. Although a selective ALK inhibitor (CEP28122) resulted in a significant clinical response of hPDT mice, nevertheless the disease could not be eradicated. These data indicate that the activation of NFkB signaling contributes to the neoplastic phenotype of TRAF1-ALK ALCL. ALCL hPDTs are invaluable tools to validate the role of druggable molecules, predict therapeutic responses and implement patient specific therapies.

2015 - An automated approach to the segmentation of HEp-2 cells for the indirect immunofluorescence ANA test [Articolo su rivista]
Tonti, Simone; Di Cataldo, Santa; Bottino, Andrea Giuseppe; Ficarra, Elisa
abstract

The automatization of the analysis of Indirect Immunofluorescence (IIF) images is of paramount importance for the diagnosis of autoimmune diseases. This paper proposes a solution to one of the most challenging steps of this process, the segmentation of HEp-2 cells, through an adaptive marker-controlled watershed approach. Our algorithm automatically conforms the marker selection pipeline to the peculiar characteristics of the input image, hence it is able to cope with different fluorescent intensities and staining patterns without any a priori knowledge. Furthermore, it shows a reduced sensitivity to over-segmentation errors and uneven illumination, that are typical issues of IIF imaging.

2015 - An integrated approach for morphofunctional analysis of DRGs in normal and diabetic mice [Abstract in Atti di Convegno]
Ciglieri, Elisa; Ferrini, Francesco; Tonti, Simone; DI CATALDO, Santa; Ficarra, Elisa; Salio, Chiara
abstract

2015 - Convergent Mutations and Kinase Fusions Lead to Oncogenic STAT3 Activation in Anaplastic Large Cell Lymphoma [Articolo su rivista]
Ramona, Crescenzo; Francesco, Abate; Elena, Lasorsa; Fabrizio, Tabbo’; Marcello, Gaudiano; Nicoletta, Chiesa; Filomena Di, Giacomo; Elisa, Spaccarotella; Luigi, Barbarossa; Elisabetta, Ercole; Maria, Todaro; Michela, Boi; Acquaviva, Andrea; Ficarra, Elisa; Domenico, Novero; Andrea, Rinaldi; Thomas, Tousseyn; Andreas, Rosenwald; Lukas, Kenner; Lorenzo, Cerroni; Alexander, Tzankov; Maurilio, Ponzoni; Marco, Paulli; Dennis, Weisenburger; Wing C., Chan; Javeed, Iqbal; Miguel A., Piris; Alberto, Zamo’; Carmela, Ciardullo; Davide, Rossi; Gianluca, Gaidano; Stefano, Pileri; Enrico, Tiacci; Brunangelo, Falini; Leonard D., Shultz; Laurence, Mevellec; Jorge E., Vialard; Roberto, Piva; Francesco, Bertoni; Raul, Rabadan; Giorgio, Inghirami
abstract

JAK/STAT3 signaling pathway is often deregulated in hematopoietic disorders including peripheral T-cell lymphoma. We describe two novel mechanisms leading to the constitutive activation of STAT3 in ALK- ALCL. Oncogenic JAK1 or STAT3 mutations are associated to hyperactive pSTAT3 that regulated canonical STAT3 and ATF3 genes. Moreover, synergizing JAK1 and STAT3 mutants sustain the neoplastic growth, which can be efficiently controlled in vitro and in an ALCL patient derived tumorgraft model by JAK1/2 inhibitors. We have discovered that novel chimera, displaying concomitant transcriptional and kinase activities, are power oncogenes capable to sustain via STAT3 the ALCL phenotype and can be uniquely neutralized by a novel ROS1 inhibitor. The pharmacological inhibition of JAK/STAT3 represents a novel strategy for the treatment of molecular stratified ALCL.

2015 - Erratum to Convergent Mutations and Kinase Fusions Lead to Oncogenic STAT3 Activation in Anaplastic Large Cell Lymphoma [Cancer Cell 27, 516-532] April 13, 2015 10.1016/j.ccell.2015.04.014 [Articolo su rivista]
Crescenzo, R.; Abate, F.; Lasorsa, E.; Tabbo', F.; Gaudiano, M.; Chiesa, N.; Di Giacomo, F.; Spaccarotella, E.; Barbarossa, L.; Ercole, E.; Todaro, M.; Boi, M.; Acquaviva, A.; Ficarra, E.; Novero, D.; Rinaldi, A.; Tousseyn, T.; Rosenwald, A.; Kenner, L.; Cerroni, L.; Tzankov, A.; Ponzoni, M.; Paulli, M.; Weisenburger, D.; Chan, W. C.; Iqbal, J.; Piris, M. A.; Zamo', A.; Ciardullo, C.; Rossi, D.; Gaidano, G.; Pileri, S.; Tiacci, E.; Falini, B.; Shultz, L. D.; Mevellec, L.; Vialard, J. E.; Piva, R.; Bertoni, F.; Rabadan, R.; Inghirami, G.
abstract

2015 - MicroRNA/mRNA interactions underlying colorectal cancer molecular subtypes [Articolo su rivista]
Cantini, Laura; Isella, Claudio; Petti, Consalvo; Picco, Gabriele; Chiola, Simone; Ficarra, Elisa; Caselle, Michele; Medico, Enzo
abstract

Colorectal cancer (CRC) molecular subtypes have been recently identified by gene expression profiling. To search for microRNAs potentially driving the subtypes, we designed an analytical pipeline, microRNA Master Regulator Analysis (MMRA). As input, MMRA requires a paired microRNA/mRNA expression dataset, with samples subdivided in two or more subgroups, and gene expression signatures specific for each subgroup. MMRA then identifies candidate regulator microRNAs by assessing their subtype-specific expression, target gene enrichment in subtype signatures and network analysis-based contribution to subtype gene expression. MMRA was applied to a CRC dataset of 450 samples, assigned to various subtypes by three different transcriptional classifiers. In total, 24 microRNA were associated to subtypes, in most cases negatively contributing to the stem/serrated/mesenchymal (SSM) poor prognosis subtype. Functional validation in CRC cell lines confirmed downregulation of the SSM subtype by miR-194, miR-200b, miR-203 and miR-429, and highlighted shared target genes and pathways mediating this effect.

2015 - On the relevance of a complete characterisation of miRNAs, isomiRs and miRNA-mRNA interaction sites through miRNA-specific alignment tools [Abstract in Atti di Convegno]
Urgese, Gianvito; Paciello, Giulia; Ficarra, Elisa; Macii, Enrico; Acquaviva, Andrea
abstract

The advent of NGS dramatically changed the characterisation of multifactorial pathologies such as cancer. The high molecular variability of cancer makes essential the identification of biomarkers able to explain the differences among cancer sub-types, allowing physicians to provide patients with suitable therapies. In this context, miRNAs are considered adequate biomarkers and miRNAs profiling from miRNA-sequencing is widely used. However, state of the art tools performing miRNAs reads mapping rely on general-purpose alignment algorithms. On the other side, researches carried out in the last decade led to the identification of many miRNAs specific features that are not exploited by miRNAs aligner. Moreover, the role of miRNAs variants called ‘isomiRs’ is still an open issue. IsomiRs impact miRNA targets affinity characterization and their analysis enables a more accurate evaluation of miRNA expression profiles. In light of these considerations, there is need of algorithmic methodologies able to provide users with a complete and accurate picture of the whole miRNAs, isomiRs and interaction sites spectrum. We report the impact of the application of such methodology on 23 human miRNA-Seq datasets from GEO, for which the overall isomiRs expression level and the characteristics of the interaction sites has been evaluated. As a result, 40% of the 189M miRNAs mapped reads showed a miRNA exact sequence, whereas 50% are characterized by a sequence accounting for 3’ isomiRs and the remaining reads possess sequences compatible with 5’ and SNP isomiRs or combinations of them. Furthermore, in the 2% of the cases some interaction sites are missed. Two other samples (hESCs and NSCs), recently analysed to confirm isomiRs importance, have been also studied in terms of isomiRs and interaction sites profiles, pointing out that such characteristics require a suitable methodology for miRNA sequences analysis because they cannot be appreciated from the overall miRNAs expression profile.

2015 - RNA Sequencing Reveals Novel and Rare Fusion Transcripts in Acute Myeloid Leukemia [Abstract in Rivista]
Padella, A.; Simonetti, G.; Paciello, Giulia; Ferrari, A.; Zago, E.; Baldazzi, C.; Guadagnuolo, V.; Papayannidis, C.; Robustelli, V.; Imbrogno, E.; Testoni, N.; Musuraca, G.; Soverini, S.; Delledonne, M.; Iacobucci, I.; Storlazzi, C. T.; Ficarra, Elisa; Martinelli, G.
abstract

2015 - Unsupervised HEp-2 mitosis recognition in Indirect Immunofluorescence Imaging [Relazione in Atti di Convegno]
Tonti, Simone; DI CATALDO, Santa; Macii, Enrico; Ficarra, Elisa
abstract

Automated HEp-2 mitotic cell recognition in IIF images is an important and yet scarcely explored step in the computer-aided diagnosis of autoimmune disorders. Such step is necessary to assess the goodness of the HEp-2 samples and helps the early diagnosis of the most difficult or ambiguous cases. In this work, we propose a completely unsupervised approach for HEp-2 mitotic cell recognition that overcomes the problem of mitotic/non-mitotic class imbalance due to the limited number of mitotic cells. Our technique automatically selects a limited set of candidate cells from the HEp-2 slide and then applies a clustering algorithm to identify the mitotic ones based on their texture. Finally, a second stage of clustering discriminates between positive and negative mitoses. Experiments on public IIF images demonstrate the performance of our technique compared to previous approaches.

2015 - VDJSeq-Solver: In Silico V(D)J Recombination Detection tool [Articolo su rivista]
Paciello, Giulia; Acquaviva, Andrea; Chiara, Pighi; Alberto, Ferrarini; Macii, Enrico; Alberto, Zamò; Ficarra, Elisa
abstract

In this paper we present VDJSeq-Solver, a methodology and tool to identify clonal lymphocyte populations from paired-end RNA Sequencing reads derived from the sequencing of mRNA neoplastic cells. The tool detects the main clone that characterises the tissue of interest by recognizing the most abundant V(D)J rearrangement among the existing ones in the sample under study. The exact sequence of the clone identified is capable of accounting for the modifications introduced by the enzymatic processes. The proposed tool overcomes limitations of currently available lymphocyte rearrangements recognition methods, working on a single sequence at a time, that are not applicable to high-throughput sequencing data. In this work, VDJSeq-Solver has been applied to correctly detect the main clone and identify its sequence on five Mantle Cell Lymphoma samples; then the tool has been tested on twelve Diffuse Large B-Cell Lymphoma samples. In order to comply with the privacy, ethics and intellectual property policies of the University Hospital and the University of Verona, data is available upon request to supporto.utenti@ateneo.univr.it after signing a mandatory Materials Transfer Agreement. VDJSeq-Solver JAVA/Perl/Bash software implementation is free and available at http://eda.polito.it/VDJSeq-Solver/.

2014 - A Novel Pipeline for Identification and Prioritization of Gene Fusions in Patient-derived Xenografts of Metastatic Colorectal Cancer [Relazione in Atti di Convegno]
Paciello, Giulia; Acquaviva, Andrea; Consalvo, Petti; Claudio, Isella; Enzo, Medico; Ficarra, Elisa
abstract

Metastatic spread to the liver is a frequent complication of colorectal cancer (CRC), occurring in almost half of the cases, for which personalized treatment strategies are highly desirable. To this aim, it has been proven that patient-derived mouse xenografts (PDX) of liver-metastatic CRC can be used to discover new therapeutic targets and determinants of drug resistance. To identify gene fusions in RNA-Seq data obtained from such PDX samples, we propose a novel pipeline that tackles the following issues: (i) discriminating human from murine RNA, to filter out transcripts contributed by the mouse stroma that supports the PDX; (ii) increasing sensitivity in case of suboptimal RNA-Seq coverage; (iii) prioritizing the detected chimeric transcripts by molecular features of the fusion and by functional relevance of the involved genes; (iv) providing appropriate sequence information for subsequent validation of the identified fusions. The pipeline, built on top of Chimerascan(R.Iyer, 2011) and deFuse(McPherson, 2011) aligner tools, was successfully applied to RNASeq data from 11 PDX samples. Among the 299 fusion genes identified by the aforementioned softwares, five were selected since passed all the filtering stages implemented into the proposed pipeline resulting as biologically relevant fusions. Three of them were experimentally confirmed.

2014 - A Preliminary Analysis on HEp-2 Pattern Classification: Evaluating Strategies Based on Support Vector Machines and Subclass Discriminant Analysis [Capitolo/Saggio]
UL-ISLAM, Ihtesham; DI CATALDO, Santa; Bottino, ANDREA GIUSEPPE; Macii, Enrico; Ficarra, Elisa
abstract

The categorization of different staining patterns in HEp-2 cell slides by means of indirect immunofluorescence (IIF) is important for the differential diagnosis of autoimmune diseases. The clinical practice usually relies on the visual evaluation of the slides, which is time-consuming and subject to the specialist's experience. Thus, there is a growing demand for computer-aided solutions capable of automatically classifying HEp-2 staining patterns. In the attempt to identify the most suited strategy for this task, in this work we compare two approaches based on Support Vector Machines and Subclass Discriminant Analysis. These techniques classify the available samples, characterized through a limited set of optimal textural attributes that are identified with a feature selection scheme. Our experimental results show that both strategies have a good concordance with the diagnosis of the human specialist and show the better performance of the Subclass Discriminant Analysis (91% accuracy) compared to Support Vector Machines (87% accuracy).

2014 - Computational Methods for CLIP-seq Data Processing [Articolo su rivista]
Paula H., Reyes Herrera; Ficarra, Elisa
abstract

RNA-binding proteins (RBPs) are at the core of post-transcriptional regulation and thus of gene expression control at the RNA level. One of the principal challenges in the field of gene expression regulation is to understand RBPs mechanism of action. As a result of recent evolution of experimental techniques, it is now possible to obtain the RNA regions recognized by RBPs on a transcriptome-wide scale. In fact, CLIP-seq protocols use the joint action of CLIP, crosslinking immunoprecipitation, and high-throughput sequencing to recover the transcriptome-wide set of interaction regions for a particular protein. Nevertheless, computational methods are necessary to process CLIP-seq experimental data and are a key to advancement in the understanding of gene regulatory mechanisms. Considering the importance of computational methods in this area, we present a review of the current status of computational approaches used and proposed for CLIP-seq data

2014 - Dynamic Gap Selector: A Smith Waterman Sequence Alignment Algorithm with Affine Gap Model Optimisation [Relazione in Atti di Convegno]
Urgese, Gianvito; Paciello, Giulia; Acquaviva, Andrea; Ficarra, Elisa; Graziano, Mariagrazia; Zamboni, Maurizio
abstract

Smith Waterman algorithm (S-W) is nowadays considered one of the best method to perform local alignments of biological sequences characterizing proteins, DNA and RNA molecules. Indeed, S-W is able to ensure better accuracy levels with respect to the heuristic alignment algorithms by extensively exploring all the possible alignment configurations between the sequences under examination. It has been proven that the first amino acid (AA) or nucleotide (NT) inserted/deleted (that identify a gap open) found during the alignment operations performed on sequences is more significant from a biological point of view than the subsequent ones (called gap extension), making the so called Affine Gap model a viable solution for biomolecules alignment. However, this version of S-W algorithm is expensive both in terms of computation as well as in terms of memory requirements with respect to others less demanding solutions such as the ones using a Linear Gap model. In order to overcome these drawbacks we have developed an optimised version of the S-Walgorithm based on Affine Gap model called Dynamic Gap Selector (DGS S-W). Differently from the standard S-W Affine Gap method, the proposed DGS S-W method reduces the memory requirements from 3*N*M to N*M where N and M represents the size of the compared sequences. In terms of computational costs, the proposed algorithm reduces by a factor of 2 the number of operations required by the standard Affine Gap model. DGS S-W method has been tested on two protein and one RNA sequences datasets, showing mapping scores very similar to those reached thanks to the classical S-W Affine Gap method and, at the same time, reduced computational costs and memory usage.

2014 - FunMod: A Cytoscape Plugin for Identifying Functional Modules in Undirected Protein–Protein Networks [Articolo su rivista]
Natale, M.; Benso, Alfredo; DI CARLO, Stefano; Ficarra, Elisa
abstract

The characterization of the interacting behaviors of complex biological systems is a primary objective in protein–protein network analysis and computational biology. In this paper we present FunMod, an innovative Cytoscape version 2.8 plugin that is able to mine undirected protein–protein networks and to infer sub-networks of interacting proteins intimately correlated with relevant biological pathways. This plugin may enable the discovery of new pathways involved in diseases. In order to describe the role of each protein within the relevant biological pathways, FunMod computes and scores three topological features of the identified sub-networks. By integrating the results from biological pathway clustering and topological network analysis, FunMod proved to be useful for the data interpretation and the generation of new hypotheses in two case studies.

2014 - Identifying sub-network functional modules in protein undirected networks [Relazione in Atti di Convegno]
Natale, Massimo; Benso, Alfredo; Di Carlo, Stefano; Ficarra, Elisa
abstract

2014 - Indentifying sub-network functional modules in protein undirected networks [Relazione in Atti di Convegno]
Natale, Massimo; Benso, Alfredo; DI CARLO, Stefano; Ficarra, Elisa
abstract

Protein networks are usually used to describe the interacting behaviours of complex biosystems. Bioinformatics must be able to provide methods to mine protein undirected networks and to infer subnetworks of interacting proteins for identifying relevant biological pathways. Here we present FunMod an innovative Cytoscape version 2.8 plugin able to identify biologically significant sub-networks within informative protein networks, enabling new opportunities for elucidating pathways involved in diseases. Moreover FunMod calculates three topological coefficients for each subnetwork, for a better understanding of the cooperative interactions between proteins and discriminating the role played by each protein within a functional module. FunMod is the first Cytoscape plugin with the ability of combining pathways and topological analysis allowing the identification of the key proteins within sub-network functional modules.

2014 - Next-Generation Sequencing Analysis Revealed That BCL11B Chromosomal Translocation Cooperates with Point Mutations in the Pathogenesis of Acute Myeloid Leukemia [Abstract in Rivista]
Antonella, Padella; Giorgia, Simonetti; Viviana, Guadagnuolo; Emanuela, Ottaviani; Anna, Ferrari; Elisa, Zago; Francesca, Griggio; Marianna, Garonzi; Paciello, Giulia; Simona, Bernardi; Carmen, Baldazzi; Cristina, Papayannidis; Maria Chiara, Abbenante; Francesca, Volpato; Raffaele, Calogero; Nicoletta, Testoni; Ficarra, Elisa; Alberto, Ferrarini; Massimo, Delledonne; Ilaria, Iacobucci; Giovanni, Martinelli
abstract

2014 - Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer [Articolo su rivista]
Abate, Francesco; Sakellarios, Zairis; Ficarra, Elisa; Acquaviva, Andrea; Chris H., Wiggins; Veronique, Frattini; Anna, Lasorella; Antonio, Iavarone; Giorgio, Inghirami; Raul, Rabadan
abstract

2014 - Subclass Discriminant Analysis of Morphological and Textural Features for HEp-2 Staining Pattern Classification [Articolo su rivista]
DI CATALDO, Santa; Bottino, ANDREA GIUSEPPE; UL-ISLAM, Ihtesham; FIGUEIREDO VIEIRA, Tiago; Ficarra, Elisa
abstract

Classifying HEp-2 fluorescence patterns in Indirect Immunofluorescence (IIF) HEp-2 cell imaging is important for the differential diagnosis of autoimmune diseases. The current technique, based on human visual inspection, is time-consuming, subjective and dependent on the operator's experience. Automating this process may be a solution to these limitations, making IIF faster and more reliable. This work proposes a classification approach based on Subclass Discriminant Analysis (SDA), a dimensionality reduction technique that provides an effective representation of the cells in the feature space, suitably coping with the high within-class variance typical of HEp-2 cell patterns. In order to generate an adequate characterization of the fluorescence patterns, we investigate the individual and combined contributions of several image attributes, showing that the integration of morphological, global and local textural features is the most suited for this purpose. The proposed approach provides an accuracy of the staining pattern classification of about 90%.

2013 - A novel pipeline for V(D)J junction identification using RNA-Seq paired-end reads [Relazione in Atti di Convegno]
Paciello, Giulia; Ficarra, Elisa; Alberto, Zamò; Chiara, Pighi; Carmelo, Foti; Abate, Francesco; Macii, Enrico; Acquaviva, Andrea
abstract

2013 - Acceleration of Coarse Grain Molecular Dynamics on GPU Architectures [Articolo su rivista]
Shkurti, Ardita; Mario, Orsi; Macii, Enrico; Ficarra, Elisa; Acquaviva, Andrea
abstract

Coarse grain (CG) molecular models have been proposed to simulate complex sys- tems with lower computational overheads and longer timescales with respect to atom- istic level models. However, their acceleration on parallel architectures such as Graphic Processing Units (GPU) presents original challenges that must be carefully evaluated. The objective of this work is to characterize the impact of CG model features on parallel simulation performance. To achieve this, we implemented a GPU-accelerated version of a CG molecular dynamics simulator, to which we applied specic optimizations for CG models, such as dedicated data structures to handle dierent bead type interac- tions, obtaining a maximum speed-up of 14 on the NVIDIA GTX480 GPU with Fermi architecture. We provide a complete characterization and evaluation of algorithmic and simulated system features of CG models impacting the achievable speed-up and accuracy of results, using three dierent GPU architectures as case studies.

2013 - Classification of HEp-2 staining patterns in ImmunoFluorescence images. Comparison of Support Vector Machines and Subclass Discriminant Analysis strategies [Relazione in Atti di Convegno]
UL-ISLAM, Ihtesham; DI CATALDO, Santa; Bottino, ANDREA GIUSEPPE; Ficarra, Elisa; Macii, Enrico
abstract

nti-nuclear antibodies test is based on the visual evaluation of the intensity and staining pattern in HEp-2 cell slides by means of indirect immunofluorescence (IIF) imaging, revealing the presence of autoantibodies responsible for important immune pathologies. In particular, the categorization of the staining pattern is crucial for differential diagnosis, because it provides information about autoantibodies type. Their manual classification is very time-consuming and not very reliable, since it depends on the subjectivity and on the experience of the specialist. This motivates the growing demand for computer-aided solutions able to perform staining pattern classification in a fully automated way. In this work we compare two classification techniques, based respectively on Support Vector Machines and Subclass Discriminant Analysis. A set of textural features characterizing the available samples are first extracted. Then, a feature selection scheme is applied in order to produce different datasets, containing a limited number of image attributes that are best suited to the classification purpose. Experiments on IIF images showed that our computer-aided method is able to identify staining patterns with an average accuracy of about 91% and demonstrate, in this specific problem, a better performance of Subclass Discriminant Analysis with respect to Support Vector Machines.

2013 - Gelsius: A Literature-Based Workflow for Determining Quantitative Associations between Genes and Biological Processes [Articolo su rivista]
Abate, Francesco; Acquaviva, Andrea; Ficarra, Elisa; Piva, R.; Macii, Enrico
abstract

2013 - Integration of Literature with Heterogeneous Information for Genes Correlation Scoring [Articolo su rivista]
Abate, Francesco; Acquaviva, Andrea; Ficarra, Elisa; Macii, Enrico
abstract

2013 - Optimization of Molecular Dynamics Simulations from a High Performance Computing Viewpoint [Poster]
Shkurti, Ardita; Mario, Orsi; Acquaviva, Andrea; Ficarra, Elisa; Macii, Enrico; Sophia, Wheeler; Jonathan W., Essex
abstract

2012 - A Novel Gaussian Extrapolation Approach for 2D Gel Electrophoresis Saturated Protein Spots [Articolo su rivista]
Natale, Massimo; Caiazzo, A.; Bucci, E. M.; Ficarra, Elisa
abstract

Analysis of images obtained from two-dimensional gel electrophoresis (2D-GE) is a topic of utmost importance in bioinformatics research, since commercial and academic software available currently has proven to be neither completely effective nor fully automatic, often requiring manual revision and refinement of computer generated matches. In this work, we present an effective technique for the detection and the reconstruction of over-saturated protein spots. Firstly, the algorithm reveals overexposed areas, where spots may be truncated, and plateau regions caused by smeared and overlapping spots. Next, it reconstructs the correct distribution of pixel values in these overexposed areas and plateau regions, using a two-dimensional least-squares fitting based on a generalized Gaussian distribution. Pixel correction in saturated and smeared spots allows more accurate quantification, providing more reliable image analysis results. The method is validated for processing highly exposed 2D-GE images, comparing reconstructed spots with the corresponding non-saturated image, demonstrating that the algorithm enables correct spot quantification.

2012 - A novel Gaussian fitting approach for 2D gel electrophoresis saturated protein spots [Relazione in Atti di Convegno]
Natale, Massimo; Caiazzo, A.; Bucci, E. M.; Ficarra, Elisa
abstract

Analysis of 2D-GE images is a hot topic in bioinformatics research, since currently available commercial and academic software has proven to be not really effective and not completely automatic, often requiring manual revision of spots detection and refinement of computer generated matches. In this work, we present an effective technique for the detection and the reconstruction of over-saturated protein spots. Firstly, it reveals overexposed areas where spots may be truncated, and plateau regions caused by smeared and overlapped spots. As next, the correct distribution of pixel values in the overexposed areas and plateau regions is recovered by a two-dimensional fitting based on a generalized Gaussian distribution approximating the spots volume. Pixel correction according to the generalized Gaussian curve in saturated and smeared spots allows more accurate quantifications, providing more reliable image analysis results. As validation, we process highly exposed 2D-GE image, containing saturate spots, with respect to the corresponding non-saturated image, confirming that the method can effectively fix the saturated spots and enable correct spots quantification.

2012 - A novel analysis flow for fused transcripts discovery from paired-end RNA-SEQ data [Relazione in Atti di Convegno]
Abate, F.; Paciello, G.; Acquaviva, A.; Ficarra, E.; Ferrarini, A.; Delledonne, M.; Macii, E.
abstract

Chimeric phenomena have been recently recognized to play a significant role in the investigation and understanding of the fundamental mechanisms behind highly diffused pathologies such as tumors. In this paper we present a new methodology for the detection of fusion transcript from Next Generation Sequencing (NGS) data. The methodology exploits short paired-end reads coming from RNA-Seq experiments to determine a list of fused genes and to exactly identify the fusion boundaries, so that the exact chimeric sequence can be analysed. Both known and unknown transcripts are considered, enabling the detection of fusions involving unannotated genes. An automated toolflow that reports a set of candidate fused genes and the associated junctions has been implemented and applied to a publicly available data set of melanoma.

2012 - ALK signaling and target therapy in anaplastic large cell lymphoma [Articolo su rivista]
F., Tabbó; A., Barreca; R., Piva; G., Inghirami; R., Bruna; D., Corino; D., Cortese; R., Crescenzo; G., Cuccuru; F., Di Giacomo; A., Fioravanti; M., Ladetto; I., Landra; K., Messana; R., Machiorlatti; B., Martinoglio; E., Medico; M., Mossino; E., Pellegrino; M., Todaro; P., Campisi; L., Chiusa; A., Chiappella; D., Novero; U., Vitolo; Abate, Francesco; Acquaviva, Andrea; Ficarra, Elisa; R., Freilone; M., Chilosi; A., Zamó; F., Facchetti; S., Lonardi; A., De Chiara; F., Fulciniti; C., Doglioni; M., Ponzoni; L., Agnelli; A., Neri; K., Todoerti; C., Agostinelli; P. P., Piccaluga; S., Pileri; B., Falini; E., Tiacci; P., Van Loo; T., Tousseyn; C., De Wolf Peeters; E., Geissinger; H. K., Muller Hermelink; A., Rosenwald; M. A., Pirisand; M. E., Rodriguez; F., Bertoni; M., Boi; I., Kwee
abstract

The discovery by Morris et al. (1994) of the genes contributing to the t(2;5)(p23;q35) translocation has laid the foundation for a molecular based recognition of anaplastic large cell lymphoma and highlighted the need for a further stratification of T-cell neoplasia. Likewise the detection of anaplastic lymphoma kinase (ALK) genetic lesions among many human cancers has defined unique subsets of cancer patients, providing new opportunities for innovative therapeutic interventions. The objective of this review is to appraise the molecular mechanisms driving ALK-mediated transformation, and to maintain the neoplastic phenotype. The understanding of these events will allow the design and implementation of novel tailored strategies for a well-defined subset of cancer patients.

2012 - Applying Textural Features to the Classification of HEp-2 Cell Patterns in IIF images [Relazione in Atti di Convegno]
Di Cataldo, Santa; Bottino, Andrea Giuseppe; Ficarra, Elisa; Macii, Enrico
abstract

The analysis of anti-nuclear antibodies in HEp-2 cells by indirect immunofluorescence (IIF) is fundamental for the diagnosis of important immune pathologies; in particular, classifying the staining pattern of the cell is critical for the differential diagnosis of several types of diseases. Current tests based on human evaluation are time-consuming and suffer from very high variability, which impacts on the reliability of the results. As a solution to this problem, in this work we propose a technique that performs automated classification of the staining pattern. Our method combines textural feature extraction and a two-step feature selection scheme to select a limited number of image attributes that are best suited to the classification purpose and then recognizes the staining pattern by means of a Support Vector Machine module. Experiments on IIF images showed that our method is able to identify staining patterns with average accuracy of about 87%.

2012 - Bellerophontes: a RNA-seq data analysis framework tor chimeric transcripts discovery base on accurate fusion model [Articolo su rivista]
Abate, Francesco; Acquaviva, Andrea; Paciello, Giulia; Foti, Carmelo; Ficarra, Elisa; Ferrarini, A.; Delle donne, M.; Iacobucci, I.; Soverini, S.; Martinelli, G.; Macii, Enrico
abstract

2012 - CHARACTERIZATION OF COARSE GRAIN MOLECULAR DYNAMIC SIMULATION PERFORMANCE ON GRAPHIC PROCESSING UNIT ARCHITECTURES [Relazione in Atti di Convegno]
Shkurti, Ardita; Acquaviva, Andrea; Ficarra, Elisa; Orsi, Mario; Macii, Enrico
abstract

2012 - Computer-aided techniques for Chromogenic Immunohistochemistry: Status and Directions [Articolo su rivista]
Di Cataldo, Santa; Ficarra, Elisa; Macii, Enrico
abstract

2012 - Multiscale Modelling of Cellular Actin Filaments: From Atomistic Molecular to Coarse Grained Dynamics [Articolo su rivista]
Deriu, Marco Agostino; Shkurti, Ardita; Paciello, Giulia; Bidone, Tamara Carla; Morbiducci, Umberto; Ficarra, Elisa; Audenino, Alberto; Acquaviva, Andrea
abstract

In this article, we present a computational multiscale model for the characterization of subcellular proteins. The model is encoded inside a simulation tool that builds coarse-grained (CG) force fields from atomistic simulations. Equilibrium molecular dynamics simulations on an all-atom model of the actin filament are performed. Then, using the statistical distribution of the distances between pairs of selected groups of atoms at the output of the MD simulations, the force field is parameterized using the Boltzmann inversion approach. This CG force field is further used to characterize the dynamics of the protein via Brownian dynamics simulations. This combination of methods into a single computational tool flow enables the simulation of actin filaments with length up to 400 nm, extending the time and length scales compared to state-of-the-art approaches. Moreover, the proposed multiscale modeling approach allows to investigate the relationship between atomistic structure and changes on the overall dynamics and mechanics of the filament and can be easily (i) extended to the characterization of other subcellular structures and (ii) used to investigate the cellular effects of molecular alterations due to pathological conditions.

2012 - New Software for the Identification and Characterization of Peptides Generated during Fontina Cheese Ripening Using Mass Spectrometry Data [Articolo su rivista]
Valentini, S.; Natale, Massimo; Ficarra, Elisa; Barmaz, A.
abstract

The aim of this work was to design and implement a new bioinformatics software which is able to identify the protein peptides from the peaks which arise from in-source or MS/MS fragmentation. The oligopeptide fraction was extracted from Fontina cheese at different ages of ripening and subsequently analyzed by LC/MS/MS. On the resulting total ion chromatograms, the peptides were identified by a method based both on the in-source fragmentation detectable with a single-quadrupole mass analyzer and by a new software which was developed. This software performs an in-silico digestion of the major milk proteins, it calculates all the possible peptide fragments which are generated by the loss of the first N- or C-terminal amino acids, and finally, it matches the experimental ion chromatogram with the in-silico which generated theoretical spectrum to identify the exact amino-acid protein sequence of the unknown oligopeptide. With this tool, the useful insights into the proteolytic processes which occur during Fontina cheese aging are obtained, which leads to a better knowledge about the functional features of the proteolysis end product.

2012 - One Decade of Development and Evolution of MicroRNA Target Prediction Algorithms [Articolo su rivista]
Paula H., Reyes Herrera; Ficarra, Elisa
abstract

Nearly two decades have passed since the publication of the first study reporting the discovery of microRNAs (miRNAs). The key role of miRNAs in post-transcriptional gene regulation led to the performance of an increasing number of studies focusing on origins, mechanisms of action and functionality of miRNAs. In order to associate each miRNA to a specific functionality it is essential to unveil the rules that govern miRNA action. Despite the fact that there has been significant improvement exposing structural characteristics of the miRNA-mRNA interaction, the entire physical mechanism is not yet fully understood. In this respect, the development of computational algorithms for miRNA target prediction becomes increasingly important. This manuscript summarizes the research done on miRNA target prediction. It describes the experimental data currently available and used in the field and presents three lines of computational approaches for target prediction. Finally, the authors put forward a number of considerations regarding current challenges and future directions.

2012 - Optimizing Splicing Junction Detection in Next Generation Sequencing Data on a Virtual-GRID Infrastructure [Relazione in Atti di Convegno]
Terzo, Olivier; Mossucca, L; Acquaviva, Andrea; Abate, Francesco; Ficarra, Elisa; Provenzano, R.
abstract

The new protocol for sequencing the messenger RNA in a cell, named RNA-seq produce millions of short sequence fragments. Next Generation Sequencing technology allows more accurate analysis but increase needs in term of computational resources. This paper describes the optimization of a RNA-seq analysis pipeline devoted to splicing variants detection, aimed at reducing computation time and providing a multi-user/multisample environment. This work brings two main contributions. First, we optimized a well-known algorithm called TopHat by parallelizing some sequential mapping steps. Second, we designed and implemented a hybrid virtual GRID infrastructure allowing to efficiently execute multiple instances of TopHat running on different samples or on behalf of different users, thus optimizing the overall execution time and enabling a flexible multi-user environment.

2012 - Reverse Engineering of TopHat: Splice Junction Mapper for Improving Computational Aspect [Relazione in Atti di Convegno]
Terzo, Olivier; Mossucca, L; Acquaviva, Andrea; Abate, Francesco; Ficarra, Elisa; Provenzano, R.
abstract

TopHat is a fast splice junction mapper for Next Generation Sequencing analysis, a technology for functional genomic research. Next Generation Sequencing technology allows more accurate analysis increasing data to elaborate, this opens to new challenges in terms of development of tools and computational infrastructures. We present a solution that cover aspects both software and hardware, the first one, after a reverse engineering phase, provides an improvement of algorithm of TopHat making it parallelizable, the second aspect is an implementation of an hybrid infrastructure: grid and virtual grid computing. Moreover the system allows to have a multi sample environment and is able to process automatically totally transparent to user.

2012 - Towards Low Cost Virtual Biological Laboratories: Molecular Modelling Simulation on Commodity Hardware [Poster]
Shkurti, Ardita; Acquaviva, Andrea; Ficarra, Elisa; Orsi, M.; Macii, Enrico; Essex, J. W.
abstract

Many essential cell processes, such as the conformation of embedded proteins, membrane permeability, interaction with drugs and signalling, are directly connected to the molecular dynamics of cell membranes. The importance of this biology has led to an intensifying demand for hardware and software optimized models and tools, implemented on commodity high performance low-cost hardware, in order to provide the scientific community with virtual low cost laboratories. In the light of these considerations, we implemented an accelerated version of a molecular dynamics coarse-grain lipid bilayers simulator on commodity Graphic Processing Units (GPU) architectures. The characteristics of this molecular dynamics model, such as new force fields for pair potentials that include an unconventional representation for water and charges, were particularly challenging. We introduced new algorithms and data structures required by coarse-grain models compared to atomistic ones, for the modelling of the integration timestep, neighbour list generation, and nonbonded force interactions. We characterized the impact on performance of biological systems of differing complexity in terms of size, particle type and timestep. We also compared the simulations of many particle-type systems against single particle-type systems, to evaluate the overhead of additional structures needed to model more complex molecules. Moreover, we performed a detailed analysis on the profiling of the simulation code and its execution flows due to the computation of the non-bonded forces. Finally, we characterized the acceleration and accuracy of the simulations on three GPUs having different computation capabilities and parallelism, achieving one order of magnitude faster simulation execution times.

2011 - A Molecular Dynamics study of a miRNA:mRNA interaction [Articolo su rivista]
Paciello, Giulia; Acquaviva, Andrea; Ficarra, Elisa; Deriu, Marco Agostino; Macii, Enrico
abstract

2011 - A new latent semantic analysis based methodology for knowledge extraction from biomedical literature and biological pathways databases [Relazione in Atti di Convegno]
Abate, F.; Acquaviva, A.; Ficarra, E.; Macii, E.
abstract

Nowadays, a considerable amount of genetic and biomedical studies are mostly diffused on the Web and freely available. This exciting capability, if from one side opens the way to new scenarios of cooperating research, on the other side makes the knowledge retrieval and extraction an extremely time consuming operation. In this context, the development of new tools and algorithms to automatically support the scientist activity to achieve a reliable interpretation of the complex interactions among biological entities is mandatory. In this paper we present a new methodology aimed at quantifying the biological degree of correlation among biomedical terms present in literature. The proposed method overcomes the limitation of current tools based on public literature information only, by exploiting the trustworthy information provided by biological pathways databases. We demonstrate how to integrate trusted pathway information in a semantic correlation extraction chain based on UMLS Metathesaurus and relying on PubMed as literature database. The effectiveness of the obtained results remarks the importance of automatically quantifying the degree of correlation among biomedical terms in order to helpfully support the scientist research activity.

2011 - A novel framework for chimeric transcript detection based on accurate gene fusion model [Relazione in Atti di Convegno]
Abate, Francesco; Acquaviva, Andrea; Ficarra, Elisa; Paciello, Giulia; Macii, Enrico; A., Ferrarini; M., Delledonne; S., Soverini; G., Martinelli
abstract

2011 - An effective grid infrastructure for efficiently support high throughput sequencing analysis [Relazione in Atti di Convegno]
Terzo, Olivier; Mossucca, L.; Ruiu, Pietro; Abate, Francesco; Acquaviva, Andrea; Ficarra, Elisa; Macii, Enrico
abstract

2011 - Automated Segmentation of Cells with IHC Membrane Staining [Articolo su rivista]
Ficarra, Elisa; Di Cataldo, Santa; Acquaviva, Andrea; Macii, Enrico
abstract

This study presents a fully automated membrane segmentation technique for immunohistochemical tissue images with membrane staining, which is a critical task in computerized immunohistochemistry (IHC). Membrane segmentation is particularly tricky in immunohistochemical tissue images because the cellular membranes are visible only in the stained tracts of the cell, while the unstained tracts are not visible. Our automated method provides accurate segmentation of the cellular membranes in the stained tracts and reconstructs the approximate location of the unstained tracts using nuclear membranes as a spatial reference. Accurate cell-by-cell membrane segmentation allows per cell morphological analysis and quantification of the target membrane proteins that is fundamental in several medical applications such as cancer characterization and classification, personalized therapy design, and for any other applications requiring cell morphology characterization. Experimental results on real datasets from different anatomical locations demonstrate the wide applicability and high accuracy of our approach in the context of IHC analysis.

2011 - Binding free energy calculation via molecular dynamics simulations for a miRNA:mRNA interaction [Relazione in Atti di Convegno]
Paciello, G.; Acquaviva, A.; Ficarra, E.; Deriu, M. A.; Grosso, A.; Macii, E.
abstract

In this paper we present a methodology to evaluate the binding free energy of a miRNA-mRNA complex through Molecular Dynamics-Thermodynamic Integration simulations. We applied our method on the C-elegans let-7 miRNA:lin-41 mRNA complex, known to be a validate miRNA:mRNA interaction, in order to evaluate the energetic stability of the structure. The methodology has been designed to face the various challenges of nucleic acid simulations and binding free energy computations and to allow an optimal trade-off between accuracy and computational cost.

2011 - Improving Latent Semantic Analysis of Biomedical Literature Integrating UMLS Metathesaurus and Biomedical Pathways Databases [Relazione in Atti di Convegno]
Abate, F.; Ficarra, E.; Acquaviva, A.; Macii, E.
abstract

The increasing pace of biotechnological advances produced an unprecedented amount of both experimental data and biological information mostly diffused on the web. However, the heterogeneity of the data organization and the different knowledge representations open the ways to new challenges in the integration and the extraction of biological information fundamental for correctly interpreter experimental results. In the present work we introduce a new methodology for quantitatively scoring the degree of biological correlation among biological terms occurring in biomedical abstracts. The proposed flow is based on the latent semantic analysis of biomedical literature coupled with the UMLS Metathesarurs and PubMed literature information. The results demonstrate that the structured and consolidated knowledge in the UMLS and pathway database efficiently improves the accuracy of the latent semantic analysis of biomedical literature. © Springer-Verlag Berlin Heidelberg 2013.

2011 - Motion artifact correction in ASL images: an improved automated procedure [Relazione in Atti di Convegno]
DI CATALDO, Santa; Ficarra, Elisa; Acquaviva, Andrea; Macii, Enrico
abstract

2011 - Solid state photodetectors for nuclear medical imaging applications [Relazione in Atti di Convegno]
Mazzillo, M.; Fallica, P. G.; Ficarra, Elisa; Messina, A.; Romeo, M.; Zafalon, R.
abstract

2011 - miREE: miRNA Recognition Elements Ensemble [Articolo su rivista]
REYES HERRERA, PAULA HELENA; Ficarra, Elisa; Acquaviva, Andrea; Macii, Enrico
abstract

2010 - Achieving the Way for Automated Segmentation of Nuclei in Cancer Tissue Images through Morphology-Based Approach: a Quantitative Evaluation [Articolo su rivista]
DI CATALDO, Santa; Ficarra, Elisa; Acquaviva, Andrea; Macii, E.
abstract

2010 - An Automated Tool for Scoring Biomedical Terms Correlation Based on Semantic Analysis [Relazione in Atti di Convegno]
Abate, Francesco; Ficarra, Elisa; Acquaviva, Andrea; Macii, Enrico
abstract

2010 - Automated segmentation of tissue images for computerized IHC analysis [Articolo su rivista]
Di Cataldo, Santa; Ficarra, Elisa; Acquaviva, Andrea; Macii, Enrico
abstract

This paper presents two automated methods for the segmentation ofimmunohistochemical tissue images that overcome the limitations of themanual approach aswell as of the existing computerized techniques. The first independent method, based on unsupervised color clustering, recognizes automatically the target cancerous areas in the specimen and disregards the stroma; the second method, based on colors separation and morphological processing, exploits automated segmentation of the nuclear membranes of the cancerous cells. Extensive experimental results on real tissue images demonstrate the accuracy of our techniques compared to manual segmentations; additional experiments show that our techniques are more effective in immunohistochemical images than popular approaches based on supervised learning or active contours. The proposed procedure can be exploited for any applications that require tissues and cells exploration and to perform reliable and standardized measures of the activity of specific proteins involved in multi-factorial genetic pathologies.

2010 - GPU acceleration of simulation tool for lipid-bilayers [Relazione in Atti di Convegno]
Orsi, M.; Shkurti, A.; Acquaviva, A.; Ficarra, E.; Macii, E.; Ruggiero, M.
abstract

Nowadays the need for powerful hardware architectures, which allow for high throughput data analysis and calculus, is fundamental especially for biological applications. We have been focused on utilizing the Graphic Processing Unit (GPU) architectures of NVIDIA for accelerating a lipid bilayer simulation tool for biomembranes. ©2010 IEEE.

2010 - MicroRNA target prediction and exploration through candidate binding sites generation [Relazione in Atti di Convegno]
Reyes-Herrera, P. H.; Acquaviva, A.; Ficarra, E.; Macii, E.
abstract

Gene regulation is one of the most important processes in the molecular biology, in the last years the microRNA molecule, one of the non-coding RNAs involved in the process, has been the focus of attention for several studies. The computational research on this area has gained a notable importance, considering the low amount of experimental information available and the lack of understanding of the microRNA binding mechanism. This article deals with the microRNA-target prediction and presents an innovative method for it. First it generates a set of promising binding sites for a given microRNA using a Genetic Algorithm, at the same time a set of target genes is selected based on the biological process under study. Secondly the set of promising binding sites is mapped into the selected set of target genes, in order to provide real binding sites and finally the resulting targets are filtered according to a biological or structural property. The objectives are to provide a flexible method that is capable of incorporating easily new knowledge, is independent of availability of the experimental information and is able to give hints on the research towards new characteristics among the microRNA binding sites such as motifs. The results present some of this novel properties and present a comparison with the most frequently used methods in the field. © 2010 IEEE.

2009 - Extraction of Constraints from Biological Data [Capitolo/Saggio]
Apiletti, Daniele; Bruno, Giulia; Ficarra, Elisa; Baralis, Elena Maria
abstract

2009 - Novel Method for MicroRNA Target Prediction Using a Genetic Algorithm [Relazione in Atti di Convegno]
REYES HERRERA, PAULA HELENA; Acquaviva, Andrea; Ficarra, Elisa; Macii, Enrico
abstract

2008 - Automated Discrimination of Pathological Regions in Tissue Images: Unsupervised Clustering vs Supervised SVM Classification [Capitolo/Saggio]
Di Cataldo, Santa; Ficarra, Elisa; Macii, Enrico
abstract

Recognizing and isolating cancerous cells from non pathological tissue areas (e.g. connective stroma) is crucial for fast and objective immunohistochemical analysis of tissue images. This operation allows the further application of fully-automated techniques for quantitative evaluation of protein activity, since it avoids the necessity of a preventive manual selection of the representative pathological areas in the image, as well as of taking pictures only in the pure-cancerous portions of the tissue. In this paper we present a fully-automated method based on unsupervised clustering that performs tissue segmentations highly comparable with those provided by a skilled operator, achieving on average an accuracy of 90%. Experimental results on a heterogeneous dataset of immunohistochemical lung cancer tissue images demonstrate that our proposed unsupervised approach overcomes the accuracy of a theoretically superior supervised method such as Support Vector Machine (SVM) by 8%.

2008 - Fully-Automated Segmentation of Tumor Areas in Tissue Confocal Images [Relazione in Atti di Convegno]
DI CATALDO, Santa; Ficarra, Elisa; Macii, Enrico
abstract

2008 - Joint co-clustering: co-clustering of genomic and clinical bioimaging data [Articolo su rivista]
Ficarra, Elisa; DE MICHELI, G; Yoon, S; Benini, L; Macii, Enrico
abstract

2008 - Segmentation of Nuclei in Cancer Tissue Images: Contrasting Active Contours with Morphology-Based Approach [Relazione in Atti di Convegno]
DI CATALDO, Santa; Ficarra, Elisa; Acquaviva, Andrea; Macii, Enrico
abstract

2008 - Temporal Association Rules for Gene Regulatory Networks [Relazione in Atti di Convegno]
Baralis, Elena Maria; Bruno, Giulia; Ficarra, Elisa
abstract

2007 - Gene-Markers Representation for Microarray Data Integration [Relazione in Atti di Convegno]
Baralis, ELENA MARIA; Ficarra, Elisa; Fiori, Alessandro; Macii, Enrico
abstract

2007 - Selection of Tumor Areas and Segmentation of Nuclear Membranes in Tissue Confocal Images: a Fully-Automated Approach [Relazione in Atti di Convegno]
DI CATALDO, Santa; Ficarra, Elisa; Macii, Enrico
abstract

2006 - Bioimaging and Clinical Genomics [Relazione in Atti di Convegno]
Ficarra, E.; DE MICHELI, G.; Yoon, S.; Benini, L.; Macii, E.
abstract

2006 - Clinical bioimaging and functional genomics [Relazione in Atti di Convegno]
Ficarra, Elisa; Yoon, S; Benini, L; Macii, E; DE MICHELI, G.
abstract

2006 - Computer-aided evaluation of protein expression in pathological tissue images [Relazione in Atti di Convegno]
Ficarra, Elisa; Macii, Enrico; Benini, L; DE MICHELI, G.
abstract

2006 - Data Cleaning and Semantic Improvement in Biological Databases [Articolo su rivista]
Apiletti, Daniele; Bruno, Giulia; Ficarra, Elisa; Baralis, ELENA MARIA
abstract

2006 - Optimized Techniques for DNA structural properties investigation [Articolo su rivista]
Masotti, D; Ficarra, Elisa; Benini, L; Macii, Enrico; Zuccheri, G.
abstract

2005 - Automated DNA Fragments Recognition and Sizing through AFM Image Processing [Articolo su rivista]
Ficarra, Elisa; Benini, L; Macii, Enrico; Zuccheri, G.
abstract

This paper presents an automated algorithm to determine DNA fragment size from atomic force microscope images and to extract the molecular profiles. The sizing of DNA fragments is a widely used procedure for investigating the physical properties of individual or protein-bound DNA molecules. Several atomic force microscope (AFM) real and computer-generated images were tested for different pixel and fragment sizes and for different background noises. The automated approach minimizes processing time with respect to manual and semi-automated DNA sizing. Moreover, the DNA molecule profile recognition can be used to perform further structural analysis. For computer-generated images, the root mean square error incurred by the automated algorithm in the length estimation is 0.6% for a 7.8 nm image pixel size and 0.34% for a 3.9 nm image pixel size. For AFM real images we obtain a distribution of lengths with a standard deviation of 2.3% of mean and a measured average length very close to the real one, with an error around 0.33%.

2005 - Automatic Intrinsic DNA Curvature Computation from AFM Images [Articolo su rivista]
Ficarra, Elisa; Masotti, D; Benini, L; Macii, Enrico; Zuccheri, G; Samori, B.
abstract

2004 - A Robust Algorithm for Automated Analysis of DNA Molecules in AFM Images [Relazione in Atti di Convegno]
Ficarra, Elisa; Benini, L; Macii, Enrico; Zuccheri, G.
abstract

2004 - Techniques for Enhancing Computation of DNA Curvature Molecules [Relazione in Atti di Convegno]
Masotti, D; Ficarra, Elisa; Macii, Enrico; Benini, L.
abstract

2002 - Automated DNA sizing in atomic force microscope images [Relazione in Atti di Convegno]
Ficarra, Elisa
abstract

An automated algorithm is presented to determine fragment DNA size from Atomic Force Microscope images. Several real and synthetic images were tested for different image and fragment sizes and different background noises. The automated approach allows to minimize processing time with respect to manual DNA sizing and to extract information that can be used to perform further analysis on the molecules. For computer-generated test images the percentage error in length estimation is less than 1% and its average value is 0.4%. For real images the deviation with respect to manually-performed length estimation is around 1%.

Università degli studi di Modena e Reggio Emilia

Pubblicazioni