Nuova ricerca

Giorgia FRANCHINI

Ricercatore Legge 240/10 - t.det.
Dipartimento di Scienze Fisiche, Informatiche e Matematiche sede ex-Matematica


Home | Curriculum(pdf) | Didattica |


Pubblicazioni

2023 - A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction [Articolo su rivista]
Franchini, G.; Porta, F.; Ruggiero, V.; Trombini, I.
abstract

Many optimization problems arising from machine learning applications can be cast as the minimization of the sum of two functions: the first one typically represents the expected risk, and in practice it is replaced by the empirical risk, and the other one imposes a priori information on the solution. Since in general the first term is differentiable and the second one is convex, proximal gradient methods are very well suited to face such optimization problems. However, when dealing with large-scale machine learning issues, the computation of the full gradient of the differentiable term can be prohibitively expensive by making these algorithms unsuitable. For this reason, proximal stochastic gradient methods have been extensively studied in the optimization area in the last decades. In this paper we develop a proximal stochastic gradient algorithm which is based on two main ingredients. We indeed combine a proper technique to dynamically reduce the variance of the stochastic gradients along the iterative process with a descent condition in expectation for the objective function, aimed to fix the value for the steplength parameter at each iteration. For general objective functionals, the a.s. convergence of the limit points of the sequence generated by the proposed scheme to stationary points can be proved. For convex objective functionals, both the a.s. convergence of the whole sequence of the iterates to a minimum point and an O(1 / k) convergence rate for the objective function values have been shown. The practical implementation of the proposed method does not need neither the computation of the exact gradient of the empirical risk during the iterations nor the tuning of an optimal value for the steplength. An extensive numerical experimentation highlights that the proposed approach appears robust with respect to the setting of the hyperparameters and competitive compared to state-of-the-art methods.


2023 - Constrained and unconstrained deep image prior optimization models with automatic regularization [Articolo su rivista]
Cascarano, Pasquale; Franchini, Giorgia; Kobler, Erich; Porta, Federica; Sebastiani, Andrea
abstract

Deep Image Prior (DIP) is currently among the most efficient unsupervised deep learning based methods for ill-posed inverse problems in imaging. This novel framework relies on the implicit regularization provided by representing images as the output of generative Convolutional Neural Network (CNN) architectures. So far, DIP has been shown to be an effective approach when combined with classical and novel regularizers. Unfortunately, to obtain appropriate solutions, all the models proposed up to now require an accurate estimate of the regularization parameter. To overcome this difficulty, we consider a locally adapted regularized unconstrained model whose local regularization parameters are automatically estimated for additively separable regularizers. Moreover, we propose a novel constrained formulation in analogy to Morozov's discrepancy principle which enables the application of a broader range of regularizers. Both the unconstrained and the constrained models are solved via the proximal gradient descent-ascent method. Numerical results demonstrate the robustness with respect to image content, noise levels and hyperparameters of the proposed models on both denoising and deblurring of simulated as well as real natural and medical images.


2023 - Correction to: A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction (Journal of Scientific Computing, (2023), 94, 1, (23), 10.1007/s10915-022-02084-3) [Articolo su rivista]
Franchini, G.; Porta, F.; Ruggiero, V.; Trombini, I.
abstract

We restated both the statement and the proof of Theorem 3. We stress that the proof only changes in obtaining the inequality (A11) from (A10), but for a better readability we report all the arguments of the proof.


2023 - DCT-Former: Efficient Self-Attention with Discrete Cosine Transform [Articolo su rivista]
Scribano, C.; Franchini, G.; Prato, M.; Bertogna, M.
abstract

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of “fully-attentive” architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as O(n^2) where n stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public.


2023 - Denoising Diffusion Models on Model-Based Latent Space [Articolo su rivista]
Scribano, C.; Pezzi, D.; Franchini, G.; Prato, M.
abstract

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.


2023 - Diagonal Barzilai-Borwein Rules in Stochastic Gradient-Like Methods [Relazione in Atti di Convegno]
Franchini, G.; Porta, F.; Ruggiero, V.; Trombini, I.; Zanni, L.
abstract


2023 - Explainable bilevel optimization: An application to the Helsinki deblur challenge [Articolo su rivista]
Bonettini, Silvia; Franchini, Giorgia; Pezzi, Danilo; Prato, Marco
abstract

In this paper we present a bilevel optimization scheme for the solution of a general image deblurring problem, in which a parametric variational-like approach is encapsulated within a machine learning scheme to provide a high quality reconstructed image with automatically learned parameters. The ingredients of the variational lower level and the machine learning upper one are specifically chosen for the Helsinki Deblur Challenge 2021, in which sequences of letters are asked to be recovered from out-of-focus photographs with increasing levels of blur. Our proposed procedure for the reconstructed image consists in a fixed number of FISTA iterations applied to the minimization of an edge preserving and binarization enforcing regularized least-squares functional. The parameters defining the variational model and the optimization steps, which, unlike most deep learning approaches, all have a precise and interpretable meaning, are learned via either a similarity index or a support vector machine strategy. Numerical experiments on the test images provided by the challenge authors show significant gains with respect to a standard variational approach and performances comparable with those of some of the proposed deep learning based algorithms which require the optimization of millions of parameters.


2023 - Learning rate selection in stochastic gradient methods based on line search strategies [Articolo su rivista]
Franchini, G.; Porta, F.; Ruggiero, V.; Trombini, I.; Zanni, L.
abstract

Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and often arise in machine learning applications with large scale data sets. A very popular approach to face finite-sum problems is the stochastic gradient method. It is well known that a proper strategy to select the hyperparameters of this method (i.e. the set of a-priori selected parameters) and, in particular, the learning rate, is needed to guarantee convergence properties and good practical performance. In this paper, we analyse standard and line search based updating rules to fix the learning rate sequence, also in relation to the size of the mini batch chosen to compute the current stochastic gradient. An extensive numerical experimentation is carried out in order to evaluate the effectiveness of the discussed strategies for convex and non-convex finite-sum test problems, highlighting that the line search based methods avoid expensive initial setting of the hyperparameters. The line search based approaches have also been applied to train a Convolutional Neural Network, providing very promising results.


2023 - Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems [Relazione in Atti di Convegno]
Masola, A.; Capodieci, N.; Rouxel, B.; Franchini, G.; Cavicchioli, R.
abstract


2023 - Neural architecture search via standard machine learning methodologies [Articolo su rivista]
Franchini, Giorgia; Ruggiero, Valeria; Porta, Federica; Zanni, Luca
abstract

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.


2023 - Piece-wise Constant Image Segmentation with a Deep Image Prior Approach [Relazione in Atti di Convegno]
Benfenati, A.; Catozzi, A.; Franchini, G.; Porta, F.
abstract

Image segmentation is a key topic in image processing and computer vision and several approaches have been proposed in the literature to address it. The formulation of the image segmentation problem as the minimization of the Mumford-Shah energy has been one of the most commonly used techniques in the last past decades. More recently, deep learning methods have yielded a new generation of image segmentation models with remarkable performance. In this paper we propose an unsupervised deep learning approach for piece-wise image segmentation based on the so called Deep Image Prior by parameterizing the Mumford-Shah functional in terms of the weights of a convolutional neural network. Several numerical experiments on both biomedical and natural images highlight the goodness of the suggested approach. The implicit regularization provided by the Deep Image Prior model allows to also consider noisy input images and to investigate the robustness of the proposed technique with respect to the level of noise.


2023 - Sex differences in schizophrenia-spectrum diagnoses: results from a 30-year health record registry [Articolo su rivista]
Ferrara, M.; Curtarello, E. M. A.; Gentili, E.; Domenicano, I.; Vecchioni, L.; Zese, R.; Alberti, M.; Franchini, G.; Sorio, C.; Benini, L.; Little, J.; Carozza, P.; Dazzan, P.; Grassi, L.
abstract

This study investigated sociodemographic and clinical differences between the sexes in individuals affected by schizophrenia-spectrum disorders (SSD) who accessed outpatient mental health services. Within a retrospective cohort of 45,361 outpatients receiving care in Ferrara (Italy) from 1991 to 2021, those with a SSD diagnosis were compared between the sexes for sociodemographic and clinical characteristics before and after the index date (when the ICD-9: 295.*diagnosis was first recorded) to assess early trajectory, age and type of diagnosis, and severity of illness indicated by medication use, hospitalization, and duration of psychiatric care. Predictors of discharge were also investigated. Among 2439 patients, 1191 were women (48.8%). Compared to men, women were significantly older at first visit (43.7 vs. 36.8 years) and at index date (47.8 vs. 40.6) with peak frequency at age 48 (vs. 30). The most frequent last diagnosis recorded before the index date was delusional disorder (27.7%) or personality disorder (24.3%) in men and depression (24%) and delusional disorder (30.1%) in women. After the index date, long-acting antipsychotics and clozapine were more frequently prescribed to men (46.5% vs. 36.3%; 13.2% vs. 9.4%, p < 0.05) and mood stabilizers and antidepressants to women (24.3% vs. 21.1%; 50.1% vs. 35.5%; p < 0.05). Women had fewer involuntary admissions (10.1% vs. 13.6%) and were more likely to be discharged as the time under care increased (p = 0.009). After adjusting for covariates, sex was not a significant predictor of discharge. Our study confirmed that sex differences exist in clinical and sociodemographic characteristics of outpatients with SSD and that gender considerations might influence the rapidity of diagnosis and medications prescribed. These findings highlight the need to implement a women-tailored approach in specialist care programs for psychoses.


2022 - Biomedical Image Classification via Dynamically Early Stopped Artificial Neural Network [Articolo su rivista]
Franchini, Giorgia; Verucchi, Micaela; Catozzi, Ambra; Porta, Federica; Prato, Marco
abstract

It is well known that biomedical imaging analysis plays a crucial role in the healthcare sector and produces a huge quantity of data. These data can be exploited to study diseases and their evolution in a deeper way or to predict their onsets. In particular, image classification represents one of the main problems in the biomedical imaging context. Due to the data complexity, biomedical image classification can be carried out by trainable mathematical models, such as artificial neural networks. When employing a neural network, one of the main challenges is to determine the optimal duration of the training phase to achieve the best performance. This paper introduces a new adaptive early stopping technique to set the optimal training time based on dynamic selection strategies to fix the learning rate and the mini-batch size of the stochastic gradient method exploited as the optimizer. The numerical experiments, carried out on different artificial neural networks for image classification, show that the developed adaptive early stopping procedure leads to the same literature performance while finalizing the training in fewer epochs. The numerical examples have been performed on the CIFAR100 dataset and on two distinct MedMNIST2D datasets which are the large-scale lightweight benchmark for biomedical image classification.


2022 - Deep Image Prior for medical image denoising, a study about parameter initialization [Articolo su rivista]
Sapienza, Davide; Franchini, Giorgia; Govi, Elena; Bertogna, Marko; Prato, Marco
abstract

Convolutional Neural Networks are widely known and used architectures in image processing contexts, in particular for medical images. These Deep Learning techniques, known for their ability to extract high-level features, almost always require a labeled dataset, a process that can be computationally expensive. Most of the time in the biomedical context, when images are used they are noisy and the ground-truth is unknown. For this reason, and in the context of Green Artificial Intelligence, recently, an unsupervised method that employs Convolutional Neural Networks, or more precisely autoencoders, has appeared in the panorama of Deep Learning. This technique, called Deep Image Prior (DIP) by the authors, can be used in areas such as denoising, superresolution, and inpainting. Starting from these assumptions, this work analyses the robustness of these networks with respect to different types of initialization. First of all, we analyze the different types of parameters: related to the Batch Norm and the Convolutional layers. For the results, we focus on the speed of convergence and the maximum performance obtained. However, this paper aims to apply acquired information on Computer Tomography noised images. In fact, the final purpose is to test the best initializations of the first phase on a phantom image and then on a real Computer Tomography one. In fact, Computer Tomography together with Magnetic Resonance Imaging and Positron Emission Tomography are some of the diagnostic tools currently available to neuroscientists and oncologists. This work shows how initializations affect final performances and, in addition, how they should be used in the medical image reconstruction field. The section on numerical experiments shows results that on the one hand confirm the importance of a good initialization to obtain fast convergence and high performance; on the other hand, it shows how the method is robust to the processing of different image types: natural and medical. Not a single good initialization is discovered, but many of them could be chosen, according to specific necessities of the single problem.


2022 - Learning the Image Prior by Unrolling an Optimization Method [Relazione in Atti di Convegno]
Bonettini, S.; Franchini, G.; Pezzi, D.; Prato, M.
abstract

Nowadays neural networks are omnipresent thanks to the amazing adaptability they possess, despite their poor interpretability and the difficulties they give when manipulating the parameters. On the other side, we have the classical variational approach, where the restoration is obtained as the solution of a given optimization problem. The bilevel approach is connected to both approaches and consists first in devising a parametric formulation of the variational problem, then in optimizing these parameters with respect to a given dataset of training data. In this work we analyze the classical bilevel approach in combination with unrolling techniques, where the parameters of the variational problem are trained with respect to the results obtained after a fixed number of iterations of an optimization method applied to it. This results in a large scale optimization problem which can be solved by means of stochastic methods; as we observed in our numerical experiments, the stochastic approach can produce medium accuracy results in very few epochs. Moreover, our experiments also show that the unrolling approach leads to results which are comparable with those of the original bilevel method in terms of accuracy.


2022 - Machine Learning and Non-Affective Psychosis: Identification, Differential Diagnosis, and Treatment [Articolo su rivista]
Ferrara, Maria; Franchini, Giorgia; Funaro, Melissa; Cutroni, Marcello; Valier, Beatrice; Toffanin, Tommaso; Palagini, Laura; Zerbinati, Luigi; Folesani, Federica; Murri, Martino Belvederi; Caruso, Rosangela; Grassi, Luigi
abstract


2022 - On the First-Order Optimization Methods in Deep Image Prior [Articolo su rivista]
Cascarano, P.; Franchini, G.; Porta, F.; Sebastiani, A.
abstract

Deep learning methods have state-of-The-Art performances in many image restoration tasks. Their effectiveness is mostly related to the size of the dataset used for the training. Deep image prior (DIP) is an energy-function framework which eliminates the dependency on the training set, by considering the structure of a neural network as an handcrafted prior offering high impedance to noise and low impedance to signal. In this paper, we analyze and compare the use of different optimization schemes inside the DIP framework for the denoising task.


2022 - Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods [Relazione in Atti di Convegno]
Franchini, G.; Ruggiero, V.; Trombini, I.
abstract

A crucial aspect in designing a learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process). In particular the effectiveness of the stochastic gradient methods strongly depends on the steplength selection. In recent papers [9, 10], Franchini et al. propose to adopt an adaptive selection rule borrowed from the full-gradient scheme known as Limited Memory Steepest Descent method [8] and appropriately tailored to the stochastic framework. This strategy is based on the computation of the eigenvalues (Ritz-like values) of a suitable matrix obtained from the gradients of the most recent iterations, and it enables to give an estimation of the local Lipschitz constant of the current gradient of the objective function, without introducing line-search techniques. The possible increase of the size of the sub-sample used to compute the stochastic gradient is driven by means of an augmented inner product test approach [3]. The whole procedure makes the tuning of the parameters less expensive than the selection of a fixed steplength, although it remains dependent on the choice of threshold values bounding the variability of the steplength sequences. The contribution of this paper is to exploit a stochastic version of the Barzilai-Borwein formulas [1] to adaptively select the endpoints range for the Ritz-like values. A numerical experimentation for some convex loss functions highlights that the proposed procedure remains stable as well as the tuning of the hyperparameters appears less expensive.


2021 - All you can embed: Natural language based vehicle retrieval with spatio-temporal transformers [Relazione in Atti di Convegno]
Scribano, C.; Sapienza, D.; Franchini, G.; Verucchi, M.; Bertogna, M.
abstract

Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.


2021 - Combining Weighted Total Variation and Deep Image Prior for natural and medical image restoration via ADMM [Relazione in Atti di Convegno]
Cascarano, P.; Sebastiani, A.; Comes, M. C.; Franchini, G.; Porta, F.
abstract

In the last decades, unsupervised deep learning based methods have caught researchers' attention, since in many real applications, such as medical imaging, collecting a large amount of training examples is not always feasible. Moreover, the construction of a good training set is time consuming and hard because the selected data have to be enough representative for the task. In this paper, we focus on the Deep Image Prior (DIP) framework and we propose to combine it with a space-variant Total Variation regularizer with an automatic estimation of the local regularization parameters. Differently from other existing approaches, we solve the arising minimization problem via the flexible Alternating Direction Method of Multipliers (ADMM). Furthermore, we provide a specific implementation also for the standard isotropic Total Variation. The promising performances of the proposed approach, in terms of PSNR and SSIM values, are addressed through several experiments on simulated as well as real natural and medical corrupted images.


2020 - Artificial Neural Networks: The Missing Link Between Curiosity and Accuracy [Relazione in Atti di Convegno]
Franchini, G.; Burgio, P.; Zanni, L.
abstract

Artificial Neural Networks, as the name itself suggests, are biologically inspired algorithms designed to simulate the way in which the human brain processes information. Like neurons, which consist of a cell nucleus that receives input from other neurons through a web of input terminals, an Artificial Neural Network includes hundreds of single units, artificial neurons or processing elements, connected with coefficients (weights), and are organized in layers. The power of neural computations comes from connecting neurons in a network: in fact, in an Artificial Neural Network it is possible to manage a different number of information at the same time. What is not fully understood is which is the most efficient way to train an Artificial Neural Network, and in particular what is the best mini-batch size for maximize accuracy while minimizing training time. The idea that will be developed in this study has its roots in the biological world, that inspired the creation of Artificial Neural Network in the first place. Humans have altered the face of the world through extraordinary adaptive and technological advances: those changes were made possible by our cognitive structure, particularly the ability to reasoning and build causal models of external events. This dynamism is made possible by a high degree of curiosity. In the biological world, and especially in human beings, curiosity arises from the constant search of knowledge and information: behaviours that support the information sampling mechanism range from the very small (initial mini-batch size) to the very elaborate sustained (increasing mini-batch size). The goal of this project is to train an Artificial Neural Network by increasing dynamically, in an adaptive manner (with validation set), the mini-batch size; our hypothesis is that this training method will be more efficient (in terms of time and costs) compared to the ones implemented so far.


2020 - Automatic stochastic dithering techniques on GPU: Image quality and processing time improved [Articolo su rivista]
Franchini, G.; Cavicchioli, R.; Hu, J. C.
abstract

Dithering or error diffusion is a technique used to obtain a binary image, suitable for printing, from a grayscale one. At each step, the algorithm computes an allowed value of a pixel from a grayscale one, applying a threshold and, therefore, causing a conversion error. To obtain the optical illusion of a continuous tone, the obtained error is distributed to adjacent pixels. In literature there are many algorithms of this type, to cite some Jarvis, Judice and Ninke (JJN), Stucki, Atkinson, Burkes, Sierra but the most known and used is the Floyd-Steinberg. We compared various types of dithering, which differ from each other for the weights and number of pixels involved in the error diffusion scheme. All these algorithms suffer from two problems: artifacts and slowness. First, we address the artifacts, which are undesired texture patterns generated by the dithering algorithm, leading to a less appealing visual results. To address this problem, we developed a stochastic version of Floyd-Steinberg's algorithm. The Weighted Signal to Noise Ratio (WSNR) is adopted to evaluate the outcome of the procedure, an error measure based on human visual perception that also takes into account artifacts. This measure behaves similarly to a low-pass filter and, in particular, exploits a contrast sensitivity function to compare the algorithm's result and the original image in terms of similarity. We will show that the new stochastic algorithm is better in terms of both WSNR measurement and visual analysis. Secondly, we address the method's inherent computational slowness: We implemented a parallel version of the Floyd-Steinberg algorithm that takes advantage of GPGPU (General Purtose Graphics Processing Unit) computing, drastically reducing the execution time. Specifically, we observed a quadratic time complexity with respect to the input size for the serial case, whereas the computational time required for our parallel implementation increased linearly. We then evaluated both image quality and the performance of the parallel algorithm on a exhaustive image database. Finally, to make the method fully automatic, an empirical technique is presented to choose the best degree of stochasticity.


2020 - On the Steplength Selection in Stochastic Gradient Methods [Relazione in Atti di Convegno]
Franchini, G.; Ruggiero, V.; Zanni, L.
abstract

This paper deals with the steplength selection in stochastic gradient methods for large scale optimization problems arising in machine learning. We introduce an adaptive steplength selection derived by tailoring a limited memory steplength rule, recently developed in the deterministic context, to the stochastic gradient approach. The proposed steplength rule provides values within an interval, whose bounds need to be prefixed by the user. A suitable choice of the interval bounds allows to perform similarly to the standard stochastic gradient method equipped with the best-tuned steplength. Since the setting of the bounds slightly affects the performance, the new rule makes the tuning of the parameters less expensive with respect to the choice of the optimal prefixed steplength in the standard stochastic gradient method. We evaluate the behaviour of the proposed steplength selection in training binary classifiers on well known data sets and by using different loss functions.


2020 - Ritz-like values in steplength selections for stochastic gradient methods [Articolo su rivista]
Franchini, G.; Ruggiero, V.; Zanni, L.
abstract

The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods for large-scale optimization problems arising in machine learning. In a recent paper, Bollapragada et al. (SIAM J Optim 28(4):3312–3343, 2018) propose to include an adaptive subsampling strategy into a stochastic gradient scheme, with the aim to assure the descent feature in expectation of the stochastic gradient directions. In this approach, theoretical convergence properties are preserved under the assumption that the positive steplength satisfies at any iteration a suitable bound depending on the inverse of the Lipschitz constant of the objective function gradient. In this paper, we propose to tailor for the stochastic gradient scheme the steplength selection adopted in the full-gradient method knows as limited memory steepest descent method. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the inverse of the local Lipschitz parameter, without introducing line search techniques, while the possible increase in the size of the subsample used to compute the stochastic gradient enables to control the variance of this direction. An extensive numerical experimentation highlights that the new rule makes the tuning of the parameters less expensive than the trial procedure for the efficient selection of a constant step in standard and mini-batch stochastic gradient methods.


2020 - Steplength and Mini-batch Size Selection in Stochastic Gradient Methods [Relazione in Atti di Convegno]
Franchini, G.; Ruggiero, V.; Zanni, L.
abstract

The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods for large-scale optimization problems arising in machine learning. In a recent paper, Bollapragada et al. [1] propose to include an adaptive subsampling strategy into a stochastic gradient scheme. We propose to combine this approach with a selection rule for the steplength, borrowed from the full-gradient scheme known as Limited Memory Steepest Descent (LMSD) method [4] and suitably tailored to the stochastic framework. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the local Lipschitz constant of the gradient of the objective function, without introducing line-search techniques, while the possible increase of the subsample size used to compute the stochastic gradient enables to control the variance of this direction. An extensive numerical experimentation for convex and non-convex loss functions highlights that the new rule makes the tuning of the parameters less expensive than the selection of a suitable constant steplength in standard and mini-batch stochastic gradient methods. The proposed procedure has also been compared with the Momentum and ADAM methods.


2019 - Stochastic Floyd-Steinberg dithering on GPU: image quality and processing time improved [Relazione in Atti di Convegno]
Franchini, G.; Cavicchioli, R.; Hu, J. C.
abstract

Error diffusion dithering is a technique that is used to represent a grey-scale image in a format usable by a printer. At every step, an algorithm converts the grey-scale value of a pixel to a new value within the allowed ones, generating a conversion error. To achieve the effect of continuous-tone illusion, the error is distributed to the neighboring pixels. Among the existent algorithms, the most commonly used is Floyd-Steinberg. However, this algorithm suffers two issues: artifacts and slowness. Regarding artifacts, those are textures that can appear after the image elaboration, making it visually different from the original one. In order to avoid this effect, we will use a stochastic version of Floyd-Steinberg algorithm. To evaluate the results, we will apply the Weighted Signal to Noise Ratio (WSNR), a visual-based model to account for perceptivity of dithered textures. This filter has a low-pass characteristic and, in particular, it uses a Contrast Sensitivity Function to evaluate the similarity between the original image and the final image. Our claim is that the new stochastic algorithm is better suited for both the WSNR measure and the visual analysis. Secondly, we will face slowness: we will describe a parallel version of Floyd-Steinberg algorithm that will exploit GPU (Graphics Processing Unit), drastically reducing the spent time. Specifically, we noticed that the serial version computational time increases quadratically with the input size, while the parallel version one increases linearly. Both the image quality and the computational performance of the parallel algorithm are evaluated on several large-scale images.