In this document, we prove the convergence of the model proposed in [1], which aims at estimating the LoRaWAN network performance in a single-gateway scenario. First, we provide an analytical proof of the existence of a fixed point solution for such a system. Then, we report experimental results, showing that the system of the two inter-dependent equations provided by the model can be solved through fixed-point iterations, and that a limited number of iterations is enough to reach convergence.
The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we present a PL-based analysis for linear convergence of generalized mirror descent (GMD), a generalization of mirror descent with a possibly time-dependent mirror. GMD subsumes popular first order optimization methods including gradient descent, mirror descent, and preconditioned gradient descent methods such as Adagrad. Since the standard PL analysis cannot be extended naturally from GMD to stochastic GMD, we present a Taylor-series based analysis to establish sufficient conditions for linear convergence of stochastic GMD. As a corollary, our result establishes sufficient conditions and provides learning rates for linear convergence of stochastic mirror descent and Adagrad. Lastly, for functions that are locally PL*, our analysis implies existence of an interpolating solution and convergence of GMD to this solution.
In the regression problem, we consider the problem of estimating the variance function by the means of aggregation methods. We focus on two particular aggregation setting: Model Selection aggregation (MS) and Convex aggregation (C) where the goal is to select the best candidate and to build the best convex combination of candidates respectively among a collection of candidates. In both cases, the construction of the estimator relies on a two-step procedure and requires two independent samples. The first step exploits the first sample to build the candidate estimators for the variance function by the residual-based method and then the second dataset is used to perform the aggregation step. We show the consistency of the proposed method with respect to the L 2error both for MS and C aggregations. We evaluate the performance of these two methods in the heteroscedastic model and illustrate their interest in the regression problem with reject option.
In this paper, a class of statistics based on high frequency observations of oscillating and skew Brownian motions is considered. Their convergence rate towards the local time of the underlying process is obtained in form of a Central Limit Theorem. Oscillating and skew Brownian motions are solutions to stochastic differential equations with singular coefficients: piecewise constant diffusion coefficient or drift involving the local time. The result is applied to provide estimators of the parameter of skew Brownian motion and study their asymptotic behavior. Moreover, in the case of the classical statistic given by the normalized number of crossings, the result is proved to hold for a larger class of It\^o-processes with singular coefficients.
Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy. Nonetheless, nonideal communication links and limited transmission resources may hinder the implementation of fast and accurate FL. In this paper, we study joint optimization of communications and FL based on analog aggregation transmission in realistic wireless networks. We first derive closed-form expressions for the expected convergence rate of FL over the air, which theoretically quantify the impact of analog aggregation on FL. Based on the analytical results, we develop a joint optimization model for accurate FL implementation, which allows a parameter server to select a subset of workers and determine an appropriate power scaling factor. Since the practical setting of FL over the air encounters unobservable parameters, we reformulate the joint optimization of worker selection and power allocation using controlled approximation. Finally, we efficiently solve the resulting mixed-integer programming problem via a simple yet optimal finite-set search method by reducing the search space. Simulation results show that the proposed solutions developed for realistic wireless analog channels outperform a benchmark method, and achieve comparable performance of the ideal case where FL is implemented over noise-free wireless channels.
Block coordinate descent (BCD), also known as nonlinear Gauss-Seidel, is a simple iterative algorithm for nonconvex optimization that sequentially minimizes the objective function in each block coordinate while the other coordinates are held fixed. We propose a version of BCD that, for block multi-convex and smooth objective functions under constraints, is guaranteed to converge to the stationary points with worst-case rate of convergence of $O((\log n)^{2}/n)$ for $n$ iterations, and a bound of $O(\epsilon^{-1}(\log \epsilon^{-1})^{2})$ for the number of iterations to achieve an $\epsilon$-approximate stationary point. Furthermore, we show that these results continue to hold even when the convex sub-problems are inexactly solved if the optimality gaps are uniformly summable against initialization. A key idea is to restrict the parameter search within a diminishing radius to promote stability of iterates. As an application, we provide an alternating least squares algorithm with diminishing radius for nonnegative CP tensor decomposition that converges to the stationary points of the reconstruction error with the same robust worst-case convergence rate and complexity bounds. We also experimentally validate our results with both synthetic and real-world data and demonstrate that using auxiliary search radius restriction can in fact improve the rate of convergence.
We consider a model of energy minimization arising in the study of the mechanical behavior caused by cell contraction within a fibrous biological medium. The macroscopic model is based on the theory of non rank-one convex nonlinear elasticity for phase transitions. We study appropriate numerical approximations based on the discontinuous Galerkin treatment of higher gradients and used succesfully in numerical simulations of experiments. We show that the discrete minimizers converge in the limit to minimizers of the continuous problem. This is achieved by employing the theory of $\Gamma$-convergence of the approximate energy functionals to the continuous model when the discretization parameter tends to zero. The analysis is involved due to the structure of numerical approximations which are defined in spaces with lower regularity than the space where the minimizers of the continuous variational problem are sought. This fact leads to the development of a new approach to $\Gamma$-convergence, appropriate for discontinuous finite element discretizations, which can be applied to quite general energy minimization problems. Furthermore, the adoption of exponential terms penalising the interpenetration of matter requires a new framework based on Orlicz spaces for discontinuous Galerkin methods which is developed in this paper as well.
Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and one-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.
Considering the large number of fractional operators that exist, and since it does not seem that their number will stop increasing soon at the time of writing this paper, it is presented for the first time, as far as the authors know, a simplified and compact way to work the fractional calculus through the classification of fractional operators using sets. This new way of working with fractional operators, which may be called as fractional calculus of sets, allows to generalize objects of the conventional calculus such as tensor operators, the diffusion equation, the heat equation, the Taylor series of a vector-valued function, and the fixed point method in several variables which allows to generate the method known as the fractional fixed point method. It is also shown that each fractional fixed point method that generates a convergent sequence has the ability to generate an uncountable family of fractional fixed point methods that generate convergent sequences. So, it is shown one way to estimate numerically the mean order of convergence of any fractional fixed point method in a region $\Omega$ through the problem of determining the critical points of a scalar function, and it is shown how to construct a hybrid fractional iterative method to determine the critical points of a scalar function.
Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We present simple convergence proofs of these two algorithms in the deterministic settings with minimal assumptions. Our analysis also provides intuition behind improving upon AdaGrad's convergence rate. We provide empirical results on MNIST dataset to reinforce our claims on the convergence and performance of G-AdaGrad and Adam.
Text to Image Synthesis refers to the process of automatic generation of a photo-realistic image starting from a given text and is revolutionizing many real-world applications. In order to perform such process it is necessary to exploit datasets containing captioned images, meaning that each image is associated with one (or more) captions describing it. Despite the abundance of uncaptioned images datasets, the number of captioned datasets is limited. To address this issue, in this paper we propose an approach capable of generating images starting from a given text using conditional GANs trained on uncaptioned images dataset. In particular, uncaptioned images are fed to an Image Captioning Module to generate the descriptions. Then, the GAN Module is trained on both the input image and the machine-generated caption. To evaluate the results, the performance of our solution is compared with the results obtained by the unconditional GAN. For the experiments, we chose to use the uncaptioned dataset LSUN bedroom. The results obtained in our study are preliminary but still promising.