We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD). We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional gradient evaluations compared to the classical PGD. It also fulfills optimal rates of convergence for cumulative regret up to logarithmic factors. We provide an extension of our approach to stochastic optimization and conduct numerical experiments supporting the developed theory.
The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.
The Black-Scholes (B-S) equation has been recently extended as a kind of tempered time-fractional B-S equations, which becomes an interesting mathematical model in option pricing. In this study, we provide a fast numerical method to approximate the solution of the tempered time-fractional B-S model. To achieve high-order accuracy in space and overcome the weak initial singularity of exact solution, we combine the compact difference operator with L1-type approximation under nonuniform time steps to yield the numerical scheme. The convergence of the proposed difference scheme is proved to be unconditionally stable. Moreover, the kernel function in the tempered Caputo fractional derivative is approximated by sum-of-exponentials, which leads to a fast unconditionally stable compact difference method that reduces the computational cost. Finally, numerical results demonstrate the effectiveness of the proposed methods.
This work proposes novel techniques for the efficient numerical simulation of parameterized, unsteady partial differential equations. Projection-based reduced order models (ROMs) such as the reduced basis method employ a (Petrov-)Galerkin projection onto a linear low-dimensional subspace. In unsteady applications, space-time reduced basis (ST-RB) methods have been developed to achieve a dimension reduction both in space and time, eliminating the computational burden of time marching schemes. However, nonaffine parameterizations dilute any computational speedup achievable by traditional ROMs. Computational efficiency can be recovered by linearizing the nonaffine operators via hyper-reduction, such as the empirical interpolation method in matrix form. In this work, we implement new hyper-reduction techniques explicitly tailored to deal with unsteady problems and embed them in a ST-RB framework. For each of the proposed methods, we develop a posteriori error bounds. We run numerical tests to compare the performance of the proposed ROMs against high-fidelity simulations, in which we combine the finite element method for space discretization on 3D geometries and the Backward Euler time integrator. In particular, we consider a heat equation and an unsteady Stokes equation. The numerical experiments demonstrate the accuracy and computational efficiency our methods retain with respect to the high-fidelity simulations.
In this short paper, we explore a new way to refactor a simple but tricky-to-parallelize tree-traversal algorithm to harness multicore parallelism. Crucially, the refactoring draws from some classic techniques from programming-languages research, such as the continuation-passing-style transform and defunctionalization. The algorithm we consider faces a particularly acute granularity-control challenge, owing to the wide range of inputs it has to deal with. Our solution achieves efficiency from heartbeat scheduling, a recent approach to automatic granularity control. We present our solution in a series of individually simple refactoring steps, starting from a high-level, recursive specification of the algorithm. As such, our approach may prove useful as a teaching tool, and perhaps be used for one-off parallelizations, as the technique requires no special compiler support.
We consider distributed online min-max resource allocation with a set of parallel agents and a parameter server. Our goal is to minimize the pointwise maximum over a set of time-varying and decreasing cost functions, without a priori information about these functions. We propose a novel online algorithm, termed Distributed Online resource Re-Allocation (DORA), where non-stragglers learn to relinquish resource and share resource with stragglers. A notable feature of DORA is that it does not require gradient calculation or projection operation, unlike most existing online optimization strategies. This allows it to substantially reduce the computation overhead in large-scale and distributed networks. We analyze the worst-case performance of DORA and derive an upper bound on its dynamic regret for non-convex functions. We further consider an application to the bandwidth allocation problem in distributed online machine learning. Our numerical study demonstrates the efficacy of the proposed solution and its performance advantage over gradient- and/or projection-based resource allocation algorithms in reducing wall-clock time.
In this study, we present an integro-differential model to simulate the local spread of infections. The model incorporates a standard susceptible-infected-recovered (\textit{SIR}-) model enhanced by an integral kernel, allowing for non-homogeneous mixing between susceptibles and infectives. We define requirements for the kernel function and derive analytical results for both the \textit{SIR}- and a reduced susceptible-infected-susceptible (\textit{SIS}-) model, especially the uniqueness of solutions. In order to optimize the balance between disease containment and the social and political costs associated with lockdown measures, we set up requirements for the implementation of control function, and show examples for three different formulations for the control: continuous and time-dependent, continuous and space- and time-dependent, and piecewise constant space- and time-dependent. Latter represent reality more closely as the control cannot be updated for every time and location. We found the optimal control values for all of those setups, which are by nature best for a continuous and space-and time dependent control, yet found reasonable results for the discrete setting as well. To validate the numerical results of the integro-differential model, we compare them to an established agent-based model that incorporates social and other microscopical factors more accurately and thus acts as a benchmark for the validity of the integro-differential approach. A close match between the results of both models validates the integro-differential model as an efficient macroscopic proxy. Since computing an optimal control strategy for agent-based models is computationally very expensive, yet comparatively cheap for the integro-differential model, using the proxy model might have interesting implications for future research.
In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.
This paper proposes Gait Decomposition (G.D), a method of mathematically decomposing snake movements, and Gait Parameter Gradient (GPG), a method of optimizing decomposed gait parameters. G.D is a method that can express the snake gait mathematically and concisely from generating movement using the curve function to the motor control order when generating movement of snake robot. Through this method, the gait of the snake robot can be intuitively classified into a matrix, as well as flexibly adjusting the parameters of the curve function required for gait generation. This can solve the problem that parameter tuning, which is the reason why it is difficult for a snake robot to practical use, is difficult. Therefore, if this G.D is applied to snake robots, various gaits can be generated with a few of parameters, so snake robots can be used in many fields. We also implemented the GPG algorithm to optimize the gait curve function as well as define the gait of the snake robot through G.D.
For time-dependent PDEs, the numerical schemes can be rendered bound-preserving without losing conservation and accuracy, by a post processing procedure of solving a constrained minimization in each time step. Such a constrained optimization can be formulated as a nonsmooth convex minimization, which can be efficiently solved by first order optimization methods, if using the optimal algorithm parameters. By analyzing the asymptotic linear convergence rate of the generalized Douglas-Rachford splitting method, optimal algorithm parameters can be approximately expressed as a simple function of the number of out-of-bounds cells. We demonstrate the efficiency of this simple choice of algorithm parameters by applying such a limiter to cell averages of a discontinuous Galerkin scheme solving phase field equations for 3D demanding problems. Numerical tests on a sophisticated 3D Cahn-Hilliard-Navier-Stokes system indicate that the limiter is high order accurate, very efficient, and well-suited for large-scale simulations. For each time step, it takes at most $20$ iterations for the Douglas-Rachford splitting to enforce bounds and conservation up to the round-off error, for which the computational cost is at most $80N$ with $N$ being the total number of cells.