In this article, a high-order time-stepping scheme based on the cubic interpolation formula is considered to approximate the generalized Caputo fractional derivative (GCFD). Convergence order for this scheme is $(4-\alpha),$ where $\alpha ~(0<\alpha<1)$ is the order of the GCFD. The local truncation error is also provided. Then, we adopt the developed scheme to establish a difference scheme for the solution of generalized fractional advection-diffusion equation with Dirichlet boundary conditions. Furthermore, we discuss about the stability and convergence of the difference scheme. Numerical examples are presented to examine the theoretical claims. The convergence order of the difference scheme is analyzed numerically, which is third-order in time and second-order in space.
We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.
Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.
We study \textit{rescaled gradient dynamical systems} in a Hilbert space $\mathcal{H}$, where implicit discretization in a finite-dimensional Euclidean space leads to high-order methods for solving monotone equations (MEs). Our framework can be interpreted as a natural generalization of celebrated dual extrapolation method~\citep{Nesterov-2007-Dual} from first order to high order via appeal to the regularization toolbox of optimization theory~\citep{Nesterov-2021-Implementable, Nesterov-2021-Inexact}. More specifically, we establish the existence and uniqueness of a global solution and analyze the convergence properties of solution trajectories. We also present discrete-time counterparts of our high-order continuous-time methods, and we show that the $p^{th}$-order method achieves an ergodic rate of $O(k^{-(p+1)/2})$ in terms of a restricted merit function and a pointwise rate of $O(k^{-p/2})$ in terms of a residue function. Under regularity conditions, the restarted version of $p^{th}$-order methods achieves local convergence with the order $p \geq 2$. Notably, our methods are \textit{optimal} since they have matched the lower bound established for solving the monotone equation problems under a standard linear span assumption~\citep{Lin-2022-Perseus}.
In this paper, we introduce and analyse numerical schemes for the homogeneous and the kinetic L\'evy-Fokker-Planck equation. The discretizations are designed to preserve the main features of the continuous model such as conservation of mass, heavy-tailed equilibrium and (hypo)coercivity properties. We perform a thorough analysis of the numerical scheme and show exponential stability and convergence of the scheme. Along the way, we introduce new tools of discrete functional analysis, such as discrete nonlocal Poincar\'e and interpolation inequalities adapted to fractional diffusion. Our theoretical findings are illustrated and complemented with numerical simulations.
Decoding algorithms for Reed--Solomon (RS) codes are of great interest for both practical and theoretical reasons. In this paper, an efficient algorithm, called the modular approach (MA), is devised for solving the Welch--Berlekamp (WB) key equation. By taking the MA as the key equation solver, we propose a new decoding algorithm for systematic RS codes. For $(n,k)$ RS codes, where $n$ is the code length and $k$ is the code dimension, the proposed decoding algorithm has both the best asymptotic computational complexity $O(n\log(n-k) + (n-k)\log^2(n-k))$ and the smallest constant factor achieved to date. By comparing the number of field operations required, we show that when decoding practical RS codes, the new algorithm is significantly superior to the existing methods in terms of computational complexity. When decoding the $(4096, 3584)$ RS code defined over $\mathbb{F}_{2^{12}}$, the new algorithm is 10 times faster than a conventional syndrome-based method. Furthermore, the new algorithm has a regular architecture and is thus suitable for hardware implementation.
In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
We identify a new class of vulnerabilities in implementations of differential privacy. Specifically, they arise when computing basic statistics such as sums, thanks to discrepancies between the implemented arithmetic using finite data types (namely, ints or floats) and idealized arithmetic over the reals or integers. These discrepancies cause the sensitivity of the implemented statistics (i.e., how much one individual's data can affect the result) to be much higher than the sensitivity we expect. Consequently, essentially all differential privacy libraries fail to introduce enough noise to hide individual-level information as required by differential privacy, and we show that this may be exploited in realistic attacks on differentially private query systems. In addition to presenting these vulnerabilities, we also provide a number of solutions, which modify or constrain the way in which the sum is implemented in order to recover the idealized or near-idealized bounds on sensitivity.
In this work, we introduce a Variational Multi-Scale (VMS) method for the numerical approximation of parabolic problems, where sub-grid scales are approximated from the eigenpairs of associated elliptic operator. The abstract method is particularized to the one-dimensional advection-diffusion equations, for which the sub-grid components are exactly calculated in terms of a spectral expansion when the advection velocity is approximated by piecewise constant velocities on the grid elements. We prove error estimates that in particular imply that when Lagrange finite element discretisations in space are used, the spectral VMS method coincides with the exact solution of the implicit Euler semi-discretisation of the advection-diffusion problem at the Lagrange interpolation nodes. We also build a feasible method to solve the evolutive advection-diffusion problems by means of an offline/online strategy with reduced computational complexity. We perform some numerical tests in good agreement with the theoretical expectations, that show an improved accuracy with respect to several stabilised methods.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.