We propose a novel method to compute a finite difference stencil for Riesz derivative for artibitrary speed of convergence. This method is based on applying a pre-filter to the Gr\"unwald-Letnikov type central difference stencil. The filter is obtained by solving for the inverse of a symmetric Vandemonde matrix and exploiting the relationship between the Taylor's series coefficients and fast Fourier transform. The filter costs O\left(N^{2}\right) operations to evaluate for O\left(h^{N}\right) of convergence, where h is the sampling distance. The higher convergence speed should more than offset the overhead with the requirement of the number of nodal points for a desired error tolerance significantly reduced. The benefit of progressive generation of the stencil coefficients for adaptive grid size for dynamic problems with the Gr\"unwald-Letnikov type difference scheme is also kept because of the application of filtering. The higher convergence rate is verified through numerical experiments.
Recent empirical advances show that training deep models with large learning rate often improves generalization performance. However, theoretical justifications on the benefits of large learning rate are highly limited, due to challenges in analysis. In this paper, we consider using Gradient Descent (GD) with a large learning rate on a homogeneous matrix factorization problem, i.e., $\min_{X, Y} \|A - XY^\top\|_{\sf F}^2$. We prove a convergence theory for constant large learning rates well beyond $2/L$, where $L$ is the largest eigenvalue of Hessian at the initialization. Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced. Numerical experiments are provided to support our theory.
Langevin Monte Carlo (LMC) is a popular Bayesian sampling method. For the log-concave distribution function, the method converges exponentially fast, up to a controllable discretization error. However, the method requires the evaluation of a full gradient in each iteration, and for a problem on $\mathbb{R}^d$, this amounts to $d$ times partial derivative evaluations per iteration. The cost is high when $d\gg1$. In this paper, we investigate how to enhance computational efficiency through the application of RCD (random coordinate descent) on LMC. There are two sides of the theory: 1 By blindly applying RCD to LMC, one surrogates the full gradient by a randomly selected directional derivative per iteration. Although the cost is reduced per iteration, the total number of iteration is increased to achieve a preset error tolerance. Ultimately there is no computational gain; 2 We then incorporate variance reduction techniques, such as SAGA (stochastic average gradient) and SVRG (stochastic variance reduced gradient), into RCD-LMC. It will be proved that the cost is reduced compared with the classical LMC, and in the underdamped case, convergence is achieved with the same number of iterations, while each iteration requires merely one-directional derivative. This means we obtain the best possible computational cost in the underdamped-LMC framework.
We study the MARINA method of Gorbunov et al (2021) -- the current state-of-the-art distributed non-convex optimization method in terms of theoretical communication complexity. Theoretical superiority of this method can be largely attributed to two sources: the use of a carefully engineered biased stochastic gradient estimator, which leads to a reduction in the number of communication rounds, and the reliance on {\em independent} stochastic communication compression operators, which leads to a reduction in the number of transmitted bits within each communication round. In this paper we i) extend the theory of MARINA to support a much wider class of potentially {\em correlated} compressors, extending the reach of the method beyond the classical independent compressors setting, ii) show that a new quantity, for which we coin the name {\em Hessian variance}, allows us to significantly refine the original analysis of MARINA without any additional assumptions, and iii) identify a special class of correlated compressors based on the idea of {\em random permutations}, for which we coin the term Perm$K$, the use of which leads to $O(\sqrt{n})$ (resp. $O(1 + d/\sqrt{n})$) improvement in the theoretical communication complexity of MARINA in the low Hessian variance regime when $d\geq n$ (resp. $d \leq n$), where $n$ is the number of workers and $d$ is the number of parameters describing the model we are learning. We corroborate our theoretical results with carefully engineered synthetic experiments with minimizing the average of nonconvex quadratics, and on autoencoder training with the MNIST dataset.
In this work, we present and analyse a system of coupled partial differential equations, which models tumour growth under the influence of subdiffusion, mechanical effects, nutrient supply, and chemotherapy. The subdiffusion of the system is modelled by a time fractional derivative in the equation governing the volume fraction of the tumour cells. The mass densities of the nutrients and the chemotherapeutic agents are modelled by reaction diffusion equations. We prove the existence and uniqueness of a weak solution to the model via the Faedo--Galerkin method and the application of appropriate compactness theorems. Lastly, we propose a fully discretised system and illustrate the effects of the fractional derivative and the influence of the fractional parameter in numerical examples.
We develop a method for generating degree-of-freedom maps for arbitrary order finite element spaces for any cell shape. The approach is based on the composition of permutations and transformations by cell sub-entity. Current approaches to generating degree-of-freedom maps for arbitrary order problems typically rely on a consistent orientation of cell entities that permits the definition of a common local coordinate system on shared edges and faces. However, while orientation of a mesh is straightforward for simplex cells and is a local operation, it is not a strictly local operation for quadrilateral cells and in the case of hexahedral cells not all meshes are orientable. The permutation and transformation approach is developed for a range of element types, including Lagrange, and divergence- and curl-conforming elements, and for a range of cell shapes. The approach is local and can be applied to cells of any shape, including general polytopes and meshes with mixed cell types. A number of examples are presented and the developed approach has been implemented in an open-source finite element library.
In this work, we explore the unique challenges -- and opportunities -- of unsupervised federated learning (FL). We develop and analyze a one-shot federated clustering scheme, $k$-FED, based on the widely-used Lloyd's method for $k$-means clustering. In contrast to many supervised problems, we show that the issue of statistical heterogeneity in federated networks can in fact benefit our analysis. We analyse $k$-FED under a center separation assumption and compare it to the best known requirements of its centralized counterpart. Our analysis shows that in heterogeneous regimes where the number of clusters per device $(k')$ is smaller than the total number of clusters over the network $k$, $(k'\le \sqrt{k})$, we can use heterogeneity to our advantage -- significantly weakening the cluster separation requirements for $k$-FED. From a practical viewpoint, $k$-FED also has many desirable properties: it requires only round of communication, can run asynchronously, and can handle partial participation or node/network failures. We motivate our analysis with experiments on common FL benchmarks, and highlight the practical utility of one-shot clustering through use-cases in personalized FL and device sampling.
In this paper, cooperative non-orthogonal multiple access (C-NOMA) is considered in short packet communications with finite blocklength (FBL) codes. The performance of a decode-and-forward (DF) relaying along with selection combining (SC) and maximum ratio combining (MRC) strategies at the receiver side is examined. We explore joint user pairing and resource allocation to maximize fair throughput in a downlink (DL) scenario. In each pair, the user with a stronger channel (strong user) acts as a relay for the other one (weak user), and optimal power and blocklength are allocated to achieve max-min throughput. To this end, first, only one pair is considered, and optimal resource allocation is explored. Also, a suboptimal algorithm is suggested, which converges to a near-optimal solution. Finally, the problem is extended to a general scenario, and a suboptimal C-NOMA-based user pairing is proposed. Numerical results show that the proposed C-NOMA scheme in both SC and MRC strategies significantly improves the users' fair throughput compared to the NOMA and OMA. It is also investigated that the proposed pairing scheme based on C-NOMA outperforms the Hybrid NOMA/OMA scheme from the average throughput perspective, while the fairness index degrades slightly.
We propose a verified computation method for eigenvalues in a region and the corresponding eigenvectors of generalized Hermitian eigenvalue problems. The proposed method uses complex moments to extract the eigencomponents of interest from a random matrix and uses the Rayleigh--Ritz procedure to project a given eigenvalue problem into a reduced eigenvalue problem. The complex moment is given by contour integral and approximated by using numerical quadrature. We split the error in the complex moment into the truncation error of the quadrature and rounding errors and evaluate each. This idea for error evaluation inherits our previous Hankel matrix approach, whereas the proposed method requires half the number of quadrature points for the previous approach to reduce the truncation error to the same order. Moreover, the Rayleigh--Ritz procedure approach forms a transformation matrix that enables verification of the eigenvectors. Numerical experiments show that the proposed method is faster than previous methods while maintaining verification performance.
Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy. Nonetheless, nonideal communication links and limited transmission resources may hinder the implementation of fast and accurate FL. In this paper, we study joint optimization of communications and FL based on analog aggregation transmission in realistic wireless networks. We first derive closed-form expressions for the expected convergence rate of FL over the air, which theoretically quantify the impact of analog aggregation on FL. Based on the analytical results, we develop a joint optimization model for accurate FL implementation, which allows a parameter server to select a subset of workers and determine an appropriate power scaling factor. Since the practical setting of FL over the air encounters unobservable parameters, we reformulate the joint optimization of worker selection and power allocation using controlled approximation. Finally, we efficiently solve the resulting mixed-integer programming problem via a simple yet optimal finite-set search method by reducing the search space. Simulation results show that the proposed solutions developed for realistic wireless analog channels outperform a benchmark method, and achieve comparable performance of the ideal case where FL is implemented over noise-free wireless channels.
Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.