When a physical system is modeled by a nonlinear function, the unknown parameters can be estimated by fitting experimental observations by a least-squares approach. Newton's method and its variants are often used to solve problems of this type. In this paper, we are concerned with the computation of the minimal-norm solution of an underdetermined nonlinear least-squares problem. We present a Gauss-Newton type method, which relies on two relaxation parameters to ensure convergence, and which incorporates a procedure to dynamically estimate the two parameters, as well as the rank of the Jacobian matrix, along the iterations. Numerical results are presented.
We propose a deterministic Kaczmarz algorithm for solving linear systems $A\x=\b$. Different from previous Kaczmarz algorithms, we use reflections in each step of the iteration. This generates a series of points distributed with patterns on a sphere centered at a solution. Firstly, we prove that taking the average of $O(\eta/\epsilon)$ points leads to an effective approximation of the solution up to relative error $\epsilon$, where $\eta$ is a parameter depending on $A$ and can be bounded above by the square of the condition number. We also show how to select these points efficiently. From the numerical tests, our Kaczmarz algorithm usually converges more quickly than the (block) randomized Kaczmarz algorithms. Secondly, when the linear system is consistent, the Kaczmarz algorithm returns the solution that has the minimal distance to the initial vector. This gives a method to solve the least-norm problem. Finally, we prove that our Kaczmarz algorithm indeed solves the linear system $A^TW^{-1}A \x = A^TW^{-1} \b$, where $W$ is the low-triangular matrix such that $W+W^T=2AA^T$. The relationship between this linear system and the original one is studied.
Several decades ago the Proximal Point Algorithm (PPA) stated to gain a long-lasting attraction for both abstract operator theory and numerical optimization communities. Even in modern applications, researchers still use proximal minimization theory to design scalable algorithms that overcome nonsmoothness. Remarkable works as \cite{Fer:91,Ber:82constrained,Ber:89parallel,Tom:11} established tight relations between the convergence behavior of PPA and the regularity of the objective function. In this manuscript we derive nonasymptotic iteration complexity of exact and inexact PPA to minimize convex functions under $\gamma-$Holderian growth: $\BigO{\log(1/\epsilon)}$ (for $\gamma \in [1,2]$) and $\BigO{1/\epsilon^{\gamma - 2}}$ (for $\gamma > 2$). In particular, we recover well-known results on PPA: finite convergence for sharp minima and linear convergence for quadratic growth, even under presence of inexactness. However, without taking into account the concrete computational effort paid for computing each PPA iteration, any iteration complexity remains abstract and purely informative. Therefore, using an inner (proximal) gradient/subgradient method subroutine that computes inexact PPA iteration, we secondly show novel computational complexity bounds on a restarted inexact PPA, available when no information on the growth of the objective function is known. In the numerical experiments we confirm the practical performance and implementability of our framework.
In this work, we propose three novel block-structured multigrid relaxation schemes based on distributive relaxation, Braess-Sarazin relaxation, and Uzawa relaxation, for solving the Stokes equations discretized by the mark-and-cell scheme. In our earlier work \cite{he2018local}, we discussed these three types of relaxation schemes, where the weighted Jacobi iteration is used for inventing the Laplacian involved in the Stokes equations. In \cite{he2018local}, we show that the optimal smoothing factor is $\frac{3}{5}$ for distributive weighted-Jacobi relaxation and inexact Braess-Sarazin relaxation, and is $\sqrt{\frac{3}{5}}$ for $\sigma$-Uzawa relaxation. Here, we propose mass-based approximation inside of these three relaxations, where mass matrix $Q$ obtained from bilinear finite element method is directly used to approximate to the inverse of scalar Laplacian operator instead of using Jacobi iteration. Using local Fourier analysis, we theoretically derive the optimal smoothing factors for the resulting three relaxation schemes. Specifically, mass-based distributive relaxation, mass-based Braess-Sarazin relaxation, and mass-based $\sigma$-Uzawa relaxation have optimal smoothing factor $\frac{1}{3}$, $\frac{1}{3}$ and $\sqrt{\frac{1}{3}}$, respectively. Note that the mass-based relaxation schemes do not cost more than the original ones using Jacobi iteration. Another superiority is that there is no need to compute the inverse of a matrix. These new relaxation schemes are appealing.
Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.
We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensor models up to arbitrary index permutations; the model incorporates the popular tensor block models and Lipschitz hypergraphon models as special cases. We show that a constrained least-squares estimator in the block-wise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree up to $(m-2)(m+1)/2$ is sufficient for accurate recovery of order-$m$ tensors, whereas higher degree exhibits no further benefits. This phenomenon reveals the intrinsic distinction for smooth tensor estimation problems with and without unknown permutations. Furthermore, we provide an efficient polynomial-time Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime data analysis.
In classical statistics, a statistical experiment consisting of $n$ i.i.d observations from d-dimensional multinomial distributions can be well approximated by a $d-1$ dimensional Gaussian distribution. In a quantum version of the result it has been shown that a collection of $n$ qudits of full rank can be well approximated by a quantum system containing a classical part, which is a $d-1$ dimensional Gaussian distribution, and a quantum part containing an ensemble of $d(d-1)/2$ shifted thermal states. In this paper, we obtain a generalization of this result when the qudits are not of full rank. We show that when the rank of the qudits is $r$, then the limiting experiment consists of an $r-1$ dimensional Gaussian distribution and an ensemble of both shifted pure and shifted thermal states. We also outline a two-stage procedure for the estimation of the low-rank qudit, where we obtain an estimator which is sharp minimax optimal. For the estimation of a linear functional of the quantum state, we construct an estimator, analyze the risk and use quantum LAN to show that our estimator is also optimal in the minimax sense.
Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
The main contribution of this paper is a new submap joining based approach for solving large-scale Simultaneous Localization and Mapping (SLAM) problems. Each local submap is independently built using the local information through solving a small-scale SLAM; the joining of submaps mainly involves solving linear least squares and performing nonlinear coordinate transformations. Through approximating the local submap information as the state estimate and its corresponding information matrix, judiciously selecting the submap coordinate frames, and approximating the joining of a large number of submaps by joining only two maps at a time, either sequentially or in a more efficient Divide and Conquer manner, the nonlinear optimization process involved in most of the existing submap joining approaches is avoided. Thus the proposed submap joining algorithm does not require initial guess or iterations since linear least squares problems have closed-form solutions. The proposed Linear SLAM technique is applicable to feature-based SLAM, pose graph SLAM and D-SLAM, in both two and three dimensions, and does not require any assumption on the character of the covariance matrices. Simulations and experiments are performed to evaluate the proposed Linear SLAM algorithm. Results using publicly available datasets in 2D and 3D show that Linear SLAM produces results that are very close to the best solutions that can be obtained using full nonlinear optimization algorithm started from an accurate initial guess. The C/C++ and MATLAB source codes of Linear SLAM are available on OpenSLAM.
We propose accelerated randomized coordinate descent algorithms for stochastic optimization and online learning. Our algorithms have significantly less per-iteration complexity than the known accelerated gradient algorithms. The proposed algorithms for online learning have better regret performance than the known randomized online coordinate descent algorithms. Furthermore, the proposed algorithms for stochastic optimization exhibit as good convergence rates as the best known randomized coordinate descent algorithms. We also show simulation results to demonstrate performance of the proposed algorithms.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.