We propose and analyze a new dynamical system with a closed-loop control law in a Hilbert space $\mathcal{H}$, aiming to shed light on the acceleration phenomenon for \textit{monotone inclusion} problems, which unifies a broad class of optimization, saddle point and variational inequality (VI) problems under a single framework. Given $A: \mathcal{H} \rightrightarrows \mathcal{H}$ that is maximal monotone, we propose a closed-loop control system that is governed by the operator $I - (I + \lambda(t)A)^{-1}$, where a feedback law $\lambda(\cdot)$ is tuned by the resolution of the algebraic equation $\lambda(t)\|(I + \lambda(t)A)^{-1}x(t) - x(t)\|^{p-1} = \theta$ for some $\theta > 0$. Our first contribution is to prove the existence and uniqueness of a global solution via the Cauchy-Lipschitz theorem. We present a simple Lyapunov function for establishing the weak convergence of trajectories via the Opial lemma and strong convergence results under additional conditions. We then prove a global ergodic convergence rate of $O(t^{-(p+1)/2})$ in terms of a gap function and a global pointwise convergence rate of $O(t^{-p/2})$ in terms of a residue function. Local linear convergence is established in terms of a distance function under an error bound condition. Further, we provide an algorithmic framework based on the implicit discretization of our system in a Euclidean setting, generalizing the large-step HPE framework. Although the discrete-time analysis is a simplification and generalization of existing analyses for a bounded domain, it is largely motivated by the above continuous-time analysis, illustrating the fundamental role that the closed-loop control plays in acceleration in monotone inclusion. A highlight of our analysis is a new result concerning $p^{th}$-order tensor algorithms for monotone inclusion problems, complementing the recent analysis for saddle point and VI problems.
We study first-order methods with preconditioning for solving structured nonlinear convex optimization problems. We propose a new family of preconditioners generated by symmetric polynomials. They provide first-order optimization methods with a provable improvement of the condition number, cutting the gaps between highest eigenvalues, without explicit knowledge of the actual spectrum. We give a stochastic interpretation of this preconditioning in terms of coordinate volume sampling and compare it with other classical approaches, including the Chebyshev polynomials. We show how to incorporate a polynomial preconditioning into the Gradient and Fast Gradient Methods and establish the corresponding global complexity bounds. Finally, we propose a simple adaptive search procedure that automatically chooses the best possible polynomial preconditioning for the Gradient Method, minimizing the objective along a low-dimensional Krylov subspace. Numerical experiments confirm the efficiency of our preconditioning strategies for solving various machine learning problems.
Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $\varepsilon$ additive accuracy with runtime $\widetilde{O}( n^2/\varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.
Summation-by-parts (SBP) operators allow us to systematically develop energy-stable and high-order accurate numerical methods for time-dependent differential equations. Until recently, the main idea behind existing SBP operators was that polynomials can accurately approximate the solution, and SBP operators should thus be exact for them. However, polynomials do not provide the best approximation for some problems, with other approximation spaces being more appropriate. We recently addressed this issue and developed a theory for one-dimensional SBP operators based on general function spaces, coined function-space SBP (FSBP) operators. In this paper, we extend the theory of FSBP operators to multiple dimensions. We focus on their existence, connection to quadratures, construction, and mimetic properties. A more exhaustive numerical demonstration of multi-dimensional FSBP (MFSBP) operators and their application will be provided in future works. Similar to the one-dimensional case, we demonstrate that most of the established results for polynomial-based multi-dimensional SBP (MSBP) operators carry over to the more general class of MFSBP operators. Our findings imply that the concept of SBP operators can be applied to a significantly larger class of methods than is currently done. This can increase the accuracy of the numerical solutions and/or provide stability to the methods.
Summation-by-parts (SBP) operators are popular building blocks for systematically developing stable and high-order accurate numerical methods for time-dependent differential equations. The main idea behind existing SBP operators is that the solution is assumed to be well approximated by polynomials up to a certain degree, and the SBP operator should therefore be exact for them. However, polynomials might not provide the best approximation for some problems, and other approximation spaces may be more appropriate. In this paper, a theory for SBP operators based on general function spaces is developed. We demonstrate that most of the established results for polynomial-based SBP operators carry over to this general class of SBP operators. Our findings imply that the concept of SBP operators can be applied to a significantly larger class of methods than currently known. We exemplify the general theory by considering trigonometric, exponential, and radial basis functions.
We consider least squares estimators of the finite regression parameter $\alpha$ in the single index regression model $Y=\psi(\alpha^T X)+\epsilon$, where $X$ is a $d$-dimensional random vector, $\E(Y|X)=\psi(\alpha^T X)$, and where $\psi$ is monotone. It has been suggested to estimate $\alpha$ by a profile least squares estimator, minimizing $\sum_{i=1}^n(Y_i-\psi(\alpha^T X_i))^2$ over monotone $\psi$ and $\alpha$ on the boundary $S_{d-1}$of the unit ball. Although this suggestion has been around for a long time, it is still unknown whether the estimate is $\sqrt{n}$ convergent. We show that a profile least squares estimator, using the same pointwise least squares estimator for fixed $\alpha$, but using a different global sum of squares, is $\sqrt{n}$-convergent and asymptotically normal. The difference between the corresponding loss functions is studied and also a comparison with other methods is given.
Koch, Strassle, and Tan [SODA 2023], show that, under the randomized exponential time hypothesis, there is no distribution-free PAC-learning algorithm that runs in time $n^{\tilde O(\log\log s)}$ for the classes of $n$-variable size-$s$ DNF, size-$s$ Decision Tree, and $\log s$-Junta by DNF (that returns a DNF hypothesis). Assuming a natural conjecture on the hardness of set cover, they give the lower bound $n^{\Omega(\log s)}$. This matches the best known upper bound for $n$-variable size-$s$ Decision Tree, and $\log s$-Junta. In this paper, we give the same lower bounds for PAC-learning of $n$-variable size-$s$ Monotone DNF, size-$s$ Monotone Decision Tree, and Monotone $\log s$-Junta by~DNF. This solves the open problem proposed by Koch, Strassle, and Tan and subsumes the above results. The lower bound holds, even if the learner knows the distribution, can draw a sample according to the distribution in polynomial time, and can compute the target function on all the points of the support of the distribution in polynomial time.
This paper presents a novel feedback motion planning method for mobile robot navigation in 3D uneven terrains. We take advantage of the \textit{supervoxel} representation of point clouds, which enables a compact connectivity graph of traversable regions on the point cloud maps. Given this graph of traversable areas, our approach navigates the robot to any reachable goal pose using a control Lyapunov function (cLf) and a navigation function. The cLf ensures the kinodynamic feasibility and target convergence of the generated motion plans, while the navigation function optimizes the resulting feedback motion plans. We carried out navigation experiments in real and simulated 3D uneven terrains. In all circumstances, the experimental findings show that our approach performs superior to the baselines, proving the approach's efficiency and adaptability to navigate a robot in challenging uneven 3D terrains. The proposed method can also navigate a robot with a particular objective, e.g., shortest-distance or least-inclined plan. We compared our approach to well-established sampling-based motion planners in which our method outperformed all other planners in terms of execution time and resulting path length. Finally, we provide an open-source implementation of the proposed method to benefit the robotics community.
The paper suggests a generalization of the Sign-Perturbed Sums (SPS) finite sample system identification method for the identification of closed-loop observable stochastic linear systems in state space form. The solution builds on the theory of matrix-variate regression and instrumental variable methods to construct distribution-free confidence regions for the state space matrices. Both direct and indirect identification are studied, and the exactness as well as the strong consistency of the construction are proved. Furthermore, a new, computationally efficient ellipsoidal outer-approximation algorithm for the confidence regions is proposed. The new construction results in a semidefinite optimization problem which has an order-of-magnitude smaller number of constraints, as if one applied the ellipsoidal outer-approximation after vectorization. The effectiveness of the approach is also demonstrated empirically via a series of numerical experiments.
High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We describe an algorithm for selecting informative features in high-dimensional data, where each observation comes from one of $K$ different distributions. Our algorithm can be applied in a completely nonparametric setup without any distributional assumptions on the data, and it aims at outputting those features in the data, that contribute the most to the overall distributional variation. At the heart of our method is the recursive application of distribution-free graph-based tests on subsets of the feature set, located at different depths of a hierarchical clustering tree constructed from the data. Our algorithm recovers all truly contributing features with high probability, while ensuring optimal control on false-discovery. Finally, we show the superior performance of our method over other existing ones through synthetic data, and also demonstrate the utility of the method on two real-life datasets from the domains of climate change and single cell transcriptomics.
The central problem in electronic structure theory is the computation of the eigenvalues of the electronic Hamiltonian -- an unbounded, self-adjoint operator acting on a Hilbert space of antisymmetric functions. Coupled cluster (CC) methods, which are based on a non-linear parameterisation of the sought-after eigenfunction and result in non-linear systems of equations, are the method of choice for high accuracy quantum chemical simulations but their numerical analysis is underdeveloped. The existing numerical analysis relies on a local, strong monotonicity property of the CC function that is valid only in a perturbative regime, i.e., when the sought-after ground state CC solution is sufficiently close to zero. In this article, we introduce a new well-posedness analysis for the single reference coupled cluster method based on the invertibility of the CC derivative. Under the minimal assumption that the sought-after eigenfunction is intermediately normalisable and the associated eigenvalue is isolated and non-degenerate, we prove that the continuous (infinite-dimensional) CC equations are always locally well-posed. Under the same minimal assumptions and provided that the discretisation is fine enough, we prove that the discrete Full-CC equations are locally well-posed, and we derive residual-based error estimates with guaranteed positive constants. Preliminary numerical experiments indicate that the constants that appear in our estimates are a significant improvement over those obtained from the local monotonicity approach.