We consider frugal splitting operators for finite sum monotone inclusion problems, i.e., splitting operators that use exactly one direct or resolvent evaluation of each operator of the sum. A novel representation of these operators in terms of what we call a generalized primal-dual resolvent is presented. This representation reveals a number of new results regarding lifting numbers, existence of solution maps, and parallelizability of the forward and backward evaluations. We show that the minimal lifting is $n-1-f$ where $n$ is the number of monotone operators and $f$ is the number of direct evaluations in the splitting. Furthermore, we show that this lifting number is only achievable as long as the first and last evaluations are resolvent evaluations. In the case of frugal resolvent splitting operators, these results are the same as the results of Ryu and Malitsky--Tam. The representation also enables a unified convergence analysis and we present a generally applicable theorem for the convergence and Fej\'er monotonicity of fixed point iterations of frugal splitting operators with cocoercive direct evaluations. We conclude by constructing a new convergent and parallelizable frugal splitting operator with minimal lifting.
We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sample error bounds. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition.
We propose a new policy gradient method, named homotopic policy mirror descent (HPMD), for solving discounted, infinite horizon MDPs with finite state and action spaces. HPMD performs a mirror descent type policy update with an additional diminishing regularization term, and possesses several computational properties that seem to be new in the literature. We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies. Then local superlinear convergence is obtained for both quantities without any assumption. With local acceleration and diminishing regularization, we establish the first result among policy gradient methods on certifying and characterizing the limiting policy, by showing, with a non-asymptotic characterization, that the last-iterate policy converges to the unique optimal policy with the maximal entropy. We then extend all the aforementioned results to HPMD instantiated with a broad class of decomposable Bregman divergences, demonstrating the generality of the these computational properties. As a by product, we discover the finite-time exact convergence for some commonly used Bregman divergences, implying the continuing convergence of HPMD to the limiting policy even if the current policy is already optimal. Finally, we develop a stochastic version of HPMD and establish similar convergence properties. By exploiting the local acceleration, we show that for small optimality gap, a better than $\tilde{\mathcal{O}}(\left|\mathcal{S}\right| \left|\mathcal{A}\right| / \epsilon^2)$ sample complexity holds with high probability, when assuming a generative model for policy evaluation.
In this work we study the asymptotic consistency of the weak-form sparse identification of nonlinear dynamics algorithm (WSINDy) in the identification of differential equations from noisy samples of solutions. We prove that the WSINDy estimator is unconditionally asymptotically consistent for a wide class of models which includes the Navier-Stokes equations and the Kuramoto-Sivashinsky equation. We thus provide a mathematically rigorous explanation for the observed robustness to noise of weak-form equation learning. Conversely, we also show that in general the WSINDy estimator is only conditionally asymptotically consistent, yielding discovery of spurious terms with probability one if the noise level is above some critical threshold and the nonlinearities exhibit sufficiently fast growth. We derive explicit bounds on the critical noise threshold in the case of Gaussian white noise and provide an explicit characterization of these spurious terms in the case of trigonometric and/or polynomial model nonlinearities. However, a silver lining to this negative result is that if the data is suitably denoised (a simple moving average filter is sufficient), then we recover unconditional asymptotic consistency on the class of models with locally-Lipschitz nonlinearities. Altogether, our results reveal several important aspects of weak-form equation learning which may be used to improve future algorithms. We demonstrate our results numerically using the Lorenz system, the cubic oscillator, a viscous Burgers growth model, and a Kuramoto-Sivashinsky-type higher-order PDE.
Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dependency trees. Zmigrod et al. (2021b) proposed algorithms for sampling with and without replacement from the dependency tree distribution that incorporate the single-root constraint. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of algorithm for sampling k trees without replacement to O(kn3). These algorithms are both asymptotically and practically more efficient.
We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We propose to address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all the past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme exhibits local $Q$-superlinear convergence with a non-asymptotic rate of $(\Upsilon\sqrt{\log (t)/t}\,)^{t}$, where $\Upsilon$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the method, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
We propose and analyze a new dynamical system with a closed-loop control law in a Hilbert space $\mathcal{H}$, aiming to shed light on the acceleration phenomenon for \textit{monotone inclusion} problems, which unifies a broad class of optimization, saddle point and variational inequality (VI) problems under a single framework. Given $A: \mathcal{H} \rightrightarrows \mathcal{H}$ that is maximal monotone, we propose a closed-loop control system that is governed by the operator $I - (I + \lambda(t)A)^{-1}$, where a feedback law $\lambda(\cdot)$ is tuned by the resolution of the algebraic equation $\lambda(t)\|(I + \lambda(t)A)^{-1}x(t) - x(t)\|^{p-1} = \theta$ for some $\theta > 0$. Our first contribution is to prove the existence and uniqueness of a global solution via the Cauchy-Lipschitz theorem. We present a simple Lyapunov function for establishing the weak convergence of trajectories via the Opial lemma and strong convergence results under additional conditions. We then prove a global ergodic convergence rate of $O(t^{-(p+1)/2})$ in terms of a gap function and a global pointwise convergence rate of $O(t^{-p/2})$ in terms of a residue function. Local linear convergence is established in terms of a distance function under an error bound condition. Further, we provide an algorithmic framework based on the implicit discretization of our system in a Euclidean setting, generalizing the large-step HPE framework. Although the discrete-time analysis is a simplification and generalization of existing analyses for a bounded domain, it is largely motivated by the above continuous-time analysis, illustrating the fundamental role that the closed-loop control plays in acceleration in monotone inclusion. A highlight of our analysis is a new result concerning $p^{th}$-order tensor algorithms for monotone inclusion problems, complementing the recent analysis for saddle point and VI problems.
We describe a new, adaptive solver for the two-dimensional Poisson equation in complicated geometries. Using classical potential theory, we represent the solution as the sum of a volume potential and a double layer potential. Rather than evaluating the volume potential over the given domain, we first extend the source data to a geometrically simpler region with high order accuracy. This allows us to accelerate the evaluation of the volume potential using an efficient, geometry-unaware fast multipole-based algorithm. To impose the desired boundary condition, it remains only to solve the Laplace equation with suitably modified boundary data. This is accomplished with existing fast and accurate boundary integral methods. The novelty of our solver is the scheme used for creating the source extension, assuming it is provided on an adaptive quad-tree. For leaf boxes intersected by the boundary, we construct a universal "stencil" and require that the data be provided at the subset of those points that lie within the domain interior. This universality permits us to precompute and store an interpolation matrix which is used to extrapolate the source data to an extended set of leaf nodes with full tensor-product grids on each. We demonstrate the method's speed, robustness and high-order convergence with several examples, including domains with piecewise smooth boundaries.
A fundamental result in the study of graph homomorphisms is Lov\'asz's theorem that two graphs are isomorphic if and only if they admit the same number of homomorphisms from every graph. A line of work extending Lov\'asz's result to more general types of graphs was recently capped by Cai and Govorov, who showed that it holds for graphs with vertex and edge weights from an arbitrary field of characteristic 0. In this work, we generalize from graph homomorphism -- a special case of #CSP with a single binary function -- to general #CSP by showing that two sets $\mathcal{F}$ and $\mathcal{G}$ of arbitrary constraint functions are isomorphic if and only if the partition function of any #CSP instance is unchanged when we replace the functions in $\mathcal{F}$ with those in $\mathcal{G}$. We give two very different proofs of this result. First, we demonstrate the power of the simple Vandermonde interpolation technique of Cai and Govorov by extending it to general #CSP. Second, we give a proof using the intertwiners of the automorphism group of a constraint function set, a concept from the representation theory of compact groups. This proof is a generalization of a classical version of the recent proof of the Lov\'asz-type result by Man\v{c}inska and Roberson relating quantum isomorphism and homomorphisms from planar graphs.
As biometric technology is increasingly deployed, it will be common to replace parts of operational systems with newer designs. The cost and inconvenience of reacquiring enrolled users when a new vendor solution is incorporated makes this approach difficult and many applications will require to deal with information from different sources regularly. These interoperability problems can dramatically affect the performance of biometric systems and thus, they need to be overcome. Here, we describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign, whose aim was to compare fusion algorithms when biometric signals were generated using several biometric devices in mismatched conditions. Quality measures from the raw biometric data are available to allow system adjustment to changing quality conditions due to device changes. This system adjustment is referred to as quality-based conditional processing. The proposed fusion approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios. This allows the easy and efficient combination of matching scores from different devices assuming low dependence among modalities. In our system, quality information is used to switch between different system modules depending on the data source (the sensor in our case) and to reject channels with low quality data during the fusion. We compare our fusion approach to a set of rule-based fusion schemes over normalized scores. Results show that the proposed approach outperforms all the rule-based fusion schemes. We also show that with the quality-based channel rejection scheme, an overall improvement of 25% in the equal error rate is obtained.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.