We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.
We provide an interior point method based on quasi-Newton iterations, which only requires first-order access to a strongly self-concordant barrier function. To achieve this, we extend the techniques of Dunagan-Harvey [STOC '07] to maintain a preconditioner, while using only first-order information. We measure the quality of this preconditioner in terms of its relative excentricity to the unknown Hessian matrix, and we generalize these techniques to convex functions with a slowly-changing Hessian. We combine this with an interior point method to show that, given first-order access to an appropriate barrier function for a convex set $K$, we can solve well-conditioned linear optimization problems over $K$ to $\varepsilon$ precision in time $\widetilde{O}\left(\left(\mathcal{T}+n^{2}\right)\sqrt{n\nu}\log\left(1/\varepsilon\right)\right)$, where $\nu$ is the self-concordance parameter of the barrier function, and $\mathcal{T}$ is the time required to make a gradient query. As a consequence we show that: $\bullet$ Linear optimization over $n$-dimensional convex sets can be solved in time $\widetilde{O}\left(\left(\mathcal{T}n+n^{3}\right)\log\left(1/\varepsilon\right)\right)$. This parallels the running time achieved by state of the art algorithms for cutting plane methods, when replacing separation oracles with first-order oracles for an appropriate barrier function. $\bullet$ We can solve semidefinite programs involving $m\geq n$ matrices in $\mathbb{R}^{n\times n}$ in time $\widetilde{O}\left(mn^{4}+m^{1.25}n^{3.5}\log\left(1/\varepsilon\right)\right)$, improving over the state of the art algorithms, in the case where $m=\Omega\left(n^{\frac{3.5}{\omega-1.25}}\right)$. Along the way we develop a host of tools allowing us to control the evolution of our potential functions, using techniques from matrix analysis and Schur convexity.
In this article, we provide a unified and simplified approach to derandomize central results in the area of fault-tolerant graph algorithms. Given a graph $G$, a vertex pair $(s,t) \in V(G)\times V(G)$, and a set of edge faults $F \subseteq E(G)$, a replacement path $P(s,t,F)$ is an $s$-$t$ shortest path in $G \setminus F$. For integer parameters $L,f$, a replacement path covering (RPC) is a collection of subgraphs of $G$, denoted by $\textit{G}_{L,f}=\{G_1,\ldots, G_r \}$, such that for every set $F$ of at most $f$ faults (i.e., $|F|\le f$) and every replacement path $P(s,t,F)$ of at most $L$ edges, there exists a subgraph $G_i\in \textit{G}_{L,f}$ that contains all the edges of $P$ and does not contain any of the edges of $F$. The covering value of the RPC $\textit{G}_{L,f}$ is then defined to be the number of subgraphs in $\textit{G}_{L,f}$. We present efficient deterministic constructions of $(L,f)$-RPCs whose covering values almost match the randomized ones, for a wide range of parameters. Our time and value bounds improve considerably over the previous construction of Parter (DISC 2019). We also provide an almost matching lower bound for the value of these coverings. A key application of our above deterministic constructions is the derandomization of the algebraic construction of the distance sensitivity oracle by Weimann and Yuster (FOCS 2010). The preprocessing and query time of the our deterministic algorithm nearly match the randomized bounds. This resolves the open problem of Alon, Chechik and Cohen (ICALP 2019).
Recent work in the matrix completion literature has shown that prior knowledge of a matrix's row and column spaces can be successfully incorporated into reconstruction programs to substantially benefit matrix recovery. This paper proposes a novel methodology that exploits more general forms of known matrix structure in terms of subspaces. The work derives reconstruction error bounds that are informative in practice, providing insight to previous approaches in the literature while introducing novel programs that severely reduce sampling complexity. The main result shows that a family of weighted nuclear norm minimization programs incorporating a $M_1 r$-dimensional subspace of $n\times n$ matrices (where $M_1\geq 1$ conveys structural properties of the subspace) allow accurate approximation of a rank $r$ matrix aligned with the subspace from a near-optimal number of observed entries (within a logarithmic factor of $M_1 r)$. The result is robust, where the error is proportional to measurement noise, applies to full rank matrices, and reflects degraded output when erroneous prior information is utilized. Numerical experiments are presented that validate the theoretical behavior derived for several example weighted programs.
Nonsmooth composite optimization with orthogonality constraints has a broad spectrum of applications in statistical learning and data science. However, this problem is generally challenging to solve due to its non-convex and non-smooth nature. Existing solutions are limited by one or more of the following restrictions: (i) they are full gradient methods that require high computational costs in each iteration; (ii) they are not capable of solving general nonsmooth composite problems; (iii) they are infeasible methods and can only achieve the feasibility of the solution at the limit point; (iv) they lack rigorous convergence guarantees; (v) they only obtain weak optimality of critical points. In this paper, we propose \textit{\textbf{OBCD}}, a new Block Coordinate Descent method for solving general nonsmooth composite problems under Orthogonality constraints. \textit{\textbf{OBCD}} is a feasible method with low computation complexity footprints. In each iteration, our algorithm updates $k$ rows of the solution matrix ($k\geq2$ is a parameter) to preserve the constraints. Then, it solves a small-sized nonsmooth composite optimization problem under orthogonality constraints either exactly or approximately. We demonstrate that any exact block-$k$ stationary point is always an approximate block-$k$ stationary point, which is equivalent to the critical stationary point. We are particularly interested in the case where $k=2$ as the resulting subproblem reduces to a one-dimensional nonconvex problem. We propose a breakpoint searching method and a fifth-order iterative method to solve this problem efficiently and effectively. We also propose two novel greedy strategies to find a good working set to further accelerate the convergence of \textit{\textbf{OBCD}}. Finally, we have conducted extensive experiments on several tasks to demonstrate the superiority of our approach.
We explore the notion of history-determinism in the context of timed automata (TA) over infinite timed words. History-deterministic (HD) automata are those in which nondeterminism can be resolved on the fly, based on the run constructed thus far. History-determinism is a robust property that admits different game-based characterisations, and HD specifications allow for game-based verification without an expensive determinization step. We show that the class of timed $\omega$-languages recognised by HD timed automata strictly extends that of deterministic ones, and is strictly included in those recognised by fully non-deterministic TA. For non-deterministic timed automata it is known that universality is already undecidable for B\"uchi TA. For history-deterministic TA with arbitrary parity acceptance, we show that timed universality, inclusion, and synthesis all remain decidable and are EXPTIME-complete. For the subclass of TA with safety or reachability acceptance, one can decide (in EXPTIME) whether such an automaton is history-deterministic. If so, it can effectively determinized without introducing new automata states.
Minimax problems have recently attracted a lot of research interests. A few efforts have been made to solve decentralized nonconvex strongly-concave (NCSC) minimax-structured optimization; however, all of them focus on smooth problems with at most a constraint on the maximization variable. In this paper, we make the first attempt on solving composite NCSC minimax problems that can have convex nonsmooth terms on both minimization and maximization variables. Our algorithm is designed based on a novel reformulation of the decentralized minimax problem that introduces a multiplier to absorb the dual consensus constraint. The removal of dual consensus constraint enables the most aggressive (i.e., local maximization instead of a gradient ascent step) dual update that leads to the benefit of taking a larger primal stepsize and better complexity results. In addition, the decoupling of the nonsmoothness and consensus on the dual variable eases the analysis of a decentralized algorithm; thus our reformulation creates a new way for interested researchers to design new (and possibly more efficient) decentralized methods on solving NCSC minimax problems. We show a global convergence result of the proposed algorithm and an iteration complexity result to produce a (near) stationary point of the reformulation. Moreover, a relation is established between the (near) stationarities of the reformulation and the original formulation. With this relation, we show that when the dual regularizer is smooth, our algorithm can have lower complexity results (with reduced dependence on a condition number) than existing ones to produce a near-stationary point of the original formulation. Numerical experiments are conducted on a distributionally robust logistic regression to demonstrate the performance of the proposed algorithm.
We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
Given a set of points $P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$ for some constant $d$ and a supply function $\mu:P\to \mathbb{R}$ such that $\mu(p) > 0~\forall p \in P^+$, $\mu(p) < 0~\forall p \in P^-$, and $\sum_{p\in P}{\mu(p)} = 0$, the geometric transportation problem asks one to find a transportation map $\tau: P^+\times P^-\to \mathbb{R}_{\ge 0}$ such that $\sum_{q\in P^-}{\tau(p, q)} = \mu(p)~\forall p \in P^+$, $\sum_{p\in P^+}{\tau(p, q)} = -\mu(q)~ \forall q \in P^-$, and the weighted sum of Euclidean distances for the pairs $\sum_{(p,q)\in P^+\times P^-}\tau(p, q)\cdot ||q-p||_2$ is minimized. We present the first deterministic algorithm that computes, in near-linear time, a transportation map whose cost is within a $(1 + \varepsilon)$ factor of optimal. More precisely, our algorithm runs in $O(n\varepsilon^{-(d+2)}\log^5{n}\log{\log{n}})$ time for any constant $\varepsilon > 0$. Surprisingly, our result is not only a generalization of a bipartite matching one to arbitrary instances of geometric transportation, but it also reduces the running time for all previously known $(1 + \varepsilon)$-approximation algorithms, randomized or deterministic, even for geometric bipartite matching. In particular, we give the first $(1 + \varepsilon)$-approximate deterministic algorithm for geometric bipartite matching and the first $(1 + \varepsilon)$-approximate deterministic or randomized algorithm for geometric transportation with no dependence on $d$ in the exponent of the running time's polylog. As an additional application of our main ideas, we also give the first randomized near-linear $O(poly(1 / \varepsilon) m \log^{O(1)} n)$ time $(1 + \varepsilon)$-approximation algorithm for the uncapacitated minimum cost flow (transshipment) problem in undirected graphs with arbitrary real edge costs.
A compartmental deterministic model that allows (1) immunity from two stages of infection and carriage, and (2) disease induced death, is used in studying the dynamics of meningitis epidemic process in a closed population. It allows for difference in the transmission rate of infection to a susceptible by a carrier and an infective. It is generalized to allow a proportion ({\phi}) of those susceptibles infected to progress directly to infectives in stage I. Both models are used in this study. The threshold conditions for the spread of carrier and infectives in stage I are derived for the two models. Sensitivity analysis is performed on the reproductive number derived from the next generation matrix. The case-carrier ratio profile for various parameters and threshold values are shown. So also are the graphs of the total number ever infected as influenced by {\epsilon} and {\phi}. The infection transmission rate (\b{eta}), the odds in favor of a carrier, over an infective, in transmitting an infection to a susceptible ({\epsilon}) and the carrier conversion rate ({\phi}) to an infective in stage I, are identified as key parameters that should be subject of attention for any control intervention strategy. The case-carrier ratio profiles provide evidence of a critical case-carrier ratio attained before the number of reported cases grows to an epidemic level. They also provide visual evidence of epidemiological context, in this case, epidemic incidence (in later part of dry season) and endemic incidence (during rainy season). Results from total proportion ever infected suggest that the model, in which {\phi}=0 obtained, can adequately represent, in essence, the generalized model for this study.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.