$k$-means clustering is a fundamental problem in various disciplines. This problem is nonconvex, and standard algorithms are only guaranteed to find a local optimum. Leveraging the structure of local solutions characterized in [1], we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution (or the ground truth). This framework consists of alternating between the following two steps iteratively: (i) detect mis-specified clusters in a local solution and (ii) improve the current local solution by non-local operations. We discuss implementation of these steps, and elucidate how the proposed framework unifies variants of $k$-means algorithm in literature from a geometric perspective. In addition, we introduce two natural extensions of the proposed framework, where the initial number of clusters is misspecified. We provide theoretical justification for our approach, which is corroborated with extensive experiments.
The basic goal of survivable network design is to build cheap networks that guarantee the connectivity of certain pairs of nodes despite the failure of a few edges or nodes. A celebrated result by Jain [Combinatorica'01] provides a 2-approximation for a wide class of these problems. However nothing better is known even for very basic special cases, raising the natural question whether any improved approximation factor is possible at all. In this paper we address one of the most basic problems in this family for which 2 is still the best-known approximation factor, the Forest Augmentation Problem (FAP): given an undirected unweighted graph (that w.l.o.g. is a forest) and a collection of extra edges (links), compute a minimum cardinality subset of links whose addition to the graph makes it 2-edge-connected. Several better-than-2 approximation algorithms are known for the special case where the input graph is a tree, a.k.a. the Tree Augmentation Problem (TAP). Recently this was achieved also for the weighted version of TAP, and for the k-edge-connectivity generalization of TAP. These results heavily exploit the fact that the input graph is connected, a condition that does not hold in FAP. In this paper we breach the 2-approximation barrier for FAP. Our result is based on two main ingredients. First, we describe a reduction to the Path Augmentation Problem (PAP), the special case of FAP where the input graph is a collection of disjoint paths. Our reduction is not approximation preserving, however it is sufficiently accurate to improve on a factor 2 approximation. Second, we present a better-than-2 approximation algorithm for PAP, an open problem on its own. Here we exploit a novel notion of implicit credits which might turn out to be helpful in future related work.
In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.
We introduce a filtering technique for Discontinuous Galerkin approximations of hyperbolic problems. Following an approach already proposed for the Hamilton-Jacobi equations by other authors, we aim at reducing the spurious oscillations that arise in presence of discontinuities when high order spatial discretizations are employed. This goal is achieved using a filter function that keeps the high order scheme when the solution is regular and switches to a monotone low order approximation if it is not. The method has been implemented in the framework of the $deal.II$ numerical library, whose mesh adaptation capabilities are also used to reduce the region in which the low order approximation is used. A number of numerical experiments demonstrate the potential of the proposed filtering technique.
We extend the Deep Galerkin Method (DGM) introduced in Sirignano and Spiliopoulos (2018)} to solve a number of partial differential equations (PDEs) that arise in the context of optimal stochastic control and mean field games. First, we consider PDEs where the function is constrained to be positive and integrate to unity, as is the case with Fokker-Planck equations. Our approach involves reparameterizing the solution as the exponential of a neural network appropriately normalized to ensure both requirements are satisfied. This then gives rise to nonlinear a partial integro-differential equation (PIDE) where the integral appearing in the equation is handled by a novel application of importance sampling. Secondly, we tackle a number of Hamilton-Jacobi-Bellman (HJB) equations that appear in stochastic optimal control problems. The key contribution is that these equations are approached in their unsimplified primal form which includes an optimization problem as part of the equation. We extend the DGM algorithm to solve for the value function and the optimal control \simultaneously by characterizing both as deep neural networks. Training the networks is performed by taking alternating stochastic gradient descent steps for the two functions, a technique inspired by the policy improvement algorithms (PIA).
Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.
Motivated by problems from neuroimaging in which existing approaches make use of "mass univariate" analysis which neglects spatial structure entirely, but the full joint modelling of all quantities of interest is computationally infeasible, a novel method for incorporating spatial dependence within a (potentially large) family of model-selection problems is presented. Spatial dependence is encoded via a Markov random field model for which a variant of the pseudo-marginal Markov chain Monte Carlo algorithm is developed and extended by a further augmentation of the underlying state space. This approach allows the exploitation of existing unbiased marginal likelihood estimators used in settings in which spatial independence is normally assumed thereby facilitating the incorporation of spatial dependence using non-spatial estimates with minimal additional development effort. The proposed algorithm can be realistically used for analysis of %smaller subsets of large image moderately sized data sets such as $2$D slices of whole $3$D dynamic PET brain images or other regions of interest. Principled approximations of the proposed method, together with simple extensions based on the augmented spaces, are investigated and shown to provide similar results to the full pseudo-marginal method. Such approximations and extensions allow the improved performance obtained by incorporating spatial dependence to be obtained at negligible additional cost. An application to measured PET image data shows notable improvements in revealing underlying spatial structure when compared to current methods that assume spatial independence.
Convection-diffusion-reaction equations model the conservation of scalar quantities. From the analytic point of view, solution of these equations satisfy under certain conditions maximum principles, which represent physical bounds of the solution. That the same bounds are respected by numerical approximations of the solution is often of utmost importance in practice. The mathematical formulation of this property, which contributes to the physical consistency of a method, is called Discrete Maximum Principle (DMP). In many applications, convection dominates diffusion by several orders of magnitude. It is well known that standard discretizations typically do not satisfy the DMP in this convection-dominated regime. In fact, in this case, it turns out to be a challenging problem to construct discretizations that, on the one hand, respect the DMP and, on the other hand, compute accurate solutions. This paper presents a survey on finite element methods, with a main focus on the convection-dominated regime, that satisfy a local or a global DMP. The concepts of the underlying numerical analysis are discussed. The survey reveals that for the steady-state problem there are only a few discretizations, all of them nonlinear, that at the same time satisfy the DMP and compute reasonably accurate solutions, e.g., algebraically stabilized schemes. Moreover, most of these discretizations have been developed in recent years, showing the enormous progress that has been achieved lately. Methods based on algebraic stabilization, nonlinear and linear ones, are currently as well the only finite element methods that combine the satisfaction of the global DMP and accurate numerical results for the evolutionary equations in the convection-dominated situation.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.