Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like $k$-Means, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $\epsilon$-coreset of a large dataset is a small proxy, usually a reweighted subset of points, that $(1+\epsilon)$-approximates the clustering objective for every possible center set. Our coresets for $k$-Means and $k$-Median clustering have size $(jk)^{O(\min(j,k))} (\epsilon^{-1} d \log n)^2$, where $n$ is the number of data points, $d$ is the dimension and $j$ is the maximum number of missing coordinates for each data point. We further design an algorithm to construct these coresets in near-linear time, and consequently improve a recent quadratic-time PTAS for $k$-Means with missing values [Eiben et al., SODA 2021] to near-linear time. We validate our coreset construction, which is based on importance sampling and is easy to implement, on various real data sets. Our coreset exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniform-sampling baseline. Furthermore, it significantly speeds up a Lloyd's-style heuristic for $k$-Means with missing values.
In constrained convex optimization, existing methods based on the ellipsoid or cutting plane method do not scale well with the dimension of the ambient space. Alternative approaches such as Projected Gradient Descent only provide a computational benefit for simple convex sets such as Euclidean balls, where Euclidean projections can be performed efficiently. For other sets, the cost of the projections can be too high. To circumvent these issues, alternative methods based on the famous Frank-Wolfe algorithm have been studied and used. Such methods use a Linear Optimization Oracle at each iteration instead of Euclidean projections; the former can often be performed efficiently. Such methods have also been extended to the online and stochastic optimization settings. However, the Frank-Wolfe algorithm and its variants do not achieve the optimal performance, in terms of regret or rate, for general convex sets. What is more, the Linear Optimization Oracle they use can still be computationally expensive in some cases. In this paper, we move away from Frank-Wolfe style algorithms and present a new reduction that turns any algorithm A defined on a Euclidean ball (where projections are cheap) to an algorithm on a constrained set C contained within the ball, without sacrificing the performance of the original algorithm A by much. Our reduction requires O(T log T) calls to a Membership Oracle on C after T rounds, and no linear optimization on C is needed. Using our reduction, we recover optimal regret bounds [resp. rates], in terms of the number of iterations, in online [resp. stochastic] convex optimization. Our guarantees are also useful in the offline convex optimization setting when the dimension of the ambient space is large.
Solutions to many partial differential equations satisfy certain bounds or constraints. For example, the density and pressure are positive for equations of fluid dynamics, and in the relativistic case the fluid velocity is upper bounded by the speed of light, etc. As widely realized, it is crucial to develop bound-preserving numerical methods that preserve such intrinsic constraints. Exploring provably bound-preserving schemes has attracted much attention and is actively studied in recent years. This is however still a challenging task for many systems especially those involving nonlinear constraints. Based on some key insights from geometry, we systematically propose an innovative and general framework, referred to as geometric quasilinearization (GQL), which paves a new effective way for studying bound-preserving problems with nonlinear constraints. The essential idea of GQL is to equivalently transfer all nonlinear constraints into linear ones, through properly introducing some free auxiliary variables. We establish the fundamental principle and general theory of GQL via the geometric properties of convex regions, and propose three simple effective methods for constructing GQL. We apply the GQL approach to a variety of partial differential equations, and demonstrate its effectiveness and remarkable advantages for studying bound-preserving schemes, by diverse challenging examples and applications which cannot be easily handled by direct or traditional approaches.
We consider $k$-means clustering in the online no-substitution setting where one must decide whether to take each data point $x_t$ as a center immediately upon streaming it and cannot remove centers once taken. Our work is focused on the \emph{arbitrary-order} assumption where there are no restrictions on how the points $X$ are ordered or generated. Algorithms in this setting are evaluated with respect to their approximation ratio compared to optimal clustering cost, the number of centers they select, and their memory usage. Recently, Bhattacharjee and Moshkovitz (2020) defined a parameter, $Lower_{\alpha, k}(X)$ that governs the minimum number of centers any $\alpha$-approximation clustering algorithm, allowed any amount of memory, must take given input $X$. To complement their result, we give the first algorithm that takes $\tilde{O}(Lower_{\alpha,k}(X))$ centers (hiding factors of $k, \log n$) while simultaneously achieving a constant approximation and using $\tilde{O}(k)$ memory in addition to the memory required to save the centers. Our algorithm shows that it in the no-substitution setting, it is possible to take an order-optimal number of centers while using little additional memory.
Motivated by a question of Farhi et al. [arXiv:1910.08187, 2019], we study the limitations of the Quantum Approximate Optimization Algorithm (QAOA) and show that there exists $\epsilon > 0$, such that $\epsilon\log(n)$ depth QAOA cannot arbitrarily-well approximate boolean constraint satisfaction problems as long as the problem satisfies a combinatorial property from statistical physics called the coupled overlap-gap property (OGP) [Chen et al., Annals of Probability, 47(3), 2019]. We show that the random \kxors{} problem has this property when $k\geq4$ is even by extending the corresponding result for diluted $k$-spin glasses. As a consequence of our techniques, we confirm a conjecture of Brandao et al. [arXiv:1812.04170, 2018] asserting that the landscape independence of QAOA extends to logarithmic depth -- in other words, for every fixed choice of QAOA angle parameters, the algorithm at logarithmic depth performs almost equally well on almost all instances. Our results provide a new way to study the power and limit of QAOA through statistical physics methods and combinatorial properties.
We consider the query complexity of finding a local minimum of a function defined on a graph, where at most $k$ rounds of interaction with the oracle are allowed. Rounds model parallel settings, where each query takes resources to complete and is executed on a separate processor. Thus the query complexity in $k$ rounds informs how many processors are needed to achieve a parallel time of $k$. We focus on the d-dimensional grid $[n]^d$, where the dimension $d$ is a constant, and consider two regimes for the number of rounds: constant and polynomial in n. We give algorithms and lower bounds that characterize the trade-off between the number of rounds of adaptivity and the query complexity of local search. When the number of rounds $k$ is constant, we show that the query complexity of local search in $k$ rounds is $\Theta\bigl(n^{\frac{d^{k+1} - d^k}{d^k - 1}}\bigl)$, for both deterministic and randomized algorithms. When the number of rounds is polynomial, i.e. $k = n^{\alpha}$ for $0 < \alpha < d/2$, the randomized query complexity is $\Theta\left(n^{d-1 - \frac{d-2}{d}\alpha}\right)$ for all $d \geq 5$. For $d=3$ and $d=4$, we show the same upper bound expression holds and give almost matching lower bounds. The local search analysis also enables us to characterize the query complexity of computing a Brouwer fixed point in rounds. Our proof technique for lower bounding the query complexity in rounds may be of independent interest as an alternative to the classical relational adversary method of Aaronson from the fully adaptive setting.
We first design an $\mathcal{O}(n^2)$ solution for finding a maximum induced matching in permutation graphs given their permutation models, based on a dynamic programming algorithm with the aid of the sweep line technique. With the support of the disjoint-set data structure, we improve the complexity to $\mathcal{O}(m + n)$. Consequently, we extend this result to give an $\mathcal{O}(m + n)$ algorithm for the same problem in trapezoid graphs. By combining our algorithms with the current best graph identification algorithms, we can solve the MIM problem in permutation and trapezoid graphs in linear and $\mathcal{O}(n^2)$ time, respectively. Our results are far better than the best known $\mathcal{O}(mn)$ algorithm for the maximum induced matching problem in both graph classes, which was proposed by Habib et al.
We study the revenue guarantees and approximability of item pricing. Recent work shows that with $n$ heterogeneous items, item-pricing guarantees an $O(\log n)$ approximation to the optimal revenue achievable by any (buy-many) mechanism, even when buyers have arbitrarily combinatorial valuations. However, finding good item prices is challenging -- it is known that even under unit-demand valuations, it is NP-hard to find item prices that approximate the revenue of the optimal item pricing better than $O(\sqrt{n})$. Our work provides a more fine-grained analysis of the revenue guarantees and computational complexity in terms of the number of item ``categories'' which may be significantly fewer than $n$. We assume the items are partitioned in $k$ categories so that items within a category are totally-ordered and a buyer's value for a bundle depends only on the best item contained from every category. We show that item-pricing guarantees an $O(\log k)$ approximation to the optimal (buy-many) revenue and provide a PTAS for computing the optimal item-pricing when $k$ is constant. We also provide a matching lower bound showing that the problem is (strongly) NP-hard even when $k=1$. Our results naturally extend to the case where items are only partially ordered, in which case the revenue guarantees and computational complexity depend on the width of the partial ordering, i.e. the largest set for which no two items are comparable.
It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard. In contrast, some algorithms have been shown empirically to succeed in finding solutions at low density. This phenomenon has been justified numerically by the existence of subdominant and dense connected regions of solutions, which are accessible by simple learning algorithms. In this paper, we establish formally such a phenomenon for both the symmetric and asymmetric binary perceptrons. We show that at low constraint density (equivalently for overparametrized perceptrons), there exists indeed a subdominant connected cluster of solutions with almost maximal diameter, and that an efficient multiscale majority algorithm can find solutions in such a cluster with high probability, settling in particular an open problem posed by Perkins-Xu '21. In addition, even close to the critical threshold, we show that there exist clusters of linear diameter for the symmetric perceptron, as well as for the asymmetric perceptron under additional assumptions.
Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.