In this entry point into the subject, combining two elementary proofs, we decrease the gap between the upper and lower bounds by $0.2\%$ in a classical combinatorial number theory problem. We show that the maximum size of a Sidon set of $\{ 1, 2, \ldots, n\}$ is at most $\sqrt{n}+ 0.998n^{1/4}$ for sufficiently large $n$.
We develop a theory to measure the variance and covariance of probability distributions defined on the nodes of a graph, which takes into account the distance between nodes. Our approach generalizes the usual (co)variance to the setting of weighted graphs and retains many of its intuitive and desired properties. Interestingly, we find that a number of famous concepts in graph theory and network science can be reinterpreted in this setting as variances and covariances of particular distributions. As a particular application, we define the maximum variance problem on graphs with respect to the effective resistance distance, and characterize the solutions to this problem both numerically and theoretically. We show how the maximum variance distribution is concentrated on the boundary of the graph, and illustrate this in the case of random geometric graphs. Our theoretical results are supported by a number of experiments on a network of mathematical concepts, where we use the variance and covariance as analytical tools to study the (co-)occurrence of concepts in scientific papers with respect to the (network) relations between these concepts.
A family S of convex sets in the plane defines a hypergraph H = (S, E) as follows. Every subfamily S' of S defines a hyperedge of H if and only if there exists a halfspace h that fully contains S' , and no other set of S is fully contained in h. In this case, we say that h realizes S'. We say a set S is shattered, if all its subsets are realized. The VC-dimension of a hypergraph H is the size of the largest shattered set. We show that the VC-dimension for pairwise disjoint convex sets in the plane is bounded by 3, and this is tight. In contrast, we show the VC-dimension of convex sets in the plane (not necessarily disjoint) is unbounded. We provide a quadratic lower bound in the number of pairs of intersecting sets in a shattered family of convex sets in the plane. We also show that the VC-dimension is unbounded for pairwise disjoint convex sets in R^d , for d > 2. We focus on, possibly intersecting, segments in the plane and determine that the VC-dimension is always at most 5. And this is tight, as we construct a set of five segments that can be shattered. We give two exemplary applications. One for a geometric set cover problem and one for a range-query data structure problem, to motivate our findings.
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalence form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalence forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalence forms, we develop optimization algorithm, named ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\tilde{\mathcal{O}}(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance.
Finding the exact integrality gap $\alpha$ for the LP relaxation of the 2-edge-connected spanning multigraph problem (2EC) is closely related to the same problem for the Held-Karp relaxation of the metric traveling salesman problem (TSP). While the former problem seems easier than the latter, since it is less constrained, currently the upper bounds on the respective integrality gaps for the two problems are the same. An approach to proving integrality gaps for both of these problems is to consider fundamental classes of extreme points. For 2EC, better bounds on the integrality gap are known for certain important special cases of these fundamental points. For example, for half-integer square points, the integrality gap is between $\frac{6}{5}$ and $\frac{4}{3}$. Our main result is to improve the approximation factor to $\frac{9}{7}$ for 2EC for these points. Our approach is based on constructing convex combinations and our key tool is the top-down coloring framework for tree augmentation, whose flexibility we employ to exploit beneficial properties in both the initial spanning tree and in the input graph. We also show how these tools can be tailored to the closely related problem of uniform covers for which the proofs of the best-known bounds do not yield polynomial-time algorithms. Another key ingredient is to use a rainbow spanning tree decomposition, which allows us to obtain a convex combination of spanning trees with particular properties
The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*} \hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}} \|\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates were recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $\|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}^*\|_2$ for a Chebyshev estimator $\hat{\boldsymbol{\beta}}$, in a regression setting with uniformly distributed noise $\varepsilon_i\sim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $\mathbf{X}$, we can bound the error rate by $\frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. In addition we show that "Chebyshev's LASSO" has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.
This paper presents a linear prioritized local algorithm that computes large independent sets on a random $d$-regular graph with small and fixed degree $d$. We studied experimentally the independence ratio obtained by the algorithm when $ d \in [3,100]$. For all $d \in [5,100]$, our results are larger than lower bounds calculated by exact methods, thus providing improved estimates of lower bounds.
The $p$-center problem (pCP) is a fundamental problem in location science, where we are given customer demand points and possible facility locations, and we want to choose $p$ of these locations to open a facility such that the maximum distance of any customer demand point to its closest open facility is minimized. State-of-the-art solution approaches of pCP use its connection to the set cover problem to solve pCP in an iterative fashion by repeatedly solving set cover problems. The classical textbook integer programming (IP) formulation of pCP is usually dismissed due to its size and bad linear programming (LP)-relaxation bounds. We present a novel solution approach that works on a new IP formulation that can be obtained by a projection from the classical formulation. The formulation is solved by means of branch-and-cut, where cuts for demand points are iteratively generated. Moreover, the formulation can be strengthened with combinatorial information to obtain a much tighter LP-relaxation. In particular, we present a novel way to use lower bound information to obtain stronger cuts. We show that the LP-relaxation bound of our strengthened formulation has the same strength as the best known bound in literature, which is based on a semi-relaxation. Finally, we also present a computational study on instances from literature with up to more than 700,000 customers and locations. Our solution algorithm is competitive with highly sophisticated set-cover-based solution algorithms, which depend on various components and parameters.
Let $G$ be a graph with $n$ vertices and $m$ edges. One of several hierarchies towards the stability number of $G$ is the exact subgraph hierarchy (ESH). On the first level it computes the Lov\'{a}sz theta function $\vartheta(G)$ as semidefinite program (SDP) with a matrix variable of order $n+1$ and $n+m+1$ constraints. On the $k$-th level it adds all exact subgraph constraints (ESC) for subgraphs of order $k$ to the SDP. An ESC ensures that the submatrix of the matrix variable corresponding to the subgraph is in the correct polytope. By including only some ESCs into the SDP the ESH can be exploited computationally. In this paper we introduce a variant of the ESH that computes $\vartheta(G)$ through an SDP with a matrix variable of order $n$ and $m+1$ constraints. We show that it makes sense to include the ESCs into this SDP and introduce the compressed ESH (CESH) analogously to the ESH. Computationally the CESH seems favorable as the SDP is smaller. However, we prove that the bounds based on the ESH are always at least as good as those of the CESH. In computational experiments sometimes they are significantly better. We also introduce scaled ESCs (SESCs), which are a more natural way to include exactness constraints into the smaller SDP and we prove that including an SESC is equivalent to including an ESC for every subgraph.
Given a set $S$ of $n$ points in $\mathbb{R}^d$, a $k$-set is a subset of $k$ points of $S$ that can be strictly separated by a hyperplane from the remaining $n-k$ points. Similarly, one may consider $k$-facets, which are hyperplanes that pass through $d$ points of $S$ and have $k$ points on one side. A notorious open problem is to determine the asymptotics of the maximum number of $k$-sets. In this paper we study a variation on the $k$-set/$k$-facet problem with hyperplanes replaced by algebraic surfaces. In stark contrast to the original $k$-set/$k$-facet problem, there are some natural families of algebraic curves for which the number of $k$-facets can be counted exactly. For example, we show that the number of halving conic sections for any set of $2n+5$ points in general position in the plane is $2\binom{n+2}{2}^2$. To understand the limits of our argument we study a class of maps we call \emph{generally neighborly embeddings}, which map generic point sets into neighborly position. Additionally, we give a simple argument which improves the best known bound on the number of $k$-sets/$k$-facets for point sets in convex position.
The insertion-deletion codes was motivated to correct the synchronization errors. In this paper we prove several Singleton type upper bounds on the insdel distances of linear insertion-deletion codes, based on the generalized Hamming weights and the formation of minimum Hamming weight codewords. Our bound are stronger than some previous known bounds. Some or our upper bounds are valid for any fixed ordering of coordinate positions. We apply these upper bounds to some binary cyclic codes with any rearrangement of coordinate positions, binary Reed-Muller codes and one algebraic-geometric code from elliptic curves.