Given a subset $A$ of the $n$-dimensional Boolean hypercube $\mathbb{F}_2^n$, the sumset $A+A$ is the set $\{a+a': a, a' \in A\}$ where addition is in $\mathbb{F}_2^n$. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field. The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more detail, our algorithm is given oracle access to (the indicator function of) an arbitrary $A \subseteq \mathbb{F}_2^n$ and an accuracy parameter $\epsilon > 0$, and with high probability it outputs a value $0 \leq v \leq 1$ that is $\pm \epsilon$-close to $\mathrm{Vol}(A' + A')$ for some perturbation $A' \subseteq A$ of $A$ satisfying $\mathrm{Vol}(A \setminus A') \leq \epsilon.$ It is easy to see that without the relaxation of dealing with $A'$ rather than $A$, any algorithm for estimating $\mathrm{Vol}(A+A)$ to any nontrivial accuracy must make $2^{\Omega(n)}$ queries. In contrast, we give an algorithm whose query complexity depends only on $\epsilon$ and is completely independent of the ambient dimension $n$.
We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a $d$-dimensional log-concave density with condition number $\kappa$, we show that MALA with a warm start mixes in $\tilde O(\kappa \sqrt{d})$ iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number $\kappa$ or the dimension $d$. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least $\tilde \Omega (\kappa \sqrt{d})$ steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results.
We revisit the $k$-Hessian eigenvalue problem on a smooth, bounded, $(k-1)$-convex domain in $\mathbb R^n$. First, we obtain a spectral characterization of the $k$-Hessian eigenvalue as the infimum of the first eigenvalues of linear second-order elliptic operators whose coefficients belong to the dual of the corresponding G\r{a}rding cone. Second, we introduce a non-degenerate inverse iterative scheme to solve the eigenvalue problem for the $k$-Hessian operator. We show that the scheme converges, with a rate, to the $k$-Hessian eigenvalue for all $k$. When $2\leq k\leq n$, we also prove a local $L^1$ convergence of the Hessian of solutions of the scheme. Hyperbolic polynomials play an important role in our analysis.
We study the Maximum Independent Set of Rectangles (MISR) problem, where we are given a set of axis-parallel rectangles in the plane and the goal is to select a subset of non-overlapping rectangles of maximum cardinality. In a recent breakthrough, Mitchell [2021] obtained the first constant-factor approximation algorithm for MISR. His algorithm achieves an approximation ratio of 10 and it is based on a dynamic program that intuitively recursively partitions the input plane into special polygons called corner-clipped rectangles (CCRs), without intersecting certain special horizontal line segments called fences. In this paper, we present a $(2+\epsilon)$-approximation algorithm for MISR which is also based on a recursive partitioning scheme. First, we use a partition into a class of axis-parallel polygons with constant complexity each that are more general than CCRs. This allows us to provide an arguably simpler analysis and at the same time already improves the approximation ratio to 6. Then, using a more elaborate charging scheme and a recursive partitioning into general axis-parallel polygons with constant complexity, we improve our approximation ratio to $2+\epsilon$. In particular, we construct a recursive partitioning based on more general fences which can be sequences of up to $O(1/\epsilon)$ line segments each. This partitioning routine and our other new ideas may be useful for future work towards a PTAS for MISR.
Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set $S_i$, with the total cost of $s$ does not exceed $B$ so that $f(\s)$ is maximized. To address this problem, we propose two streaming algorithms that provide approximation guarantees for the problem. In particular, in the case of each element $e$ has the same cost for all $i$-th sets, we propose a deterministic streaming algorithm which provides an approximation ratio of $\frac{1}{4}-\epsilon$ when $f$ is monotone and $\frac{1}{5}-\epsilon$ when $f$ is non-monotone. For the general case, we propose a random streaming algorithm that provides an approximation ratio of $\min\{\frac{\alpha}{2}, \frac{(1-\alpha)k}{(1+\beta)k-\beta} \}-\epsilon$ when $f$ is monotone and $\min\{\frac{\alpha}{2}, \frac{(1-\alpha)k}{(1+2\beta)k-2\beta} \}-\epsilon$ when $f$ is non-monotone in expectation, where $\beta=\max_{e\in V, i , j \in [k], i\neq j} \frac{c_i(e)}{c_j(e)}$ and $\epsilon, \alpha$ are fixed inputs.
This paper establishes the (nearly) optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^s([0,1]^d)$ with a nearly optimal approximation error $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.
We consider the control of McKean-Vlasov dynamics whose coefficients have mean field interactions in the state and control. We show that for a class of linear-convex mean field control problems, the unique optimal open-loop control admits the optimal 1/2-H\"{o}lder regularity in time. Consequently, we prove that the value function can be approximated by one with piecewise constant controls and discrete-time state processes arising from Euler-Maruyama time stepping, up to an order 1/2 error, and the optimal control can be approximated up to an order 1/4 error. These results are novel even for the case without mean field interaction.
We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, current MIP algorithms for SPCA are unable to scale beyond instances with a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We propose a custom algorithm to solve the MIP which is significantly more scalable than off-the-shelf solvers; and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}{\Delta} + \sqrt{ \frac{C \log N }{\Delta} } )$-regret, where $N, \Delta$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
This paper addresses the problem of formally verifying desirable properties of neural networks, i.e., obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i.e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.