We propose a method for estimating a log-concave density on $\mathbb R^d$ from samples, under the assumption that there exists an orthogonal transformation that makes the components of the random vector independent. While log-concave density estimation is hard both computationally and statistically, the independent components assumption alleviates both issues, while still maintaining a large non-parametric class. We prove that under mild conditions, at most $\tilde{\mathcal{O}}(\epsilon^{-4})$ samples (suppressing constants and log factors) suffice for our proposed estimator to be within $\epsilon$ of the original density in squared Hellinger distance. On the computational front, while the usual log-concave maximum likelihood estimate can be obtained via a finite-dimensional convex program, it is slow to compute -- especially in higher dimensions. We demonstrate through numerical experiments that our estimator can be computed efficiently, making it more practical to use.
The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found. The loss at minima belonging to the first type converges to zero as $d$ increases. In the second type, the loss remains bounded away from zero. That being so, how may one avoid minima belonging to the latter type? Fortunately, such minima are never detected by standard optimization methods. Motivated by questions concerning the nature of this phenomenon, we develop methods to study distinctive analytic properties of hidden minima. By existing analyses, the Hessian spectrum of both types agree modulo $O(d^{-1/2})$-terms -- not promising. Thus, rather, our investigation proceeds by studying curves along which the loss is minimized or maximized, generally referred to as tangency arcs. We prove that apparently far removed group representation-theoretic considerations concerning the arrangement of subspaces invariant to the action of subgroups of $S_d$, the symmetry group over $d$ symbols, relative to ones fixed by the action yield a precise description of all finitely many admissible types of tangency arcs. The general results used for the loss function reveal that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent in previous work, indicating in particular the subtlety of the analysis. The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame to enable a numerical construction of tangency arcs and so compare how minima, both types, are positioned relative to adjacent critical points.
Fault-tolerant connectivity labelings are schemes that, given an $n$-vertex graph $G=(V,E)$ and $f\geq 1$, produce succinct yet informative labels for the elements of the graph. Given only the labels of two vertices $u,v$ and of the elements in a faulty-set $F$ with $|F|\leq f$, one can determine if $u,v$ are connected in $G-F$, the surviving graph after removing $F$. For the edge or vertex faults models, i.e., $F\subseteq E$ or $F\subseteq V$, a sequence of recent work established schemes with $poly(f,\log n)$-bit labels. This paper considers the color faults model, recently introduced in the context of spanners [Petruschka, Sapir and Tzalik, ITCS'24], which accounts for known correlations between failures. Here, the edges (or vertices) of the input $G$ are arbitrarily colored, and the faulty elements in $F$ are colors; a failing color causes all edges (vertices) of that color to crash. Our main contribution is settling the label length complexity for connectivity under one color fault ($f=1$). The existing implicit solution, by applying the state-of-the-art scheme for edge faults of [Dory and Parter, PODC'21], might yield labels of $\Omega(n)$ bits. We provide a deterministic scheme with labels of $\tilde{O}(\sqrt{n})$ bits in the worst case, and a matching lower bound. Moreover, our scheme is universally optimal: even schemes tailored to handle only colorings of one specific graph topology cannot produce asymptotically smaller labels. We extend our labeling approach to yield a routing scheme avoiding a single forbidden color. We also consider the centralized setting, and show an $\tilde{O}(n)$-space oracle, answering connectivity queries under one color fault in $\tilde{O}(1)$ time. Turning to $f\geq 2$ color faults, we give a randomized labeling scheme with $\tilde{O}(n^{1-1/2^f})$-bit labels, along with a lower bound of $\Omega(n^{1-1/(f+1)})$ bits.
We develop a Markovian framework for load balancing that combines classical algorithms such as Power-of-$d$ with auto-scaling mechanisms that allow the net service capacity to scale up or down in response to the current load on the same timescale as job dynamics. Our framework is inspired by serverless platforms, such as Knative, where servers are software functions that can be flexibly instantiated in milliseconds according to scaling rules defined by the users of the serverless platform. The main question is how to design such scaling rules to minimize user-perceived delay performance while ensuring low energy consumption. For the first time, we investigate this problem when the auto-scaling and load balancing processes operate asynchronously (or proactively), as in Knative. In contrast to the synchronous (or reactive) paradigm, asynchronism brings the advantage that jobs do not necessarily need to wait any time a scale-up decision is taken. In our main result, we find a general condition on the structure of scaling rules able to drive mean-field dynamics to delay and relative energy optimality, i.e., a situation where both the user-perceived delay and the relative energy waste induced by idle servers vanish in the limit where the network demand grows to infinity in proportion to the nominal service capacity. The identified condition suggests to scale up the current net capacity if and only if the mean demand exceeds the rate at which servers become idle and active. Finally, we propose a family of scaling rules that satisfy our optimality condition. Numerical simulations demonstrate that these rules provide better delay performance than existing synchronous auto-scaling schemes while inducing almost the same power consumption.
We construct explicit pseudorandom generators that fool $n$-variate polynomials of degree at most $d$ over a finite field $\mathbb{F}_q$. The seed length of our generators is $O(d \log n + \log q)$, over fields of size exponential in $d$ and characteristic at least $d(d-1)+1$. Previous constructions such as Bogdanov's (STOC 2005) and Derksen and Viola's (FOCS 2022) had either suboptimal seed length or required the field size to depend on $n$. Our approach follows Bogdanov's paradigm while incorporating techniques from Lecerf's factorization algorithm (J. Symb. Comput. 2007) and insights from the construction of Derksen and Viola regarding the role of indecomposability of polynomials.
We propose an $\widetilde{O}(n + 1/\varepsilon)$-time FPTAS (Fully Polynomial-Time Approximation Scheme) for the classical Partition problem. This is the best possible (up to a logarithmic factor) assuming SETH (Strong Exponential Time Hypothesis) [Abboud, Bringmann, Hermelin, and Shabtay'22]. Prior to our work, the best known FPTAS for Partition runs in $\widetilde{O}(n + 1/\varepsilon^{5/4})$ time [Deng, Jin and Mao'23, Wu and Chen'22]. Our result is obtained by solving a more general problem of weakly approximating Subset Sum.
We study the fundamental problem of estimating the mean of a $d$-dimensional distribution with covariance $\Sigma \preccurlyeq \sigma^2 I_d$ given $n$ samples. When $d = 1$, \cite{catoni} showed an estimator with error $(1+o(1)) \cdot \sigma \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$, with probability $1 - \delta$, matching the Gaussian error rate. For $d>1$, a natural estimator outputs the center of the minimum enclosing ball of one-dimensional confidence intervals to achieve a $1-\delta$ confidence radius of $\sqrt{\frac{2 d}{d+1}} \cdot \sigma \left(\sqrt{\frac{d}{n}} + \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}\right)$, incurring a $\sqrt{\frac{2d}{d+1}}$-factor loss over the Gaussian rate. When the $\sqrt{\frac{d}{n}}$ term dominates by a $\sqrt{\log \frac{1}{\delta}}$ factor, \cite{lee2022optimal-highdim} showed an improved estimator matching the Gaussian rate. This raises a natural question: Is the $\sqrt{\frac{2 d}{d+1}}$ loss \emph{necessary} when the $\sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$ term dominates? We show that the answer is \emph{no} -- we construct an estimator that improves over the above naive estimator by a constant factor. We also consider robust estimation, where an adversary is allowed to corrupt an $\epsilon$-fraction of samples arbitrarily: in this case, we show that the above strategy of combining one-dimensional estimates and incurring the $\sqrt{\frac{2d}{d+1}}$-factor \emph{is} optimal in the infinite-sample limit.
Motivated by the difficulty of specifying complete ordinal preferences over a large set of $m$ candidates, we study voting rules that are computable by querying voters about $t < m$ candidates. Generalizing prior works that focused on specific instances of this problem, our paper fully characterizes the set of positional scoring rules that can be computed for any $1 \leq t < m$, which notably does not include plurality. We then extend this to show a similar impossibility result for single transferable vote (elimination voting). These negative results are information-theoretic and agnostic to the number of queries. Finally, for scoring rules that are computable with limited-sized queries, we give parameterized upper and lower bounds on the number of such queries a deterministic or randomized algorithm must make to determine the score-maximizing candidate. While there is no gap between our bounds for deterministic algorithms, identifying the exact query complexity for randomized algorithms is a challenging open problem, of which we solve one special case.
We consider composition orderings for linear functions of one variable. Given $n$ linear functions $f_1,\dots,f_n$ and a constant $c$, the objective is to find a permutation $\sigma$ that minimizes/maximizes $f_{\sigma(n)}\circ\dots\circ f_{\sigma(1)}(c)$. It was first studied in the area of time-dependent scheduling, and known to be solvable in $O(n\log n)$ time if all functions are nondecreasing. In this paper, we present a complete characterization of optimal composition orderings for this case, by regarding linear functions as two-dimensional vectors. We also show several interesting properties on optimal composition orderings such as the equivalence between local and global optimality. Furthermore, by using the characterization above, we provide a fixed-parameter tractable (FPT) algorithm for the composition ordering problem for general linear functions, with respect to the number of decreasing linear functions. We next deal with matrix multiplication orderings as a generalization of composition of linear functions. Given $n$ matrices $M_1,\dots,M_n\in\mathbb{R}^{m\times m}$ and two vectors $w,y\in\mathbb{R}^m$, where $m$ denotes a positive integer, the objective is to find a permutation $\sigma$ that minimizes/maximizes $w^\top M_{\sigma(n)}\dots M_{\sigma(1)} y$. The problem is also viewed as a generalization of flow shop scheduling through a limit. By this extension, we show that the multiplication ordering problem for $2\times 2$ matrices is solvable in $O(n\log n)$ time if all the matrices are simultaneously triangularizable and have nonnegative determinants, and FPT with respect to the number of matrices with negative determinants, if all the matrices are simultaneously triangularizable. As the negative side, we finally prove that three possible natural generalizations are NP-hard: 1) when $m=2$, 2) when $m\geq 3$, and 3) the target version of the problem.
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions, while minimizing the number of samples drawn from them in total. Unlike in the usual collaborative learning setup, it is not assumed that there exists a single classifier that is simultaneously accurate for all distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible. We give a learning algorithm based on Empirical Risk Minimization (ERM) on a natural augmentation of the hypothesis class, and the analysis relies on an upper bound on the VC dimension of this augmented class. In terms of the computational efficiency, we show that ERM on the augmented hypothesis class is NP-hard, which gives evidence against the existence of computationally efficient learners in general. On the positive side, for two special cases, we give learners that are both sample- and computationally-efficient.
We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming $\nabla h$ is Lipschitz and $M$ has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within $\epsilon$-Wasserstein distance of $\pi^*$ after $\tilde{O}(\epsilon^{-2})$ steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where $h$ can be nonconvex and $M$ can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that $\pi^*$ satisfies a $CD(\cdot,\infty)$ condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by $\tilde{O}(\epsilon^{-2})$ as well.