We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the posterior mean histogram and with respect to posterior contraction: under the class of Borel probability measures and some classes of smooth densities. The attained memory footprint overcomes existing minimax optimal procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class. Additionally constructing both the posterior mean histogram and the posterior itself can be done super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the coverage provided by the $d < 2v$ case, our results are of most practical interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide simulations demonstrating the theory in several of these instances.
We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a novel loss function and a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. Further, we show that our estimator can be interpreted as a solution to minimizing a particular Bregman score as well as an instance of minimizing the \textit{surrogate} likelihood. We also provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of $\alpha$ in the parameter estimation with sample complexity $O({\sf poly}(k)/\alpha^2)$. Our method achives the order-optimal sample complexity of $O({\sf log}(k)/\alpha^2)$ when tailored for node-wise-sparse Markov random fields. Finally, we demonstrate the performance of our estimator via numerical experiments.
What is the time complexity of matrix multiplication of sparse integer matrices with $m_{in}$ nonzeros in the input and $m_{out}$ nonzeros in the output? This paper provides improved upper bounds for this question for almost any choice of $m_{in}$ vs. $m_{out}$, and provides evidence that these new bounds might be optimal up to further progress on fast matrix multiplication. Our main contribution is a new algorithm that reduces sparse matrix multiplication to dense (but smaller) rectangular matrix multiplication. Our running time thus depends on the optimal exponent $\omega(a,b,c)$ of multiplying dense $n^a\times n^b$ by $n^b\times n^c$ matrices. We discover that when $m_{out}=\Theta(m_{in}^r)$ the time complexity of sparse matrix multiplication is $O(m_{in}^{\sigma+\epsilon})$, for all $\epsilon > 0$, where $\sigma$ is the solution to the equation $\omega(\sigma-1,2-\sigma,1+r-\sigma)=\sigma$. No matter what $\omega(\cdot,\cdot,\cdot)$ turns out to be, and for all $r\in(0,2)$, the new bound beats the state of the art, and we provide evidence that it is optimal based on the complexity of the all-edge triangle problem. In particular, in terms of the input plus output size $m = m_{in} + m_{out}$ our algorithm runs in time $O(m^{1.3459})$. Even for Boolean matrices, this improves over the previous $m^{\frac{2\omega}{\omega+1}+\epsilon}=O(m^{1.4071})$ bound [Amossen, Pagh; 2009], which was a natural barrier since it coincides with the longstanding bound of all-edge triangle in sparse graphs [Alon, Yuster, Zwick; 1994]. We find it interesting that matrix multiplication can be solved faster than triangle detection in this natural setting. In fact, we establish an equivalence to a special case of the all-edge triangle problem.
We study the complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $\Omega(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(d\delta^{-1}\epsilon^{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $\delta,\epsilon$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful geometric lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.
This paper analyses the nonconforming Morley type virtual element method to approximate a regular solution to the von K\'{a}rm\'{a}n equations that describes bending of very thin elastic plates. Local existence and uniqueness of a discrete solution to the non-linear problem is discussed. A priori error estimate in the energy norm is established under minimal regularity assumptions on the exact solution. Error estimates in piecewise $H^1$ and $L^2$ norm are also derived. A working procedure to find an approximation for the discrete solution using Newtons method is discussed. Numerical results that justify theoretical estimates are presented.
Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a prototypical example of undirected graphical models and has wide applications in many real-world problems. We establish a sharp computational phase transition for influence maximization on sparse Ising models under a bounded budget: In the high-temperature regime, we give a linear-time algorithm for finding a small subset of variables and their values which achieve nearly optimal influence; In the low-temperature regime, we show that the influence maximization problem becomes $\mathsf{NP}$-hard under commonly-believed complexity assumption. The critical temperature coincides with the tree uniqueness/non-uniqueness threshold for Ising models which is also a critical point for other computational problems including approximate sampling and counting.
Given a set system $\mathcal{X} = \{\mathcal{U},\mathcal{S}\}$, where $\mathcal{U}$ is a set of elements and $\mathcal{S}$ is a set of subsets of $\mathcal{U}$, an exact hitting set $\mathcal{U}'$ is a subset of $\mathcal{U}$ such that each subset in $\mathcal{S}$ contains exactly one element in $\mathcal{U}'$. We refer to a set system as exactly hittable if it has an exact hitting set. In this paper, we study interval graphs which have intersection models that are exactly hittable. We refer to these interval graphs as exactly hittable interval graphs (EHIG). We present a forbidden structure characterization for EHIG. We also show that the class of proper interval graphs is a strict subclass of EHIG. Finally, we give an algorithm that runs in polynomial time to recognize graphs belonging to the class of EHIG.
The combined universal probability $\mathbf{m}(D)$ of strings $x$ in sets $D$ is close to max $\mathbf{m}(x)$ over $x$ in $D$: their logs differ by at most $D$'s information $\mathbf{I}(D:\mathcal{H})$ about the halting sequence $\mathcal{H}$.
A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies $O(n\log U)$ bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] bits, while supporting all operations in $O(k)$ time, for any parameter $k \leq \log^* n$. The term $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For $U=n^{1+\Theta(1)}$, any dictionary with $O(\log^{(k)} n)$ wasted bits per key must have expected operational time $\Omega(k)$, in the cell-probe model with word-size $w=\Theta(\log U)$. Furthermore, if a dictionary stores values of $\Theta(\log U)$ bits, we show that regardless of the query time, it must have $\Omega(k)$ expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.
In interactive coding, Alice and Bob wish to compute some function $f$ of their individual private inputs $x$ and $y$. They do this by engaging in an interactive protocol to jointly compute $f(x,y)$. The goal is to do this in an error-resilient way, such that even given some fraction of adversarial corruptions to the protocol, both parties still learn $f(x,y)$. Typically, the error resilient protocols constructed by interactive coding schemes are \emph{non-adaptive}, that is, the length of the protocol as well as the speaker in each round is fixed beforehand. The maximal error resilience obtainable by non-adaptive schemes is now well understood. In order to circumvent known barriers and achieve higher error resilience, the work of Agrawal, Gelles, and Sahai (ISIT 2016) introduced to interactive coding the notion of \emph{adaptive} schemes, where the length of the protocol or the speaker order are no longer necessarily fixed. In this paper, we study the power of \emph{adaptive termination} in the context of the error resilience of interactive coding schemes. In other words, what is the power of schemes where Alice and Bob are allowed to disengage from the protocol early? We study this question in two contexts, both for the task of \emph{message exchange}, where the goal is to learn the other party's input.
Event-based sensors, distinguished by their high temporal resolution of 1$\mathrm{\mu s}$ and a dynamic range of 120$\mathrm{dB}$, stand out as ideal tools for deployment in fast-paced settings like vehicles and drones. Traditional object detection techniques that utilize Artificial Neural Networks (ANNs) face challenges due to the sparse and asynchronous nature of the events these sensors capture. In contrast, Spiking Neural Networks (SNNs) offer a promising alternative, providing a temporal representation that is inherently aligned with event-based data. This paper explores the unique membrane potential dynamics of SNNs and their ability to modulate sparse events. We introduce an innovative spike-triggered adaptive threshold mechanism designed for stable training. Building on these insights, we present a specialized spiking feature pyramid network (SpikeFPN) optimized for automotive event-based object detection. Comprehensive evaluations demonstrate that SpikeFPN surpasses both traditional SNNs and advanced ANNs enhanced with attention mechanisms. Evidently, SpikeFPN achieves a mean Average Precision (mAP) of 0.477 on the {GEN1 Automotive Detection (GAD)} benchmark dataset, marking a significant increase of 9.7\% over the previous best SNN. Moreover, the efficient design of SpikeFPN ensures robust performance while optimizing computational resources, attributed to its innate sparse computation capabilities.