We study the asymptotic discrepancy of $m \times m$ matrices $A_1,\ldots,A_n$ belonging to the Gaussian orthogonal ensemble, which is a class of random symmetric matrices with independent normally distributed entries. In the setting $m^2 = o(n)$, our results show that there exists a signing $x \in \{\pm1\}^n$ such that the spectral norm of $\sum_{i=1}^n x_iA_i$ is $\Theta(\sqrt{nm}4^{-(1 + o(1))n/m^2})$ with high probability. This is best possible and settles a recent conjecture by Kunisky and Zhang.
We consider the problem of releasing a sparse histogram under $(\varepsilon, \delta)$-differential privacy. The stability histogram independently adds noise from a Laplace or Gaussian distribution to the non-zero entries and removes those noisy counts below a threshold. Thereby, the introduction of new non-zero values between neighboring histograms is only revealed with probability at most $\delta$, and typically, the value of the threshold dominates the error of the mechanism. We consider the variant of the stability histogram with Gaussian noise. Recent works ([Joseph and Yu, COLT '24] and [Lebeda, SOSA '25]) reduced the error for private histograms using correlated Gaussian noise. However, these techniques can not be directly applied in the very sparse setting. Instead, we adopt Lebeda's technique and show that adding correlated noise to the non-zero counts only allows us to reduce the magnitude of noise when we have a sparsity bound. This, in turn, allows us to use a lower threshold by up to a factor of $1/2$ compared to the non-correlated noise mechanism. We then extend our mechanism to a setting without a known bound on sparsity. Additionally, we show that correlated noise can give a similar improvement for the more practical discrete Gaussian mechanism.
Let $P$ be a set of $n$ points in $\mathbb{R}^d$, and let $\varepsilon,\psi \in (0,1)$ be parameters. Here, we consider the task of constructing a $(1+\varepsilon)$-spanner for $P$, where every edge might fail (independently) with probability $1-\psi$. For example, for $\psi=0.1$, about $90\%$ of the edges of the graph fail. Nevertheless, we show how to construct a spanner that survives such a catastrophe with near linear number of edges. The measure of reliability of the graph constructed is how many pairs of vertices lose $(1+\varepsilon)$-connectivity. Surprisingly, despite the spanner constructed being of near linear size, the number of failed pairs is close to the number of failed pairs if the underlying graph was a clique. Specifically, we show how to construct such an exact dependable spanner in one dimension of size $O(\tfrac{n}{\psi} \log n)$, which is optimal. Next, we build an $(1+\varepsilon)$-spanners for a set $P \subseteq \mathbb{R}^d$ of $n$ points, of size $O( C n \log n )$, where $C \approx 1/\bigl(\varepsilon^{d} \psi^{4/3}\bigr)$. Surprisingly, these new spanners also have the property that almost all pairs of vertices have a $\leq 4$-hop paths between them realizing this short path.
We study the problem of privately releasing an approximate minimum spanning tree (MST). Given a graph $G = (V, E, \vec{W})$ where $V$ is a set of $n$ vertices, $E$ is a set of $m$ undirected edges, and $ \vec{W} \in \mathbb{R}^{|E|} $ is an edge-weight vector, our goal is to publish an approximate MST under edge-weight differential privacy, as introduced by Sealfon in PODS 2016, where $V$ and $E$ are considered public and the weight vector is private. Our neighboring relation is $\ell_\infty$-distance on weights: for a sensitivity parameter $\Delta_\infty$, graphs $ G = (V, E, \vec{W}) $ and $ G' = (V, E, \vec{W}') $ are neighboring if $\|\vec{W}-\vec{W}'\|_\infty \leq \Delta_\infty$. Existing private MST algorithms face a trade-off, sacrificing either computational efficiency or accuracy. We show that it is possible to get the best of both worlds: With a suitable random perturbation of the input that does not suffice to make the weight vector private, the result of any non-private MST algorithm will be private and achieves a state-of-the-art error guarantee. Furthermore, by establishing a connection to Private Top-k Selection [Steinke and Ullman, FOCS '17], we give the first privacy-utility trade-off lower bound for MST under approximate differential privacy, demonstrating that the error magnitude, $\tilde{O}(n^{3/2})$, is optimal up to logarithmic factors. That is, our approach matches the time complexity of any non-private MST algorithm and at the same time achieves optimal error. We complement our theoretical treatment with experiments that confirm the practicality of our approach.
We consider maximizing an unknown monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ with cardinality constraint under stochastic bandit feedback. At each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + \eta_t$ where $\eta_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum $f(S_*)$ with $|S_*| = k$, obtained through robust greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like $\tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. For a slightly restricted algorithm class, we prove a stronger regret lower bound of $\tilde{\Omega}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$. Moreover, we propose an algorithm Sub-UCB that achieves regret $\tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T}))$ capable of matching the lower bound on regret for the restricted class up to logarithmic factors.
High-dimensional problems have long been considered the Achilles' heel of Bayesian optimization algorithms. Spurred by the curse of dimensionality, a large collection of algorithms aim to make it more performant in this setting, commonly by imposing various simplifying assumptions on the objective. In this paper, we identify the degeneracies that make vanilla Bayesian optimization poorly suited to high-dimensional tasks, and further show how existing algorithms address these degeneracies through the lens of lowering the model complexity. Moreover, we propose an enhancement to the prior assumptions that are typical to vanilla Bayesian optimization algorithms, which reduces the complexity to manageable levels without imposing structural restrictions on the objective. Our modification - a simple scaling of the Gaussian process lengthscale prior with the dimensionality - reveals that standard Bayesian optimization works drastically better than previously thought in high dimensions, clearly outperforming existing state-of-the-art algorithms on multiple commonly considered real-world high-dimensional tasks.
We consider temporal numeric planning problems $\Pi$ expressed in PDDL2.1 level 3, and show how to produce SMT formulas $(i)$ whose models correspond to valid plans of $\Pi$, and $(ii)$ that extend the recently proposed planning with patterns approach from the numeric to the temporal case. We prove the correctness and completeness of the approach and show that it performs very well on 10 domains with required concurrency.
Martin-L\"{o}f type theory $\mathbf{MLTT}$ was extended by Setzer with the so-called Mahlo universe types. The extension of $\mathbf{MLTT}$ with one Mahlo universe is called $\mathbf{MLM}$ and was introduced to develop a variant of $\mathbf{MLTT}$ equipped with an analogue of a large cardinal. Another instance of constructive systems extended with an analogue of a large set was formulated in the context of Aczel's constructive set theory: $\mathbf{CZF}$. Rathjen, Griffor and Palmgren extended $\mathbf{CZF}$ with inaccessible sets of all transfinite orders. While Rathjen proved that this extended system of $\mathbf{CZF}$ is interpretable in an extension of $\mathbf{MLM}$ with one usual universe type above the Mahlo universe, it is unknown whether it can be interpreted by the Mahlo universe without a universe type above it. We extend $\mathbf{MLM}$ not by a universe type but by the accessibility predicate, and show that $\mathbf{CZF}$ with inaccessible sets can be interpreted in $\mathbf{MLM}$ with the accessibility predicate. Our interpretation of this extension of $\mathbf{CZF}$ is the same as that of Rathjen, Griffor and Palmgren formulated by $\mathbf{MLTT}$ with second-order universe operators, except that we construct the inaccessible sets by using the Mahlo universe and the accessibility predicate. We formalised the main part of our interpretation in the proof assistant Agda.
Not many tests exist for testing the equality for two or more multivariate distributions with compositional data, perhaps due to their constrained sample space. At the moment, there is only one test suggested that relies upon random projections. We propose a novel test termed {\alpha}-Energy Based Test ({\alpha}-EBT) to compare the multivariate distributions of two (or more) compositional data sets. Similar to the aforementioned test, the new test makes no parametric assumptions about the data and, based on simulation studies it exhibits higher power levels.
The Knaster-Tarski theorem, also known as Tarski's theorem, guarantees that every monotone function defined on a complete lattice has a fixed point. We analyze the query complexity of finding such a fixed point on the $k$-dimensional grid of side length $n$ under the $\leq$ relation. Specifically, there is an unknown monotone function $f: \{0,1,\ldots, n-1\}^k \to \{0,1,\ldots, n-1\}^k$ and an algorithm must query a vertex $v$ to learn $f(v)$. A key special case of interest is the Boolean hypercube $\{0,1\}^k$, which is isomorphic to the power set lattice -- the original setting of the Knaster-Tarski theorem. Our lower bound characterizes the randomized and deterministic query complexity of the Tarski search problem on the Boolean hypercube as $\Theta(k)$. More generally, we prove a randomized lower bound of $\Omega\left( k + \frac{k \cdot \log{n}}{\log{k}} \right)$ for the $k$-dimensional grid of side length $n$, which is asymptotically tight in high dimensions when $k$ is large relative to $n$.
This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.