We prove that a polynomial fraction of the set of $k$-component forests in the $m \times n$ grid graph have equal numbers of vertices in each component. This resolves a conjecture of Charikar, Liu, Liu, and Vuong. It also establishes the first provably polynomial-time algorithm for (exactly or approximately) sampling balanced grid graph partitions according to the spanning tree distribution, which weights each $k$-partition according to the product, across its $k$ pieces, of the number of spanning trees of each piece. Our result has applications to understanding political districtings, where there is an underlying graph of indivisible geographic units that must be partitioned into $k$ population-balanced connected subgraphs. In this setting, tree-weighted partitions have interesting geometric properties, and this has stimulated significant effort to develop methods to sample them.
A sharp, distribution free, non-asymptotic result is proved for the concentration of a random function around the mean function, when the randomization is generated by a finite sequence of independent data and the random functions satisfy uniform bounded variation assumptions. The specific motivation for the work comes from the need for inference on the distributional impacts of social policy intervention. However, the family of randomized functions that we study is broad enough to cover wide-ranging applications. For example, we provide a Kolmogorov-Smirnov like test for randomized functions that are almost surely Lipschitz continuous, and novel tools for inference with heterogeneous treatment effects. A Dvoretzky-Kiefer-Wolfowitz like inequality is also provided for the sum of almost surely monotone random functions, extending the famous non-asymptotic work of Massart for empirical cumulative distribution functions generated by i.i.d. data, to settings without micro-clusters proposed by Canay, Santos, and Shaikh. We illustrate the relevance of our theoretical results for applied work via empirical applications. Notably, the proof of our main concentration result relies on a novel stochastic rendition of the fundamental result of Debreu, generally dubbed the "gap lemma," that transforms discontinuous utility representations of preorders into continuous utility representations, and on an envelope theorem of an infinite dimensional optimisation problem that we carefully construct.
In 2023, Kuznetsov and Speranski introduced infinitary action logic with multiplexing $!^m\nabla \mathrm{ACT}_\omega$ and proved that the derivability problem for it lies between the $\omega$ and $\omega^\omega$ levels of the hyperarithmetical hierarchy. We prove that this problem is $\Delta^0_{\omega^\omega}$-complete under Turing reductions. Namely, we show that it is recursively isomorphic to the satisfaction predicate for computable infinitary formulas of rank less than $\omega^\omega$ in the language of arithmetic. As a consequence we prove that the closure ordinal for $!^m\nabla \mathrm{ACT}_\omega$ equals $\omega^\omega$. We also prove that the fragment of $!^m\nabla \mathrm{ACT}_\omega$ where Kleene star is not allowed to be in the scope of the subexponential is $\Delta^0_{\omega^\omega}$-complete. Finally, we present a family of logics, which are fragments of $!^m\nabla \mathrm{ACT}_\omega$, such that the complexity of the $k$-th logic lies between $\Delta^0_{\omega^k}$ and $\Delta^0_{\omega^{k+1}}$.
We introduce a \emph{gain function} viewpoint of information leakage by proposing \emph{maximal $g$-leakage}, a rich class of operationally meaningful leakage measures that subsumes recently introduced leakage measures -- {maximal leakage} and {maximal $\alpha$-leakage}. In maximal $g$-leakage, the gain of an adversary in guessing an unknown random variable is measured using a {gain function} applied to the probability of correctly guessing. In particular, maximal $g$-leakage captures the multiplicative increase, upon observing $Y$, in the expected gain of an adversary in guessing a randomized function of $X$, maximized over all such randomized functions. We also consider the scenario where an adversary can make multiple attempts to guess the randomized function of interest. We show that maximal leakage is an upper bound on maximal $g$-leakage under multiple guesses, for any non-negative gain function $g$. We obtain a closed-form expression for maximal $g$-leakage under multiple guesses for a class of concave gain functions. We also study maximal $g$-leakage measure for a specific class of gain functions related to the $\alpha$-loss. In particular, we first completely characterize the minimal expected $\alpha$-loss under multiple guesses and analyze how the corresponding leakage measure is affected with the number of guesses. Finally, we study two variants of maximal $g$-leakage depending on the type of adversary and obtain closed-form expressions for them, which do not depend on the particular gain function considered as long as it satisfies some mild regularity conditions. We do this by developing a variational characterization for the R\'{e}nyi divergence of order infinity which naturally generalizes the definition of pointwise maximal leakage to incorporate arbitrary gain functions.
Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information~(SEI) for the HEVC/H.265 without the need of altering the core design of the video codec. VVC/H.266 further improves the coding efficiency by adopting in-loop reshaping that modifies the residual signal being processed in the hybrid coding loop. In this paper, we theoretically analyze the rate-distortion performance of the in-loop reshaping and use experiments to verify the theoretical result. We prove that the in-loop reshaping can improve coding efficiency when the entropy coder adopted in the coding pipeline is suboptimal, which is in line with the practical scenarios that video codecs operate in. We derive the PSNR gain in a closed form and show that the theoretically predicted gain is consistent with that measured from experiments using standard testing video sequences.
The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent can sometimes be noticeably absent or diminished, even without explicit regularization, such as weight decay and early stopping. In this paper, we investigate this intriguing phenomenon from the optimization perspective and propose a simple optimization-based explanation for why double descent sometimes occurs weakly or not at all. To the best of our knowledge, we are the first to demonstrate that many disparate factors contributing to model-wise double descent (initialization, normalization, batch size, learning rate, optimization algorithm) are unified from the viewpoint of optimization: model-wise double descent is observed if and only if the optimizer can find a sufficiently low-loss minimum. These factors directly affect the condition number of the optimization problem or the optimizer and thus affect the final minimum found by the optimizer, reducing or increasing the height of the double descent peak. We conduct a series of controlled experiments on random feature models and two-layer neural networks under various optimization settings, demonstrating this optimization-based unified view. Our results suggest the following implication: Double descent is unlikely to be a problem for real-world machine learning setups. Additionally, our results help explain the gap between weak double descent peaks in practice and strong peaks observable in carefully designed setups.
We investigate algorithms for testing whether an image is connected. Given a proximity parameter $\epsilon\in(0,1)$ and query access to a black-and-white image represented by an $n\times n$ matrix of Boolean pixel values, a (1-sided error) connectedness tester accepts if the image is connected and rejects with probability at least 2/3 if the image is $\epsilon$-far from connected. We show that connectedness can be tested nonadaptively with $O(\frac 1{\epsilon^2})$ queries and adaptively with $O(\frac{1}{\epsilon^{3/2}} \sqrt{\log\frac{1}{\epsilon}})$ queries. The best connectedness tester to date, by Berman, Raskhodnikova, and Yaroslavtsev (STOC 2014) had query complexity $O(\frac 1{\epsilon^2}\log \frac 1{\epsilon})$ and was adaptive. We also prove that every nonadaptive, 1-sided error tester for connectedness must make $\Omega(\frac 1\epsilon\log \frac 1\epsilon)$ queries.
A fundamental question is whether one can maintain a maximum independent set in polylogarithmic update time for a dynamic collection of geometric objects in Euclidean space. Already, for a set of intervals, it is known that no dynamic algorithm can maintain an exact maximum independent set in sublinear update time. Therefore, the typical objective is to explore the trade-off between update time and solution size. Substantial efforts have been made in recent years to understand this question for various families of geometric objects, such as intervals, hypercubes, hyperrectangles, and fat objects. We present the first fully dynamic approximation algorithm for disks of arbitrary radii in the plane that maintains a constant-factor approximate maximum independent set in polylogarithmic expected amortized update time. Moreover, for a fully dynamic set of $n$ disks of unit radius in the plane, we show that a $12$-approximate maximum independent set can be maintained with worst-case update time $O(\log n)$, and optimal output-sensitive reporting. This result generalizes to fat objects of comparable sizes in any fixed dimension $d$, where the approximation ratio depends on the dimension and the fatness parameter. Further, we note that, even for a dynamic set of disks of unit radius in the plane, it is impossible to maintain $O(1+\varepsilon)$-approximate maximum independent set in truly sublinear update time, under standard complexity assumptions.
We study range spaces, where the ground set consists of either polygonal curves in $\mathbb{R}^d$ or polygonal regions in the plane that may contain holes and the ranges are balls defined by an elastic distance measure, such as the Hausdorff distance, the Fr\'echet distance and the dynamic time warping distance. The range spaces appear in various applications like classification, range counting, density estimation and clustering when the instances are trajectories, time series or polygons. The Vapnik-Chervonenkis dimension (VC-dimension) plays an important role when designing algorithms for these range spaces. We show for the Fr\'echet distance of polygonal curves and the Hausdorff distance of polygonal curves and planar polygonal regions that the VC-dimension is upper-bounded by $O(dk\log(km))$ where $k$ is the complexity of the center of a ball, $m$ is the complexity of the polygonal curve or region in the ground set, and $d$ is the ambient dimension. For $d \geq 4$ this bound is tight in each of the parameters $d, k$ and $m$ separately. For the dynamic time warping distance of polygonal curves, our analysis directly yields an upper-bound of $O(\min(dk^2\log(m),dkm\log(k)))$.
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
Game theory has by now found numerous applications in various fields, including economics, industry, jurisprudence, and artificial intelligence, where each player only cares about its own interest in a noncooperative or cooperative manner, but without obvious malice to other players. However, in many practical applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cyber-security, and national defense, players often have apparently adversarial stances, that is, selfish actions of each player inevitably or intentionally inflict loss or wreak havoc on other players. Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal strategy seeking techniques, prevailing algorithms, and practical applications. Finally, promising future research directions are also discussed for relevant adversarial games.