A universal partial cycle (or upcycle) for $\mathcal{A}^n$ is a cyclic sequence that covers each word of length $n$ over the alphabet $\mathcal{A}$ exactly once -- like a De Bruijn cycle, except that we also allow a wildcard symbol $\mathord{\diamond}$ that can represent any letter of $\mathcal{A}$. Chen et al. in 2017 and Goeckner et al. in 2018 showed that the existence and structure of upcycles are highly constrained, unlike those of De Bruijn cycles, which exist for any alphabet size and word length. Moreover, it was not known whether any upcycles existed for $n \ge 5$. We present several examples of upcycles over both binary and non-binary alphabets for $n = 8$. We generalize two graph-theoretic representations of De Bruijn cycles to upcycles. We then introduce novel approaches to constructing new upcycles from old ones. Notably, given any upcycle for an alphabet of size $a$, we show how to construct an upcycle for an alphabet of size $ak$ for any $k \in \mathbb{N}$, so each example generates an infinite family of upcycles. We also define folds and lifts of upcycles, which relate upcycles with differing densities of $\mathord{\diamond}$ characters. In particular, we show that every upcycle lifts to a De Bruijn cycle. Our constructions rely on a different generalization of De Bruijn cycles known as perfect necklaces, and we introduce several new examples of perfect necklaces. We extend the definitions of certain pseudorandomness properties to partial words and determine which are satisfied by all upcycles, then draw a conclusion about linear feedback shift registers. Finally, we prove new nonexistence results based on the word length $n$, alphabet size, and $\mathord{\diamond}$ density.
We introduce a \emph{gain function} viewpoint of information leakage by proposing \emph{maximal $g$-leakage}, a rich class of operationally meaningful leakage measures that subsumes recently introduced leakage measures -- {maximal leakage} and {maximal $\alpha$-leakage}. In maximal $g$-leakage, the gain of an adversary in guessing an unknown random variable is measured using a {gain function} applied to the probability of correctly guessing. In particular, maximal $g$-leakage captures the multiplicative increase, upon observing $Y$, in the expected gain of an adversary in guessing a randomized function of $X$, maximized over all such randomized functions. We also consider the scenario where an adversary can make multiple attempts to guess the randomized function of interest. We show that maximal leakage is an upper bound on maximal $g$-leakage under multiple guesses, for any non-negative gain function $g$. We obtain a closed-form expression for maximal $g$-leakage under multiple guesses for a class of concave gain functions. We also study maximal $g$-leakage measure for a specific class of gain functions related to the $\alpha$-loss. In particular, we first completely characterize the minimal expected $\alpha$-loss under multiple guesses and analyze how the corresponding leakage measure is affected with the number of guesses. Finally, we study two variants of maximal $g$-leakage depending on the type of adversary and obtain closed-form expressions for them, which do not depend on the particular gain function considered as long as it satisfies some mild regularity conditions. We do this by developing a variational characterization for the R\'{e}nyi divergence of order infinity which naturally generalizes the definition of pointwise maximal leakage to incorporate arbitrary gain functions.
In 2023, Kuznetsov and Speranski introduced infinitary action logic with multiplexing $!^m\nabla \mathrm{ACT}_\omega$ and proved that the derivability problem for it lies between the $\omega$ and $\omega^\omega$ levels of the hyperarithmetical hierarchy. We prove that this problem is $\Delta^0_{\omega^\omega}$-complete under Turing reductions. Namely, we prove that it is recursively isomorphic to the satisfaction predicate for computable infinitary formulas of rank less than $\omega^\omega$ in the language of arithmetic. We also prove this result for the fragment of $!^m\nabla \mathrm{ACT}_\omega$ where Kleene star is not allowed to be in the scope of the subexponential. Finally, we present a family of logics, which are fragments of $!^m\nabla \mathrm{ACT}_\omega$, such that the complexity of the $k$-th logic is between $\Delta^0_{\omega^k}$ and $\Delta^0_{\omega^{k+1}}$.
We present a performant, general-purpose gradient-guided nested sampling algorithm, ${\tt GGNS}$, combining the state of the art in differentiable programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows ${\tt GGNS}$ to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.
We investigate algorithms for testing whether an image is connected. Given a proximity parameter $\epsilon\in(0,1)$ and query access to a black-and-white image represented by an $n\times n$ matrix of Boolean pixel values, a (1-sided error) connectedness tester accepts if the image is connected and rejects with probability at least 2/3 if the image is $\epsilon$-far from connected. We show that connectedness can be tested nonadaptively with $O(\frac 1{\epsilon^2})$ queries and adaptively with $O(\frac{1}{\epsilon^{3/2}} \sqrt{\log\frac{1}{\epsilon}})$ queries. The best connectedness tester to date, by Berman, Raskhodnikova, and Yaroslavtsev (STOC 2014) had query complexity $O(\frac 1{\epsilon^2}\log \frac 1{\epsilon})$ and was adaptive. We also prove that every nonadaptive, 1-sided error tester for connectedness must make $\Omega(\frac 1\epsilon\log \frac 1\epsilon)$ queries.
A common method to study deep learning systems is to use simplified model representations -- for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space. This approach assumes that the results of these simplified are faithful to the original model. Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution -- the understanding developed from simplified representations may be an illusion. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits. First, we train models on the Dyck balanced-parenthesis languages. We simplify these models using tools like dimensionality reduction and clustering, and then explicitly test how these simplified proxies match the behavior of the original model on various out-of-distribution test sets. We find that the simplified proxies are generally less faithful out of distribution. In cases where the original model generalizes to novel structures or deeper depths, the simplified versions may fail, or generalize better. This finding holds even if the simplified representations do not directly depend on the training distribution. Next, we study a more naturalistic task: predicting the next character in a dataset of computer code. We find similar generalization gaps between the original model and simplified proxies, and conduct further analysis to investigate which aspects of the code completion task are associated with the largest gaps. Together, our results raise questions about the extent to which mechanistic interpretations derived using tools like SVD can reliably predict what a model will do in novel situations.
We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.
We study range spaces, where the ground set consists of either polygonal curves in $\mathbb{R}^d$ or polygonal regions in the plane that may contain holes and the ranges are balls defined by an elastic distance measure, such as the Hausdorff distance, the Fr\'echet distance and the dynamic time warping distance. The range spaces appear in various applications like classification, range counting, density estimation and clustering when the instances are trajectories, time series or polygons. The Vapnik-Chervonenkis dimension (VC-dimension) plays an important role when designing algorithms for these range spaces. We show for the Fr\'echet distance of polygonal curves and the Hausdorff distance of polygonal curves and planar polygonal regions that the VC-dimension is upper-bounded by $O(dk\log(km))$ where $k$ is the complexity of the center of a ball, $m$ is the complexity of the polygonal curve or region in the ground set, and $d$ is the ambient dimension. For $d \geq 4$ this bound is tight in each of the parameters $d, k$ and $m$ separately. For the dynamic time warping distance of polygonal curves, our analysis directly yields an upper-bound of $O(\min(dk^2\log(m),dkm\log(k)))$.
This paper presents a method for finding a sparse representation of Barron functions. Specifically, given an $L^2$ function $f$, the inverse scale space flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between the Barron function associated to the measure $\mu$ and the function $f$. The convergence properties of this method are analysed in an ideal setting and in the cases of measurement noise and sampling bias. In an ideal setting the objective decreases strictly monotone in time to a minimizer with $\mathcal{O}(1/t)$, and in the case of measurement noise or sampling bias the optimum is achieved up to a multiplicative or additive constant. This convergence is preserved on discretization of the parameter space, and the minimizers on increasingly fine discretizations converge to the optimum on the full parameter space.
Novelty detection is a fundamental task of machine learning which aims to detect abnormal ($\textit{i.e.}$ out-of-distribution (OOD)) samples. Since diffusion models have recently emerged as the de facto standard generative framework with surprising generation results, novelty detection via diffusion models has also gained much attention. Recent methods have mainly utilized the reconstruction property of in-distribution samples. However, they often suffer from detecting OOD samples that share similar background information to the in-distribution data. Based on our observation that diffusion models can \emph{project} any sample to an in-distribution sample with similar background information, we propose \emph{Projection Regret (PR)}, an efficient novelty detection method that mitigates the bias of non-semantic information. To be specific, PR computes the perceptual distance between the test image and its diffusion-based projection to detect abnormality. Since the perceptual distance often fails to capture semantic changes when the background information is dominant, we cancel out the background bias by comparing it against recursive projections. Extensive experiments demonstrate that PR outperforms the prior art of generative-model-based novelty detection methods by a significant margin.
In the literature on algorithms for performing the multi-term addition $s_n=\sum_{i=1}^n x_i$ using floating-point arithmetic it is often shown that a hardware unit that has single normalization and rounding improves precision, area, latency, and power consumption, compared with the use of standard add or fused multiply-add units. However, non-monotonicity can appear when computing sums with a subclass of multi-term addition units, which currently is not explored in the literature. We demonstrate that common techniques for performing multi-term addition with $n\geq 4$, without normalization of intermediate quantities, can result in non-monotonicity -- increasing one of the addends $x_i$ decreases the sum $s_n$. Summation is required in dot product and matrix multiplication operations, operations that have increasingly started appearing in the hardware of supercomputers, thus knowing where monotonicity is preserved can be of interest to the users of these machines. Our results suggest that non-monotonicity of summation, in some of the commercial hardware devices that implement a specific class of multi-term adders, is a feature that may have appeared unintentionally as a consequence of design choices that reduce circuit area and other metrics. To demonstrate our findings, we use formal proofs as well as a numerical simulation of non-monotonic multi-term adders in MATLAB.