Permutation pattern-avoidance is a central concept of both enumerative and extremal combinatorics. In this paper we study the effect of permutation pattern-avoidance on the complexity of optimization problems. In the context of the dynamic optimality conjecture (Sleator, Tarjan, STOC 1983), Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak (FOCS 2015) conjectured that the amortized access cost of an optimal binary search tree (BST) is $O(1)$ whenever the access sequence avoids some fixed pattern. They showed a bound of $2^{\alpha{(n)}^{O(1)}}$, which was recently improved to $2^{\alpha{(n)}(1+o(1))}$ by Chalermsook, Pettie, and Yingchareonthawornchai (2023); here $n$ is the BST size and $\alpha(\cdot)$ the inverse-Ackermann function. In this paper we resolve the conjecture, showing a tight $O(1)$ bound. This indicates a barrier to dynamic optimality: any candidate online BST (e.g., splay trees or greedy trees) must match this optimum, but current analysis techniques only give superconstant bounds. More broadly, we argue that the easiness of pattern-avoiding input is a general phenomenon, not limited to BSTs or even to data structures. To illustrate this, we show that when the input avoids an arbitrary, fixed, a priori unknown pattern, one can efficiently compute a $k$-server solution of $n$ requests from a unit interval, with total cost $n^{O(1/\log k)}$, in contrast to the worst-case $\Theta(n/k)$ bound; and a traveling salesman tour of $n$ points from a unit box, of length $O(\log{n})$, in contrast to the worst-case $\Theta(\sqrt{n})$ bound; similar results hold for the euclidean minimum spanning tree, Steiner tree, and nearest-neighbor graphs. We show both results to be tight. Our techniques build on the Marcus-Tardos proof of the Stanley-Wilf conjecture, and on the recently emerging concept of twin-width; we believe our techniques to be more generally applicable.
In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.
In this article, we consider the problem of approximating a finite set of data (usually huge in applications) by invariant subspaces generated through a small set of smooth functions. The invariance is either by translations under a full-rank lattice or through the action of crystallographic groups. Smoothness is ensured by stipulating that the generators belong to a Paley-Wiener space, that is selected in an optimal way based on the characteristics of the given data. To complete our investigation, we analyze the fundamental role played by the lattice in the process of approximation.
We introduce PUNQ, a novel quantum programming language with quantum control, which features higher-order programs that can be superposed, enabling quantum control via quantum conditionals. Our language boasts a type system guaranteeing both unitarity and polynomial-time normalization. Unitarity is achieved by using a special modality for superpositions while requiring orthogonality among superposed terms. Polynomial-time normalization is achieved using a linear-logic-based type discipline employing Barber and Plotkin duality along with a specific modality to account for potential duplications. This type discipline also guarantees that derived values have polynomial size. PUNQ seamlessly combines the two modalities: quantum circuit programs uphold unitarity, and all programs are evaluated in polynomial time, ensuring their feasibility.
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This very general proposal is applicable to any convolution-closed distribution, a class that includes the Gaussian, Poisson, negative binomial, gamma, and binomial distributions, among others. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the usual approach of cross-validation via sample splitting, especially in settings in which the latter is not applicable. In simulations and in an application to single-cell RNA-sequencing data, we show that data thinning can be used to validate the results of unsupervised learning approaches, such as k-means clustering and principal components analysis, for which traditional sample splitting is unattractive or unavailable.
In this manuscript, we discuss a class of difference-based estimators of the autocovariance structure in a semiparametric regression model where the signal is discontinuous and the errors are serially correlated. The signal in this model consists of a sum of the function with jumps and an identifiable smooth function. A simpler form of this model has been considered earlier under the name of Nonparametric Jump Regression (NJRM). The estimators proposed allow us to bypass a complicated problem of prior estimation of the mean signal in such a model. We provide finite-sample expressions for biases and variance of the proposed estimators when the errors are Gaussian. Gaussianity in the above is only needed to provide explicit closed form expressions for biases and variances of our estimators. Moreover, we observe that the mean squared error of the proposed variance estimator does not depend on either the unknown smooth function that is a part of the mean signal nor on the values of difference sequence coefficients. Our approach also suggests sufficient conditions for $\sqrt{n}-$ consistency of the proposed estimators.
Adversarial robustness and generalization are both crucial properties of reliable machine learning models. In this paper, we study these properties in the context of quantum machine learning based on Lipschitz bounds. We derive tailored, parameter-dependent Lipschitz bounds for quantum models with trainable encoding, showing that the norm of the data encoding has a crucial impact on the robustness against perturbations in the input data. Further, we derive a bound on the generalization error which explicitly depends on the parameters of the data encoding. Our theoretical findings give rise to a practical strategy for training robust and generalizable quantum models by regularizing the Lipschitz bound in the cost. Further, we show that, for fixed and non-trainable encodings as frequently employed in quantum machine learning, the Lipschitz bound cannot be influenced by tuning the parameters. Thus, trainable encodings are crucial for systematically adapting robustness and generalization during training. With numerical results, we demonstrate that, indeed, Lipschitz bound regularization leads to substantially more robust and generalizable quantum models.
This investigation is firstly focused into showing that two metric parameters represent the same object in graph theory. That is, we prove that the multiset resolving sets and the ID-colorings of graphs are the same thing. We also consider some computational and combinatorial problems of the multiset dimension, or equivalently, the ID-number of graphs. We prove that the decision problem concerning finding the multiset dimension of graphs is NP-complete. We consider the multiset dimension of king grids and prove that it is bounded above by 4. We also give a characterization of the strong product graphs with one factor being a complete graph, and whose multiset dimension is not infinite.
In this paper, we study the stability of commonly used filtration functions in topological data analysis under small pertubations of the underlying nonrandom point cloud. Relying on these stability results, we then develop a test procedure to detect and determine structural breaks in a sequence of topological data objects obtained from weakly dependent data. The proposed method applies for instance to statistics of persistence diagrams of $\mathbb{R}^d$-valued Bernoulli shift systems under the \v{C}ech or Vietoris-Rips filtration.
Combining ideas coming from Stone duality and Reynolds parametricity, we formulate in a clean and principled way a notion of profinite lambda-term which, we show, generalizes at every type the traditional notion of profinite word coming from automata theory. We start by defining the Stone space of profinite lambda-terms as a projective limit of finite sets of usual lambda-terms, considered modulo a notion of equivalence based on the finite standard model. One main contribution of the paper is to establish that, somewhat surprisingly, the resulting notion of profinite lambda-term coming from Stone duality lives in perfect harmony with the principles of Reynolds parametricity. In addition, we show that the notion of profinite lambda-term is compositional by constructing a cartesian closed category of profinite lambda-terms, and we establish that the embedding from lambda-terms modulo beta-eta-conversion to profinite lambda-terms is faithful using Statman's finite completeness theorem. Finally, we prove that the traditional Church encoding of finite words into lambda-terms can be extended to profinite words, and leads to a homeomorphism between the space of profinite words and the space of profinite lambda-terms of the corresponding Church type.
In this paper we show that using implicative algebras one can produce models of set theory generalizing Heyting/Boolean-valued models and realizability models of (I)ZF, both in intuitionistic and classical logic. This has as consequence that any topos which is obtained from a Set-based tripos as the result of the tripos-to-topos construction hosts a model of intuitionistic or classical set theory, provided a large enough strongly inaccessible cardinal exists.