Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stopping times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be $\mathcal{P}$-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions $\mathcal{P}$. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in $\mathcal{P}$ and in the sample size $n$. This paper fills that gap by deriving a novel $\mathcal{P}$-uniform strong Gaussian approximation theorem. We apply some of these results to obtain an anytime-valid test of conditional independence without the Model-X assumption, as well as a $\mathcal{P}$-uniform law of the iterated logarithm.
This work deals with developing two fast randomized algorithms for computing the generalized tensor singular value decomposition (GTSVD) based on the tubal product (t-product). The random projection method is utilized to compute the important actions of the underlying data tensors and use them to get small sketches of the original data tensors, which are easier to be handled. Due to the small size of the sketch tensors, deterministic approaches are applied to them to compute their GTSVDs. Then, from the GTSVD of the small sketch tensors, the GTSVD of the original large-scale data tensors is recovered. Some experiments are conducted to show the effectiveness of the proposed approach.
Given two matrices $X,B\in \mathbb{R}^{n\times m}$ and a set $\mathcal{A}\subseteq \mathbb{R}^{n\times n}$, a Procrustes problem consists in finding a matrix $A \in \mathcal{A}$ such that the Frobenius norm of $AX-B$ is minimized. When $\mathcal{A}$ is the set of the matrices whose symmetric part is positive semidefinite, we obtain the so-called non-symmetric positive semidefinite Procrustes (NSPDSP) problem. The NSPDSP problem arises in the estimation of compliance or stiffness matrix in solid and elastic structures. If $X$ has rank $r$, Baghel et al. (Lin. Alg. Appl., 2022) proposed a three-step semi-analytical approach: (1) construct a reduced NSPDSP problem in dimension $r\times r$, (2) solve the reduced problem by means of a fast gradient method with a linear rate of convergence, and (3) post-process the solution of the reduced problem to construct a solution of the larger original NSPDSP problem. In this paper, we revisit this approach of Baghel et al. and identify an unnecessary assumption used by the authors leading to cases where their algorithm cannot attain a minimum and produces solutions with unbounded norm. In fact, revising the post-processing phase of their semi-analytical approach, we show that the infimum of the NSPDSP problem is always attained, and we show how to compute a minimum-norm solution. We also prove that the symmetric part of the computed solution has minimum rank bounded by $r$, and that the skew-symmetric part has rank bounded by $2r$. Several numerical examples show the efficiency of this algorithm, both in terms of computational speed and of finding optimal minimum-norm solutions.
For each $d\leq3$, we construct a finite set $F_d$ of multigraphs such that for each graph $H$ of girth at least $5$ obtained from a multigraph $G$ by subdividing each edge at least two times, $H$ has twin-width at most $d$ if and only if $G$ has no minor in $F_d$. This answers a question of Berg\'{e}, Bonnet, and D\'{e}pr\'{e}s asking for the structure of graphs $G$ such that each long subdivision of $G$ has twin-width $4$. As a corollary, we show that the $7\times7$ grid has twin-width $4$, which answers a question of Schidler and Szeider.
It is well known that the independent random variables $X$ and $Y$ are uncorrelated in the sense $E[XY]=E[X]\cdot E[Y]$ and that the implication may be reversed in very specific cases only. This paper proves that under general assumptions the conditional uncorrelation of random variables, where the conditioning takes place over the suitable class of test sets, is equivalent to the independence. It is also shown that the mutual independence of $X_1,\dots,X_n$ is equivalent to the fact that any conditional correlation matrix equals to the identity matrix.
We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data and give asymptotic bounds on the coverage error of the resulting confidence intervals. For a fixed differential privacy parameter $\epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$. We empirically validate the performance of our methods for mean estimation, median estimation, and logistic regression with both real and synthetic data. Our methods achieve similar coverage accuracy to existing methods (and non-private baselines) while providing notably shorter ($\gtrsim 10$ times) confidence intervals than previous approaches.
We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree $q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). Prior works showed that gradient-based training of neural networks can learn this target with $n\gtrsim d^{\Theta(p)}$ samples, and such statistical complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary polynomial link function with a sample and runtime complexity of $n \asymp T \asymp C(q) \cdot d\mathrm{polylog} d$, where constant $C(q)$ only depends on the degree of $\sigma_*$, regardless of information exponent; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.
We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.
We study the eigenvalue distribution and resolvent of a Kronecker-product random matrix model $A \otimes I_{n \times n}+I_{n \times n} \otimes B+\Theta \otimes \Xi \in \mathbb{C}^{n^2 \times n^2}$, where $A,B$ are independent Wigner matrices and $\Theta,\Xi$ are deterministic and diagonal. For fixed spectral arguments, we establish a quantitative approximation for the Stieltjes transform by that of an approximating free operator, and a diagonal deterministic equivalent approximation for the resolvent. We further obtain sharp estimates in operator norm for the $n \times n$ resolvent blocks, and show that off-diagonal resolvent entries fall on two differing scales of $n^{-1/2}$ and $n^{-1}$ depending on their locations in the Kronecker structure. Our study is motivated by consideration of a matrix-valued least-squares optimization problem $\min_{X \in \mathbb{R}^{n \times n}} \frac{1}{2}\|XA+BX\|_F^2+\frac{1}{2}\sum_{ij} \xi_i\theta_j x_{ij}^2$ subject to a linear constraint. For random instances of this problem defined by Wigner inputs $A,B$, our analyses imply an asymptotic characterization of the minimizer $X$ and its associated minimum objective value as $n \to \infty$.
In this article, we propose a new classification of $\Sigma^0_2$ formulas under the realizability interpretation of many-one reducibility (i.e., Levin reducibility). For example, ${\sf Fin}$, the decision of being eventually zero for sequences, is many-one/Levin complete among $\Sigma^0_2$ formulas of the form $\exists n\forall m\geq n.\varphi(m,x)$, where $\varphi$ is decidable. The decision of boundedness for sequences ${\sf BddSeq}$ and for width of posets ${\sf FinWidth}$ are many-one/Levin complete among $\Sigma^0_2$ formulas of the form $\exists n\forall m\geq n\forall k.\varphi(m,k,x)$, where $\varphi$ is decidable. However, unlike the classical many-one reducibility, none of the above is $\Sigma^0_2$-complete. The decision of non-density of linear order ${\sf NonDense}$ is truly $\Sigma^0_2$-complete.
A connected undirected graph is called \emph{geodetic} if for every pair of vertices there is a unique shortest path connecting them. It has been conjectured that for finite groups, the only geodetic Cayley graphs which occur are odd cycles and complete graphs. In this article we present a series of theoretical results which contribute to a computer search verifying this conjecture for all groups of size up to 1024. The conjecture is also verified theoretically for several infinite families of groups including dihedral and some families of nilpotent groups. Two key results which enable the computer search to reach as far as it does are: if the center of a group has even order, then the conjecture holds (this eliminates all 2-groups from our computer search); if a Cayley graph is geodetic then there are bounds relating the size of the group, generating set and center (which cuts down the number of generating sets which must be searched significantly).