In this paper, we study a nonlinear cointegration-type model of the form \(Z_t = f_0(X_t) + W_t\) where \(f_0\) is a monotone function and \(X_t\) is a Harris recurrent Markov chain. We use a nonparametric Least Square Estimator to locally estimate \(f_0\), and under mild conditions, we show its strong consistency and obtain its rate of convergence. New results (of the Glivenko-Cantelli type) for localized null recurrent Markov chains are also proved.
Fourier-Motzkin elimination, a standard method for solving systems of linear inequalities, leads to an elementary, short, and self-contained proof of von Neumann's minimax theorem.
Generative diffusion models apply the concept of Langevin dynamics in physics to machine leaning, attracting a lot of interests from engineering, statistics and physics, but a complete picture about inherent mechanisms is still lacking. In this paper, we provide a transparent physics analysis of diffusion models, formulating the fluctuation theorem, entropy production, equilibrium measure, and Franz-Parisi potential to understand the dynamic process and intrinsic phase transitions. Our analysis is rooted in a path integral representation of both forward and backward dynamics, and in treating the reverse diffusion generative process as a statistical inference, where the time-dependent state variables serve as quenched disorder akin to that in spin glass theory. Our study thus links stochastic thermodynamics, statistical inference and geometry based analysis together to yield a coherent picture about how the generative diffusion models work.
In this study, we explore the effects of including noise predictors and noise observations when fitting linear regression models. We present empirical and theoretical results that show that double descent occurs in both cases, albeit with contradictory implications: the implication for noise predictors is that complex models are often better than simple ones, while the implication for noise observations is that simple models are often better than complex ones. We resolve this contradiction by showing that it is not the model complexity but rather the implicit shrinkage by the inclusion of noise in the model that drives the double descent. Specifically, we show how noise predictors or observations shrink the estimators of the regression coefficients and make the test error asymptote, and then how the asymptotes of the test error and the ``condition number anomaly'' ensure that double descent occurs. We also show that including noise observations in the model makes the (usually unbiased) ordinary least squares estimator biased and indicates that the ridge regression estimator may need a negative ridge parameter to avoid over-shrinkage.
Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further splitting the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after efficiency-oriented model selection. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without invoking additional sample-splitting. We show that our methods yield prediction sets with asymptotically optimal size under certain notion of continuity for the model class. The improved efficiency of the prediction sets constructed by our methods are further demonstrated through applications to synthetic datasets in various settings and a real data example.
In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best of our knowledge, existing literature lacks methods for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. In our approach, we introduce a representative matrix form for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. The determination of these representative MDS matrices involves searching through all $(n-1)\times (n-1)$ MDS matrices over $\mathbb{F}_{p^m}$. Our contributions extend to proving that the count of all $3\times 3$ MDS matrices over $\mathbb{F}_{2^m}$ is precisely $(2^m-1)^5(2^m-2)(2^m-3)(2^{2m}-9\cdot 2^m+21)$. Furthermore, we explicitly provide the count of all $4\times 4$ MDS and involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=2, 3, 4$.
In this paper, we propose nonlocal diffusion models with Dirichlet boundary. These nonlocal diffusion models preserve the maximum principle and also have corresponding variational form. With these good properties, we can prove the well-posedness and the vanishing nonlocality convergence. Furthermore, by specifically designed weight function, we can get a nonlocal diffusion model with second order convergence which is optimal for nonlocal diffusion models.
A low-rank approximation of a parameter-dependent matrix $A(t)$ is an important task in the computational sciences appearing for example in dynamical systems and compression of a series of images. In this work, we introduce AdaCUR, an efficient algorithm for computing a low-rank approximation of parameter-dependent matrices via CUR decomposition. The key idea for this algorithm is that for nearby parameter values, the column and row indices for the CUR decomposition can often be reused. AdaCUR is rank-adaptive, certifiable and has complexity that compares favourably against existing methods. A faster algorithm which we call FastAdaCUR that prioritizes speed over accuracy is also given, which is rank-adaptive and has complexity which is at most linear in the number of rows or columns, but without certification.
We study upper bounds on the size of optimum locating-total dominating sets in graphs. A set $S$ of vertices of a graph $G$ is a locating-total dominating set if every vertex of $G$ has a neighbor in $S$, and if any two vertices outside $S$ have distinct neighborhoods within $S$. The smallest size of such a set is denoted by $\gamma^L_t(G)$. It has been conjectured that $\gamma^L_t(G)\leq\frac{2n}{3}$ holds for every twin-free graph $G$ of order $n$ without isolated vertices. We prove that the conjecture holds for cobipartite graphs, split graphs, block graphs and subcubic graphs.
Given a finite set of matrices with integer entries, the matrix mortality problem asks if there exists a product of these matrices equal to the zero matrix. We consider a special case of this problem where all entries of the matrices are nonnegative. This case is equivalent to the NFA mortality problem, which, given an NFA, asks for a word $w$ such that the image of every state under $w$ is the empty set. The size of the alphabet of the NFA is then equal to the number of matrices in the set. We study the length of shortest such words depending on the size of the alphabet. We show that for an NFA with $n$ states this length can be at least $2^n - 1$ for an alphabet of size $n$, $2^{(n - 4)/2}$ for an alphabet of size $3$ and $2^{(n - 2)/3}$ for an alphabet of size $2$. We also discuss further open problems related to mortality of NFAs and DFAs.
In this paper we propose an explicit fully discrete scheme to numerically solve the stochastic Allen-Cahn equation. The spatial discretization is done by a spectral Galerkin method, followed by the temporal discretization by a tamed accelerated exponential Euler scheme. Based on the time-independent boundedness of moments of numerical solutions, we present the weak error analysis in an infinite time interval by using Malliavin calculus. This provides a way to numerically approximate the invariant measure for the stochastic Allen-Cahn equation.