In this paper we give the detailed error analysis of two algorithms (denoted as $W_1$ and $W_2$) for computing the symplectic factorization of a symmetric positive definite and symplectic matrix $A \in \mathbb R^{2n \times 2n}$ in the form $A=LL^T$, where $L \in \mathbb R^{2n \times 2n}$ is a symplectic block lower triangular matrix. Algorithm $W_1$ is an implementation of the $HH^T$ factorization from [Dopico et al., 2009]. Algorithm $W_2$, proposed in [Bujok et al., 2023], uses both Cholesky and Reverse Cholesky decompositions of symmetric positive definite matrix blocks that appear during the factorization. We prove that Algorithm $W_2$ is numerically stable for a broader class of symmetric positive definite matrices $A \in \mathbb R^{2n \times 2n}$, producing the computed factors $\tilde L$ in floating-point arithmetic with machine precision $u$, such that $||A-\tilde L {\tilde L}^T||_2= {\cal O}(u ||A||_2)$. However, Algorithm $W_1$ is unstable in general for symmetric positive definite and symplectic matrix $A$. This was confirmed by numerical experiments in [Bujok et al., 2023]. In this paper we give corresponding bounds also for Algorithm $W_1$ that are weaker, since we show that the factorization error depends on the size of the inverse of the principal submatrix $A_{11}$. The tests performed in MATLAB illustrate that our error bounds for considered algorithms are reasonably sharp.
Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.
The recently proposed fixed-X knockoff is a powerful variable selection procedure that controls the false discovery rate (FDR) in any finite-sample setting, yet its theoretical insights are difficult to show beyond Gaussian linear models. In this paper, we make the first attempt to extend the fixed-X knockoff to partially linear models by using generalized knockoff features, and propose a new stability generalized knockoff (Stab-GKnock) procedure by incorporating selection probability as feature importance score. We provide FDR control and power guarantee under some regularity conditions. In addition, we propose a two-stage method under high dimensionality by introducing a new joint feature screening procedure, with guaranteed sure screening property. Extensive simulation studies are conducted to evaluate the finite-sample performance of the proposed method. A real data example is also provided for illustration.
We study the computational problem of rigorously describing the asymptotic behaviour of topological dynamical systems up to a finite but arbitrarily small pre-specified error. More precisely, we consider the limit set of a typical orbit, both as a spatial object (attractor set) and as a statistical distribution (physical measure), and prove upper bounds on the computational resources of computing descriptions of these objects with arbitrary accuracy. We also study how these bounds are affected by different dynamical constrains and provide several examples showing that our bounds are sharp in general. In particular, we exhibit a computable interval map having a unique transitive attractor with Cantor set structure supporting a unique physical measure such that both the attractor and the measure are non computable.
We study the long time behavior of an underdamped mean-field Langevin (MFL) equation, and provide a general convergence as well as an exponential convergence rate result under different conditions. The results on the MFL equation can be applied to study the convergence of the Hamiltonian gradient descent algorithm for the overparametrized optimization. We then provide a numerical example of the algorithm to train a generative adversarial networks (GAN).
Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.
We introduce numerical solvers for the steady-state Boltzmann equation based on the symmetric Gauss-Seidel (SGS) method. Due to the quadratic collision operator in the Boltzmann equation, the SGS method requires solving a nonlinear system on each grid cell, and we consider two methods, namely Newton's method and the fixed-point iteration, in our numerical tests. For small Knudsen numbers, our method has an efficiency between the classical source iteration and the modern generalized synthetic iterative scheme, and the complexity of its implementation is closer to the source iteration. A variety of numerical tests are carried out to demonstrate its performance, and it is concluded that the proposed method is suitable for applications with moderate to large Knudsen numbers.
The main reason for query model's prominence in complexity theory and quantum computing is the presence of concrete lower bounding techniques: polynomial and adversary method. There have been considerable efforts to give lower bounds using these methods, and to compare/relate them with other measures based on the decision tree. We explore the value of these lower bounds on quantum query complexity and their relation with other decision tree based complexity measures for the class of symmetric functions, arguably one of the most natural and basic sets of Boolean functions. We show an explicit construction for the dual of the positive adversary method and also of the square root of private coin certificate game complexity for any total symmetric function. This shows that the two values can't be distinguished for any symmetric function. Additionally, we show that the recently introduced measure of spectral sensitivity gives the same value as both positive adversary and approximate degree for every total symmetric Boolean function. Further, we look at the quantum query complexity of Gap Majority, a partial symmetric function. It has gained importance recently in regard to understanding the composition of randomized query complexity. We characterize the quantum query complexity of Gap Majority and show a lower bound on noisy randomized query complexity (Ben-David and Blais, FOCS 2020) in terms of quantum query complexity. Finally, we study how large certificate complexity and block sensitivity can be as compared to sensitivity for symmetric functions (even up to constant factors). We show tight separations, i.e., give upper bounds on possible separations and construct functions achieving the same.
We construct and analyze finite element approximations of the Einstein tensor in dimension $N \ge 3$. We focus on the setting where a smooth Riemannian metric tensor $g$ on a polyhedral domain $\Omega \subset \mathbb{R}^N$ has been approximated by a piecewise polynomial metric $g_h$ on a simplicial triangulation $\mathcal{T}$ of $\Omega$ having maximum element diameter $h$. We assume that $g_h$ possesses single-valued tangential-tangential components on every codimension-1 simplex in $\mathcal{T}$. Such a metric is not classically differentiable in general, but it turns out that one can still attribute meaning to its Einstein curvature in a distributional sense. We study the convergence of the distributional Einstein curvature of $g_h$ to the Einstein curvature of $g$ under refinement of the triangulation. We show that in the $H^{-2}(\Omega)$-norm, this convergence takes place at a rate of $O(h^{r+1})$ when $g_h$ is an optimal-order interpolant of $g$ that is piecewise polynomial of degree $r \ge 1$. We provide numerical evidence to support this claim.
This paper develops a general asymptotic theory of local polynomial (LP) regression for spatial data observed at irregularly spaced locations in a sampling region $R_n \subset \mathbb{R}^d$. We adopt a stochastic sampling design that can generate irregularly spaced sampling sites in a flexible manner including both pure increasing and mixed increasing domain frameworks. We first introduce a nonparametric regression model for spatial data defined on $\mathbb{R}^d$ and then establish the asymptotic normality of LP estimators with general order $p \geq 1$. We also propose methods for constructing confidence intervals and establishing uniform convergence rates of LP estimators. Our dependence structure conditions on the underlying processes cover a wide class of random fields such as L\'evy-driven continuous autoregressive moving average random fields. As an application of our main results, we discuss a two-sample testing problem for mean functions and their partial derivatives.
The classical Zarankiewicz's problem asks for the maximum number of edges in a bipartite graph on $n$ vertices which does not contain the complete bipartite graph $K_{t,t}$. In one of the cornerstones of extremal graph theory, K\H{o}v\'ari S\'os and Tur\'an proved an upper bound of $O(n^{2-\frac{1}{t}})$. In a celebrated result, Fox et al. obtained an improved bound of $O(n^{2-\frac{1}{d}})$ for graphs of VC-dimension $d$ (where $d<t$). Basit, Chernikov, Starchenko, Tao and Tran improved the bound for the case of semilinear graphs. At SODA'23, Chan and Har-Peled further improved Basit et al.'s bounds and presented (quasi-)linear upper bounds for several classes of geometrically-defined incidence graphs, including a bound of $O(n \log \log n)$ for the incidence graph of points and pseudo-discs in the plane. In this paper we present a new approach to Zarankiewicz's problem, via $\epsilon$-t-nets - a recently introduced generalization of the classical notion of $\epsilon$-nets. We show that the existence of `small'-sized $\epsilon$-t-nets implies upper bounds for Zarankiewicz's problem. Using the new approach, we obtain a sharp bound of $O(n)$ for the intersection graph of two families of pseudo-discs, thus both improving and generalizing the result of Chan and Har-Peled from incidence graphs to intersection graphs. We also obtain a short proof of the $O(n^{2-\frac{1}{d}})$ bound of Fox et al., and show improved bounds for several other classes of geometric intersection graphs, including a sharp $O(n\frac{\log n}{\log \log n})$ bound for the intersection graph of two families of axis-parallel rectangles.