Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this paper, we review recent progress on optimal sampling for (weighted) least-squares approximation in arbitrary linear spaces. We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples, then show how it can be used to construct sampling strategies that possess near-optimal sample complexity: namely, the number of samples scales log-linearly in $n$, the dimension of the approximation space. We discuss a series of variations, extensions and further topics, and throughout highlight connections to approximation theory, machine learning, information-based complexity and numerical linear algebra. Finally, motivated by various contemporary applications, we consider a generalization of the classical setting where the samples need not be pointwise samples of a scalar-valued function, and the approximation space need not be linear. We show that even in this significantly more general setting suitable generalizations of the Christoffel function still determine the sample complexity. This provides a unified procedure for designing improved sampling strategies for general recovery problems. This article is largely self-contained, and intended to be accessible to nonspecialists.
Although quantile regression has emerged as a powerful tool for understanding various quantiles of a response variable conditioned on a set of covariates, the development of quantile regression for count responses has received far less attention. This paper proposes a new Bayesian approach to quantile regression for count data, which provides a more flexible and interpretable alternative to the existing approaches. The proposed approach associates the continuous latent variable with the discrete response and nonparametrically estimates the joint distribution of the latent variable and a set of covariates. Then, by regressing the estimated continuous conditional quantile on the covariates, the posterior distributions of the covariate effects on the conditional quantiles are obtained through general Bayesian updating via simple optimization. The simulation study and real data analysis demonstrate that the proposed method overcomes the existing limitations and enhances quantile estimation and interpretation of variable relationships, making it a valuable tool for practitioners handling count data.
Sampling from a target distribution is a fundamental problem. Traditional Markov chain Monte Carlo (MCMC) algorithms, such as the unadjusted Langevin algorithm (ULA), derived from the overdamped Langevin dynamics, have been extensively studied. From an optimization perspective, the Kolmogorov forward equation of the overdamped Langevin dynamics can be treated as the gradient flow of the relative entropy in the space of probability densities embedded with Wassrstein-2 metrics. Several efforts have also been devoted to including momentum-based methods, such as underdamped Langevin dynamics for faster convergence of sampling algorithms. Recent advances in optimizations have demonstrated the effectiveness of primal-dual damping and Hessian-driven damping dynamics for achieving faster convergence in solving optimization problems. Motivated by these developments, we introduce a class of stochastic differential equations (SDEs) called gradient-adjusted underdamped Langevin dynamics (GAUL), which add stochastic perturbations in primal-dual damping dynamics and Hessian-driven damping dynamics from optimization. We prove that GAUL admits the correct stationary distribution, whose marginal is the target distribution. The proposed method outperforms overdamped and underdamped Langevin dynamics regarding convergence speed in the total variation distance for Gaussian target distributions. Moreover, using the Euler-Maruyama discretization, we show that the mixing time towards a biased target distribution only depends on the square root of the condition number of the target covariance matrix. Numerical experiments for non-Gaussian target distributions, such as Bayesian regression problems and Bayesian neural networks, further illustrate the advantages of our approach.
Ridge functions are used to describe and study the lower bound of the approximation done by the neural networks which can be written as a linear combination of activation functions. If the activation functions are also ridge functions, these networks are called explainable neural networks. In this brief paper, we first show that quantum neural networks which are based on variational quantum circuits can be written as a linear combination of ridge functions by following matrix notations. Consequently, we show that the interpretability and explainability of such quantum neural networks can be directly considered and studied as an approximation with the linear combination of ridge functions.
In Bayesian optimization, a black-box function is maximized via the use of a surrogate model. We apply distributed Thompson sampling, using a Gaussian process as a surrogate model, to approach the multi-agent Bayesian optimization problem. In our distributed Thompson sampling implementation, each agent receives sampled points from neighbors, where the communication network is encoded in a graph; each agent utilizes a Gaussian process to model the objective function. We demonstrate a theoretical bound on Bayesian Simple Regret, where the bound depends on the size of the largest complete subgraph of the communication graph. Unlike in batch Bayesian optimization, this bound is applicable in cases where the communication graph amongst agents is constrained. When compared to sequential Thompson sampling, our bound guarantees faster convergence with respect to time as long as there is a fully connected subgraph of at least two agents. We confirm the efficacy of our algorithm with numerical simulations on traditional optimization test functions, illustrating the significance of graph connectivity on improving regret convergence.
The discrete cosine transform (DCT) is a central tool for image and video coding because it can be related to the Karhunen-Lo\`eve transform (KLT), which is the optimal transform in terms of retained transform coefficients and data decorrelation. In this paper, we introduce 16-, 32-, and 64-point low-complexity DCT approximations by minimizing individually the angle between the rows of the exact DCT matrix and the matrix induced by the approximate transforms. According to some classical figures of merit, the proposed transforms outperformed the approximations for the DCT already known in the literature. Fast algorithms were also developed for the low-complexity transforms, asserting a good balance between the performance and its computational cost. Practical applications in image encoding showed the relevance of the transforms in this context. In fact, the experiments showed that the proposed transforms had better results than the known approximations in the literature for the cases of 16, 32, and 64 blocklength.
Principal component analysis (PCA) is one of the most popular dimension reduction techniques in statistics and is especially powerful when a multivariate distribution is concentrated near a lower-dimensional subspace. Multivariate extreme value distributions have turned out to provide challenges for the application of PCA since their constraint support impedes the detection of lower-dimensional structures and heavy-tails can imply that second moments do not exist, thereby preventing the application of classical variance-based techniques for PCA. We adapt PCA to max-stable distributions using a regression setting and employ max-linear maps to project the random vector to a lower-dimensional space while preserving max-stability. We also provide a characterization of those distributions which allow for a perfect reconstruction from the lower-dimensional representation. Finally, we demonstrate how an optimal projection matrix can be consistently estimated and show viability in practice with a simulation study and application to a benchmark dataset.
Motivated by information geometry, a distance function on the space of stochastic matrices is advocated. Starting with sequences of Markov chains the Bhattacharyya angle is advocated as the natural tool for comparing both short and long term Markov chain runs. Bounds on the convergence of the distance and mixing times are derived. Guided by the desire to compare different Markov chain models, especially in the setting of healthcare processes, a new distance function on the space of stochastic matrices is presented. It is a true distance measure which has a closed form and is efficient to implement for numerical evaluation. In the case of ergodic Markov chains, it is shown that considering either the Bhattacharyya angle on Markov sequences or the new stochastic matrix distance leads to the same distance between models.
Matrix perturbation bounds (such as Weyl and Davis-Kahan) are frequently used in many branches of mathematics. Most of the classical results in this area are optimal, in the worst case analysis. However, in modern applications, both the ground and the nose matrices frequently have extra structural properties. For instance, it is often assumed that the ground matrix is essentially low rank, and the nose matrix is random or pseudo-random. We aim to rebuild a part of perturbation theory, adapting to these modern assumptions. We will do this using a contour expansion argument, which enables us to exploit the skewness among the leading eigenvectors of the ground and the noise matrix (which is significant when the two are uncorrelated) to our advantage. In the current paper, we focus on the perturbation of eigenspaces. This helps us to introduce the arguments in the cleanest way, avoiding the more technical consideration of the general case. In applications, this case is also one of the most useful. More general results appear in a subsequent paper. Our method has led to several improvements, which have direct applications in central problems. Among others, we derive a sharp result for perturbation of a low rank matrix with random perturbation, answering an open question in this area. Next, we derive new results concerning the spike model, an important model in statistics, bridging two different directions of current research. Finally, we use our results on the perturbation of eigenspaces to derive new results concerning eigenvalues of deterministic and random matrices. In particular, we obtain new results concerning the outliers in the deformed Wigner model and the least singular value of random matrices with non-zero mean.
The incompressible Euler equations are an important model system in computational fluid dynamics. Fast high-order methods for the solution of this time-dependent system of partial differential equations are of particular interest: due to their exponential convergence in the polynomial degree they can make efficient use of computational resources. To address this challenge we describe a novel timestepping method which combines a hybridised Discontinuous Galerkin method for the spatial discretisation with IMEX timestepping schemes, thus achieving high-order accuracy in both space and time. The computational bottleneck is the solution of a (block-) sparse linear system to compute updates to pressure and velocity at each stage of the IMEX integrator. Following Chorin's projection approach, this update of the velocity and pressure fields is split into two stages. As a result, the hybridised equation for the implicit pressure-velocity problem is reduced to the well-known system which arises in hybridised mixed formulations of the Poisson- or diffusion problem and for which efficient multigrid preconditioners have been developed. Splitting errors can be reduced systematically by embedding this update into a preconditioned Richardson iteration. The accuracy and efficiency of the new method is demonstrated numerically for two time-dependent testcases that have been previously studied in the literature.
This paper proposes a new generalized linear model with fractional binomial distribution. Zero-inflated Poisson/negative binomial distributions are used for count data that has many zeros. To analyze the association of such a count variable with covariates, zero-inflated Poisson/negative binomial regression models are widely used. In this work, we develop a regression model with the fractional binomial distribution that can serve as an additional tool for modeling the count response variable with covariates. Data analysis results show that on some occasions, our model outperforms the existing zero-inflated regression models.