We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime where the gaps between arm means scale as $1/\sqrt{n}$ with the time horizon $n$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDEs and random ODEs. As $n \to \infty$, we show that the dynamics converge weakly to solutions of the corresponding SDEs and random ODEs. (Recently, Wager and Xu (arXiv:2101.09855) independently proposed this regime and developed similar SDE and random ODE approximations for Thompson sampling in the multi-armed bandit setting.) Our weak convergence theory, which covers both multi-armed and linear bandit settings, is developed from first principles using the Continuous Mapping Theorem and can be directly adapted to analyze other sampling-based bandit algorithms, for example, algorithms using the bootstrap for exploration. We also establish an invariance principle for multi-armed bandits with gaps scaling as $1/\sqrt{n}$ -- for Thompson sampling and related algorithms involving posterior approximation or the bootstrap, the weak diffusion limits are in general the same regardless of the specifics of the reward distributions or the choice of prior. In particular, as suggested by the classical Bernstein-von Mises normal approximation for posterior distributions, the weak diffusion limits generally coincide with the limit for normally-distributed rewards and priors.
Throughput is a main performance objective in communication networks. This paper considers a fundamental maximum throughput routing problem -- the all-or-nothing multicommodity flow (ANF) problem -- in arbitrary directed graphs and in the practically relevant but challenging setting where demands can be (much) larger than the edge capacities. Hence, in addition to assigning requests to valid flows for each routed commodity, an admission control mechanism is required which prevents overloading the network when routing commodities. We make several contributions. On the theoretical side we obtain substantially improved bi-criteria approximation algorithms for this NP-hard problem. We present two non-trivial linear programming relaxations and show how to convert their fractional solutions into integer solutions via randomized rounding. One is an exponential-size formulation (solvable in polynomial time using a separation oracle) that considers a ``packing'' view and allows a more flexible approach, while the other is a generalization of the compact LP formulation of Liu et al. (INFOCOM'19) that allows for easy solving via standard LP solvers. We obtain a polynomial-time randomized algorithm that yields an arbitrarily good approximation on the weighted throughput while violating the edge capacity constraints by only a small multiplicative factor. We also describe a deterministic rounding algorithm by derandomization, using the method of pessimistic estimators. We complement our theoretical results with a proof of concept empirical evaluation.
We study the problem of approximating the eigenspectrum of a symmetric matrix $A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $A$ up to additive error $\pm \epsilon n$ using those of a randomly sampled $\tilde{O}(\frac{1}{\epsilon^4}) \times \tilde O(\frac{1}{\epsilon^4})$ principal submatrix. Our result can be viewed as a concentration bound on the full eigenspectrum of a random principal submatrix. It significantly extends existing work which shows concentration of just the spectral norm [Tro08]. It also extends work on sublinear time algorithms for testing the presence of large negative eigenvalues in the spectrum [BCJ20]. To complement our theoretical results, we provide numerical simulations, which demonstrate the effectiveness of our algorithm in approximating the eigenvalues of a wide range of matrices.
A general adaptive refinement strategy for solving linear elliptic partial differential equation with random data is proposed and analysed herein. The adaptive strategy extends the a posteriori error estimation framework introduced by Guignard and Nobile in 2018 (SIAM J. Numer. Anal., 56, 3121--3143) to cover problems with a nonaffine parametric coefficient dependence. A suboptimal, but nonetheless reliable and convenient implementation of the strategy involves approximation of the decoupled PDE problems with a common finite element approximation space. Computational results obtained using such a single-level strategy are presented in this paper (part I). Results obtained using a potentially more efficient multilevel approximation strategy, where meshes are individually tailored, will be discussed in part II of this work. The codes used to generate the numerical results are available online.
We aim at the development and analysis of the numerical schemes for approximately solving the backward diffusion-wave problem, which involves a fractional derivative in time with order $\alpha\in(1,2)$. From terminal observations at two time levels, i.e., $u(T_1)$ and $u(T_2)$, we simultaneously recover two initial data $u(0)$ and $u_t(0)$ and hence the solution $u(t)$ for all $t > 0$. First of all, existence, uniqueness and Lipschitz stability of the backward diffusion-wave problem were established under some conditions about $T_1$ and $T_2$. Moreover, for noisy data, we propose a quasi-boundary value scheme to regularize the "mildly" ill-posed problem, and show the convergence of the regularized solution. Next, to numerically solve the regularized problem, a fully discrete scheme is proposed by applying finite element method in space and convolution quadrature in time. We establish error bounds of the discrete solution in both cases of smooth and nonsmooth data. The error estimate is very useful in practice since it indicates the way to choose discretization parameters and regularization parameter, according to the noise level. The theoretical results are supported by numerical experiments.
We consider the question of sequential prediction under the log-loss in terms of cumulative regret. Namely, given a hypothesis class of distributions, learner sequentially predicts the (distribution of the) next letter in sequence and its performance is compared to the baseline of the best constant predictor from the hypothesis class. The well-specified case corresponds to an additional assumption that the data-generating distribution belongs to the hypothesis class as well. Here we present results in the more general misspecified case. Due to special properties of the log-loss, the same problem arises in the context of competitive-optimality in density estimation, and model selection. For the $d$-dimensional Gaussian location hypothesis class, we show that cumulative regrets in the well-specified and misspecified cases asymptotically coincide. In other words, we provide an $o(1)$ characterization of the distribution-free (or PAC) regret in this case -- the first such result as far as we know. We recall that the worst-case (or individual-sequence) regret in this case is larger by an additive constant ${d\over 2} + o(1)$. Surprisingly, neither the traditional Bayesian estimators, nor the Shtarkov's normalized maximum likelihood achieve the PAC regret and our estimator requires special "robustification" against heavy-tailed data. In addition, we show two general results for misspecified regret: the existence and uniqueness of the optimal estimator, and the bound sandwiching the misspecified regret between well-specified regrets with (asymptotically) close hypotheses classes.
Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.
In this work, we study an inverse problem of recovering a space-time dependent diffusion coefficient in the subdiffusion model from the distributed observation, where the mathematical model involves a Djrbashian-Caputo fractional derivative of order $\alpha\in(0,1)$ in time. The main technical challenges of both theoretical and numerical analysis lie in the limited smoothing properties due to the fractional differential operator and the high degree of nonlinearity of the forward map from the unknown diffusion coefficient to the distributed observation. Theoretically, we establish two conditional stability results using a novel test function, which leads to a stability bound in $L^2(0,T;L^2(\Omega))$ under a suitable positivity condition. The positivity condition is verified for a large class of problem data. Numerically, we develop a rigorous procedure for the recovery of the diffusion coefficient based on a regularized least-squares formulation, which is then discretized by the standard Galerkin method with continuous piecewise linear elements in space and backward Euler convolution quadrature in time. We provide a complete error analysis of the fully discrete formulation, by combining several new error estimates for the direct problem (optimal in terms of data regularity), a discrete version of fractional maximal $L^p$ regularity, and a nonstandard energy argument. Under the positivity condition, we obtain a standard $L^2(0,T; L^2(\Omega))$ error estimate consistent with the conditional stability. Further, we illustrate the analysis with some numerical examples.
This paper considers the initial value problem of general nonlinear stochastic fractional integro-differential equations with weakly singular kernels. Our effort is devoted to establishing some fine estimates to include all the cases of Abel-type singular kernels. Firstly, the existence, uniqueness and continuous dependence on the initial value of the true solution under local Lipschitz condition and linear growth condition are derived in detail. Secondly, the Euler--Maruyama method is developed for solving numerically the equation, and then its strong convergence is proven under the same conditions as the well-posedness. Moreover, we obtain the accurate convergence rate of this method under global Lipschitz condition and linear growth condition. In particular, the Euler--Maruyama method can reach strong first-order superconvergence when $\alpha = 1$. Finally, several numerical tests are reported for verification of the theoretical findings.
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.