The Bayesian inference is widely used in many scientific and engineering problems, especially in the linear inverse problems in infinite-dimensional setting where the unknowns are functions. In such problems, choosing an appropriate prior distribution is an important task. In particular, when the function to infer has much detail information, such as many sharp jumps, corners, and the discontinuous and nonsmooth oscillation, the so-called total variation-Gaussian (TG) prior is proposed in function space to address it. However, the TG prior is easy to lead the blocky (staircase) effect in numerical results. In this work, we present a fractional order-TG (FTG) hybrid prior to deal with such problems, where the fractional order total variation (FTV) term is used to capture the detail information of the unknowns and simultaneously uses the Gaussian measure to ensure that it results in a well-defined posterior measure. For the numerical implementations of linear inverse problems in function spaces, we also propose an efficient independence sampler based on a transport map, which uses a proposal distribution derived from a diagonal map, and the acceptance probability associated to the proposal is independent of discretization dimensionality. And in order to take full advantage of the transport map, the hierarchical Bayesian framework is applied to flexibly determine the regularization parameter. Finally we provide some numerical examples to demonstrate the performance of the FTG prior and the efficiency and robustness of the proposed independence sampler method.
We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning can be measured proportional to the regret of online behavior. Primarily, we study two settings of the problem: instance-independent and instance-dependent. While the upper bound is the same for both settings, the worst-case lower bound is more comprehensive in the former case and more accurate in the latter one. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on the optimal batch size choice.
We consider constrained partial differential equations of hyperbolic type with a small parameter $\varepsilon>0$, which turn parabolic in the limit case, i.e., for $\varepsilon=0$. The well-posedness of the resulting systems is discussed and the corresponding solutions are compared in terms of the parameter $\varepsilon$. For the analysis, we consider the system equations as partial differential-algebraic equation based on the variational formulation of the problem. For a particular choice of the initial data, we reach first- and second-order estimates. For general initial data, lower-order estimates are proven and their optimality is shown numerically.
In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets and construct an \emph{approximate posterior} according to the clustering information. Fortunately, the clustering structure can be efficiently obtained via a particular clustering algorithm. When constructing the approximate posterior, the data points in the same cluster are all replaced by the centroid of the cluster. As a result, the posterior can be significantly simplified. Theoretically, we show that under certain conditions the approximate posterior we construct is close (measured by KL divergence) to the exact posterior. Furthermore, thorough experiments are conducted to validate the fact that the constructed posterior is a good approximation to the true posterior and much easier to sample from.
Within the framework of parameter dependent PDEs, we develop a constructive approach based on Deep Neural Networks for the efficient approximation of the parameter-to-solution map. The research is motivated by the limitations and drawbacks of state-of-the-art algorithms, such as the Reduced Basis method, when addressing problems that show a slow decay in the Kolmogorov n-width. Our work is based on the use of deep autoencoders, which we employ for encoding and decoding a high fidelity approximation of the solution manifold. To provide guidelines for the design of deep autoencoders, we consider a nonlinear version of the Kolmogorov n-width over which we base the concept of a minimal latent dimension. We show that the latter is intimately related to the topological properties of the solution manifold, and we provide theoretical results with particular emphasis on second order elliptic PDEs, characterizing the minimal dimension and the approximation errors of the proposed approach. The theory presented is further supported by numerical experiments, where we compare the proposed approach with classical POD-Galerkin reduced order models. In particular, we consider parametrized advection-diffusion PDEs, and we test the methodology in the presence of strong transport fields, singular terms and stochastic coefficients.
It is often the case in Statistics that one needs to compute sums of infinite series, especially in marginalising over discrete latent variables. This has become more relevant with the popularization of gradient-based techniques (e.g. Hamiltonian Monte Carlo) in the Bayesian inference context, for which discrete latent variables are hard or impossible to deal with. For many commonly used infinite series, custom algorithms have been developed which exploit specific features of each problem. General techniques, suitable for a large class of problems with limited input from the user are less established. We employ basic results from the theory of infinite series to investigate general, problem-agnostic algorithms to truncate infinite sums within an arbitrary tolerance $\varepsilon > 0$ and provide robust computational implementations with provable guarantees. We compare three tentative solutions to estimating the infinite sum of interest: (i) a "naive" approach that sums terms until the terms are below the threshold $\varepsilon$; (ii) a `bounding pair' strategy based on trapping the true value between two partial sums; and (iii) a `batch' strategy that computes the partial sums in regular intervals and stops when their difference is less than $\varepsilon$. We show under which conditions each strategy guarantees the truncated sum is within the required tolerance and compare the error achieved by each approach, as well as the number of function evaluations necessary for each one. A detailed discussion of numerical issues in practical implementations is also provided. The paper provides some theoretical discussion of a variety of statistical applications, including raw and factorial moments and count models with observation error. Finally, detailed illustrations in the form noisy MCMC for Bayesian inference and maximum marginal likelihood estimation are presented.
Assume that we observe i.i.d.~points lying close to some unknown $d$-dimensional $\mathcal{C}^k$ submanifold $M$ in a possibly high-dimensional space. We study the problem of reconstructing the probability distribution generating the sample. After remarking that this problem is degenerate for a large class of standard losses ($L_p$, Hellinger, total variation, etc.), we focus on the Wasserstein loss, for which we build an estimator, based on kernel density estimation, whose rate of convergence depends on $d$ and the regularity $s\leq k-1$ of the underlying density, but not on the ambient dimension. In particular, we show that the estimator is minimax and matches previous rates in the literature in the case where the manifold $M$ is a $d$-dimensional cube. The related problem of the estimation of the volume measure of $M$ for the Wasserstein loss is also considered, for which a minimax estimator is exhibited.
We study full Bayesian procedures for high-dimensional linear regression. We adopt data-dependent empirical priors introduced in [1]. In their paper, these priors have nice posterior contraction properties and are easy to compute. Our paper extend their theoretical results to the case of unknown error variance . Under proper sparsity assumption, we achieve model selection consistency, posterior contraction rates as well as Bernstein von-Mises theorem by analyzing multivariate t-distribution.
We prove that the stack-number of the strong product of three $n$-vertex paths is $\Theta(n^{1/3})$. The best previously known upper bound was $O(n)$. No non-trivial lower bound was known. This is the first explicit example of a graph family with bounded maximum degree and unbounded stack-number. The main tool used in our proof of the lower bound is the topological overlap theorem of Gromov. We actually prove a stronger result in terms of so-called triangulations of Cartesian products. We conclude that triangulations of three-dimensional Cartesian products of any sufficiently large connected graphs have large stack-number. The upper bound is a special case of a more general construction based on families of permutations derived from Hadamard matrices. The strong product of three paths is also the first example of a bounded degree graph with bounded queue-number and unbounded stack-number. A natural question that follows from our result is to determine the smallest $\Delta_0$ such that there exist a graph family with unbounded stack-number, bounded queue-number and maximum degree $\Delta_0$. We show that $\Delta_0\in \{6,7\}$.
In the absence of governing equations, dimensional analysis is a robust technique for extracting insights and finding symmetries in physical systems. Given measurement variables and parameters, the Buckingham Pi theorem provides a procedure for finding a set of dimensionless groups that spans the solution space, although this set is not unique. We propose an automated approach using the symmetric and self-similar structure of available measurement data to discover the dimensionless groups that best collapse this data to a lower dimensional space according to an optimal fit. We develop three data-driven techniques that use the Buckingham Pi theorem as a constraint: (i) a constrained optimization problem with a non-parametric input-output fitting function, (ii) a deep learning algorithm (BuckiNet) that projects the input parameter space to a lower dimension in the first layer, and (iii) a technique based on sparse identification of nonlinear dynamics (SINDy) to discover dimensionless equations whose coefficients parameterize the dynamics. We explore the accuracy, robustness and computational complexity of these methods as applied to three example problems: a bead on a rotating hoop, a laminar boundary layer, and Rayleigh-B\'enard convection.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.