In the present paper, we examine a Crouzeix-Raviart approximation for non-linear partial differential equations having a $(p(\cdot),\delta)$-structure. We establish a medius error estimate, i.e., a best-approximation result, which holds for uniformly continuous exponents and implies a priori error estimates, which apply for H\"older continuous exponents and are optimal for Lipschitz continuous exponents. The theoretical findings are supported by numerical experiments.
Gaussian process (GP) regression is a Bayesian nonparametric method for regression and interpolation, offering a principled way of quantifying the uncertainties of predicted function values. For the quantified uncertainties to be well-calibrated, however, the covariance kernel of the GP prior has to be carefully selected. In this paper, we theoretically compare two methods for choosing the kernel in GP regression: cross-validation and maximum likelihood estimation. Focusing on the scale-parameter estimation of a Brownian motion kernel in the noiseless setting, we prove that cross-validation can yield asymptotically well-calibrated credible intervals for a broader class of ground-truth functions than maximum likelihood estimation, suggesting an advantage of the former over the latter.
We present a combination technique based on mixed differences of both spatial approximations and quadrature formulae for the stochastic variables to solve efficiently a class of Optimal Control Problems (OCPs) constrained by random partial differential equations. The method requires to solve the OCP for several low-fidelity spatial grids and quadrature formulae for the objective functional. All the computed solutions are then linearly combined to get a final approximation which, under suitable regularity assumptions, preserves the same accuracy of fine tensor product approximations, while drastically reducing the computational cost. The combination technique involves only tensor product quadrature formulae, thus the discretized OCPs preserve the convexity of the continuous OCP. Hence, the combination technique avoids the inconveniences of Multilevel Monte Carlo and/or sparse grids approaches, but remains suitable for high dimensional problems. The manuscript presents an a-priori procedure to choose the most important mixed differences and an asymptotic complexity analysis, which states that the asymptotic complexity is exclusively determined by the spatial solver. Numerical experiments validate the results.
Mixtures of factor analysers (MFA) models represent a popular tool for finding structure in data, particularly high-dimensional data. While in most applications the number of clusters, and especially the number of latent factors within clusters, is mostly fixed in advance, in the recent literature models with automatic inference on both the number of clusters and latent factors have been introduced. The automatic inference is usually done by assigning a nonparametric prior and allowing the number of clusters and factors to potentially go to infinity. The MCMC estimation is performed via an adaptive algorithm, in which the parameters associated with the redundant factors are discarded as the chain moves. While this approach has clear advantages, it also bears some significant drawbacks. Running a separate factor-analytical model for each cluster involves matrices of changing dimensions, which can make the model and programming somewhat cumbersome. In addition, discarding the parameters associated with the redundant factors could lead to a bias in estimating cluster covariance matrices. At last, identification remains problematic for infinite factor models. The current work contributes to the MFA literature by providing for the automatic inference on the number of clusters and the number of cluster-specific factors while keeping both cluster and factor dimensions finite. This allows us to avoid many of the aforementioned drawbacks of the infinite models. For the automatic inference on the cluster structure, we employ the dynamic mixture of finite mixtures (MFM) model. Automatic inference on cluster-specific factors is performed by assigning an exchangeable shrinkage process (ESP) prior to the columns of the factor loading matrices. The performance of the model is demonstrated on several benchmark data sets as well as real data applications.
In this paper, we investigate the impact of numerical instability on the reliability of sampling, density evaluation, and evidence lower bound (ELBO) estimation in variational flows. We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map -- which affects sampling -- and the numerical inverse flow map does not accurately recover the initial input -- which affects density and ELBO computations. Surprisingly though, we find that results produced by flows are often accurate enough for applications despite the presence of serious numerical instability. In this work, we treat variational flows as dynamical systems, and leverage shadowing theory to elucidate this behavior via theoretical guarantees on the error of sampling, density evaluation, and ELBO estimation. Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice.
The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of out-of-distribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking out-of-distribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our $Q$ updating rules. Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks.
Despite the progress in medical data collection the actual burden of SARS-CoV-2 remains unknown due to under-ascertainment of cases. This was apparent in the acute phase of the pandemic and the use of reported deaths has been pointed out as a more reliable source of information, likely less prone to under-reporting. Since daily deaths occur from past infections weighted by their probability of death, one may infer the total number of infections accounting for their age distribution, using the data on reported deaths. We adopt this framework and assume that the dynamics generating the total number of infections can be described by a continuous time transmission model expressed through a system of non-linear ordinary differential equations where the transmission rate is modelled as a diffusion process allowing to reveal both the effect of control strategies and the changes in individuals behavior. We develop this flexible Bayesian tool in Stan and study 3 pairs of European countries, estimating the time-varying reproduction number($R_t$) as well as the true cumulative number of infected individuals. As we estimate the true number of infections we offer a more accurate estimate of $R_t$. We also provide an estimate of the daily reporting ratio and discuss the effects of changes in mobility and testing on the inferred quantities.
Consider an empirical measure $\mathbb{P}_n$ induced by $n$ iid samples from a $d$-dimensional $K$-subgaussian distribution $\mathbb{P}$ and let $\gamma = \mathcal{N}(0,\sigma^2 I_d)$ be the isotropic Gaussian measure. We study the speed of convergence of the smoothed Wasserstein distance $W_2(\mathbb{P}_n * \gamma, \mathbb{P}*\gamma) = n^{-\alpha + o(1)}$ with $*$ being the convolution of measures. For $K<\sigma$ and in any dimension $d\ge 1$ we show that $\alpha = {1\over2}$. For $K>\sigma$ in dimension $d=1$ we show that the rate is slower and is given by $\alpha = {(\sigma^2 + K^2)^2\over 4 (\sigma^4 + K^4)} < 1/2$. This resolves several open problems in \cite{goldfeld2020convergence}, and in particular precisely identifies the amount of smoothing $\sigma$ needed to obtain a parametric rate. In addition, we also establish that $D_{KL}(\mathbb{P}_n * \gamma \|\mathbb{P}*\gamma)$ has rate $O(1/n)$ for $K<\sigma$ but only slows down to $O({(\log n)^{d+1}\over n})$ for $K>\sigma$. The surprising difference of the behavior of $W_2^2$ and KL implies the failure of $T_{2}$-transportation inequality when $\sigma < K$. Consequently, the requirement $K<\sigma$ is necessary for validity of the log-Sobolev inequality (LSI) for the Gaussian mixture $\mathbb{P} * \mathcal{N}(0, \sigma^{2})$, closing an open problem in \cite{wang2016functional}, who established the LSI under precisely this condition.
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration (FQI), suggest a $O(1/\sqrt{n})$ convergence for regret, empirical behavior exhibits \emph{much} faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function $Q^*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q^*$-estimate's pointwise convergence rate, thus speeding it up. The level of exponentiation depends on the level of noise in the \emph{decision-making} problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply $O(1/n)$ regret rates in linear cases and $\exp(-\Omega(n))$ regret rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on $L_p$-convergence rates for estimating $Q^*$ rather than pointwise rates, where $L_2$ guarantees for nonparametric $Q^*$-estimation can be ensured under mild conditions.
We consider the estimation of factor model-based variance-covariance matrix when the factor loading matrix is assumed sparse. To do so, we rely on a system of penalized estimating functions to account for the identification issue of the factor loading matrix while fostering sparsity in potentially all its entries. We prove the oracle property of the penalized estimator for the factor model when the dimension is fixed. That is, the penalization procedure can recover the true sparse support, and the estimator is asymptotically normally distributed. Consistency and recovery of the true zero entries are established when the number of parameters is diverging. These theoretical results are supported by simulation experiments, and the relevance of the proposed method is illustrated by an application to portfolio allocation.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.