We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard errors. Our results hold in the increasing dimension setting and under minimal assumptions on the data generating distribution. In particular, we do not assume a linear regression function and only require the existence of finitely many moments for the response and the covariates. Furthermore, we construct confidence sets for the projection parameters in the form of hyper-rectangles and establish finite sample bounds on their coverage and accuracy. We derive analogous results for partial correlations among the entries of sub-Gaussian vectors. \end{abstract}
Various natural phenomena exhibit spatial extremal dependence at short distances only, while it usually vanishes as the distance between sites increases arbitrarily. However, models proposed in the literature for spatial extremes, which are based on max-stable or Pareto processes or comparatively less computationally demanding ``sub-asymptotic'' models based on Gaussian location and/or scale mixtures, generally assume that spatial extremal dependence persists across the entire spatial domain. This is a clear limitation when modeling extremes over large geographical domains, but surprisingly, it has been mostly overlooked in the literature. In this paper, we develop a more realistic Bayesian framework based on a novel Gaussian scale mixture model, where the Gaussian process component is defined by a stochastic partial differential equation that yields a sparse precision matrix, and the random scale component is modeled as a low-rank Pareto-tailed or Weibull-tailed spatial process determined by compactly supported basis functions. We show that our proposed model is approximately tail-stationary despite its non-stationary construction in terms of basis functions, and we demonstrate that it can capture a wide range of extremal dependence structures as a function of distance. Furthermore, the inherently sparse structure of our spatial model allows fast Bayesian computations, even in high spatial dimensions, based on a customized Markov chain Monte Carlo algorithm, which prioritize calibration in the tail. In our application, we fit our model to analyze heavy monsoon rainfall data in Bangladesh. Our study indicates that the proposed model outperforms some natural alternatives, and that the model fits precipitation extremes satisfactorily well. Finally, we use the fitted model to draw inferences on long-term return levels for marginal precipitation at each site, and for spatial aggregates.
$k$-center is one of the most popular clustering models. While it admits a simple 2-approximation in polynomial time in general metrics, the Euclidean version is NP-hard to approximate within a factor of 1.93, even in the plane, if one insists the dependence on $k$ in the running time be polynomial. Without this restriction, a classic algorithm yields a $2^{O((k\log k)/{\epsilon})}dn$-time $(1+\epsilon)$-approximation for Euclidean $k$-center, where $d$ is the dimension. We give a faster algorithm for small dimensions: roughly speaking an $O^*(2^{O((1/\epsilon)^{O(d)} \cdot k^{1-1/d} \cdot \log k)})$-time $(1+\epsilon)$-approximation. In particular, the running time is roughly $O^*(2^{O((1/\epsilon)^{O(1)}\sqrt{k}\log k)})$ in the plane. We complement our algorithmic result with a matching hardness lower bound. We also consider a well-studied generalization of $k$-center, called Non-uniform $k$-center (NUkC), where we allow different radii clusters. NUkC is NP-hard to approximate within any factor, even in the Euclidean case. We design a $2^{O(k\log k)}n^2$ time $3$-approximation for NUkC in general metrics, and a $2^{O((k\log k)/\epsilon)}dn$ time $(1+\epsilon)$-approximation for Euclidean NUkC. The latter time bound matches the bound for $k$-center.
We give new polynomial lower bounds for a number of dynamic measure problems in computational geometry. These lower bounds hold in the the Word-RAM model, conditioned on the hardness of either the 3SUM problem or the Online Matrix-Vector Mutliplication problem [Henzinger et al., STOC 2015]. In particular we get lower bounds in the incremental and fully-dynamic settings for counting maximal or extremal points in R^3, different variants of Klee's Measure Problem, problems related to finding the largest empty disk in a set of points, and querying the size of the i'th convex layer in a planar set of points. While many conditional lower bounds for dynamic data structures have been proven since the seminal work of Patrascu [STOC 2010], few of them relate to computational geometry problems. This is the first paper focusing on this topic. The problems we consider can all be solved in O(n log n) time in the static case and their dynamic versions have mostly been approached from the perspective of improving known upper bounds. One exception to this is Klee's measure problem in R^2, for which Chan [CGTA 2010] gave an unconditional {\Omega}(\sqrt{n}) lower bound on the worst-case update time. By a similar approach, we show that this also holds for an important special case of Klee's measure problem in R^3 known as the Hypervolume Indicator problem.
The noncentral Wishart distribution has become more mainstream in statistics as the prevalence of applications involving sample covariances with underlying multivariate Gaussian populations as dramatically increased since the advent of computers. Multiple sources in the literature deal with local approximations of the noncentral Wishart distribution with respect to its central counterpart. However, no source has yet developed explicit local approximations for the (central) Wishart distribution in terms of a normal analogue, which is important since Gaussian distributions are at the heart of the asymptotic theory for many statistical methods. In this paper, we prove a precise asymptotic expansion for the ratio of the Wishart density to the symmetric matrix-variate normal density with the same mean and covariances. The result is then used to derive an upper bound on the total variation between the corresponding probability measures and to find the pointwise variance of a new density estimator on the space of positive definite matrices with a Wishart asymmetric kernel. For the sake of completeness, we also find expressions for the pointwise bias of our new estimator, the pointwise variance as we move towards the boundary of its support, the mean squared error, the mean integrated squared error away from the boundary, and we prove its asymptotic normality.
Modern high-dimensional methods often adopt the ``bet on sparsity'' principle, while in supervised multivariate learning statisticians may face ``dense'' problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CRL is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection. In this paper, new information-theoretical limits are presented to reveal the intrinsic cost of seeking for clusters, as well as the blessing from dimensionality in multivariate learning. Moreover, an efficient optimization algorithm is developed, which performs subspace learning and clustering with guaranteed convergence. The obtained fixed-point estimators, though not necessarily globally optimal, enjoy the desired statistical accuracy beyond the standard likelihood setup under some regularity conditions. Moreover, a new kind of information criterion, as well as its scale-free form, is proposed for cluster and rank selection, and has a rigorous theoretical support without assuming an infinite sample size. Extensive simulations and real-data experiments demonstrate the statistical accuracy and interpretability of the proposed method.
Empirical likelihood enables a nonparametric, likelihood-driven style of inference without restrictive assumptions routinely made in parametric models. We develop a framework for applying empirical likelihood to the analysis of experimental designs, addressing issues that arise from blocking and multiple hypothesis testing. In addition to popular designs such as balanced incomplete block designs, our approach allows for highly unbalanced, incomplete block designs. Based on all these designs, we derive an asymptotic multivariate chi-square distribution for a set of empirical likelihood test statistics. Further, we propose two single-step multiple testing procedures: asymptotic Monte Carlo and nonparametric bootstrap. Both procedures asymptotically control the generalized family-wise error rate and efficiently construct simultaneous confidence intervals for comparisons of interest without explicitly considering the underlying covariance structure. A simulation study demonstrates that the performance of the procedures is robust to violations of standard assumptions of linear mixed models. Significantly, considering the asymptotic nature of empirical likelihood, the nonparametric bootstrap procedure performs well even for small sample sizes. We also present an application to experiments on a pesticide. Supplementary materials for this article are available online.
In neuroscience, the distribution of a decision time is modelled by means of a one-dimensional Fokker--Planck equation with time-dependent boundaries and space-time-dependent drift. Efficient approximation of the solution to this equation is required, e.g., for model evaluation and parameter fitting. However, the prescribed boundary conditions lead to a strong singularity and thus to slow convergence of numerical approximations. In this article we demonstrate that the solution can be related to the solution of a parabolic PDE on a rectangular space-time domain with homogeneous initial and boundary conditions by transformation and subtraction of a known function. We verify that the solution of the new PDE is indeed more regular than the solution of the original PDE and proceed to discretize the new PDE using a space-time minimal residual method. We also demonstrate that the solution depends analytically on the parameters determining the boundaries as well as the drift. This justifies the use of a sparse tensor product interpolation method to approximate the PDE solution for various parameter ranges. The predicted convergence rates of the minimal residual method and that of the interpolation method are supported by numerical simulations.
In this paper, we consider the time-inhomogeneous nonlinear time series regression for a general class of locally stationary time series. On one hand, we propose sieve nonparametric estimators for the time-varying regression functions which can achieve the min-max optimal rate. On the other hand, we develop a unified simultaneous inferential theory which can be used to conduct both structural and exact form testings on the functions. Our proposed statistics are powerful even under locally weak alternatives. We also propose a multiplier bootstrapping procedure for practical implementation. Our methodology and theory do not require any structural assumptions on the regression functions and we also allow the functions to be supported in an unbounded domain. We also establish sieve approximation theory for 2-D functions in unbounded domain and a Gaussian approximation result for affine and quadratic forms for high dimensional locally stationary time series, which can be of independent interest. Numerical simulations and a real financial data analysis are provided to support our results.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.