We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha=n/d$, and study a data model that includes outliers. We provide exact asymptotics for the performances of the empirical risk minimisation (ERM) using $\ell_2$-regularised $\ell_2$, $\ell_1$, and Huber loss, which are the standard approach to such problems. We focus on two metrics for the performance: the generalisation error to similar datasets with outliers, and the estimation error of the original, unpolluted function. Our results are compared with the information theoretic Bayes-optimal estimation bound. For the generalization error, we find that optimally-regularised ERM is asymptotically consistent in the large sample complexity limit if one perform a simple calibration, and compute the rates of convergence. For the estimation error however, we show that due to a norm calibration mismatch, the consistency of the estimator requires an oracle estimate of the optimal norm, or the presence of a cross-validation set not corrupted by the outliers. We examine in detail how performance depends on the loss function and on the degree of outlier corruption in the training set and identify a region of parameters where the optimal performance of the Huber loss is identical to that of the $\ell_2$ loss, offering insights into the use cases of different loss functions.
This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlinear variational form and the nonlinear functional of interest which introduces linearization errors into the error estimate. In this paper, we investigate these linearization errors. In particular, we develop a novel discrete goal-oriented error estimate that accounts for traditionally neglected nonlinear terms at the expense of greater computational cost. We demonstrate how this error estimate can be used to drive mesh adaptivity. We show that accounting for linearization errors in the error estimate can improve its effectivity for several nonlinear model problems and quantities of interest. We also demonstrate that an adaptive strategy based on the newly proposed estimate can lead to more accurate approximations of the nonlinear functional with fewer degrees of freedom when compared to uniform refinement and traditional adjoint-based approaches.
This work presents a non-parametric estimator for the cumulative distribution function (CDF) of the jump-size distribution for a storage system with compound Poisson input. The workload process is observed according to an independent Poisson sampling process. The nonparametric estimator is constructed by first estimating the characteristic function (CF) and then applying an inversion formula. The convergence rate of the CF estimator at $s$ is shown to be of the order of $s^2/n$, where $n$ is the sample size. This convergence rate is leveraged to explore the bias-variance tradeoff of the inversion estimator. It is demonstrated that within a certain class of continuous distributions, the risk, in terms of MSE, is uniformly bounded by $C n^{-\frac{\eta}{1+\eta}}$, where $C$ is a positive constant and the parameter $\eta>0$ depends on the smoothness of the underlying class of distributions. A heuristic method is further developed to address the case of an unknown rate of the compound Poisson input process.
Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings. This limits the applicability of PCG to more complex and modern titles, hindering its adoption in industry. Our work aims to address this limitation by introducing a compositional level generation method that recursively composes simple low-level generators to construct large and complex creations. This approach allows for easily-optimisable objectives and the ability to design a complex structure in an interpretable way by referencing lower-level components. We empirically demonstrate that our method outperforms a non-compositional baseline by more accurately satisfying a designer's functional requirements in several tasks. Finally, we provide a qualitative showcase (in Minecraft) illustrating the large and complex, but still coherent, structures that were generated using simple base generators.
In an earlier paper (//doi.org/10.1137/21M1393315), the Switch Point Algorithm was developed for solving optimal control problems whose solutions are either singular or bang-bang or both singular and bang-bang, and which possess a finite number of jump discontinuities in an optimal control at the points in time where the solution structure changes. The class of control problems that were considered had a given initial condition, but no terminal constraint. The theory is now extended to include problems with both initial and terminal constraints, a structure that often arises in boundary-value problems. Substantial changes to the theory are needed to handle this more general setting. Nonetheless, the derivative of the cost with respect to a switch point is again the jump in the Hamiltonian at the switch point.
Covariate adjustment is desired by both practitioners and regulators of randomized clinical trials because it improves precision for estimating treatment effects. However, covariate adjustment presents a particular challenge in time-to-event analysis. We propose to apply covariate adjusted pseudovalue regression to estimate between-treatment difference in restricted mean survival times (RMST). Our proposed method incorporates a prognostic covariate to increase precision of treatment effect estimate, maintaining strict type I error control without introducing bias. In addition, the amount of increase in precision can be quantified and taken into account in sample size calculation at the study design stage. Consequently, our proposed method provides the ability to design smaller randomized studies at no expense to statistical power.
A new variant of Newton's method for empirical risk minimization is studied, where at each iteration of the optimization algorithm, the gradient and Hessian of the objective function are replaced by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, consequences of the theory in generalized linear models are studied when data are generated from Huber's epsilon-contamination model and/or heavytailed distributions. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed, which may be more appropriate for high-dimensional settings, and conjectures about the convergence of the resulting algorithm are offered. Compared to robust gradient descent, the proposed algorithm enjoys the faster rates of convergence for successive iterates often achieved by second-order algorithms for convex problems, i.e., quadratic convergence in a neighborhood of the optimum, with a stepsize that may be chosen adaptively via backtracking linesearch.
It is well known that it is impossible to construct useful confidence intervals (CIs) about the mean or median of a response $Y$ conditional on features $X = x$ without making strong assumptions about the joint distribution of $X$ and $Y$. This paper introduces a new framework for reasoning about problems of this kind by casting the conditional problem at different levels of resolution, ranging from coarse to fine localization. In each of these problems, we consider local quantiles defined as the marginal quantiles of $Y$ when $(X,Y)$ is resampled in such a way that samples $X$ near $x$ are up-weighted while the conditional distribution $Y \mid X$ does not change. We then introduce the Weighted Quantile method, which asymptotically produces the uniformly most accurate confidence intervals for these local quantiles no matter the (unknown) underlying distribution. Another method, namely, the Quantile Rejection method, achieves finite sample validity under no assumption whatsoever. We conduct extensive numerical studies demonstrating that both of these methods are valid. In particular, we show that the Weighted Quantile procedure achieves nominal coverage as soon as the effective sample size is in the range of 10 to 20.
In this article we perform an asymptotic analysis of parallel Bayesian logspline density estimators. Such estimators are useful for the analysis of datasets that are partitioned into subsets and stored in separate databases without the capability of accessing the full dataset from a single computer. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the normalized density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error of the full dataset posterior estimator. The error bound depends upon the parameters that arise in logspline density estimation and the numerical approximation procedure. In our analysis, we identify the choices for the parameters that result in the error bound scaling optimally in relation to the number of samples. This provides our method with increased estimation accuracy, while also minimizing the computational cost.
Algorithms for online learning typically require one or more boundedness assumptions: that the domain is bounded, that the losses are Lipschitz, or both. In this paper, we develop a new setting for online learning with unbounded domains and non-Lipschitz losses. For this setting we provide an algorithm which guarantees $R_{T}(u)\le \tilde O(G\|u\|\sqrt{T}+L\|u\|^{2}\sqrt{T})$ regret on any problem where the subgradients satisfy $\|g_{t}\|\le G+L\|w_{t}\|$, and show that this bound is unimprovable without further assumptions. We leverage this algorithm to develop new saddle-point optimization algorithms that converge in duality gap in unbounded domains, even in the absence of meaningful curvature. Finally, we provide the first algorithm achieving non-trivial dynamic regret in an unbounded domain for non-Lipschitz losses, as well as a matching lower bound. The regret of our dynamic regret algorithm automatically improves to a novel $L^{*}$ bound when the losses are smooth.
In the literature on nonlinear cointegration, a long-standing open problem relates to how a (nonlinear) vector autoregression, which provides a unified description of the short- and long-run dynamics of a collection of time series, can generate 'nonlinear cointegration' in the profound sense of those series sharing common nonlinear stochastic trends. We consider this problem in the setting of the censored and kinked structural VAR (CKSVAR), which provides a flexible yet tractable framework within which to model time series that are subject to threshold-type nonlinearities, such as those arising due to occasionally binding constraints, of which the zero lower bound (ZLB) on short-term nominal interest rates provides a leading example. We provide a complete characterisation of how common linear and {nonlinear stochastic trends may be generated in this model, via unit roots and appropriate generalisations of the usual rank conditions, providing the first extension to date of the Granger-Johansen representation theorem to a nonlinearly cointegrated setting, and thereby giving the first successful treatment of the open problem. The limiting common trend processes include regulated, censored and kinked Brownian motions, none of which have previously appeared in the literature on cointegrated VARs. Our results and running examples illustrate that the CKSVAR is capable of supporting a far richer variety of long-run behaviour than is a linear VAR, in ways that may be particularly useful for the identification of structural parameters.