Bernstein estimators are well-known to avoid the boundary bias problem of traditional kernel estimators. The theoretical properties of these estimators have been studied extensively on compact intervals and hypercubes, but never on the simplex, except for the mean squared error of the density estimator in Tenbusch (1994) when $d = 2$. The simplex is an important case as it is the natural domain of compositional data. In this paper, we make an effort to prove several asymptotic results (bias, variance, mean squared error (MSE), mean integrated squared error (MISE), asymptotic normality, uniform strong consistency) for Bernstein estimators of cumulative distribution functions and density functions on the $d$-dimensional simplex. Our results generalize the ones in Leblanc (2012) and Babu et al. (2002), who treated the case $d = 1$, and significantly extend those found in Tenbusch (1994). In particular, our rates of convergence for the MSE and MISE are optimal.
Function values are, in some sense, "almost as good" as general linear information for $L_2$-approximation (optimal recovery, data assimilation) of functions from a reproducing kernel Hilbert space. This was recently proved by new upper bounds on the sampling numbers under the assumption that the singular values of the embedding of this Hilbert space into $L_2$ are square-summable. Here we mainly prove new lower bounds. In particular we prove that the sampling numbers behave worse than the approximation numbers for Sobolev spaces with small smoothness. Hence there can be a logarithmic gap also in the case where the singular numbers of the embedding are square-summable. We first prove new lower bounds for the integration problem, again for rather classical Sobolev spaces of periodic univariate functions.
I consider the estimation of the average treatment effect (ATE), in a population that can be divided into $G$ groups, and such that one has unbiased and uncorrelated estimators of the conditional average treatment effect (CATE) in each group. These conditions are for instance met in stratified randomized experiments. I assume that the outcome is homoscedastic, and that each CATE is bounded in absolute value by $B$ standard deviations of the outcome, for some known constant $B$. I derive, across all linear combinations of the CATEs' estimators, the estimator of the ATE with the lowest worst-case mean-squared error. This estimator assigns a weight equal to group $g$'s share in the population to the most precisely estimated CATEs, and a weight proportional to one over the estimator's variance to the least precisely estimated CATEs. Given $B$, this optimal estimator is feasible: the weights only depend on known quantities. I then allow for positive covariances known up to the outcome's variance between the estimators. This condition is met by differences-in-differences estimators in staggered adoption designs, if potential outcomes are homoscedastic and uncorrelated. Under those assumptions, I show that the minimax estimator is still feasible and can easily be computed. In realistic numerical examples, the minimax estimator can lead to substantial precision and worst-case MSE gains relative to the unbiased estimator.
In this work we study two Riemannian distances between infinite-dimensional positive definite Hilbert-Schmidt operators, namely affine-invariant Riemannian and Log-Hilbert-Schmidt distances, in the context of covariance operators associated with functional stochastic processes, in particular Gaussian processes. Our first main results show that both distances converge in the Hilbert-Schmidt norm. Using concentration results for Hilbert space-valued random variables, we then show that both distances can be consistently and efficiently estimated from (i) sample covariance operators, (ii) finite, normalized covariance matrices, and (iii) finite samples generated by the given processes, all with dimension-independent convergence. Our theoretical analysis exploits extensively the methodology of reproducing kernel Hilbert space (RKHS) covariance and cross-covariance operators. The theoretical formulation is illustrated with numerical experiments on covariance operators of Gaussian processes.
We study approximation of multivariate periodic functions from Besov and Triebel--Lizorkin spaces of dominating mixed smoothness by the Smolyak algorithm constructed using a special class of quasi-interpolation operators of Kantorovich-type. These operators are defined similar to the classical sampling operators by replacing samples with the average values of a function on small intervals (or more generally with sampled values of a convolution of a given function with an appropriate kernel). In this paper, we estimate the rate of convergence of the corresponding Smolyak algorithm in the $L_q$-norm for functions from the Besov spaces $\mathbf{B}_{p,\theta}^s(\mathbb{T}^d)$ and the Triebel--Lizorkin spaces $\mathbf{F}_{p,\theta}^s(\mathbb{T}^d)$ for all $s>0$ and admissible $1\le p,\theta\le \infty$ as well as provide analogues of the Littlewood--Paley-type characterizations of these spaces in terms of families of quasi-interpolation operators.
Functional data analysis is a fast evolving branch of statistics. Estimation procedures for the popular functional linear model either suffer from lack of robustness or are computationally burdensome. To address these shortcomings, a flexible family of penalized lower-rank estimators based on a bounded loss function is proposed. The proposed class of estimators is shown to be consistent and can attain high rates of convergence with respect to prediction error under weak regularity conditions. These results can be generalized to higher dimensions under similar assumptions. The finite-sample performance of the proposed family of estimators is investigated by a Monte-Carlo study which shows that these estimators reach high efficiency while offering protection against outliers. The proposed estimators compare favorably to existing approaches robust as well as non-robust alternatives. The good performance of the method is also illustrated on a complex real dataset.
We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using $O(1)$ batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression.
We prove a new Bernstein type inequality in $L^p$ spaces associated with the tangential derivatives on the boundary of a general compact $C^2$-domain. We give two applications: Marcinkiewicz type inequality for discretization of $L^p$ norm and positive cubature formula. Both results are optimal in the sense that the number of function samples used has the order of the dimension of the corresponding space of algebraic polynomials.
We propose local space-time approximation spaces for parabolic problems that are optimal in the sense of Kolmogorov and may be employed in multiscale and domain decomposition methods. The diffusion coefficient can be arbitrarily rough in space and time. To construct local approximation spaces we consider a compact transfer operator that acts on the space of local solutions and covers the full time dimension. The optimal local spaces are then given by the left singular vectors of the transfer operator. To prove compactness of the latter we combine a suitable parabolic Caccioppoli inequality with the compactness theorem of Aubin-Lions. In contrast to the elliptic setting [I. Babu\v{s}ka and R. Lipton, Multiscale Model. Simul., 9 (2011), pp. 373-406] we need an additional regularity result to combine the two results. Furthermore, we employ the generalized finite element method to couple local spaces and construct an approximation of the global solution. Since our approach yields reduced space-time bases, the computation of the global approximation does not require a time stepping method and is thus computationally efficient. Moreover, we derive rigorous local and global a priori error bounds. In detail, we bound the global approximation error in a graph norm by the local errors in the $L^2(H^1)$-norm, noting that the space the transfer operator maps to is equipped with this norm. Numerical experiments demonstrate an exponential decay of the singular values of the transfer operator and the local and global approximation errors for problems with high contrast or multiscale structure regarding space and time.
The main aim of this paper is to solve an inverse source problem for a general nonlinear hyperbolic equation. Combining the quasi-reversibility method and a suitable Carleman weight function, we define a map of which fixed point is the solution to the inverse problem. To find this fixed point, we define a recursive sequence with an arbitrary initial term by the same manner as in the classical proof of the contraction principle. Applying a Carleman estimate, we show that the sequence above converges to the desired solution with the exponential rate. Therefore, our new method can be considered as an analog of the contraction principle. We rigorously study the stability of our method with respect to noise. Numerical examples are presented.
We abstract and study \emph{reachability preservers}, a graph-theoretic primitive that has been implicit in prior work on network design. Given a directed graph $G = (V, E)$ and a set of \emph{demand pairs} $P \subseteq V \times V$, a reachability preserver is a sparse subgraph $H$ that preserves reachability between all demand pairs. Our first contribution is a series of extremal bounds on the size of reachability preservers. Our main result states that, for an $n$-node graph and demand pairs of the form $P \subseteq S \times V$ for a small node subset $S$, there is always a reachability preserver on $O(n+\sqrt{n |P| |S|})$ edges. We additionally give a lower bound construction demonstrating that this upper bound characterizes the settings in which $O(n)$ size reachability preservers are generally possible, in a large range of parameters. The second contribution of this paper is a new connection between extremal graph sparsification results and classical Steiner Network Design problems. Surprisingly, prior to this work, the osmosis of techniques between these two fields had been superficial. This allows us to improve the state of the art approximation algorithms for the most basic Steiner-type problem in directed graphs from the $O(n^{0.6+\varepsilon})$ of Chlamatac, Dinitz, Kortsarz, and Laekhanukit (SODA'17) to $O(n^{4/7+\varepsilon})$.