We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order insertion-only model, then there exist space efficient algorithms with improved dependencies on the approximation parameter $\varepsilon$. In particular, for any real $p > 2$, we first obtain an algorithm for $F_p$ moment estimation using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of memory. Our techniques also give algorithms for $F_p$ moment estimation with $p>2$ on arbitrary order insertion-only and turnstile streams, using $\tilde{\mathcal{O}}\left(\frac{1}{\varepsilon^{4/p}}\cdot n^{1-2/p}\right)$ bits of space and two passes, which is the first optimal multi-pass $F_p$ estimation algorithm up to $\log n$ factors. Finally, we give an improved lower bound of $\Omega\left(\frac{1}{\varepsilon^2}\cdot n^{1-2/p}\right)$ for one-pass insertion-only streams. Our results separate the complexity of this problem both between random and non-random orders, as well as one-pass and multi-pass streams.
This paper studies the estimation of the conditional density f (x, $\times$) of Y i given X i = x, from the observation of an i.i.d. sample (X i , Y i) $\in$ R d , i = 1,. .. , n. We assume that f depends only on r unknown components with typically r d. We provide an adaptive fully-nonparametric strategy based on kernel rules to estimate f. To select the bandwidth of our kernel rule, we propose a new fast iterative algorithm inspired by the Rodeo algorithm (Wasserman and Lafferty (2006)) to detect the sparsity structure of f. More precisely, in the minimax setting, our pointwise estimator, which is adaptive to both the regularity and the sparsity, achieves the quasi-optimal rate of convergence. Its computational complexity is only O(dn log n).
The weighted nonlinear least-squares problem for low-rank signal estimation is considered. The problem of constructing a numerical solution that is stable and fast for long time series is addressed. A modified weighted Gauss-Newton method, which can be implemented through the direct variable projection onto a space of low-rank signals, is proposed. For a weight matrix which provides the maximum likelihood estimator of the signal in the presence of autoregressive noise of order $p$ the computational cost of iterations is $O(N r^2 + N p^2 + r N \log N)$ as $N$ tends to infinity, where $N$ is the time-series length, $r$ is the rank of the approximating time series. Moreover, the proposed method can be applied to data with missing values, without increasing the computational cost. The method is compared with state-of-the-art methods based on the variable projection approach in terms of floating-point numerical stability and computational cost.
Estimators for causal quantities sometimes suffer from outliers. We investigate outlier-resistant estimation for the average treatment effect (ATE) under challenging but realistic settings. We assume that the ratio of outliers is not necessarily small and that it can depend on covariates. We propose three types of estimators for the ATE, which combines the well-known inverse probability weighting (IPW)/doubly robust (DR) estimators with the density-power weight. Under heterogeneous contamination, our methods can reduce the bias caused by outliers. In particular, under homogeneous contamination, our estimators are approximately consistent with the true ATE. An influence-function-based analysis indicates that the adverse effect of outliers is negligible if the ratio of outliers is small even under heterogeneous contamination. We also derived the asymptotic properties of our estimators. We evaluated the performance of our estimators through Monte-Carlo simulations and real data analysis. The comparative methods, which estimate the median of the potential outcome, do not have enough outlier resistance. In experiments, our methods outperformed the comparative methods.
We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $\theta\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $\hat{\boldsymbol x}^{\rm L}$ and approach $\hat{\boldsymbol x}^{\rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.
This paper establishes the optimal approximation error characterization of deep ReLU networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^s([0,1]^d)$ with a nearly optimal approximation error $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.
In this paper, we study the estimation of the derivative of a regression function in a standard univariate regression model. The estimators are defined either by derivating nonparametric least-squares estimators of the regression function or by estimating the projection of the derivative. We prove two simple risk bounds allowing to compare our estimators. More elaborate bounds under a stability assumption are then provided. Bases and spaces on which we can illustrate our assumptions and first results are both of compact or non compact type, and we discuss the rates reached by our estimators. They turn out to be optimal in the compact case. Lastly, we propose a model selection procedure and prove the associated risk bound. To consider bases with a non compact support makes the problem difficult.
The problem of solving linear systems is one of the most fundamental problems in computer science, where given a satisfiable linear system $(A,b)$, for $A \in \mathbb{R}^{n \times n}$ and $b \in \mathbb{R}^n$, we wish to find a vector $x \in \mathbb{R}^n$ such that $Ax = b$. The current best algorithms for solving dense linear systems reduce the problem to matrix multiplication, and run in time $O(n^{\omega})$. We consider the problem of finding $\varepsilon$-approximate solutions to linear systems with respect to the $L_2$-norm, that is, given a satisfiable linear system $(A \in \mathbb{R}^{n \times n}, b \in \mathbb{R}^n)$, find an $x \in \mathbb{R}^n$ such that $||Ax - b||_2 \leq \varepsilon||b||_2$. Our main result is a fine-grained reduction from computing the rank of a matrix to finding $\varepsilon$-approximate solutions to linear systems. In particular, if the best known $O(n^\omega)$ time algorithm for computing the rank of $n \times O(n)$ matrices is optimal (which we conjecture is true), then finding an $\varepsilon$-approximate solution to a dense linear system also requires $\tilde{\Omega}(n^{\omega})$ time, even for $\varepsilon$ as large as $(1 - 1/\text{poly}(n))$. We also prove (under some modified conjectures for the rank-finding problem) optimal hardness of approximation for sparse linear systems, linear systems over positive semidefinite matrices, well-conditioned linear systems, and approximately solving linear systems with respect to the $L_p$-norm, for $p \geq 1$. At the heart of our results is a novel reduction from the rank problem to a decision version of the approximate linear systems problem. This reduction preserves properties such as matrix sparsity and bit complexity.
We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $\Omega(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $\Omega(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the case where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Prior to our work, no such hardness was known for an approximation factor less than $1/2$ for any CSP. Our work builds on and extends the work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the max cut in graphs. This corresponds roughly to the case of Max $k$-LIN-$\bmod\; q$ with $k=q=2$. Each one of the extensions provides non-trivial technical challenges that we overcome in this work.
Bernstein estimators are well-known to avoid the boundary bias problem of traditional kernel estimators. The theoretical properties of these estimators have been studied extensively on compact intervals and hypercubes, but never on the simplex, except for the mean squared error of the density estimator in Tenbusch (1994) when $d = 2$. The simplex is an important case as it is the natural domain of compositional data. In this paper, we make an effort to prove several asymptotic results (bias, variance, mean squared error (MSE), mean integrated squared error (MISE), asymptotic normality, uniform strong consistency) for Bernstein estimators of cumulative distribution functions and density functions on the $d$-dimensional simplex. Our results generalize the ones in Leblanc (2012) and Babu et al. (2002), who treated the case $d = 1$, and significantly extend those found in Tenbusch (1994). In particular, our rates of convergence for the MSE and MISE are optimal.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.