Let $f$ be analytic on $[0,1]$ with $|f^{(k)}(1/2)|\leq A\alpha^kk!$ for some constant $A$ and $\alpha<2$. We show that the median estimate of $\mu=\int_0^1f(x)\,\mathrm{d}x$ under random linear scrambling with $n=2^m$ points converges at the rate $O(n^{-c\log(n)})$ for any $c< 3\log(2)/\pi^2\approx 0.21$. We also get a super-polynomial convergence rate for the sample median of $2k-1$ random linearly scrambled estimates, when $k=\Omega(m)$. When $f$ has a $p$'th derivative that satisfies a $\lambda$-H\"older condition then the median-of-means has error $O( n^{-(p+\lambda)+\epsilon})$ for any $\epsilon>0$, if $k\to\infty$ as $m\to\infty$.
In $d$ dimensions, approximating an arbitrary function oscillating with frequency $\lesssim k$ requires $\sim k^d$ degrees of freedom. A numerical method for solving the Helmholtz equation (with wavenumber $k$) suffers from the pollution effect if, as $k\to \infty$, the total number of degrees of freedom needed to maintain accuracy grows faster than this natural threshold. While the $h$-version of the finite element method (FEM) (where accuracy is increased by decreasing the meshwidth $h$ and keeping the polynomial degree $p$ fixed) suffers from the pollution effect, the celebrated papers [Melenk, Sauter 2010], [Melenk, Sauter 2011], [Esterhazy, Melenk 2012], and [Melenk, Parsania, Sauter 2013] showed that the $hp$-FEM (where accuracy is increased by decreasing the meshwidth $h$ and increasing the polynomial degree $p$) applied to a variety of constant-coefficient Helmholtz problems does not suffer from the pollution effect. The heart of the proofs of these results is a PDE result splitting the solution of the Helmholtz equation into "high" and "low" frequency components. In this expository paper we prove this splitting for the constant-coefficient Helmholtz equation in full space (i.e., in $\mathbb{R}^d$) using only integration by parts and elementary properties of the Fourier transform; this is in contrast to the proof for this set-up in [Melenk, Sauter 2010] which uses somewhat-involved bounds on Bessel and Hankel functions. The proof in this paper is motivated by the recent proof in [Lafontaine, Spence, Wunsch 2020] of this splitting for the variable-coefficient Helmholtz equation in full space; indeed, the proof in [Lafontaine, Spence, Wunsch 2020] uses more-sophisticated tools that reduce to the elementary ones above for constant coefficients.
In addition to the average treatment effect (ATE) for all randomized patients, sometimes it is important to understand the ATE for a principal stratum, a subset of patients defined by one or more post-baseline variables. For example, what is the ATE for those patients who could be compliant with the experimental treatment? Commonly used assumptions include monotonicity, principal ignorability, and cross-world assumptions of principal ignorability and principal strata independence. Most of these assumptions cannot be evaluated in clinical trials with parallel treatment arms. In this article, we evaluate these assumptions through a 2x2 cross-over study in which the potential outcomes under both treatments can be observed, provided there are no carry-over and study period effects. From this example, it seemed the monotonicity assumption and the within-treatment principal ignorability assumptions did not hold well. On the other hand, the assumptions of cross-world principal ignorability and cross-world principal stratum independence conditional on baseline covaraites seemed to hold well. With the latter assumptions, we estimated the ATE for principal strata, defined by whether the blood glucose standard deviation increased in each treatment period, without relying on the cross-over feature. These estimates were very close to the ATE estimate when exploiting the cross-over feature of the trial. To the best of our knowledge, this article is the first attempt to evaluate the plausibility of commonly used assumptions for estimating ATE for principal strata using the setting of a cross-over trial.
We consider the facility location problem in two dimensions. In particular, we consider a setting where agents have Euclidean preferences, defined by their ideal points, for a facility to be located in $\mathbb{R}^2$. For the minisum objective and an odd number of agents, we show that the coordinate-wise median mechanism (CM) has a worst-case approximation ratio (AR) of $\sqrt{2}\frac{\sqrt{n^2+1}}{n+1}$. Further, we show that CM has the lowest AR for this objective in the class of deterministic, anonymous, and strategyproof mechanisms. For the $p-norm$ social welfare objective, we find that the AR for CM is bounded above by $2^{\frac{3}{2}-\frac{2}{p}}$ for $p\geq 2$. Since any deterministic strategyproof mechanism must have AR at least $2^{1-\frac{1}{p}}$ (\citet{feigenbaum_approximately_2017}), our upper bound suggests that the CM is (at worst) very nearly optimal. We conjecture that the approximation ratio of coordinate-wise median is actually equal to the lower bound $2^{1-\frac{1}{p}}$ (as is the case for $p=2$ and $p=\infty$) for any $p\geq 2$.
Consider a set $P$ of $n$ points in $\mathbb{R}^d$. In the discrete median line segment problem, the objective is to find a line segment bounded by a pair of points in $P$ such that the sum of the Euclidean distances from $P$ to the line segment is minimized. In the continuous median line segment problem, a real number $\ell>0$ is given, and the goal is to locate a line segment of length $\ell$ in $\mathbb{R}^d$ such that the sum of the Euclidean distances between $P$ and the line segment is minimized. We show how to compute $(1+\epsilon\Delta)$- and $(1+\epsilon)$-approximations to a discrete median line segment in time $O(n\epsilon^{-2d}\log n)$ and $O(n^2\epsilon^{-d})$, respectively, where $\Delta$ is the spread of line segments spanned by pairs of points. While developing our algorithms, by using the principle of pair decomposition, we derive new data structures that allow us to quickly approximate the sum of the distances from a set of points to a given line segment or point. To our knowledge, our utilization of pair decompositions for solving minsum facility location problems is the first of its kind; it is versatile and easily implementable. We prove that it is impossible to construct a continuous median line segment for $n\geq3$ non-collinear points in the plane by using only ruler and compass. In view of this, we present an $O(n^d\epsilon^{-d})$-time algorithm for approximating a continuous median line segment in $\mathbb{R}^d$ within a factor of $1+\epsilon$. The algorithm is based upon generalizing the point-segment pair decomposition from the discrete to the continuous domain. Last but not least, we give an $(1+\epsilon)$-approximation algorithm, whose time complexity is sub-quadratic in $n$, for solving the constrained median line segment problem in $\mathbb{R}^2$ where an endpoint or the slope of the median line segment is given at input.
When the regression function belongs to the standard smooth classes consisting of univariate functions with derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e., it is well known that the minimax optimal rate of convergence in mean squared error (MSE) is $\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ when $\gamma$ is finite and the sample size $n\rightarrow\infty$. From a nonasymptotic viewpoint that does not take $n$ to infinity, this paper shows that: for the standard H\"older and Sobolev classes, the minimax optimal rate is $\frac{\sigma^{2}\left(\gamma+1\right)}{n}$ ($\succsim\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$) when $\frac{n}{\sigma^{2}}\precsim\left(\gamma+1\right)^{2\gamma+3}$ and $\left(\frac{\sigma^{2}}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ ($\succsim\frac{\sigma^{2}\left(\gamma+1\right)}{n}$) when $\frac{n}{\sigma^{2}}\succsim\left(\gamma+1\right)^{2\gamma+3}$. To establish these results, we derive upper and lower bounds on the covering and packing numbers for the generalized H\"older class where the absolute value of the $k$th ($k=0,...,\gamma$) derivative is bounded by a parameter $R_{k}$ and the $\gamma$th derivative is $R_{\gamma+1}-$Lipschitz (and also for the generalized ellipsoid class of smooth functions). Our bounds sharpen the classical metric entropy results for the standard classes, and give the general dependence on $\gamma$ and $R_{k}$. By deriving the minimax optimal MSE rates under various (well motivated) $R_{k}$s for the smooth classes with the help of our new entropy bounds, we show several interesting results that cannot be shown with the existing entropy bounds in the literature.
In this paper we prove new results concerning pseudo-polynomial time algorithms for the classical scheduling problem of minimizing the weighted number of jobs on a single machine, the so-called $1 \mid \mid \Sigma w_j U_j$ problem. The previously best known pseudo-polynomial algorithm for this problem, due to Lawler and Moore [Management Science'69], dates back to the late 60s and has running time $O(d_{\max}n)$ or $O(wn)$, where $d_{max}$ and $w$ are the maximum due date and sum of weights of the job set respectively. Using the recently introduced "prediction technique" by Bateni et al. [STOC'19], we present an algorithm for the problem running in $\widetilde{O}(d_{\#}(n +dw_{\max}))$ time, where $d_{\#}$ is the number of different due dates in the instance, $d$ is the total sum of the $d_{\#}$ different due dates, and $w_{\max}$ is the maximum weight of any job. This algorithm outperform the algorithm of Lawler and Moore for certain ranges of the above parameters, and provides the first such improvement for over 50 years. We complement this result by showing that $1 \mid \mid \Sigma w_j U_j$ has no $\widetilde{O}(n +w^{1-\varepsilon}_{\max}n)$ nor $\widetilde{O}(n +w_{\max}n^{1-\varepsilon})$ time algorithms assuming $\forall \exists$-SETH conjecture, a recently introduced variant of the well known Strong Exponential Time Hypothesis (SETH).
We revisit the outlier hypothesis testing framework of Li \emph{et al.} (TIT 2014) and derive fundamental limits for the optimal test under the generalized Neyman-Pearson criterion. In outlier hypothesis testing, one is given multiple observed sequences, where most sequences are generated i.i.d. from a nominal distribution. The task is to discern the set of outlying sequences that are generated from anomalous distributions. The nominal and anomalous distributions are \emph{unknown}. We study the tradeoff among the probabilities of misclassification error, false alarm and false reject for tests that satisfy weak conditions on the rate of decrease of these error probabilities as a function of sequence length. Specifically, we propose a threshold-based test that ensures exponential decay of misclassification error and false alarm probabilities. We study two constraints on the false reject probability, with one constraint being that it is a non-vanishing constant and the other being that it has an exponential decay rate. For both cases, we characterize bounds on the false reject probability, as a function of the threshold, for each pair of nominal and anomalous distributions and demonstrate the optimality of our test under the generalized Neyman-Pearson criterion. We first consider the case of at most one outlying sequence and then generalize our results to the case of multiple outlying sequences where the number of outlying sequences is unknown and each outlying sequence can follow a different anomalous distribution.
The theory of reinforcement learning currently suffers from a mismatch between its empirical performance and the theoretical characterization of its performance, with consequences for, e.g., the understanding of sample efficiency, safety, and robustness. The linear quadratic regulator with unknown dynamics is a fundamental reinforcement learning setting with significant structure in its dynamics and cost function, yet even in this setting there is a gap between the best known regret lower-bound of $\Omega_p(\sqrt{T})$ and the best known upper-bound of $O_p(\sqrt{T}\,\text{polylog}(T))$. The contribution of this paper is to close that gap by establishing a novel regret upper-bound of $O_p(\sqrt{T})$. Our proof is constructive in that it analyzes the regret of a concrete algorithm, and simultaneously establishes an estimation error bound on the dynamics of $O_p(T^{-1/4})$ which is also the first to match the rate of a known lower-bound. The two keys to our improved proof technique are (1) a more precise upper- and lower-bound on the system Gram matrix and (2) a self-bounding argument for the expected estimation error of the optimal controller.
We prove that the stack-number of the strong product of three $n$-vertex paths is $\Theta(n^{1/3})$. The best previously known upper bound was $O(n)$. No non-trivial lower bound was known. This is the first explicit example of a graph family with bounded maximum degree and unbounded stack-number. The main tool used in our proof of the lower bound is the topological overlap theorem of Gromov. We actually prove a stronger result in terms of so-called triangulations of Cartesian products. We conclude that triangulations of three-dimensional Cartesian products of any sufficiently large connected graphs have large stack-number. The upper bound is a special case of a more general construction based on families of permutations derived from Hadamard matrices. The strong product of three paths is also the first example of a bounded degree graph with bounded queue-number and unbounded stack-number. A natural question that follows from our result is to determine the smallest $\Delta_0$ such that there exist a graph family with unbounded stack-number, bounded queue-number and maximum degree $\Delta_0$. We show that $\Delta_0\in \{6,7\}$.
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.