Motivated by applications in combinatorial geometry, we consider the following question: Let $\lambda=(\lambda_1,\lambda_2,\ldots,\lambda_m)$ be an $m$-partition of a positive integer $n$, $S_i \subseteq \mathbb{C}^{\lambda_i}$ be finite sets, and let $S:=S_1 \times S_2 \times \ldots \times S_m \subset \mathbb{C}^n$ be the multi-grid defined by $S_i$. Suppose $p$ is an $n$-variate degree $d$ polynomial. How many zeros does $p$ have on $S$? We first develop a multivariate generalization of Combinatorial Nullstellensatz that certifies existence of a point $t \in S$ so that $p(t) \neq 0$. Then we show that a natural multivariate generalization of the DeMillo-Lipton-Schwartz-Zippel lemma holds, except for a special family of polynomials that we call $\lambda$-reducible. This yields a simultaneous generalization of Szemer\'edi-Trotter theorem and Schwartz-Zippel lemma into higher dimensions, and has applications in incidence geometry. Finally, we develop a symbolic algorithm that identifies certain $\lambda$-reducible polynomials. More precisely, our symbolic algorithm detects polynomials that include a cartesian product of hypersurfaces in their zero set. It is likely that using Chow forms the algorithm can be generalized to handle arbitrary $\lambda$-reducible polynomials, which we leave as an open problem.
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.
In network analysis, how to estimate the number of communities $K$ is a fundamental problem. We consider a broad setting where we allow severe degree heterogeneity and a wide range of sparsity levels, and propose Stepwise Goodness-of-Fit (StGoF) as a new approach. This is a stepwise algorithm, where for $m = 1, 2, \ldots$, we alternately use a community detection step and a goodness-of-fit (GoF) step. We adapt SCORE \cite{SCORE} for community detection, and propose a new GoF metric. We show that at step $m$, the GoF metric diverges to $\infty$ in probability for all $m < K$ and converges to $N(0,1)$ if $m = K$. This gives rise to a consistent estimate for $K$. Also, we discover the right way to define the signal-to-noise ratio (SNR) for our problem and show that consistent estimates for $K$ do not exist if $\mathrm{SNR} \goto 0$, and StGoF is uniformly consistent for $K$ if $\mathrm{SNR} \goto \infty$. Therefore, StGoF achieves the optimal phase transition. Similar stepwise methods (e.g., \cite{wang2017likelihood, ma2018determining}) are known to face analytical challenges. We overcome the challenges by using a different stepwise scheme in StGoF and by deriving sharp results that are not available before. The key to our analysis is to show that SCORE has the {\it Non-Splitting Property (NSP)}. Primarily due to a non-tractable rotation of eigenvectors dictated by the Davis-Kahan $\sin(\theta)$ theorem, the NSP is non-trivial to prove and requires new techniques we develop.
We develop a novel procedure for estimating the optimizer of general convex stochastic optimization problems of the form $\min_{x\in\mathcal{X}} \mathbb{E}[F(x,\xi)]$, when the given data is a finite independent sample selected according to $\xi$. The procedure is based on a median-of-means tournament, and is the first procedure that exhibits the optimal statistical performance in heavy tailed situations: we recover the asymptotic rates dictated by the central limit theorem in a non-asymptotic manner once the sample size exceeds some explicitly computable threshold. Additionally, our results apply in the high-dimensional setup, as the threshold sample size exhibits the optimal dependence on the dimension (up to a logarithmic factor). The general setting allows us to recover recent results on multivariate mean estimation and linear regression in heavy-tailed situations and to prove the first sharp, non-asymptotic results for the portfolio optimization problem.
In this paper we propose a flexible nested error regression small area model with high dimensional parameter that incorporates heterogeneity in regression coefficients and variance components. We develop a new robust small area specific estimating equations method that allows appropriate pooling of a large number of areas in estimating small area specific model parameters. We propose a parametric bootstrap and jackknife method to estimate not only the mean squared errors but also other commonly used uncertainty measures such as standard errors and coefficients of variation. We conduct both modelbased and design-based simulation experiments and real-life data analysis to evaluate the proposed methodology
The decision time of an infinite time algorithm is the supremum of its halting times over all real inputs. The decision time of a set of reals is the least decision time of an algorithm that decides the set; semidecision times of semidecidable sets are defined similary. It is not hard to see that $\omega_1$ is the maximal decision time of sets of reals. Our main results determine the supremum of countable decision times as $\sigma$ and that of countable semidecision times as $\tau$, where $\sigma$ and $\tau$ denote the suprema of $\Sigma_1$- and $\Sigma_2$-definable ordinals, respectively, over $L_{\omega_1}$. We further compute analogous suprema for singletons.
We consider point sources in hyperbolic equations discretized by finite differences. If the source is stationary, appropriate source discretization has been shown to preserve the accuracy of the finite difference method. Moving point sources, however, pose two challenges that do not appear in the stationary case. First, the discrete source must not excite modes that propagate with the source velocity. Second, the discrete source spectrum amplitude must be independent of the source position. We derive a source discretization that meets these requirements and prove design-order convergence of the numerical solution for the one-dimensional advection equation. Numerical experiments indicate design-order convergence also for the acoustic wave equation in two dimensions. The source discretization covers on the order of $\sqrt{N}$ grid points on an $N$-point grid and is applicable for source trajectories that do not touch domain boundaries.
The Extended Randomized Kaczmarz method is a well known iterative scheme which can find the Moore-Penrose inverse solution of a possibly inconsistent linear system and requires only one additional column of the system matrix in each iteration in comparison with the standard randomized Kaczmarz method. Also, the Sparse Randomized Kaczmarz method has been shown to converge linearly to a sparse solution of a consistent linear system. Here, we combine both ideas and propose an Extended Sparse Randomized Kaczmarz method. We show linear expected convergence to a sparse least squares solution in the sense that an extended variant of the regularized basis pursuit problem is solved. Moreover, we generalize the additional step in the method and prove convergence to a more abstract optimization problem. We demonstrate numerically that our method can find sparse least squares solutions of real and complex systems if the noise is concentrated in the complement of the range of the system matrix and that our generalization can handle impulsive noise.
We exhibit the analog of the entropy map for multivariate Gaussian distributions on local fields. As in the real case, the image of this map lies in the supermodular cone and it determines the distribution of the valuation vector. In general, this map can be defined for non-archimedian valued fields whose valuation group is an additive subgroup of the real line, and it remains supermodular. We also explicitly compute the image of this map in dimension 3.
Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or $\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.