Let $P$ be a bounded polyhedron defined as the intersection of the non-negative orthant ${\Bbb R}^n_+$ and an affine subspace of codimension $m$ in ${\Bbb R}^n$. We show that a simple and computationally efficient formula approximates the volume of $P$ within a factor of $\gamma^m$, where $\gamma >0$ is an absolute constant. The formula provides the best known estimate for the volume of transportation polytopes from a wide family.
Background and Objective: The contrast of cryo-EM images vary from one to another, primarily due to the uneven thickness of ice layers. The variation of contrast can affect the quality of 2-D class averaging, 3-D ab-initio modeling, and 3-D heterogeneity analysis. Contrast estimation is currently performed during 3-D iterative refinement. As a result, the estimates are not available for class averaging and ab-initio modeling. However, these methods require good initial estimates of 3-D volumes and 3-D rotations of molecules. This paper aims to solve the contrast estimation problem in the ab-initio stage, without estimating the 3-D volume. Methods: The key observation underlying our analysis is that the 2-D covariance matrix of the raw images is related to the covariance of the underlying clean images, the noise variance, and the contrast variability between images. We show that the contrast variability can be derived from the 2-D covariance matrix and use the existing Covariance Wiener Filtering (CWF) framework to estimate it. We also demonstrate a modification of CWF to estimate the contrast of individual images. Results: Our method improves the contrast estimation by a large margin, compared to the previous CWF method. Its estimation accuracy is often comparable to that of an oracle that knows the ground truth covariance of the clean images. The more accurate contrast estimation also improves the quality of image denoising as demonstrated in both synthetic and experimental datasets. Conclusions: This paper proposes an effective method for contrast estimation directly from noisy images without using any 3-D volume information. It enables contrast correction in the earlier stage of single particle analysis, and may improve the accuracy of downstream processing.
$\newcommand{\NP}{\mathsf{NP}}\newcommand{\GapSVP}{\textrm{GapSVP}}$We give a simple proof that the (approximate, decisional) Shortest Vector Problem is $\NP$-hard under a randomized reduction. Specifically, we show that for any $p \geq 1$ and any constant $\gamma < 2^{1/p}$, the $\gamma$-approximate problem in the $\ell_p$ norm ($\gamma$-$\GapSVP_p$) is not in $\mathsf{RP}$ unless $\NP \subseteq \mathsf{RP}$. Our proof follows an approach pioneered by Ajtai (STOC 1998), and strengthened by Micciancio (FOCS 1998 and SICOMP 2000), for showing hardness of $\gamma$-$\GapSVP_p$ using locally dense lattices. We construct such lattices simply by applying "Construction A" to Reed-Solomon codes with suitable parameters, and prove their local density via an elementary argument originally used in the context of Craig lattices. As in all known $\NP$-hardness results for $\GapSVP_p$ with $p < \infty$, our reduction uses randomness. Indeed, it is a notorious open problem to prove $\NP$-hardness via a deterministic reduction. To this end, we additionally discuss potential directions and associated challenges for derandomizing our reduction. In particular, we show that a close deterministic analogue of our local density construction would improve on the state-of-the-art explicit Reed-Solomon list-decoding lower bounds of Guruswami and Rudra (STOC 2005 and IEEE Trans. Inf. Theory 2006). As a related contribution of independent interest, we also give a polynomial-time algorithm for decoding $n$-dimensional "Construction A Reed-Solomon lattices" (with different parameters than those used in our hardness proof) to a distance within an $O(\sqrt{\log n})$ factor of Minkowski's bound. This asymptotically matches the best known distance for decoding near Minkowski's bound, due to Mook and Peikert (IEEE Trans. Inf. Theory 2022), whose work we build on with a somewhat simpler construction and analysis.
We study the problem of robustly estimating the parameter $p$ of an Erd\H{o}s-R\'enyi random graph on $n$ nodes, where a $\gamma$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates $p$ up to accuracy $\tilde O(\sqrt{p(1-p)}/n + \gamma\sqrt{p(1-p)} /\sqrt{n}+ \gamma/n)$ for $\gamma < 1/60$. Furthermore, we give an inefficient algorithm with similar accuracy for all $\gamma <1/2$, the information-theoretic limit. Finally, we prove a nearly-matching statistical lower bound, showing that the error of our algorithms is optimal up to logarithmic factors.
Consider a set $P$ of $n$ points in $\mathbb{R}^d$. In the discrete median line segment problem, the objective is to find a line segment bounded by a pair of points in $P$ such that the sum of the Euclidean distances from $P$ to the line segment is minimized. In the continuous median line segment problem, a real number $\ell>0$ is given, and the goal is to locate a line segment of length $\ell$ in $\mathbb{R}^d$ such that the sum of the Euclidean distances between $P$ and the line segment is minimized. We show how to compute $(1+\epsilon\Delta)$- and $(1+\epsilon)$-approximations to a discrete median line segment in time $O(n\epsilon^{-2d}\log n)$ and $O(n^2\epsilon^{-d})$, respectively, where $\Delta$ is the spread of line segments spanned by pairs of points. While developing our algorithms, by using the principle of pair decomposition, we derive new data structures that allow us to quickly approximate the sum of the distances from a set of points to a given line segment or point. To our knowledge, our utilization of pair decompositions for solving minsum facility location problems is the first of its kind; it is versatile and easily implementable. We prove that it is impossible to construct a continuous median line segment for $n\geq3$ non-collinear points in the plane by using only ruler and compass. In view of this, we present an $O(n^d\epsilon^{-d})$-time algorithm for approximating a continuous median line segment in $\mathbb{R}^d$ within a factor of $1+\epsilon$. The algorithm is based upon generalizing the point-segment pair decomposition from the discrete to the continuous domain. Last but not least, we give an $(1+\epsilon)$-approximation algorithm, whose time complexity is sub-quadratic in $n$, for solving the constrained median line segment problem in $\mathbb{R}^2$ where an endpoint or the slope of the median line segment is given at input.
In $d$ dimensions, approximating an arbitrary function oscillating with frequency $\lesssim k$ requires $\sim k^d$ degrees of freedom. A numerical method for solving the Helmholtz equation (with wavenumber $k$) suffers from the pollution effect if, as $k\to \infty$, the total number of degrees of freedom needed to maintain accuracy grows faster than this natural threshold. While the $h$-version of the finite element method (FEM) (where accuracy is increased by decreasing the meshwidth $h$ and keeping the polynomial degree $p$ fixed) suffers from the pollution effect, the celebrated papers [Melenk, Sauter 2010], [Melenk, Sauter 2011], [Esterhazy, Melenk 2012], and [Melenk, Parsania, Sauter 2013] showed that the $hp$-FEM (where accuracy is increased by decreasing the meshwidth $h$ and increasing the polynomial degree $p$) applied to a variety of constant-coefficient Helmholtz problems does not suffer from the pollution effect. The heart of the proofs of these results is a PDE result splitting the solution of the Helmholtz equation into "high" and "low" frequency components. The main novelty of the present paper is that we prove this splitting for the constant-coefficient Helmholtz equation in full-space (i.e., in $\mathbb{R}^d$) using only integration by parts and elementary properties of the Fourier transform (this is contrast to the proof for this set-up in [Melenk, Sauter 2010] which uses somewhat-involved bounds on Bessel and Hankel functions). We combine this splitting with (i) standard arguments about convergence of the FEM applied to the Helmholtz equation (the so-called "Schatz argument", which we reproduce here) and (ii) polynomial-approximation results (which we quote from the literature without proof) to give a simple proof that the $hp$-FEM does not suffer from the pollution effect for the constant-coefficient full-space Helmholtz equation.
Many applications, such as system identification, classification of time series, direct and inverse problems in partial differential equations, and uncertainty quantification lead to the question of approximation of a non-linear operator between metric spaces $\mathfrak{X}$ and $\mathfrak{Y}$. We study the problem of determining the degree of approximation of a such operators on a compact subset $K_\mathfrak{X}\subset \mathfrak{X}$ using a finite amount of information. If $\mathcal{F}: K_\mathfrak{X}\to K_\mathfrak{Y}$, a well established strategy to approximate $\mathcal{F}(F)$ for some $F\in K_\mathfrak{X}$ is to encode $F$ (respectively, $\mathcal{F}(F)$) in terms of a finite number $d$ (repectively $m$) of real numbers. Together with appropriate reconstruction algorithms (decoders), the problem reduces to the approximation of $m$ functions on a compact subset of a high dimensional Euclidean space $\mathbb{R}^d$, equivalently, the unit sphere $\mathbb{S}^d$ embedded in $\mathbb{R}^{d+1}$. The problem is challenging because $d$, $m$, as well as the complexity of the approximation on $\mathbb{S}^d$ are all large, and it is necessary to estimate the accuracy keeping track of the inter-dependence of all the approximations involved. In this paper, we establish constructive methods to do this efficiently; i.e., with the constants involved in the estimates on the approximation on $\\mathbb{S}^d$ being $\mathcal{O}(d^{1/6})$. We study different smoothness classes for the operators, and also propose a method for approximation of $\mathcal{F}(F)$ using only information in a small neighborhood of $F$, resulting in an effective reduction in the number of parameters involved. To further mitigate the problem of large number of parameters, we propose prefabricated networks, resulting in a substantially smaller number of effective parameters.
Assume that we observe i.i.d.~points lying close to some unknown $d$-dimensional $\mathcal{C}^k$ submanifold $M$ in a possibly high-dimensional space. We study the problem of reconstructing the probability distribution generating the sample. After remarking that this problem is degenerate for a large class of standard losses ($L_p$, Hellinger, total variation, etc.), we focus on the Wasserstein loss, for which we build an estimator, based on kernel density estimation, whose rate of convergence depends on $d$ and the regularity $s\leq k-1$ of the underlying density, but not on the ambient dimension. In particular, we show that the estimator is minimax and matches previous rates in the literature in the case where the manifold $M$ is a $d$-dimensional cube. The related problem of the estimation of the volume measure of $M$ for the Wasserstein loss is also considered, for which a minimax estimator is exhibited.
We develop a novel a posteriori error estimator for the L2 error committed by the finite element discretization of the solution of the fractional Laplacian. Our a posteriori error estimator takes advantage of the semi-discretization scheme using a rational approximation which allows to reformulate the fractional problem into a family of non-fractional parametric problems. The estimator involves applying the implicit Bank-Weiser error estimation strategy to each parametric non-fractional problem and reconstructing the fractional error through the same rational approximation used to compute the solution to the original fractional problem. We provide several numerical examples in both two and three-dimensions demonstrating the effectivity of our estimator for varying fractional powers and its ability to drive an adaptive mesh refinement strategy.
Solutions of the Helmholtz equation are known to be well approximated by superpositions of propagative plane waves. This observation is the foundation of successful Trefftz methods. However, when too many plane waves are used, the computation of the expansion is known to be numerically unstable. We explain how this effect is due to the presence of exponentially large coefficients in the expansion and can drastically limit the efficiency of the approach. In this work, we show that the Helmholtz solutions on a disk can be exactly represented by a continuous superposition of evanescent plane waves, generalizing the standard Herglotz representation. Here, by evanescent plane waves, we mean exponential plane waves with complex-valued propagation vector, whose absolute value decays exponentially in one direction. In addition, the density in this representation is proved to be uniformly bounded in a suitable weighted Lebesgue norm, hence overcoming the instability observed with propagative plane waves and paving the way for stable discrete expansions. In view of practical implementations, discretization strategies are investigated. We construct suitable finite-dimensional sets of evanescent plane waves using sampling strategies in a parametric domain. Provided one uses sufficient oversampling and regularization, the resulting approximations are shown to be both controllably accurate and numerically stable, as supported by numerical evidence.
As a traditional and widely-adopted mortality rate projection technique, by representing the log mortality rate as a simple bilinear form $\log(m_{x,t})=a_x+b_xk_t$. The Lee-Carter model has been extensively studied throughout the past 30 years, however, the performance of the model in the presence of outliers has been paid little attention, particularly for the parameter estimation of $b_x$. In this paper, we propose a robust estimation method for Lee-Carter model by formulating it as a probabilistic principal component analysis (PPCA) with multivariate $t$-distributions, and an efficient expectation-maximization (EM) algorithm for implementation. The advantages of the method are threefold. It yields significantly more robust estimates of both $b_x$ and $k_t$, preserves the fundamental interpretation for $b_x$ as the first principal component as in the traditional approach and is flexible to be integrated into other existing time series models for $k_t$. The parameter uncertainties are examined by adopting a standard residual bootstrap. A simulation study based on Human Mortality Database shows superior performance of the proposed model compared to other conventional approaches.