The Sinc approximation applied to double-exponentially decaying functions is referred to as the DE-Sinc approximation. Because of its high efficiency, this method has been used in various applications. In the Sinc approximation, its mesh size and truncation numbers should be optimally selected to achieve its best performance. However, the standard selection formula has only been ``near-optimally'' selected because the optimal formula of the mesh size cannot be expressed in terms of elementary functions of truncation numbers. In this study, we propose two improved selection formulas. The first one is based on the concept by an earlier research that resulted in a better selection formula for the double-exponential formula. The formula performs slightly better than the standard one, but is still not optimal. As a second selection formula, we introduce a new parameter to propose truly optimal selection formula. We provide explicit error bounds for both selection formulas. Numerical comparisons show that the first formula gives a better error bound than the standard formula, and the second formula gives a much better error bound than the standard and first formulas.
Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
This paper characterizes the proximal operator of the piece-wise exponential function $1\!-\!e^{-|x|/\sigma}$ with a given shape parameter $\sigma\!>\!0$, which is a popular nonconvex surrogate of $\ell_0$-norm in support vector machines, zero-one programming problems, and compressed sensing, etc. Although Malek-Mohammadi et al. [IEEE Transactions on Signal Processing, 64(21):5657--5671, 2016] once worked on this problem, the expressions they derived were regrettably inaccurate. In a sense, it was lacking a case. Using the Lambert W function and an extensive study of the piece-wise exponential function, we have rectified the formulation of the proximal operator of the piece-wise exponential function in light of their work. We have also undertaken a thorough analysis of this operator. Finally, as an application in compressed sensing, an iterative shrinkage and thresholding algorithm (ISTA) for the piece-wise exponential function regularization problem is developed and fully investigated. A comparative study of ISTA with nine popular non-convex penalties in compressed sensing demonstrates the advantage of the piece-wise exponential penalty.
We consider nonlinear delay differential and renewal equations with infinite delay. We extend the work of Gyllenberg et al, Appl. Math. Comput. (2018) by introducing a unifying abstract framework and derive a finite-dimensional approximating system via pseudospectral discretization. For renewal equations, via integration we consider a reformulation in a space of absolutely continuous functions that ensures that point evaluation is well defined. We prove the one-to-one correspondence of equilibria between the original equation and its approximation, and that linearization and discretization commute. Our most important result is the proof of convergence of the characteristic roots of the pseudospectral approximation of the linear(ized) equations, which ensures that the finite-dimensional system correctly reproduces the stability properties of the original linear equation if the dimension of the approximation is large enough. This result is illustrated with several numerical tests, which also demonstrate the effectiveness of the approach for the bifurcation analysis of equilibria of nonlinear equations.
Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
In this work, we present a generic step-size choice for the ADMM type proximal algorithms. It admits a closed-form expression and is theoretically optimal with respect to a worst-case convergence rate bound. It is simply given by the ratio of Euclidean norms of the dual and primal solutions, i.e., $ ||{\lambda}^\star|| / ||{x}^\star||$. Numerical tests show that its practical performance is near-optimal in general. The only challenge is that such a ratio is not known a priori and we provide two strategies to address it. The derivation of our step-size choice is based on studying the fixed-point structure of ADMM using the proximal operator. However, we demonstrate that the classical proximal operator definition contains an input scaling issue. This leads to a scaled step-size optimization problem which would yield a false solution. Such an issue is naturally avoided by our proposed new definition of the proximal operator. A series of its properties is established.
In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
The approximate stabilizer rank of a quantum state is the minimum number of terms in any approximate decomposition of that state into stabilizer states. Bravyi and Gosset showed that the approximate stabilizer rank of a so-called "magic" state like $|T\rangle^{\otimes n}$, up to polynomial factors, is an upper bound on the number of classical operations required to simulate an arbitrary quantum circuit with Clifford gates and $n$ number of $T$ gates. As a result, an exponential lower bound on this quantity seems inevitable. Despite this intuition, several attempts using various techniques could not lead to a better than a linear lower bound on the "exact" rank of ${|T\rangle}^{\otimes n}$, meaning the minimal size of a decomposition that exactly produces the state. For the "approximate" rank, which is more realistically related to the cost of simulating quantum circuits, no lower bound better than $\tilde \Omega(\sqrt n)$ has been known. In this paper, we improve the lower bound on the approximate rank to $\tilde \Omega (n^2)$ for a wide range of the approximation parameters. An immediate corollary of our result is the existence of polynomial time computable functions which require a super-linear number of terms in any decomposition into exponentials of quadratic forms over $\mathbb{F}_2$, resolving a question in [Wil18]. Our approach is based on a strong lower bound on the approximate rank of a quantum state sampled from the Haar measure, a step-by-step analysis of the approximate rank of a magic-state teleportation protocol to sample from the Haar measure, and a result about trading Clifford operations with $T$ gates by [LKS18].
Quantiles and expectiles, which are two important concepts and tools in tail risk measurements, can be regarded as an extension of median and mean, respectively. Both of these tail risk measurers can actually be embedded in a common framework of $L_p$ optimization with the absolute loss function ($p=1$) and quadratic loss function ($p=2$), respectively. When 0-1 loss function is frequently used in statistics, machine learning and decision theory, this paper introduces an 0-1 loss function based $L_0$ optimisation problem for tail risk measure and names its solution as modile, which can be regarded as an extension of mode. Mode, as another measure of central tendency, is more robust than expectiles with outliers and easy to compute than quantiles. However, mode based extension for tail risk measure is new. This paper shows that the proposed modiles are not only more conservative than quantiles and expectiles for skewed and heavy-tailed distributions, but also providing or including the unique interpretation of these measures. Further, the modiles can be regarded as a type of generalized quantiles and doubly truncated tail measure whcih have recently attracted a lot of attention in the literature. The asymptotic properties of the corresponding sample-based estimators of modiles are provided, which, together with numerical analysis results, show that the proposed modiles are promising for tail measurement.
We study extensions of Fr\'{e}chet means for random objects in the space ${\rm Sym}^+(p)$ of $p \times p$ symmetric positive-definite matrices using the scaling-rotation geometric framework introduced by Jung et al. [\textit{SIAM J. Matrix. Anal. Appl.} \textbf{36} (2015) 1180-1201]. The scaling-rotation framework is designed to enjoy a clearer interpretation of the changes in random ellipsoids in terms of scaling and rotation. In this work, we formally define the \emph{scaling-rotation (SR) mean set} to be the set of Fr\'{e}chet means in ${\rm Sym}^+(p)$ with respect to the scaling-rotation distance. Since computing such means requires a difficult optimization, we also define the \emph{partial scaling-rotation (PSR) mean set} lying on the space of eigen-decompositions as a proxy for the SR mean set. The PSR mean set is easier to compute and its projection to ${\rm Sym}^+(p)$ often coincides with SR mean set. Minimal conditions are required to ensure that the mean sets are non-empty. Because eigen-decompositions are never unique, neither are PSR means, but we give sufficient conditions for the sample PSR mean to be unique up to the action of a certain finite group. We also establish strong consistency of the sample PSR means as estimators of the population PSR mean set, and a central limit theorem. In an application to multivariate tensor-based morphometry, we demonstrate that a two-group test using the proposed PSR means can have greater power than the two-group test using the usual affine-invariant geometric framework for symmetric positive-definite matrices.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.