We consider inverse problems where the conditional distribution of the observation ${\bf y}$ given the latent variable of interest ${\bf x}$ (also known as the forward model) is known, and we have access to a data set in which multiple instances of ${\bf x}$ and ${\bf y}$ are both observed. In this context, algorithm unrolling has become a very popular approach for designing state-of-the-art deep neural network architectures that effectively exploit the forward model. We analyze the statistical complexity of the gradient descent network (GDN), an algorithm unrolling architecture driven by proximal gradient descent. We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $\log(n)/\log(\varrho_n^{-1})$, where $n$ is the sample size, and $\varrho_n$ is the convergence rate of the corresponding gradient descent algorithm. We also show that when the negative log-density of the latent variable ${\bf x}$ has a simple proximal operator, then a GDN unrolled at depth $D'$ can solve the inverse problem at the parametric rate $O(D'/\sqrt{n})$. Our results thus also suggest that algorithm unrolling models are prone to overfitting as the unrolling depth $D'$ increases. We provide several examples to illustrate these results.
The causal inference literature frequently focuses on estimating the mean of the potential outcome, whereas the quantiles of the potential outcome may carry important additional information. We propose a universal approach, based on the inverse estimating equations, to generalize a wide class of causal inference solutions from estimating the mean of the potential outcome to its quantiles. We assume that an identifying moment function is available to identify the mean of the threshold-transformed potential outcome, based on which a convenient construction of the estimating equation of quantiles of potential outcome is proposed. In addition, we also give a general construction of the efficient influence functions of the mean and quantiles of potential outcomes, and identify their connection. We motivate estimators for the quantile estimands with the efficient influence function, and develop their asymptotic properties when either parametric models or data-adaptive machine learners are used to estimate the nuisance functions. A broad implication of our results is that one can rework the existing result for mean causal estimands to facilitate causal inference on quantiles, rather than starting from scratch. Our results are illustrated by several examples.
We compute explicitly the MTW tensor (or cross curvature) for the optimal transport problem on $\mathbb{R}^n$ with a cost function of form $\mathsf{c}(x, y) = \mathsf{u}(x^{\mathfrak{t}}y)$, where $\mathsf{u}$ is a scalar function with inverse $\mathsf{s}$, $x^{\ft}y$ is a nondegenerate bilinear pairing of vectors $x, y$ belonging to an open subset of $\mathbb{R}^n$. The condition that the MTW-tensor vanishes on null vectors under the Kim-McCann metric is a fourth-order nonlinear ODE, which could be reduced to a linear ODE of the form $\mathsf{s}^{(2)} - S\mathsf{s}^{(1)} + P\mathsf{s} = 0$ with constant coefficients $P$ and $S$. The resulting inverse functions include {\it Lambert} and {\it generalized inverse hyperbolic\slash trigonometric} functions. The square Euclidean metric and $\log$-type costs are equivalent to instances of these solutions. The optimal map for the family is also explicit. For cost functions of a similar form on a hyperboloid model of the hyperbolic space and unit sphere, we also express this tensor in terms of algebraic expressions in derivatives of $\mathsf{s}$ using the Gauss-Codazzi equation, obtaining new families of strictly regular costs for these manifolds, including new families of {\it power function costs}. We analyze the $\sinh$-type hyperbolic cost, providing examples of $\mathsf{c}$-convex functions and divergence.
The generalized inverse Gaussian, denoted $\mathrm{GIG}(p, a, b)$, is a flexible family of distributions that includes the gamma, inverse gamma, and inverse Gaussian distributions as special cases. In this article, we derive two novel mixture representations for the $\mathrm{GIG}(p, a, b)$: one that expresses the distribution as a continuous mixture of inverse Gaussians and another one that expresses it as a continuous mixture of truncated exponentials. Beyond their conceptual interest, these representations are useful for random number generation. We use the first representation to derive a geometrically ergodic Gibbs sampler whose stationary distribution is $\mathrm{GIG}(p, a, b)$, and the second one to define a recursive algorithm to generate exact independent draws from the distribution for half-integer $p$. Additionally, the second representation gives rise to a recursive algorithm for evaluating the cumulative distribution function of the $\mathrm{GIG}(p, a, b)$ for half-integer $p$. The algorithms are simple and can be easily implemented in standard programming languages.
We explore new interactions between finite model theory and a number of classical streams of universal algebra and semigroup theory. A key result is an example of a finite algebra whose variety is not finitely axiomatisable in first order logic, but which has first order definable finite membership problem. This algebra witnesses the simultaneous failure of the {\L}os-Tarski Theorem, the SP-preservation theorem and Birkhoff's HSP-preservation theorem at the finite level as well as providing a negative solution to a first order formulation of the long-standing Eilenberg Sch\"utzenberger problem. The example also shows that a pseudovariety without any finite pseudo-identity basis may be finitely axiomatisable in first order logic. Other results include the undecidability of deciding first order definability of the pseudovariety of a finite algebra and a mapping from any fixed template constraint satisfaction problem to a first order equivalent variety membership problem, thereby providing examples of variety membership problems complete in each of the classes $\texttt{L}$, $\texttt{NL}$, $\texttt{Mod}_p(\texttt{L})$, $\texttt{P}$, and infinitely many others (depending on complexity-theoretic assumptions).
Fix a positive integer $n$, a real number $p\in (0,1]$, and a (perhaps random) hypergraph $\mathcal{H}$ on $[n]$. We introduce and investigate the following random multigraph model, which we denote $\mathbb{G}(n,p\, ; \,\mathcal{H})$: begin with an empty graph on $n$ vertices, which are labelled by the set $[n]$. For every $H\in \mathcal{H}$ choose, independently from previous choices, a doubleton from $H$, say $D = \{i,j\} \subset H$, uniformly at random and then introduce an edge between the vertices $i$ and $j$ in the graph with probability $p$, where each edge is introduced independently of all other edges.
Group equivariant non-expansive operators have been recently proposed as basic components in topological data analysis and deep learning. In this paper we study some geometric properties of the spaces of group equivariant operators and show how a space $\mathcal{F}$ of group equivariant non-expansive operators can be endowed with the structure of a Riemannian manifold, so making available the use of gradient descent methods for the minimization of cost functions on $\mathcal{F}$. As an application of this approach, we also describe a procedure to select a finite set of representative group equivariant non-expansive operators in the considered manifold.
In fitting a continuous bounded data, the generalized beta (and several variants of this distribution) and the two-parameter Kumaraswamy (KW) distributions are the two most prominent univariate continuous distributions that come to our mind. There are some common features between these two rival probability models and to select one of them in a practical situation can be of great interest. Consequently, in this paper, we discuss various methods of selection between the generalized beta proposed by Libby and Novick (1982) (LNGB) and the KW distributions, such as the criteria based on probability of correct selection which is an improvement over the likelihood ratio statistic approach, and also based on pseudo-distance measures. We obtain an approximation for the probability of correct selection under the hypotheses HLNGB and HKW , and select the model that maximizes it. However, our proposal is more appealing in the sense that we provide the comparison study for the LNGB distribution that subsumes both types of classical beta and exponentiated generators (see, for details, Cordeiro et al. 2014; Libby and Novick 1982) which can be a natural competitor of a two-parameter KW distribution in an appropriate scenario.
For a matrix $A$ which satisfies Crouzeix's conjecture, we construct several classes of matrices from $A$ for which the conjecture will also hold. We discover a new link between cyclicity and Crouzeix's conjecture, which shows that Crouzeix's Conjecture holds in full generality if and only if it holds for the differentiation operator on a class of analytic functions. We pose several open questions, which if proved, will prove Crouzeix's conjecture. We also begin an investigation into Crouzeix's conjecture for symmetric matrices and in the case of $3 \times 3$ matrices, we show Crouzeix's conjecture holds for symmetric matrices if and only if it holds for analytic truncated Toeplitz operators.
We propose a method for computing the Lyapunov exponents of renewal equations (delay equations of Volterra type) and of coupled systems of renewal and delay differential equations. The method consists in the reformulation of the delay equation as an abstract differential equation, the reduction of the latter to a system of ordinary differential equations via pseudospectral collocation, and the application of the standard discrete QR method. The effectiveness of the method is shown experimentally and a MATLAB implementation is provided.
We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$ input-output pairs with relative error $O(\Xi_\epsilon^{-1}\epsilon)$ in the operator norm as $\epsilon\to0$, with high probability. Here, $\Psi_\epsilon$ represents the existence of degenerate singular values of the solution operator, and $\Xi_\epsilon$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.