The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.
Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors. The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement. In addition, it lends itself naturally to conduct model selection for the choice of hyperparameters. We illustrate the performance of our methodology in several computed examples, including a deconvolution problem and sparse identification of dynamical systems from time series data.
We study a new class of NP search problems, those which can be proved total using standard combinatorial reasoning based on approximate counting. Our model for this kind of reasoning is the bounded arithmetic theory $\mathrm{APC}_2$ of [Je\v{r}\'abek 2009]. In particular, the Ramsey and weak pigeonhole search problems lie in the new class. We give a purely computational characterization of this class and show that, relative to an oracle, it does not contain the problem CPLS, a strengthening of PLS. As CPLS is provably total in the theory $T^2_2$, this shows that $\mathrm{APC}_2$ does not prove every $\forall \Sigma^b_1$ sentence which is provable in bounded arithmetic. This answers the question posed in [Buss, Ko{\l}odziejczyk, Thapen 2014] and represents some progress in the programme of separating the levels of the bounded arithmetic hierarchy by low-complexity sentences. Our main technical tool is an extension of the "fixing lemma" from [Pudl\'ak, Thapen 2017], a form of switching lemma, which we use to show that a random partial oracle from a certain distribution will, with high probability, determine an entire computation of a $\textrm{P}^{\textrm{NP}}$ oracle machine. The introduction to the paper is intended to make the statements and context of the results accessible to someone unfamiliar with NP search problems or with bounded arithmetic.
We are interested in privatizing an approximate posterior inference algorithm called Expectation Propagation (EP). EP approximates the posterior by iteratively refining approximations to the local likelihoods, and is known to provide better posterior uncertainties than those by variational inference (VI). However, using EP for large-scale datasets imposes a challenge in terms of memory requirements as it needs to maintain each of the local approximates in memory. To overcome this problem, stochastic expectation propagation (SEP) was proposed, which only considers a unique local factor that captures the average effect of each likelihood term to the posterior and refines it in a way analogous to EP. In terms of privacy, SEP is more tractable than EP because at each refining step of a factor, the remaining factors are fixed to the same value and do not depend on other datapoints as in EP, which makes the sensitivity analysis tractable. We provide a theoretical analysis of the privacy-accuracy trade-off in the posterior estimates under differentially private stochastic expectation propagation (DP-SEP). Furthermore, we demonstrate the performance of our DP-SEP algorithm evaluated on both synthetic and real-world datasets in terms of the quality of posterior estimates at different levels of guaranteed privacy.
Large-scale optimization problems that seek sparse solutions have become ubiquitous. They are routinely solved with various specialized first-order methods. Although such methods are often fast, they usually struggle with not-so-well conditioned problems. In this paper, specialized variants of an interior point-proximal method of multipliers are proposed and analyzed for problems of this class. Computational experience on a variety of problems, namely, multi-period portfolio optimization, classification of data coming from functional Magnetic Resonance Imaging, restoration of images corrupted by Poisson noise, and classification via regularized logistic regression, provides substantial evidence that interior point methods, equipped with suitable linear algebra, can offer a noticeable advantage over first-order approaches.
This study debuts a new spline dimensional decomposition (SDD) for uncertainty quantification analysis of high-dimensional functions, including those endowed with high nonlinearity and nonsmoothness, if they exist, in a proficient manner. The decomposition creates an hierarchical expansion for an output random variable of interest with respect to measure-consistent orthonormalized basis splines (B-splines) in independent input random variables. A dimensionwise decomposition of a spline space into orthogonal subspaces, each spanned by a reduced set of such orthonormal splines, results in SDD. Exploiting the modulus of smoothness, the SDD approximation is shown to converge in mean-square to the correct limit. The computational complexity of the SDD method is polynomial, as opposed to exponential, thus alleviating the curse of dimensionality to the extent possible. Analytical formulae are proposed to calculate the second-moment properties of a truncated SDD approximation for a general output random variable in terms of the expansion coefficients involved. Numerical results indicate that a low-order SDD approximation of nonsmooth functions calculates the probabilistic characteristics of an output variable with an accuracy matching or surpassing those obtained by high-order approximations from several existing methods. Finally, a 34-dimensional random eigenvalue analysis demonstrates the utility of SDD in solving practical problems.
We propose a topology optimisation of acoustic devices that work in a certain bandwidth. To achieve this, we define the objective function as the frequency-averaged sound intensity at given observation points, which is represented by a frequency integral over a given frequency band. It is, however, prohibitively expensive to evaluate such an integral naively by a quadrature. We thus estimate the frequency response by the Pad\'{e} approximation and integrate the approximated function to obtain the objective function. The corresponding topological derivative is derived with the help of the adjoint variable method and chain rule. It is shown that the objective and its sensitivity can be evaluated semi-analytically. We present efficient numerical procedures to compute them and incorporate them into a topology optimisation based on the level-set method. We confirm the validity and effectiveness of the present method through some numerical examples.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.