Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
We propose new linear combinations of compositions of a basic second-order scheme with appropriately chosen coefficients to construct higher order numerical integrators for differential equations. They can be considered as a generalization of extrapolation methods and multi-product expansions. A general analysis is provided and new methods up to order 8 are built and tested. The new approach is shown to reduce the latency problem when implemented in a parallel environment and leads to schemes that are significantly more efficient than standard extrapolation when the linear combination is delayed by a number of steps.
Heterogeneous computing and exploiting integrated CPU-GPU architectures has become a clear current trend since the flattening of Moore's Law. In this work, we propose a numerical and algorithmic re-design of a p-adaptive quadrature-free discontinuous Galerkin method (DG) for the shallow water equations (SWE). Our new approach separates the computations of the non-adaptive (lower-order) and adaptive (higher-order) parts of the discretization form each other. Thereby, we can overlap computations of the lower-order and the higher-order DG solution components. Furthermore, we investigate execution times of main computational kernels and use automatic code generation to optimize their distribution between the CPU and GPU. Several setups, including a prototype of a tsunami simulation in a tide-driven flow scenario, are investigated, and the results show that significant performance improvements can be achieved in suitable setups.
A new exploratory technique called biarchetype analysis is defined. We extend archetype analysis to find the archetypes of both observations and features simultaneously. The idea of this new unsupervised machine learning tool is to represent observations and features by instances of pure types (biarchetypes) that can be easily interpreted as they are mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which also helps understand the structure of the data. We propose an algorithm to solve biarchetype analysis. We show that biarchetype analysis offers advantages over biclustering, especially in terms of interpretability. This is because byarchetypes are extreme instances as opposed to the centroids returned by biclustering, which favors human understanding. Biarchetype analysis is applied to several machine learning problems to illustrate its usefulness.
Linear logic has provided new perspectives on proof-theory, denotational semantics and the study of programming languages. One of its main successes are proof-nets, canonical representations of proofs that lie at the intersection between logic and graph theory. In the case of the minimalist proof-system of multiplicative linear logic without units (MLL), these two aspects are completely fused: proof-nets for this system are graphs satisfying a correctness criterion that can be fully expressed in the language of graphs. For more expressive logical systems (containing logical constants, quantifiers and exponential modalities), this is not completely the case. The purely graphical approach of proof-nets deprives them of any sequential structure that is crucial to represent the order in which arguments are presented, which is necessary for these extensions. Rebuilding this order of presentation - sequentializing the graph - is thus a requirement for a graph to be logical. Presentations and study of the artifacts ensuring that sequentialization can be done, such as boxes or jumps, are an integral part of researches on linear logic. Jumps, extensively studied by Faggian and di Giamberardino, can express intermediate degrees of sequentialization between a sequent calculus proof and a fully desequentialized proof-net. We propose to analyze the logical strength of jumps by internalizing them in an extention of MLL where axioms on a specific formula, the jumping formula, introduce constrains on the possible sequentializations. The jumping formula needs to be treated non-linearly, which we do either axiomatically, or by embedding it in a very controlled fragment of multiplicative-exponential linear logic, uncovering the exponential logic of sequentialization.
Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
Numerical simulations of kinetic problems can become prohibitively expensive due to their large memory footprint and computational costs. A method that has proven to successfully reduce these costs is the dynamical low-rank approximation (DLRA). One key question when using DLRA methods is the construction of robust time integrators that preserve the invariances and associated conservation laws of the original problem. In this work, we demonstrate that the augmented basis update & Galerkin integrator (BUG) preserves solution invariances and the associated conservation laws when using a conservative truncation step and an appropriate time and space discretization. We present numerical comparisons to existing conservative integrators and discuss advantages and disadvantages
There are many numerical methods for solving partial different equations (PDEs) on manifolds such as classical implicit, finite difference, finite element, and isogeometric analysis methods which aim at improving the interoperability between finite element method and computer aided design (CAD) software. However, these approaches have difficulty when the domain has singularities since the solution at the singularity may be multivalued. This paper develops a novel numerical approach to solve elliptic PDEs on real, closed, connected, orientable, and almost smooth algebraic curves and surfaces. Our method integrates numerical algebraic geometry, differential geometry, and a finite difference scheme which is demonstrated on several examples.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.