This investigation is motivated by PDE-constrained optimization problems arising in connection with electrocardiograms (ECGs) and electroencephalography (EEG). Standard sparsity regularization does not necessarily produce adequate results for these applications because only boundary data/observations are available for the identification of the unknown source, which may be interior. We therefore study a weighted $\ell^1$-regularization technique for solving inverse problems when the forward operator has a significant null space. In particular, we prove that a sparse source, regardless of whether it is interior or located at the boundary, can be exactly recovered with this weighting procedure as the regularization parameter $\alpha$ tends to zero. Our analysis is supported by numerical experiments for cases with one and several local sources. The theory is developed in terms of Euclidean spaces, and our results can therefore be applied to many problems.
We propose a general framework for obtaining probabilistic solutions to PDE-based inverse problems. Bayesian methods are attractive for uncertainty quantification but assume knowledge of the likelihood model or data generation process. This assumption is difficult to justify in many inverse problems, where the specification of the data generation process is not obvious. We adopt a Gibbs posterior framework that directly posits a regularized variational problem on the space of probability distributions of the parameter. We propose a novel model comparison framework that evaluates the optimality of a given loss based on its ''predictive performance''. We provide cross-validation procedures to calibrate the regularization parameter of the variational objective and compare multiple loss functions. Some novel theoretical properties of Gibbs posteriors are also presented. We illustrate the utility of our framework via a simulated example, motivated by dispersion-based wave models used to characterize arterial vessels in ultrasound vibrometry.
In this work, we propose a simple yet generic preconditioned Krylov subspace method for a large class of nonsymmetric block Toeplitz all-at-once systems arising from discretizing evolutionary partial differential equations. Namely, our main result is to propose two novel symmetric positive definite preconditioners, which can be efficiently diagonalized by the discrete sine transform matrix. More specifically, our approach is to first permute the original linear system to obtain a symmetric one, and subsequently develop desired preconditioners based on the spectral symbol of the modified matrix. Then, we show that the eigenvalues of the preconditioned matrix sequences are clustered around $\pm 1$, which entails rapid convergence when the minimal residual method is devised. Alternatively, when the conjugate gradient method on the normal equations is used, we show that our preconditioner is effective in the sense that the eigenvalues of the preconditioned matrix sequence are clustered around unity. An extension of our proposed preconditioned method is given for high-order backward difference time discretization schemes, which can be applied on a wide range of time-dependent equations. Numerical examples are given, also in the variable-coefficient setting, to demonstrate the effectiveness of our proposed preconditioners, which consistently outperforms an existing block circulant preconditioner discussed in the relevant literature.
This paper introduces a novel framework for assessing risk and decision-making in the presence of uncertainty, the \emph{$\varphi$-Divergence Quadrangle}. This approach expands upon the traditional Risk Quadrangle, a model that quantifies uncertainty through four key components: \emph{risk, deviation, regret}, and \emph{error}. The $\varphi$-Divergence Quadrangle incorporates the $\varphi$-divergence as a measure of the difference between probability distributions, thereby providing a more nuanced understanding of risk. Importantly, the $\varphi$-Divergence Quadrangle is closely connected with the distributionally robust optimization based on the $\varphi$-divergence approach through the duality theory of convex functionals. To illustrate its practicality and versatility, several examples of the $\varphi$-Divergence Quadrangle are provided, including the Quantile Quadrangle. The final portion of the paper outlines a case study implementing regression with the Entropic Value-at-Risk Quadrangle. The proposed $\varphi$-Divergence Quadrangle presents a refined methodology for understanding and managing risk, contributing to the ongoing development of risk assessment and management strategies.
We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
Nishikawa (2007) proposed to reformulate the classical Poisson equation as a steady state problem for a linear hyperbolic system. This results in optimal error estimates for both the solution of the elliptic equation and its gradient. However, it prevents the application of well-known solvers for elliptic problems. We show connections to a discontinuous Galerkin (DG) method analyzed by Cockburn, Guzm\'an, and Wang (2009) that is very difficult to implement in general. Next, we demonstrate how this method can be implemented efficiently using summation by parts (SBP) operators, in particular in the context of SBP DG methods such as the DG spectral element method (DGSEM). The resulting scheme combines nice properties of both the hyperbolic and the elliptic point of view, in particular a high order of convergence of the gradients, which is one order higher than what one would usually expect from DG methods for elliptic problems.
Directed acyclic graphs (DAGs) are directed graphs in which there is no path from a vertex to itself. DAGs are an omnipresent data structure in computer science and the problem of counting the DAGs of given number of vertices and to sample them uniformly at random has been solved respectively in the 70's and the 00's. In this paper, we propose to explore a new variation of this model where DAGs are endowed with an independent ordering of the out-edges of each vertex, thus allowing to model a wide range of existing data structures. We provide efficient algorithms for sampling objects of this new class, both with or without control on the number of edges, and obtain an asymptotic equivalent of their number. We also show the applicability of our method by providing an effective algorithm for the random generation of classical labelled DAGs with a prescribed number of vertices and edges, based on a similar approach. This is the first known algorithm for sampling labelled DAGs with full control on the number of edges, and it meets a need in terms of applications, that had already been acknowledged in the literature.
We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+\epsilon)$-approximation for positive LPs, for any selected $\epsilon$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understood. In this work, we implement and test an efficient parallel algorithm for solving positive LP relaxations, and apply it to graph problems such as densest subgraph, bipartite matching, vertex cover and dominating set. We accelerate the algorithm via a new step size search heuristic. Our implementation uses sparse linear algebra optimization techniques such as fusion of vector operations and use of sparse format. Furthermore, we devise an implicit representation for graph incidence constraints. We demonstrate the parallel scalability with the use of threading OpenMP and MPI on the Stampede2 supercomputer. We compare this implementation with exact libraries and specialized libraries for the above problems in order to evaluate MWU's practical standing for both accuracy and performance among other methods. Our results show this implementation is faster than general purpose LP solvers (IBM CPLEX, Gurobi) in all of our experiments, and in some instances, outperforms state-of-the-art specialized parallel graph algorithms.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.