Transportation of probability measures underlies many core tasks in statistics and machine learning, from variational inference to generative modeling. A typical goal is to represent a target probability measure of interest as the push-forward of a tractable source measure through a learned map. We present a new construction of such a transport map, given the ability to evaluate the score of the target distribution. Specifically, we characterize the map as a zero of an infinite-dimensional score-residual operator and derive a Newton-type method for iteratively constructing such a zero. We prove convergence of these iterations by invoking classical elliptic regularity theory for partial differential equations (PDE) and show that this construction enjoys rapid convergence, under smoothness assumptions on the target score. A key element of our approach is a generalization of the elementary Newton method to infinite-dimensional operators, other forms of which have appeared in nonlinear PDE and in dynamical systems. Our Newton construction, while developed in a functional setting, also suggests new iterative algorithms for approximating transport maps.
Regression analysis based on many covariates is becoming increasingly common. However, when the number of covariates $p$ is of the same order as the number of observations $n$, statistical protocols like maximum likelihood estimation of regression and nuisance parameters become unreliable due to overfitting. Overfitting typically leads to systematic estimation biases, and to increased estimator variances. It is crucial to be able to correctly quantify these effects, for inference and prediction purposes. In literature, several methods have been proposed to overcome overfitting bias or adjust estimates. The vast majority of these focus on the regression parameters only, either via empirical regularization methods or by expansion for small ratios $p/n$. This failure to correctly estimate also the nuisance parameters may lead to significant errors in outcome predictions. In this paper we use the leave one out method to derive the compact set of non-linear equations for the overfitting biases of maximum likelihood (ML) estimators in parametric regression models, as obtained previously using the replica method. We show that these equations enable one to correct regression and nuisance parameter estimators, and make them asymptotically unbiased. To illustrate the theory we performed simulation studies for multiple regression models. In all cases we find excellent agreement between theory and simulations.
Conductivity reconstruction in an inverse eddy current problem is considered in the present paper. With the electric field measurement on part of domain boundary, we formulate the reconstruction problem to a constrained optimization problem with total variation regularization. Existence and stability are proved for the solution to the optimization problem. The finite element method is employed to discretize the optimization problem. The gradient Lipschitz properties of the objective functional are established for the the discrete optimization problems. We propose the alternating direction method of multipliers to solve the discrete problem. Based on the the gradient Lipschitz property, we prove the convergence by extending the admissible set to the whole finite element space. Finally, we show some numerical experiments to illustrate the efficiency of the proposed methods.
We propose an accurate and energy-stable parametric finite element method for solving the sharp-interface continuum model of solid-state dewetting in three-dimensional space. The model describes the motion of the film\slash vapor interface with contact line migration and is governed by the surface diffusion equation with proper boundary conditions at the contact line. We present a new weak formulation for the problem, in which the interface and its contact line are evolved simultaneously. By using piecewise linear elements in space and backward Euler in time, we then discretize the weak formulation to obtain a fully discretized parametric finite element approximation. The resulting numerical method is shown to be well-posed and unconditionally energy-stable. Furthermore, the numerical method is extended for solving the sharp interface model of solid-state dewetting with anisotropic surface energies in the Riemmanian metric form. Numerical results are reported to show the convergence and efficiency of the proposed numerical method as well as the anisotropic effects on the morphological evolution of thin films in solid-state dewetting.
We consider a general class of nonsmooth optimal control problems with partial differential equation (PDE) constraints, which are very challenging due to its nonsmooth objective functionals and the resulting high-dimensional and ill-conditioned systems after discretization. We focus on the application of a primal-dual method, with which different types of variables can be treated individually and thus its main computation at each iteration only requires solving two PDEs. Our target is to accelerate the primal-dual method with either larger step sizes or operator learning techniques. For the accelerated primal-dual method with larger step sizes, its convergence can be still proved rigorously while it numerically accelerates the original primal-dual method in a simple and universal way. For the operator learning acceleration, we construct deep neural network surrogate models for the involved PDEs. Once a neural operator is learned, solving a PDE requires only a forward pass of the neural network, and the computational cost is thus substantially reduced. The accelerated primal-dual method with operator learning is mesh-free, numerically efficient, and scalable to different types of PDEs. The acceleration effectiveness of these two techniques is promisingly validated by some preliminary numerical results.
Longitudinal studies with binary or ordinal responses are widely encountered in various disciplines, where the primary focus is on the temporal evolution of the probability of each response category. Traditional approaches build from the generalized mixed effects modeling framework. Even amplified with nonparametric priors placed on the fixed or random effects, such models are restrictive due to the implied assumptions on the marginal expectation and covariance structure of the responses. We tackle the problem from a functional data analysis perspective, treating the observations for each subject as realizations from subject-specific stochastic processes at the measured times. We develop the methodology focusing initially on binary responses, for which we assume the stochastic processes have Binomial marginal distributions. Leveraging the logits representation, we model the discrete space processes through sequences of continuous space processes. We utilize a hierarchical framework to model the mean and covariance kernel of the continuous space processes nonparametrically and simultaneously through a Gaussian process prior and an Inverse-Wishart process prior, respectively. The prior structure results in flexible inference for the evolution and correlation of binary responses, while allowing for borrowing of strength across all subjects. The modeling approach can be naturally extended to ordinal responses. Here, the continuation-ratio logits factorization of the multinomial distribution is key for efficient modeling and inference, including a practical way of dealing with unbalanced longitudinal data. The methodology is illustrated with synthetic data examples and an analysis of college students' mental health status data.
Variable selection is a procedure to attain the truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. In addition, real-world applications have increased demands for interpretability of the selection process. A pragmatic approach should not only attain the most predictive covariates, but also provide ample and easy-to-understand grounds for removing certain covariates. In view of these requirements, this paper puts forward an approach for transparent and nonlinear variable selection. In order to transparently decouple information within the input predictors, a three-step heuristic search is designed, via which the input predictors are grouped into four subsets: the relevant to be selected, and the uninformative, redundant, and conditionally independent to be removed. A nonlinear partial correlation coefficient is introduced to better identify the predictors which have nonlinear functional dependence with the response. The proposed method is model-free and the selected subset can be competent input for commonly used predictive models. Experiments demonstrate the superior performance of the proposed method against the state-of-the-art baselines in terms of prediction accuracy and model interpretability.
Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling -- a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance~(or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback--Leibler divergence. Specialized rates for approximations of the popular triangular Kn{\"o}the-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
Graph Neural Networks (GNNs) have been successfully used in many problems involving graph-structured data, achieving state-of-the-art performance. GNNs typically employ a message-passing scheme, in which every node aggregates information from its neighbors using a permutation-invariant aggregation function. Standard well-examined choices such as the mean or sum aggregation functions have limited capabilities, as they are not able to capture interactions among neighbors. In this work, we formalize these interactions using an information-theoretic framework that notably includes synergistic information. Driven by this definition, we introduce the Graph Ordering Attention (GOAT) layer, a novel GNN component that captures interactions between nodes in a neighborhood. This is achieved by learning local node orderings via an attention mechanism and processing the ordered representations using a recurrent neural network aggregator. This design allows us to make use of a permutation-sensitive aggregator while maintaining the permutation-equivariance of the proposed GOAT layer. The GOAT model demonstrates its increased performance in modeling graph metrics that capture complex information, such as the betweenness centrality and the effective size of a node. In practical use-cases, its superior modeling capability is confirmed through its success in several real-world node classification benchmarks.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.