Most common Optimal Transport (OT) solvers are currently based on an approximation of underlying measures by discrete measures. However, it is sometimes relevant to work only with moments of measures instead of the measure itself, and many common OT problems can be formulated as moment problems (the most relevant examples being $L^p$-Wasserstein distances, barycenters, and Gromov-Wasserstein discrepancies on Euclidean spaces). We leverage this fact to develop a generalized moment formulation that covers these classes of OT problems. The transport plan is represented through its moments on a given basis, and the marginal constraints are expressed in terms of moment constraints. A practical computation then consists in considering a truncation of the involved moment sequences up to a certain order, and using the polynomial sums-of-squares hierarchy for measures supported on semi-algebraic sets. We prove that the strategy converges to the solution of the OT problem as the order increases. We also show how to approximate linear quantities of interest, and how to estimate the support of the optimal transport map from the computed moments using Christoffel-Darboux kernels. Numerical experiments illustrate the good behavior of the approach.
Nonparametric estimation for semilinear SPDEs, namely stochastic reaction-diffusion equations in one space dimension, is studied. We consider observations of the solution field on a discrete grid in time and space with infill asymptotics in both coordinates. Firstly, we derive a nonparametric estimator for the reaction function of the underlying equation. The estimate is chosen from a finite-dimensional function space based on a least squares criterion. Oracle inequalities provide conditions for the estimator to achieve the usual nonparametric rate of convergence. Adaptivity is provided via model selection. Secondly, we show that the asymptotic properties of realized quadratic variation based estimators for the diffusivity and volatility carry over from linear SPDEs. In particular, we obtain a rate-optimal joint estimator of the two parameters. The result relies on our precise analysis of the H\"older regularity of the solution process and its nonlinear component, which may be of its own interest. Both steps of the calibration can be carried out simultaneously without prior knowledge of the parameters.
We introduce a new stochastic algorithm for solving entropic optimal transport (EOT) between two absolutely continuous probability measures $\mu$ and $\nu$. Our work is motivated by the specific setting of Monge-Kantorovich quantiles where the source measure $\mu$ is either the uniform distribution on the unit hypercube or the spherical uniform distribution. Using the knowledge of the source measure, we propose to parametrize a Kantorovich dual potential by its Fourier coefficients. In this way, each iteration of our stochastic algorithm reduces to two Fourier transforms that enables us to make use of the Fast Fourier Transform (FFT) in order to implement a fast numerical method to solve EOT. We study the almost sure convergence of our stochastic algorithm that takes its values in an infinite-dimensional Banach space. Then, using numerical experiments, we illustrate the performances of our approach on the computation of regularized Monge-Kantorovich quantiles. In particular, we investigate the potential benefits of entropic regularization for the smooth estimation of multivariate quantiles using data sampled from the target measure $\nu$.
This paper is devoted to the construction and analysis of immersed finite element (IFE) methods in three dimensions. Different from the 2D case, the points of intersection of the interface and the edges of a tetrahedron are usually not coplanar, which makes the extension of the original 2D IFE methods based on a piecewise linear approximation of the interface to the 3D case not straightforward. We address this coplanarity issue by an approach where the interface is approximated via discrete level set functions. This approach is very convenient from a computational point of view since in many practical applications the exact interface is often unknown, and only a discrete level set function is available. As this approach has also not be considered in the 2D IFE methods, in this paper we present a unified framework for both 2D and 3D cases. We consider an IFE method based on the traditional Crouzeix-Raviart element using integral values on faces as degrees of freedom. The novelty of the proposed IFE is the unisolvence of basis functions on arbitrary triangles/tetrahedrons without any angle restrictions even for anisotropic interface problems, which is advantageous over the IFE using nodal values as degrees of freedom. The optimal bounds for the IFE interpolation errors are proved on shape-regular triangulations. For the IFE method, optimal a priori error and condition number estimates are derived with constants independent of the location of the interface with respect to the unfitted mesh. The extension to anisotropic interface problems with tensor coefficients is also discussed. Numerical examples supporting the theoretical results are provided.
A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then be represented as the matrix product of these Jacobian matrices. This approach is valid because the chain rule can be applied to a composition of vector-valued functions, and the use of Jacobian matrices allows for the incorporation of multiple inputs and outputs. By providing concise mathematical justifications, the results can be made understandable and useful to a broad audience from various disciplines.
We consider estimation of generalized additive models using basis expansions with Bayesian model selection. Although Bayesian model selection is an intuitively appealing tool for regression splines by virtue of the flexible knot placement and model-averaged function estimates, its use has traditionally been limited to Gaussian additive regression, as posterior search of the model space requires a tractable form of the marginal model likelihood. We introduce an extension of the method to the exponential family of distributions using the Laplace approximation to the likelihood. Although the Laplace approximation is successful with all Gaussian-type prior distributions in providing a closed-form expression of the marginal likelihood, there is no broad consensus on the best prior distribution to be used for nonparametric regression via model selection. We observe that the classical unit information prior distribution for variable selection may not be suitable for nonparametric regression using basis expansions. Instead, our study reveals that mixtures of g-priors are more suitable. A large family of mixtures of g-priors is considered for a detailed examination of how various mixture priors perform in estimating generalized additive models. Furthermore, we compare several priors of knots for model selection-based spline approaches to determine the most practically effective scheme. The model selection-based estimation methods are also compared with other Bayesian approaches to function estimation. Extensive simulation studies demonstrate the validity of the model selection-based approaches. We provide an R package for the proposed method.
We study the problem of optimal sampling in an edge-based video analytics system (VAS), where sensor samples collected at a terminal device are offloaded to a back-end server that processes them and generates feedback for a user. Sampling the system with the maximum allowed frequency results in the timely detection of relevant events with minimum delay. However, it incurs high energy costs and causes unnecessary usage of network and compute resources via communication and processing of redundant samples. On the other hand, an infrequent sampling result in a higher delay in detecting the relevant event, thus increasing the idle energy usage and degrading the quality of experience in terms of responsiveness of the system. We quantify this sampling frequency trade-off as a weighted function between the number of samples and the responsiveness. We propose an energy-optimal aperiodic sampling policy that improves over the state-of-the-art optimal periodic sampling policy. Numerically, we show the proposed policy provides a consistent improvement of more than 10$\mathbf{\%}$ over the state-of-the-art.
In consumer theory, ranking available objects by means of preference relations yields the most common description of individual choices. However, preference-based models assume that individuals: (1) give their preferences only between pairs of objects; (2) are always able to pick the best preferred object. In many situations, they may be instead choosing out of a set with more than two elements and, because of lack of information and/or incomparability (objects with contradictory characteristics), they may not able to select a single most preferred object. To address these situations, we need a choice-model which allows an individual to express a set-valued choice. Choice functions provide such a mathematical framework. We propose a Gaussian Process model to learn choice functions from choice-data. The proposed model assumes a multiple utility representation of a choice function based on the concept of Pareto rationalization, and derives a strategy to learn both the number and the values of these latent multiple utilities. Simulation experiments demonstrate that the proposed model outperforms the state-of-the-art methods.
Predictive models -- as with machine learning -- can underpin causal inference, to estimate the effects of an intervention at the population or individual level. This opens the door to a plethora of models, useful to match the increasing complexity of health data, but also the Pandora box of model selection: which of these models yield the most valid causal estimates? Classic machine-learning cross-validation procedures are not directly applicable. Indeed, an appropriate selection procedure for causal inference should equally weight both outcome errors for each individual, treated or not treated, whereas one outcome may be seldom observed for a sub-population. We study how more elaborate risks benefit causal model selection. We show theoretically that simple risks are brittle to weak overlap between treated and non-treated individuals as well as to heterogeneous errors between populations. Rather a more elaborate metric, the R-risk appears as a proxy of the oracle error on causal estimates, observable at the cost of an overlap re-weighting. As the R-risk is defined not only from model predictions but also by using the conditional mean outcome and the treatment probability, using it for model selection requires adapting cross validation. Extensive experiments show that the resulting procedure gives the best causal model selection.
The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.
Regression models that ignore measurement error in predictors may produce highly biased estimates leading to erroneous inferences. It is well known that it is extremely difficult to take measurement error into account in Gaussian nonparametric regression. This problem becomes tremendously more difficult when considering other families such as logistic regression, Poisson and negative-binomial. For the first time, we present a method aiming to correct for measurement error when estimating regression functions flexibly covering virtually all distributions and link functions regularly considered in generalized linear models. This approach depends on approximating the first and the second moment of the response after integrating out the true unobserved predictors in a semiparametric generalized linear model. Unlike previous methods, this method is not restricted to truncated splines and can utilize various basis functions. Through extensive simulation studies, we study the performance of our method under many scenarios.