Given a $d$-dimensional continuous (resp. discrete) probability distribution $\mu$ and a discrete distribution $\nu$, the semi-discrete (resp. discrete) Optimal Transport (OT) problem asks for computing a minimum-cost plan to transport mass from $\mu$ to $\nu$; we assume $n$ to be the size of the support of the discrete distributions, and we assume we have access to an oracle outputting the mass of $\mu$ inside a constant-complexity region in $O(1)$ time. In this paper, we present three approximation algorithms for the OT problem. (i) Semi-discrete additive approximation: For any $\epsilon>0$, we present an algorithm that computes a semi-discrete transport plan with $\epsilon$-additive error in $n^{O(d)}\log\frac{C_{\max}}{\epsilon}$ time; here, $C_{\max}$ is the diameter of the supports of $\mu$ and $\nu$. (ii) Semi-discrete relative approximation: For any $\epsilon>0$, we present an algorithm that computes a $(1+\epsilon)$-approximate semi-discrete transport plan in $n\epsilon^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume the ground distance is any $L_p$ norm. (iii) Discrete relative approximation: For any $\epsilon>0$, we present a Monte-Carlo $(1+\epsilon)$-approximation algorithm that computes a transport plan under any $L_p$ norm in $n\epsilon^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume that the spread of the supports of $\mu$ and $\nu$ is polynomially bounded.
Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure on, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a covariance kernel. In this paper we provide necessary and sufficient conditions on the covariance kernel for the sample paths of the corresponding GP to attain a given regularity. We use the framework of H\"older regularity as it grants us particularly straightforward conditions, which simplify further in the cases of stationary and isotropic GPs. We then demonstrate that our results allow for novel and unusually tight characterisations of the sample path regularities of the GPs commonly used in machine learning applications, such as the Mat\'ern GPs.
We explore some connections between association schemes and the analyses of the semidefinite programming (SDP) based convex relaxations of combinatorial optimization problems in the Lov\'{a}sz--Schrijver lift-and-project hierarchy. Our analysis of the relaxations of the stable set polytope leads to bounds on the clique and stability numbers of some regular graphs reminiscent of classical bounds by Delsarte and Hoffman, as well as the notion of deeply vertex-transitive graphs -- highly symmetric graphs that we show arise naturally from some association schemes. We also study relaxations of the hypergraph matching problem, and determine exactly or provide bounds on the lift-and-project ranks of these relaxations. Our proofs for these results also inspire the study of the general hypermatching pseudo-scheme, which is an association scheme except it is generally non-commutative. We then illustrate the usefulness of obtaining commutative subschemes from non-commutative pseudo-schemes via contraction in this context.
Classical mathematical statistics deals with models that are parametrized by a Euclidean, i.e. finite dimensional, parameter. Quite often such models have been and still are chosen in practical situations for their mathematical simplicity and tractability. However, these models are typically inappropriate since the implied distributional assumptions cannot be supported by hard evidence. It is natural then to relax these assumptions. This leads to the class of semiparametric models. These models have been studied in a local asymptotic setting, in which the Convolution Theorem yields bounds on the performance of regular estimators. Alternatively, local asymptotics can be based on the Local Asymptotic Minimax Theorem and on the Local Asymptotic Spread Theorem, both valid for any sequence of estimators. This Local Asymptotic Spread Theorem is a straightforward consequence of a Finite Sample Spread Inequality, which has some intrinsic value for estimation theory in general. We will discuss both the Finite Sample and Local Asymptotic Spread Theorem, as well as the Convolution Theorem.
A new $H(\textrm{divdiv})$-conforming finite element is presented, which avoids the need for super-smoothness by redistributing the degrees of freedom to edges and faces. This leads to a hybridizable mixed method with superconvergence for the biharmonic equation. Moreover, new finite element divdiv complexes are established. Finally, new weak Galerkin and $C^0$ discontinuous Galerkin methods for the biharmonic equation are derived.
A Gr\"obner basis computation for the Weyl algebra with respect to a tropical term order and by using a homogenization-dehomogenization technique is sufficiently sluggish. A significant number of reductions to zero occur. To improve the computation, a tropical F5 algorithm is developed for this context. As a member of the family of signature-based algorithms, this algorithm keeps track of where Weyl algebra elements come from to anticipate reductions to zero. The total order for ordering module monomials or signatures in this paper is designed as close as possible to the definition of the tropical term order. As in Vaccon et al. (2021), this total order is not compatible with the tropical term order.
We explore the concept of separating systems of vertex sets of graphs. A separating system of a set $X$ is a collection of subsets of $X$ such that for any pair of distinct elements in $X$, there exists a set in the separating system that contains exactly one of the two elements. A separating system of the vertex set of a graph $G$ is called a vertex-separating path (tree) system of $G$ if the elements of the separating system are paths (trees) in the graph $G$. In this paper, we focus on the size of the smallest vertex-separating path (tree) system for different types of graphs, including trees, grids, and maximal outerplanar graphs.
Natural data observed in $\mathbb{R}^n$ is often constrained to an $m$-dimensional manifold $\mathcal{M}$, where $m < n$. This work focuses on the task of building theoretically principled generative models for such data. Current generative models learn $\mathcal{M}$ by mapping an $m$-dimensional latent variable through a neural network $f_\theta: \mathbb{R}^m \to \mathbb{R}^n$. These procedures, which we call pushforward models, incur a straightforward limitation: manifolds cannot in general be represented with a single parameterization, meaning that attempts to do so will incur either computational instability or the inability to learn probability densities within the manifold. To remedy this problem, we propose to model $\mathcal{M}$ as a neural implicit manifold: the set of zeros of a neural network. We then learn the probability density within $\mathcal{M}$ with a constrained energy-based model, which employs a constrained variant of Langevin dynamics to train and sample from the learned manifold. In experiments on synthetic and natural data, we show that our model can learn manifold-supported distributions with complex topologies more accurately than pushforward models.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.