We study the well-posedness and numerical approximation of multidimensional stochastic differential equations (SDEs) with distributional drift, driven by a fractional Brownian motion. First, we prove weak existence for such SDEs. This holds under a condition that relates the Hurst parameter $H$ of the noise to the Besov regularity of the drift. Then under a stronger condition, we study the error between a solution $X$ of the SDE with drift $b$ and its tamed Euler scheme with mollified drift $b^n$. We obtain a rate of convergence in $L^m(\Omega)$ for this error, which depends on the Besov regularity of the drift. This result covers the critical case of the regime of strong existence and pathwise uniqueness. When the Besov regularity increases and the drift becomes a bounded measurable function, we recover the (almost) optimal rate of convergence $1/2-\varepsilon$. As a byproduct of this convergence, we deduce that pathwise uniqueness holds in a class of H\"older continuous solutions and that any such solution is strong. The proofs rely on stochastic sewing techniques, especially to deduce new regularising properties of the discrete-time fractional Brownian motion. We also present several examples and numerical simulations that illustrate our results.
We propose a spectral collocation method to approximate the exact boundary control of the wave equation in a square domain. The idea is to introduce a suitable approximate control problem that we solve in the finite-dimensional space of polynomials of degree N in space. We prove that we can choose a sequence of discrete controls depending on the parameter N associated with the approximate control problem in such a way that they converge, as N goes to infinity, to a control of the continuous wave equation. Unlike other numerical approximations tried in the literature, this one does not require regularization techniques and can be easily adapted to other equations and systems where the controllability of the continuous model is known. The method is illustrated with several examples in 1-d and 2-d in a square domain. We also give numerical evidence of the highly accurate approximation inherent to spectral methods.
In this paper, we present a comprehensive convergence analysis of Laguerre spectral approximations for analytic functions. By exploiting contour integral techniques from complex analysis, we prove rigorously that Laguerre projection and interpolation methods of degree $n$ converge at the root-exponential rate $O(\exp(-2\rho\sqrt{n}))$ with $\rho>0$ when the underlying function is analytic inside and on a parabola with focus at the origin and vertex at $z=-\rho^2$. The extension to several important applications are also discussed, including Laguerre spectral differentiations, Gauss-Laguerre quadrature rules and the Weeks method for the inversion of Laplace transform, and some sharp convergence rate estimates are derived. Numerical experiments are presented to verify the theoretical results.
In this work we propose a weighted hybridizable discontinuous Galerkin method (W-HDG) for drift-diffusion problems. By using specific exponential weights when computing the $L^2$ product in each cell of the discretization, we are able to mimic the behavior of the Slotboom variables, and eliminate the drift term from the local matrix contributions, while still solving the problem for the primal variables. We show that the proposed numerical scheme is well-posed, and validate numerically that it has the same properties as classical HDG methods, including optimal convergence, and superconvergence of postprocessed solutions. For polynomial degree zero, dimension one, and vanishing HDG stabilization parameter, W-HDG coincides with the Scharfetter-Gummel finite volume scheme (i.e., it produces the same system matrix). The use of local exponential weights generalizes the Scharfetter-Gummel scheme (the state-of-the-art for finite volume discretization of transport dominated problems) to arbitrary high order approximations.
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
Autonomous driving is regarded as one of the most promising remedies to shield human beings from severe crashes. To this end, 3D object detection serves as the core basis of such perception system especially for the sake of path planning, motion prediction, collision avoidance, etc. Generally, stereo or monocular images with corresponding 3D point clouds are already standard layout for 3D object detection, out of which point clouds are increasingly prevalent with accurate depth information being provided. Despite existing efforts, 3D object detection on point clouds is still in its infancy due to high sparseness and irregularity of point clouds by nature, misalignment view between camera view and LiDAR bird's eye of view for modality synergies, occlusions and scale variations at long distances, etc. Recently, profound progress has been made in 3D object detection, with a large body of literature being investigated to address this vision task. As such, we present a comprehensive review of the latest progress in this field covering all the main topics including sensors, fundamentals, and the recent state-of-the-art detection methods with their pros and cons. Furthermore, we introduce metrics and provide quantitative comparisons on popular public datasets. The avenues for future work are going to be judiciously identified after an in-deep analysis of the surveyed works. Finally, we conclude this paper.
The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.
This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.