In this paper, we develop a fast and accurate pseudospectral method to approximate numerically the fractional Laplacian $(-\Delta)^{\alpha/2}$ of a function on $\mathbb{R}$ for $\alpha=1$; this case, commonly referred to as the half Laplacian, is equivalent to the Hilbert transform of the derivative of the function. The main ideas are as follows. Given a twice continuously differentiable bounded function $u\in \mathit{C}_b^2(\mathbb{R})$, we apply the change of variable $x=L\cot(s)$, with $L>0$ and $s\in[0,\pi]$, which maps $\mathbb{R}$ into $[0,\pi]$, and denote $(-\Delta)_s^{1/2}u(x(s)) \equiv (-\Delta)^{1/2}u(x)$. Therefore, by performing a Fourier series expansion of $u(x(s))$, the problem is reduced to computing $(-\Delta)_s^{1/2}e^{iks} \equiv (-\Delta)^{1/2}(x + i)^k/(1+x^2)^{k/2}$. On a previous work, we considered the case with $k$ even and $\alpha\in(0,2)$, so we focus now on the case with $k$ odd. More precisely, we express $(-\Delta)_s^{1/2}e^{iks}$ for $k$ odd in terms of the Gaussian hypergeometric function ${}_2F_1$, and also as a well-conditioned finite sum. Then, we use a fast convolution result, that enable us to compute very efficiently $\sum_{l = 0}^Ma_l(-\Delta)_s^{1/2}e^{i(2l+1)s}$, for extremely large values of $M$. This enables us to approximate $(-\Delta)_s^{1/2}u(x(s))$ in a fast and accurate way, especially when $u(x(s))$ is not periodic of period $\pi$.
Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time.
We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the $2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Amp\`{e}re PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.
This note addresses the question of optimally estimating a linear functional of an object acquired through linear observations corrupted by random noise, where optimality pertains to a worst-case setting tied to a symmetric, convex, and closed model set containing the object. It complements the article "Statistical Estimation and Optimal Recovery" published in the Annals of Statistics in 1994. There, Donoho showed (among other things) that, for Gaussian noise, linear maps provide near-optimal estimation schemes relatively to a performance measure relevant in Statistical Estimation. Here, we advocate for a different performance measure arguably more relevant in Optimal Recovery. We show that, relatively to this new measure, linear maps still provide near-optimal estimation schemes even if the noise is merely log-concave. Our arguments, which make a connection to the deterministic noise situation and bypass properties specific to the Gaussian case, offer an alternative to parts of Donoho's proof.
Moving average processes driven by exponential-tailed L\'evy noise are important extensions of their Gaussian counterparts in order to capture deviations from Gaussianity, more flexible dependence structures, and sample paths with jumps. Popular examples include non-Gaussian Ornstein--Uhlenbeck processes and type G Mat\'ern stochastic partial differential equation random fields. This paper is concerned with the open problem of determining their extremal dependence structure. We leverage the fact that such processes admit approximations on grids or triangulations that are used in practice for efficient simulations and inference. These approximations can be expressed as special cases of a class of linear transformations of independent, exponential-tailed random variables, that bridge asymptotic dependence and independence in a novel, tractable way. This result is of independent interest since models that can capture both extremal dependence regimes are scarce and the construction of such flexible models is an active area of research. This new fundamental result allows us to show that the integral approximation of general moving average processes with exponential-tailed L\'evy noise is asymptotically independent when the mesh is fine enough. Under mild assumptions on the kernel function we also derive the limiting residual tail dependence function. For the popular exponential-tailed Ornstein--Uhlenbeck process we prove that it is asymptotically independent, but with a different residual tail dependence function than its Gaussian counterpart. Our results are illustrated through simulation studies.
We investigate the approximation of functions $f$ on a bounded domain $\Omega\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $\Omega$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains $\Omega$. The current definition of these model classes does not depend on the domain $\Omega$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.
The stochastic partial differential equation (SPDE) approach is widely used for modeling large spatial datasets. It is based on representing a Gaussian random field $u$ on $\mathbb{R}^d$ as the solution of an elliptic SPDE $L^\beta u = \mathcal{W}$ where $L$ is a second-order differential operator, $2\beta$ (belongs to natural number starting from 1) is a positive parameter that controls the smoothness of $u$ and $\mathcal{W}$ is Gaussian white noise. A few approaches have been suggested in the literature to extend the approach to allow for any smoothness parameter satisfying $\beta>d/4$. Even though those approaches work well for simulating SPDEs with general smoothness, they are less suitable for Bayesian inference since they do not provide approximations which are Gaussian Markov random fields (GMRFs) as in the original SPDE approach. We address this issue by proposing a new method based on approximating the covariance operator $L^{-2\beta}$ of the Gaussian field $u$ by a finite element method combined with a rational approximation of the fractional power. This results in a numerically stable GMRF approximation which can be combined with the integrated nested Laplace approximation (INLA) method for fast Bayesian inference. A rigorous convergence analysis of the method is performed and the accuracy of the method is investigated with simulated data. Finally, we illustrate the approach and corresponding implementation in the R package rSPDE via an application to precipitation data which is analyzed by combining the rSPDE package with the R-INLA software for full Bayesian inference.
Let $k \geq 1$. A graph $G$ is $\mathbf{W_k}$ if for any $k$ pairwise disjoint independent vertex subsets $A_1, \dots, A_k$ in $G$, there exist $k$ pairwise disjoint maximum independent sets $S_1, \dots, S_k$ in $G$ such that $A_i \subseteq S_i$ for $i \in [k]$. Recognizing $\mathbf{W_1}$ graphs is co-NP-hard, as shown by Chv\'atal and Hartnell (1993) and, independently, by Sankaranarayana and Stewart (1992). Extending this result and answering a recent question of Levit and Tankus, we show that recognizing $\mathbf{W_k}$ graphs is co-NP-hard for $k \geq 2$. On the positive side, we show that recognizing $\mathbf{W_k}$ graphs is, for each $k\geq 2$, FPT parameterized by clique-width and by tree-width. Finally, we construct graphs $G$ that are not $\mathbf{W_2}$ such that, for every vertex $v$ in $G$ and every maximal independent set $S$ in $G - N[v]$, the largest independent set in $N(v) \setminus S$ consists of a single vertex, thereby refuting a conjecture of Levit and Tankus.
The majority of fault-tolerant distributed algorithms are designed assuming a nominal corruption model, in which at most a fraction $f_n$ of parties can be corrupted by the adversary. However, due to the infamous Sybil attack, nominal models are not sufficient to express the trust assumptions in open (i.e., permissionless) settings. Instead, permissionless systems typically operate in a weighted model, where each participant is associated with a weight and the adversary can corrupt a set of parties holding at most a fraction $f_w$ of total weight. In this paper, we suggest a simple way to transform a large class of protocols designed for the nominal model into the weighted model. To this end, we formalize and solve three novel optimization problems, which we collectively call the weight reduction problems, that allow us to map large real weights into small integer weights while preserving the properties necessary for the correctness of the protocols. In all cases, we manage to keep the sum of the integer weights to be at most linear in the number of parties, resulting in extremely efficient protocols for the weighted model. Moreover, we demonstrate that, on weight distributions that emerge in practice, the sum of the integer weights tends to be far from the theoretical worst-case and, often even smaller than the number of participants. While, for some protocols, our transformation requires an arbitrarily small reduction in resilience (i.e., $f_w = f_n - \epsilon$), surprisingly, for many important problems we manage to obtain weighted solutions with the same resilience ($f_w = f_n$) as nominal ones. Notable examples include asynchronous consensus, verifiable secret sharing, erasure-coded distributed storage and broadcast protocols.
The vertex cover problem is a fundamental and widely studied combinatorial optimization problem. It is known that its standard linear programming relaxation is integral for bipartite graphs and half-integral for general graphs. As a consequence, the natural rounding algorithm based on this relaxation computes an optimal solution for bipartite graphs and a $2$-approximation for general graphs. This raises the question of whether one can interpolate the rounding curve of the standard linear programming relaxation in a beyond the worst-case manner, depending on how close the graph is to being bipartite. In this paper, we consider a simple rounding algorithm that exploits the knowledge of an induced bipartite subgraph to attain improved approximation ratios. Equivalently, we suppose that we work with a pair $(G, S)$, consisting of a graph with an odd cycle transversal. If $S$ is a stable set, we prove a tight approximation ratio of $1 + 1/\rho$, where $2\rho -1$ denotes the odd girth (i.e., length of the shortest odd cycle) of the contracted graph $\tilde{G} := G /S$ and satisfies $\rho \in [2,\infty]$. If $S$ is an arbitrary set, we prove a tight approximation ratio of $\left(1+1/\rho \right) (1 - \alpha) + 2 \alpha$, where $\alpha \in [0,1]$ is a natural parameter measuring the quality of the set $S$. The technique used to prove tight improved approximation ratios relies on a structural analysis of the contracted graph $\tilde{G}$. Tightness is shown by constructing classes of weight functions matching the obtained upper bounds. As a byproduct of the structural analysis, we obtain improved tight bounds on the integrality gap and the fractional chromatic number of 3-colorable graphs. We also discuss algorithmic applications in order to find good odd cycle transversals and show optimality of the analysis.
The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.