Recently, Steinbach et al. introduced a novel operator $\mathcal{H}_T: L^2(0,T) \to L^2(0,T)$, known as the modified Hilbert transform. This operator has shown its significance in space-time formulations related to the heat and wave equations. In this paper, we establish a direct connection between the modified Hilbert transform $\mathcal{H}_T$ and the canonical Hilbert transform $\mathcal{H}$. Specifically, we prove the relationship $\mathcal{H}_T \varphi = -\mathcal{H} \tilde{\varphi}$, where $\varphi \in L^2(0,T)$ and $\tilde{\varphi}$ is a suitable extension of $\varphi$ over the entire $\mathbb{R}$. By leveraging this crucial result, we derive some properties of $\mathcal{H}_T$, including a new inversion formula, that emerge as immediate consequences of well-established findings on $\mathcal{H}$.
A generalized unbalanced optimal transport distance ${\rm WB}_{\Lambda}$ on matrix-valued measures $\mathcal{M}(\Omega,\mathbb{S}_+^n)$ was defined in [arXiv:2011.05845] \`{a} la Benamou-Brenier, which extends the Kantorovich-Bures and the Wasserstein-Fisher-Rao distances. In this work, we investigate the convergence properties of the discrete transport problems associated with ${\rm WB}_{\Lambda}$. We first present a convergence framework for abstract discretization. Then, we propose a specific discretization scheme that aligns with this framework, under the assumption that the initial and final distributions are absolutely continuous with respect to the Lebesgue measure. Moreover, thanks to the static formulation, we show that such an assumption can be removed for the Wasserstein-Fisher-Rao distance.
We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
Many flexible families of positive random variables exhibit non-closed forms of the density and distribution functions and this feature is considered unappealing for modelling purposes. However, such families are often characterized by a simple expression of the corresponding Laplace transform. Relying on the Laplace transform, we propose to carry out parameter estimation and goodness-of-fit testing for a general class of non-standard laws. We suggest a novel data-driven inferential technique, providing parameter estimators and goodness-of-fit tests, whose large-sample properties are derived. The implementation of the method is specifically considered for the positive stable and Tweedie distributions. A Monte Carlo study shows good finite-sample performance of the proposed technique for such laws.
In this article, we combine Sweedler's classic theory of measuring coalgebras -- by which $k$-algebras are enriched in $k$-coalgebras for $k$ a field -- with the theory of W-types -- by which the categorical semantics of inductive data types in functional programming languages are understood. In our main theorem, we find that under some hypotheses, algebras of an endofunctor are enriched in coalgebras of the same endofunctor, and we find polynomial endofunctors provide many interesting examples of this phenomenon. We then generalize the notion of initial algebra of an endofunctor using this enrichment, thus generalizing the notion of W-type. This article is an extended version of arXiv:2303.16793, it adds expository introductions to the original theories of measuring coalgebras and W-types along with some improvements to the main theory and many explicitly worked examples.
We prove lower bounds for the randomized approximation of the embedding $\ell_1^m \rightarrow \ell_\infty^m$ based on algorithms that use arbitrary linear (hence non-adaptive) information provided by a (randomized) measurement matrix $N \in \mathbb{R}^{n \times m}$. These lower bounds reflect the increasing difficulty of the problem for $m \to \infty$, namely, a term $\sqrt{\log m}$ in the complexity $n$. This result implies that non-compact operators between arbitrary Banach spaces are not approximable using non-adaptive Monte Carlo methods. We also compare these lower bounds for non-adaptive methods with upper bounds based on adaptive, randomized methods for recovery for which the complexity $n$ only exhibits a $(\log\log m)$-dependence. In doing so we give an example of linear problems where the error for adaptive vs. non-adaptive Monte Carlo methods shows a gap of order $n^{1/2} ( \log n)^{-1/2}$.
In the Euclidean $k$-means problems we are given as input a set of $n$ points in $\mathbb{R}^d$ and the goal is to find a set of $k$ points $C\subseteq \mathbb{R}^d$, so as to minimize the sum of the squared Euclidean distances from each point in $P$ to its closest center in $C$. In this paper, we formally explore connections between the $k$-coloring problem on graphs and the Euclidean $k$-means problem. Our results are as follows: $\bullet$ For all $k\ge 3$, we provide a simple reduction from the $k$-coloring problem on regular graphs to the Euclidean $k$-means problem. Moreover, our technique extends to enable a reduction from a structured max-cut problem (which may be considered as a partial 2-coloring problem) to the Euclidean $2$-means problem. Thus, we have a simple and alternate proof of the NP-hardness of Euclidean 2-means problem. $\bullet$ In the other direction, we mimic the $O(1.7297^n)$ time algorithm of Williams [TCS'05] for the max-cut of problem on $n$ vertices to obtain an algorithm for the Euclidean 2-means problem with the same runtime, improving on the naive exhaustive search running in $2^n\cdot \text{poly}(n,d)$ time. $\bullet$ We prove similar results and connections as above for the Euclidean $k$-min-sum problem.
Given an unconditional diffusion model $\pi(x, y)$, using it to perform conditional simulation $\pi(x \mid y)$ is still largely an open question and is typically achieved by learning conditional drifts to the denoising SDE after the fact. In this work, we express conditional simulation as an inference problem on an augmented space corresponding to a partial SDE bridge. This perspective allows us to implement efficient and principled particle Gibbs and pseudo-marginal samplers marginally targeting the conditional distribution $\pi(x \mid y)$. Contrary to existing methodology, our methods do not introduce any additional approximation to the unconditional diffusion model aside from the Monte Carlo error. We showcase the benefits and drawbacks of our approach on a series of synthetic and real data examples.
We provide a framework to analyze the convergence of discretized kinetic Langevin dynamics for $M$-$\nabla$Lipschitz, $m$-convex potentials. Our approach gives convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration schemes which are popular in the molecular dynamics and machine learning communities. Further, we introduce the property ``$\gamma$-limit convergent" (GLC) to characterize underdamped Langevin schemes that converge to overdamped dynamics in the high-friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement. Finally, we provide asymptotic bias estimates for the BAOAB scheme, which remain accurate in the high-friction limit by comparison to a modified stochastic dynamics which preserves the invariant measure.
We consider a wide class of generalized Radon transforms $\mathcal R$, which act in $\mathbb{R}^n$ for any $n\ge 2$ and integrate over submanifolds of any codimension $N$, $1\le N\le n-1$. Also, we allow for a fairly general reconstruction operator $\mathcal A$. The main requirement is that $\mathcal A$ be a Fourier integral operator with a phase function, which is linear in the phase variable. We consider the task of image reconstruction from discrete data $g_{j,k} = (\mathcal R f)_{j,k} + \eta_{j,k}$. We show that the reconstruction error $N_\epsilon^{\text{rec}}=\mathcal A \eta_{j,k}$ satisfies $N^{\text{rec}}(\check x;x_0)=\lim_{\epsilon\to0}N_\epsilon^{\text{rec}}(x_0+\epsilon\check x)$, $\check x\in D$. Here $x_0$ is a fixed point, $D\subset\mathbb{R}^n$ is a bounded domain, and $\eta_{j,k}$ are independent, but not necessarily identically distributed, random variables. $N^{\text{rec}}$ and $N_\epsilon^{\text{rec}}$ are viewed as continuous random functions of the argument $\check x$ (random fields), and the limit is understood in the sense of probability distributions. Under some conditions on the first three moments of $\eta_{j,k}$ (and some other not very restrictive conditions on $x_0$ and $\mathcal A$), we prove that $N^{\text{rec}}$ is a zero mean Gaussian random field and explicitly compute its covariance. We also present a numerical experiment with a cone beam transform in $\mathbb{R}^3$, which shows an excellent match between theoretical predictions and simulated reconstructions.
Optimal transport and the Wasserstein distance $\mathcal{W}_p$ have recently seen a number of applications in the fields of statistics, machine learning, data science, and the physical sciences. These applications are however severely restricted by the curse of dimensionality, meaning that the number of data points needed to estimate these problems accurately increases exponentially in the dimension. To alleviate this problem, a number of variants of $\mathcal{W}_p$ have been introduced. We focus here on one of these variants, namely the max-sliced Wasserstein metric $\overline{\mathcal{W}}_p$. This metric reduces the high-dimensional minimization problem given by $\mathcal{W}_p$ to a maximum of one-dimensional measurements in an effort to overcome the curse of dimensionality. In this note we derive concentration results and upper bounds on the expectation of $\overline{\mathcal{W}}_p$ between the true and empirical measure on unbounded reproducing kernel Hilbert spaces. We show that, under quite generic assumptions, probability measures concentrate uniformly fast in one-dimensional subspaces, at (nearly) parametric rates. Our results rely on an improvement of currently known bounds for $\overline{\mathcal{W}}_p$ in the finite-dimensional case.