We prove that a variant of the classical Sobolev space of first-order dominating mixed smoothness is equivalent (under a certain condition) to the unanchored ANOVA space on $\mathbb{R}^d$, for $d \geq 1$. Both spaces are Hilbert spaces involving weight functions, which determine the behaviour as different variables tend to $\pm \infty$, and weight parameters, which represent the influence of different subsets of variables. The unanchored ANOVA space on $\mathbb{R}^d$ was initially introduced by Nichols & Kuo in 2014 to analyse the error of quasi-Monte Carlo (QMC) approximations for integrals on unbounded domains; whereas the classical Sobolev space of dominating mixed smoothness was used as the setting in a series of papers by Griebel, Kuo & Sloan on the smoothing effect of integration, in an effort to develop a rigorous theory on why QMC methods work so well for certain non-smooth integrands with kinks or jumps coming from option pricing problems. In this same setting, Griewank, Kuo, Le\"ovey & Sloan in 2018 subsequently extended these ideas by developing a practical smoothing by preintegration technique to approximate integrals of such functions with kinks or jumps. We first prove the equivalence in one dimension (itself a non-trivial task), before following a similar, but more complicated, strategy to prove the equivalence for general dimensions. As a consequence of this equivalence, we analyse applying QMC combined with a preintegration step to approximate the fair price of an Asian option, and prove that the error of such an approximation using $N$ points converges at a rate close to $1/N$.
Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
We analyse a class of time discretizations for solving the Gross-Pitaevskii equation at low-regularity on an arbitrary Lipschitz domain $\Omega \subset \mathbb{R}^d$, $d \le 3$, with a non-smooth potential. We show that these schemes, together with their optimal local error structure, allow for convergence under lower regularity assumptions on both the solution and the potential than is required by classical methods, such as splitting or exponential integrator methods. Moreover, we show convergence in the case of periodic boundary conditions, in any fractional positive Sobolev space $H^{r}$, $r \ge 0$ beyond the more typical $L^2$-error analysis.
Past work shows that one can associate a notion of Shannon entropy to a Dirichlet polynomial, regarded as an empirical distribution. Indeed, entropy can be extracted from any $d\in\mathsf{Dir}$ by a two-step process, where the first step is a rig homomorphism out of $\mathsf{Dir}$, the \emph{set} of Dirichlet polynomials, with rig structure given by standard addition and multiplication. In this short note, we show that this rig homomorphism can be upgraded to a rig \emph{functor}, when we replace the set of Dirichlet polynomials by the \emph{category} of ordinary (Cartesian) polynomials. In the Cartesian case, the process has three steps. The first step is a rig functor $\mathbf{Poly}^{\mathbf{Cart}}\to\mathbf{Poly}$ sending a polynomial $p$ to $\dot{p}\mathcal{y}$, where $\dot{p}$ is the derivative of $p$. The second is a rig functor $\mathbf{Poly}\to\mathbf{Set}\times\mathbf{Set}^{\text{op}}$, sending a polynomial $q$ to the pair $(q(1),\Gamma(q))$, where $\Gamma(q)=\mathbf{Poly}(q,\mathcal{y})$ can be interpreted as the global sections of $q$ viewed as a bundle, and $q(1)$ as its base. To make this precise we define what appears to be a new distributive monoidal structure on $\mathbf{Set}\times\mathbf{Set}^{\text{op}}$, which can be understood geometrically in terms of rectangles. The last step, as for Dirichlet polynomials, is simply to extract the entropy as a real number from a pair of sets $(A,B)$; it is given by $\log A-\log \sqrt[A]{B}$ and can be thought of as the log aspect ratio of the rectangle.
Matrix sparsification is a well-known approach in the design of efficient algorithms, where one approximates a matrix $A$ with a sparse matrix $A'$. Achlioptas and McSherry [2007] initiated a long line of work on spectral-norm sparsification, which aims to guarantee that $\|A'-A\|\leq \epsilon \|A\|$ for error parameter $\epsilon>0$. Various forms of matrix approximation motivate considering this problem with a guarantee according to the Schatten $p$-norm for general $p$, which includes the spectral norm as the special case $p=\infty$. We investigate the relation between fixed but different $p\neq q$, that is, whether sparsification in Schatten $p$-norm implies (existentially and/or algorithmically) sparsification in Schatten $q$-norm with similar sparsity. An affirmative answer could be tremendously useful, as it will identify which value of $p$ to focus on. Our main finding is a surprising contrast between this question and the analogous case of $\ell_p$-norm sparsification for vectors: For vectors, the answer is affirmative for $p<q$ and negative for $p>q$, but for matrices we answer negatively for almost all $p\neq q$.
The link between Gaussian random fields and Markov random fields is well established based on a stochastic partial differential equation in Euclidean spaces, where the Mat\'ern covariance functions are essential. However, the Mat\'ern covariance functions are not always positive definite on circles and spheres. In this manuscript, we focus on the extension of this link to circles, and show that the link between Gaussian random fields and Markov random fields on circles is valid based on the circular Mat\'ern covariance function instead. First, we show that this circular Mat\'ern function is the covariance of the stationary solution to the stochastic differential equation on the circle with a formally defined white noise space measure. Then, for the corresponding conditional autoregressive model, we derive a closed form formula for its covariance function. Together with a closed form formula for the circular Mat\'ern covariance function, the link between these two random fields can be established explicitly. Additionally, it is known that the estimator of the mean is not consistent on circles, we provide an equivalent Gaussian measure explanation for this non-ergodicity issue.
We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which {fails to hold even for Gaussian policies. } To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks.
A kernel method for estimating a probability density function (pdf) from an i.i.d. sample drawn from such density is presented. Our estimator is a linear combination of kernel functions, the coefficients of which are determined by a linear equation. An error analysis for the mean integrated squared error is established in a general reproducing kernel Hilbert space setting. The theory developed is then applied to estimate pdfs belonging to weighted Korobov spaces, for which a dimension independent convergence rate is established. Under a suitable smoothness assumption, our method attains a rate arbitrarily close to the optimal rate. Numerical results support our theory.
Overdetermined systems of first kind integral equations appear in many applications. When the right-hand side is discretized, the resulting finite-data problem is ill-posed and admits infinitely many solutions. We propose a numerical method to compute the minimal-norm solution in the presence of boundary constraints. The algorithm stems from the Riesz representation theorem and operates in a reproducing kernel Hilbert space. Since the resulting linear system is strongly ill-conditioned, we construct a regularization method depending on a discrete parameter. It is based on the expansion of the minimal-norm solution in terms of the singular functions of the integral operator defining the problem. Two estimation techniques are tested for the automatic determination of the regularization parameter, namely, the discrepancy principle and the L-curve method. Numerical results concerning two artificial test problems demonstrate the excellent performance of the proposed method. Finally, a particular model typical of geophysical applications, which reproduces the readings of a frequency domain electromagnetic induction device, is investigated. The results show that the new method is extremely effective when the sought solution is smooth, but gives significant information on the solution even for non-smooth solutions.
In recent work (Maierhofer & Huybrechs, 2022, Adv. Comput. Math.), the authors showed that least-squares oversampling can improve the convergence properties of collocation methods for boundary integral equations involving operators of certain pseudo-differential form. The underlying principle is that the discrete method approximates a Bubnov$-$Galerkin method in a suitable sense. In the present work, we extend this analysis to the case when the integral operator is perturbed by a compact operator $\mathcal{K}$ which is continuous as a map on Sobolev spaces on the boundary, $\mathcal{K}:H^{p}\rightarrow H^{q}$ for all $p,q\in\mathbb{R}$. This study is complicated by the fact that both the test and trial functions in the discrete Bubnov-Galerkin orthogonality conditions are modified over the unperturbed setting. Our analysis guarantees that previous results concerning optimal convergence rates and sufficient rates of oversampling are preserved in the more general case. Indeed, for the first time, this analysis provides a complete explanation of the advantages of least-squares oversampled collocation for boundary integral formulations of the Laplace equation on arbitrary smooth Jordan curves in 2D. Our theoretical results are shown to be in very good agreement with numerical experiments.
Structural identifiability is a property of a differential model with parameters that allows for the parameters to be determined from the model equations in the absence of noise. The method of input-output equations is one method for verifying structural identifiability. This method stands out in its importance because the additional insights it provides can be used to analyze and improve models. However, its complete theoretical grounds and applicability are still to be established. A subtlety and key for this method to work correctly is knowing whether the coefficients of these equations are identifiable. In this paper, to address this, we prove identifiability of the coefficients of input-output equations for types of differential models that often appear in practice, such as linear models with one output and linear compartment models in which, from each compartment, one can reach either a leak or an input. This shows that checking identifiability via input-output equations for these models is legitimate and, as we prove, that the field of identifiable functions is generated by the coefficients of the input-output equations. Finally, we exploit a connection between input-output equations and the transfer function matrix to show that, for a linear compartment model with an input and strongly connected graph, the field of all identifiable functions is generated by the coefficients of the transfer function matrix even if the initial conditions are generic.