This paper deals with a Skorokhod's integral based least squares type estimator $\widehat\theta_N$ of the drift parameter $\theta_0$ computed from $N\in\mathbb N^*$ (possibly dependent) copies $X^1,\dots,X^N$ of the solution $X$ of $dX_t =\theta_0b(X_t)dt +\sigma dB_t$, where $B$ is a fractional Brownian motion of Hurst index $H\in (1/3,1)$. On the one hand, some convergence results are established on $\widehat\theta_N$ when $H = 1/2$. On the other hand, when $H\neq 1/2$, Skorokhod's integral based estimators as $\widehat\theta_N$ cannot be computed from data, but in this paper some convergence results are established on a computable approximation of $\widehat\theta_N$.
When translating a term calculus into a graphical formalism many inessential details are abstracted away. In the case of $\lambda$-calculus translated to proof-nets, these inessential details are captured by a notion of equivalence on $\lambda$-terms known as $\simeq_\sigma$-equivalence, in both the intuitionistic (due to Regnier) and classical (due to Laurent) cases. The purpose of this paper is to uncover a strong bisimulation behind $\simeq_\sigma$-equivalence, as formulated by Laurent for Parigot's $\lambda\mu$-calculus. This is achieved by introducing a relation $\simeq$, defined over a revised presentation of $\lambda\mu$-calculus we dub $\Lambda M$. More precisely, we first identify the reasons behind Laurent's $\simeq_\sigma$-equivalence on $\lambda\mu$-terms failing to be a strong bisimulation. Inspired by Laurent's \emph{Polarized Proof-Nets}, this leads us to distinguish multiplicative and exponential reduction steps on terms. Second, we enrich the syntax of $\lambda\mu$ to allow us to track the exponential operations. These technical ingredients pave the way towards a strong bisimulation for the classical case. We introduce a calculus $\Lambda M$ and a relation $\simeq$ that we show to be a strong bisimulation with respect to reduction in $\Lambda M$, ie. two $\simeq$-equivalent terms have the exact same reduction semantics, a result which fails for Regnier's $\simeq_\sigma$-equivalence in $\lambda$-calculus as well as for Laurent's $\simeq_\sigma$-equivalence in $\lambda\mu$. Although $\simeq$ is formulated over an enriched syntax and hence is not strictly included in Laurent's $\simeq_\sigma$, we show how it can be seen as a restriction of it.
For positive integers $d$ and $p$ such that $d \ge p$, we obtain complete asymptotic expansions, for large $d$, of the normalizing constants for the matrix Bingham and matrix Langevin distributions on Stiefel manifolds. The accuracy of each truncated expansion is strictly increasing in $d$; also, for sufficiently large $d$, the accuracy is strictly increasing in $m$, the number of terms in the truncated expansion. We apply these results to obtain the rate of convergence of these asymptotic expansions if both $d, p \to \infty$. Using values of $d$ and $p$ arising in various data sets, we illustrate the rate of convergence of the truncated approximations as $d$ or $m$ increases. These results extend our recent work on asymptotic expansions for the normalizing constants of the high-dimensional Bingham distributions.
Let $(X, d)$ be a metric space and $C \subseteq 2^X$ -- a collection of special objects. In the $(X,d,C)$-chasing problem, an online player receives a sequence of online requests $\{B_t\}_{t=1}^T \subseteq C$ and responds with a trajectory $\{x_t\}_{t=1}^T$ such that $x_t \in B_t$. This response incurs a movement cost $\sum_{t=1}^T d(x_t, x_{t-1})$, and the online player strives to minimize the competitive ratio -- the worst case ratio over all input sequences between the online movement cost and the optimal movement cost in hindsight. Under this setup, we call the $(X,d,C)$-chasing problem $\textit{chaseable}$ if there exists an online algorithm with finite competitive ratio. In the case of Convex Body Chasing (CBC) over real normed vector spaces, (Bubeck et al. 2019) proved the chaseability of the problem. Furthermore, in the vector space setting, the dimension of the ambient space appears to be the factor controlling the size of the competitive ratio. Indeed, recently, (Sellke 2020) provided a $d-$competitive online algorithm over arbitrary real normed vector spaces $(\mathbb{R}^d, ||\cdot||)$, and we will shortly present a general strategy for obtaining novel lower bounds of the form $\Omega(d^c), \enspace c > 0$, for CBC in the same setting. In this paper, we also prove that the $\textit{doubling}$ and $\textit{Assouad}$ dimensions of a metric space exert no control on the hardness of ball chasing over the said metric space. More specifically, we show that for any large enough $\rho \in \mathbb{R}$, there exists a metric space $(X,d)$ of doubling dimension $\Theta(\rho)$ and Assouad dimension $\rho$ such that no online selector can achieve a finite competitive ratio in the general ball chasing regime.
Let $\alpha$ and $\beta$ belong to the same quadratic field. We show that the inhomogeneous Beatty sequence $(\lfloor n \alpha + \beta \rfloor)_{n \geq 1}$ is synchronized, in the sense that there is a finite automaton that takes as input the Ostrowski representations of $n$ and $y$ in parallel, and accepts if and only if $y = \lfloor n \alpha + \beta \rfloor$. Since it is already known that the addition relation is computable for Ostrowski representations based on a quadratic number, a consequence is a new and rather simple proof that the first-order logical theory of these sequences with addition is decidable. The decision procedure is easily implemented in the free software Walnut. As an application, we show that for each $r \geq 1$ it is decidable whether the set $\{ \lfloor n \alpha + \beta \rfloor \, : \, n \geq 1 \}$ forms an additive basis (or asymptotic additive basis) of order $r$. Using our techniques, we also solve some open problems of Reble and Kimberling, and give an explicit characterization of a sequence of Hildebrand et al.
The Kneser graph $K(n,k)$ is defined for integers $n$ and $k$ with $n \geq 2k$ as the graph whose vertices are all the $k$-subsets of $[n]=\{1,2,\ldots,n\}$ where two such sets are adjacent if they are disjoint. The Schrijver graph $S(n,k)$ is defined as the subgraph of $K(n,k)$ induced by the collection of all $k$-subsets of $[n]$ that do not include two consecutive elements modulo $n$. It is known that the chromatic number of both $K(n,k)$ and $S(n,k)$ is $n-2k+2$. In the computational Kneser and Schrijver problems, we are given an access to a coloring with $n-2k+1$ colors of the vertices of $K(n,k)$ and $S(n,k)$ respectively, and the goal is to find a monochromatic edge. We prove that the problems admit randomized algorithms with running time $n^{O(1)} \cdot k^{O(k)}$, hence they are fixed-parameter tractable with respect to the parameter $k$. The analysis involves structural results on intersecting families and on induced subgraphs of Kneser and Schrijver graphs. We also study the Agreeable-Set problem of assigning a small subset of a set of $m$ items to a group of $\ell$ agents, so that all agents value the subset at least as much as its complement. As an application of our algorithm for the Kneser problem, we obtain a randomized polynomial-time algorithm for the Agreeable-Set problem for instances with $\ell \geq m - O(\frac{\log m}{\log \log m})$. We further show that the Agreeable-Set problem is at least as hard as a variant of the Kneser problem with an extended access to the input coloring.
Given a set $P$ of $n$ points in the plane, in general position, denote by $N_\Delta(P)$ the number of empty triangles with vertices in $P$. In this paper we investigate by how much $N_\Delta(P)$ changes if a point $x$ is removed from $P$. By constructing a graph $G_P(x)$ based on the arrangement of the empty triangles incident on $x$, we transform this geometric problem to the problem of counting triangles in the graph $G_P(x)$. We study properties of the graph $G_P(x)$ and, in particular, show that it is kite-free. This relates the growth rate of the number of empty triangles to the famous Ruzsa-Szemer\'edi problem.
In the maximum independent set of convex polygons problem, we are given a set of $n$ convex polygons in the plane with the objective of selecting a maximum cardinality subset of non-overlapping polygons. Here we study a special case of the problem where the edges of the polygons can take at most $d$ fixed directions. We present an $8d/3$-approximation algorithm for this problem running in time $O((nd)^{O(d4^d)})$. The previous-best polynomial-time approximation (for constant $d$) was a classical $n^\varepsilon$ approximation by Fox and Pach [SODA'11] that has recently been improved to a $OPT^{\varepsilon}$-approximation algorithm by Cslovjecsek, Pilipczuk and W\k{e}grzycki [SODA '24], which also extends to an arbitrary set of convex polygons. Our result builds on, and generalizes the recent constant factor approximation algorithms for the maximum independent set of axis-parallel rectangles problem (which is a special case of our problem with $d=2$) by Mitchell [FOCS'21] and G\'{a}lvez, Khan, Mari, M\"{o}mke, Reddy, and Wiese [SODA'22].
We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in $[0,1]^{d}$, assuming exponentially bounded weights. This addresses an open problem posed in \citet{safran2019depth}, and proves that the curse of dimensionality manifests in depth 2 approximation, even in cases where the target function can be represented efficiently using depth 3. Previously, lower bounds that were used to separate depth 2 from depth 3 required that at least one of the Lipschitz parameter, target accuracy or (some measure of) the size of the domain of approximation scale polynomially with the input dimension, whereas we fix the former two and restrict our domain to the unit hypercube. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of an average- to worst-case random self-reducibility argument, to reduce the problem to threshold circuits lower bounds.
We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|\rm{U}_0\| \leq \epsilon$ converges near-linearly to an $\epsilon$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|\rm{U}_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.
Semantic segmentation is a key prerequisite to robust image understanding for applications in \acrlong{ai} and Robotics. \acrlong{fss}, in particular, concerns the extension and optimization of traditional segmentation methods in challenging conditions where limited training examples are available. A predominant approach in \acrlong{fss} is to rely on a single backbone for visual feature extraction. Choosing which backbone to leverage is a deciding factor contributing to the overall performance. In this work, we interrogate on whether fusing features from different backbones can improve the ability of \acrlong{fss} models to capture richer visual features. To tackle this question, we propose and compare two ensembling techniques-Independent Voting and Feature Fusion. Among the available \acrlong{fss} methods, we implement the proposed ensembling techniques on PANet. The module dedicated to predicting segmentation masks from the backbone embeddings in PANet avoids trainable parameters, creating a controlled `in vitro' setting for isolating the impact of different ensembling strategies. Leveraging the complementary strengths of different backbones, our approach outperforms the original single-backbone PANet across standard benchmarks even in challenging one-shot learning scenarios. Specifically, it achieved a performance improvement of +7.37\% on PASCAL-5\textsuperscript{i} and of +10.68\% on COCO-20\textsuperscript{i} in the top-performing scenario where three backbones are combined. These results, together with the qualitative inspection of the predicted subject masks, suggest that relying on multiple backbones in PANet leads to a more comprehensive feature representation, thus expediting the successful application of \acrlong{fss} methods in challenging, data-scarce environments.