We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that most models written in terms of elementary functions cannot achieve the learnability demonstrated in this theorem.
Let $P$ be a $k$-colored set of $n$ points in the plane, $4 \leq k \leq n$. We study the problem of deciding if $P$ contains a subset of four points of different colors such that its Rectilinear Convex Hull has positive area. We show this problem to be equivalent to deciding if there exists a point $c$ in the plane such that each of the open quadrants defined by $c$ contains a point of $P$, each of them having a different color. We provide an $O(n \log n)$-time algorithm for this problem, where the hidden constant does not depend on $k$; then, we prove that this problem has time complexity $\Omega(n \log n)$ in the algebraic computation tree model. No general position assumptions for $P$ are required.
An edge $e$ of a graph $G$ is called deletable for some orientation $o$ if the restriction of $o$ to $G-e$ is a strong orientation. Inspired by a problem of Frank, in 2021 H\"orsch and Szigeti proposed a new parameter for $3$-edge-connected graphs, called the Frank number, which refines $k$-edge-connectivity. The Frank number is defined as the minimum number of orientations of $G$ for which every edge of $G$ is deletable in at least one of them. They showed that every $3$-edge-connected graph has Frank number at most $7$ and that in case these graphs are also $3$-edge-colourable the parameter is at most $3$. Here we strengthen both results by showing that every $3$-edge-connected graph has Frank number at most $4$ and that every graph which is $3$-edge-connected and $3$-edge-colourable has Frank number $2$. The latter also confirms a conjecture by Bar\'at and Bl\'azsik. Furthermore, we prove two sufficient conditions for cubic graphs to have Frank number $2$ and use them in an algorithm to computationally show that the Petersen graph is the only cyclically $4$-edge-connected cubic graph up to $36$ vertices having Frank number greater than $2$.
Accelerated failure time (AFT) models are frequently used to model survival data, providing a direct quantification of the relationship between event times and covariates. These models allow for the acceleration or deceleration of failure times through a multiplicative factor that accounts for the effect of covariates. While existing literature provides numerous methods for fitting AFT models with time-fixed covariates, adapting these approaches to scenarios involving both time-varying covariates and partly interval-censored data remains challenging. Motivated by a randomised clinical trial dataset on advanced melanoma patients, we propose a maximum penalised likelihood approach for fitting a semiparametric AFT model to survival data with partly interval-censored failure times. This method also accommodates both time-fixed and time-varying covariates. We utilise Gaussian basis functions to construct a smooth approximation of the non-parametric baseline hazard and fit the model using a constrained optimisation approach. The effectiveness of our method is demonstrated through extensive simulations. Finally, we illustrate the relevance of our approach by applying it to a dataset from a randomised clinical trial involving patients with advanced melanoma.
We study the behavior of a label propagation algorithm (LPA) on the Erd\H{o}s-R\'enyi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while at each of the next rounds, ties are broken uniformly at random. The algorithm terminates once all labels stay the same in two consecutive iterations. LPA is successfully used in practice for detecting communities in networks (corresponding to vertex sets with the same label after termination of the algorithm). Perhaps surprisingly, LPA's performance on dense random graphs is hard to analyze, and so far convergence to consensus was known only when $np\ge n^{3/4+\varepsilon}$, where LPA converges in three rounds. By defining an alternative label attribution procedure which converges to the label propagation algorithm after three rounds, a careful multi-stage exposure of the edges allows us to break the $n^{3/4+\varepsilon}$ barrier and show that, when $np \ge n^{5/8+\varepsilon}$, a.a.s.\ the algorithm terminates with a single label. Moreover, we show that, if $np\gg n^{2/3}$, a.a.s.\ this label is the smallest one, whereas if $n^{5/8+\varepsilon}\le np\ll n^{2/3}$, the surviving label is a.a.s.\ not the smallest one. En passant, we show a presumably new monotonicity lemma for Binomial random variables that might be of independent interest.
We address the regression problem for a general function $f:[-1,1]^d\to \mathbb R$ when the learner selects the training points $\{x_i\}_{i=1}^n$ to achieve a uniform error bound across the entire domain. In this setting, known historically as nonparametric regression, we aim to establish a sample complexity bound that depends solely on the function's degree of smoothness. Assuming periodicity at the domain boundaries, we introduce PADUA, an algorithm that, with high probability, provides performance guarantees optimal up to constant or logarithmic factors across all problem parameters. Notably, PADUA is the first parametric algorithm with optimal sample complexity for this setting. Due to this feature, we prove that, differently from the non-parametric state of the art, PADUA enjoys optimal space complexity in the prediction phase. To validate these results, we perform numerical experiments over functions coming from real audio data, where PADUA shows comparable performance to state-of-the-art methods, while requiring only a fraction of the computational time.
The Hartman-Watson distribution with density $f_r(t)$ is a probability distribution defined on $t \geq 0$ which appears in several problems of applied probability. The density of this distribution is expressed in terms of an integral $\theta(r,t)$ which is difficult to evaluate numerically for small $t\to 0$. Using saddle point methods, we obtain the first two terms of the $t\to 0$ expansion of $\theta(\rho/t,t)$ at fixed $\rho >0$. An error bound is obtained by numerical estimates of the integrand, which is furthermore uniform in $\rho$. As an application we obtain the leading asymptotics of the density of the time average of the geometric Brownian motion as $t\to 0$. This has the form $\mathbb{P}(\frac{1}{t} \int_0^t e^{2(B_s+\mu s)} ds \in da) \sim (2\pi t)^{-1/2} g(a,\mu) e^{-\frac{1}{t} J(a)} da/a$, with an exponent $J(a)$ which reproduces the known result obtained previously using Large Deviations theory.
We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.
A statistical network model with overlapping communities can be generated as a superposition of mutually independent random graphs of varying size. The model is parameterized by the number of nodes, the number of communities, and the joint distribution of the community size and the edge probability. This model admits sparse parameter regimes with power-law limiting degree distributions and non-vanishing clustering coefficients. This article presents large-scale approximations of clique and cycle frequencies for graph samples generated by the model, which are valid for regimes with unbounded numbers of overlapping communities. Our results reveal the growth rates of these subgraph frequencies and show that their theoretical densities can be reliably estimated from data.
We prove, for stably computably enumerable formal systems, direct analogues of the first and second incompleteness theorems of G\"odel. A typical stably computably enumerable set is the set of Diophantine equations with no integer solutions, and in particular such sets are generally not computably enumerable. And so this gives the first extension of the second incompleteness theorem to non classically computable formal systems. Let's motivate this with a somewhat physical application. Let $\mathcal{H} $ be the suitable infinite time limit (stabilization in the sense of the paper) of the mathematical output of humanity, specializing to first order sentences in the language of arithmetic (for simplicity), and understood as a formal system. Suppose that all the relevant physical processes in the formation of $\mathcal{H} $ are Turing computable. Then as defined $\mathcal{H} $ may \emph{not} be computably enumerable, but it is stably computably enumerable. Thus, the classical G\"odel disjunction applied to $\mathcal{H} $ is meaningless, but applying our incompleteness theorems to $\mathcal{H} $ we then get a sharper version of G\"odel's disjunction: assume $\mathcal{H} \vdash PA$ then either $\mathcal{H} $ is not stably computably enumerable or $\mathcal{H} $ is not 1-consistent (in particular is not sound) or $\mathcal{H} $ cannot prove a certain true statement of arithmetic (and cannot disprove it if in addition $\mathcal{H} $ is 2-consistent).
We consider the quantum query complexity of local search as a function of graph geometry. Given a graph $G = (V,E)$ with $n$ vertices and black box access to a function $f : V \to \mathbb{R}$, the goal is find a vertex $v$ that is a local minimum, i.e. with $f(v) \leq f(u)$ for all $(u,v) \in E$, using as few oracle queries as possible. We show that the quantum query complexity of local search on $G$ is $\Omega\bigl( \frac{n^{\frac{3}{4}}}{\sqrt{g}} \bigr)$, where $g$ is the vertex congestion of the graph. For a $\beta$-expander with maximum degree $\Delta$, this implies a lower bound of $ \Omega\bigl(\frac{\sqrt{\beta} \; n^{\frac{1}{4}}}{\sqrt{\Delta} \; \log{n}} \bigr)$. We obtain these bounds by applying the strong weighted adversary method to a construction by Br\^anzei, Choo, and Recker (2024). As a corollary, on constant degree expanders, we derive a lower bound of $\Omega\bigl(\frac{n^{\frac{1}{4}}}{ \sqrt{\log{n}}} \bigr)$. This improves upon the best prior quantum lower bound of $\Omega\bigl( \frac{n^{\frac{1}{8}}}{\log{n}}\bigr) $ by Santha and Szegedy (2004). In contrast to the classical setting, a gap remains in the quantum case between our lower bound and the best-known upper bound of $O\bigl( n^{\frac{1}{3}} \bigr)$ for such graphs.