Let $G=(V,E)$ be an $n$-vertex connected graph of maximum degree $\Delta$. Given access to $V$ and an oracle that given two vertices $u,v\in V$, returns the shortest path distance between $u$ and $v$, how many queries are needed to reconstruct $E$? We give a simple deterministic algorithm to reconstruct trees using $\Delta n\log_\Delta n+(\Delta+2)n$ distance queries and show that even randomised algorithms need to use at least $\frac1{200} \Delta n\log_\Delta n$ queries in expectation. The best previous lower bound was an information-theoretic lower bound of $\Omega(n\log n/\log \log n)$. Our lower bound also extends to related query models including distance queries for phylogenetic trees, membership queries for learning partitions and path queries in directed trees. We extend our deterministic algorithm to reconstruct graphs without induced cycles of length at least $k$ using $O_{\Delta,k}(n\log n)$ queries, which includes various graph classes of interest such as chordal graphs, permutation graphs and AT-free graphs. Since the previously best known randomised algorithm for chordal graphs uses $O_{\Delta}(n\log^2 n)$ queries in expectation, we both get rid off the randomness and get the optimal dependency in $n$ for chordal graphs and various other graph classes. Finally, we build on an algorithm of Kannan, Mathieu, and Zhou [ICALP, 2015] to give a randomised algorithm for reconstructing graphs of treelength $k$ using $O_{\Delta,k}(n\log^2n)$ queries in expectation.
In this paper, we propose a fully discrete soft thresholding trigonometric polynomial approximation on $[-\pi,\pi],$ named Lasso trigonometric interpolation. This approximation is an $\ell_1$-regularized discrete least squares approximation under the same conditions of classical trigonometric interpolation on an equidistant grid. Lasso trigonometric interpolation is sparse and meanwhile it is an efficient tool to deal with noisy data. We theoretically analyze Lasso trigonometric interpolation for continuous periodic function. The principal results show that the $L_2$ error bound of Lasso trigonometric interpolation is less than that of classical trigonometric interpolation, which improved the robustness of trigonometric interpolation. This paper also presents numerical results on Lasso trigonometric interpolation on $[-\pi,\pi]$, with or without the presence of data errors.
We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$ and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,\theta \rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,\theta \rangle$. Specifically, we show that with high probability, \[\sup_{\theta \in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left|\langle X_i,\theta \rangle^\sharp - \lambda^\theta_i \right|^2 \right)^{1/2} \leq c \left( \frac{d}{m} \right)^{1/4},\] where $\lambda^{\theta}_i = m\int_{(\frac{i-1}{m}, \frac{i}{m}]} F_{ \langle X,\theta \rangle }^{-1}(u)\,du$ and $a^\sharp$ denotes the monotone non-decreasing rearrangement of $a$. Moreover, this estimate is optimal. The proof follows from a sharp estimate on the worst Wasserstein distance between a marginal of $X$ and its empirical counterpart, $\frac{1}{m} \sum_{i=1}^m \delta_{\langle X_i, \theta \rangle}$.
We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by mapping it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase diagram of the model as a function of the submatrix average and the size of the submatrix $k$ in the limit $N\to\infty$. We consider submatrices of size $k = m N$ with $0 < m < 1$. We find a rich phase diagram, including dynamical, static one-step replica symmetry breaking and full-step replica symmetry breaking. In the limit of $m \to 0$, we find a simpler phase diagram featuring a frozen 1-RSB phase, where the Gibbs measure is composed of exponentially many pure states each with zero entropy. We discover an interesting phenomenon, reminiscent of the phenomenology of the binary perceptron: there exist efficient algorithms that provably work in the frozen 1-RSB phase.
A set of vertices of a graph $G$ is said to be decycling if its removal leaves an acyclic subgraph. The size of a smallest decycling set is the decycling number of $G$. Generally, at least $\lceil(n+2)/4\rceil$ vertices have to be removed in order to decycle a cubic graph on $n$ vertices. In 1979, Payan and Sakarovitch proved that the decycling number of a cyclically $4$-edge-connected cubic graph of order $n$ equals $\lceil (n+2)/4\rceil$. In addition, they characterised the structure of minimum decycling sets and their complements. If $n\equiv 2\pmod4$, then $G$ has a decycling set which is independent and its complement induces a tree. If $n\equiv 0\pmod4$, then one of two possibilities occurs: either $G$ has an independent decycling set whose complement induces a forest of two trees, or the decycling set is near-independent (which means that it induces a single edge) and its complement induces a tree. In this paper we strengthen the result of Payan and Sakarovitch by proving that the latter possibility (a near-independent set and a tree) can always be guaranteed. Moreover, we relax the assumption of cyclic $4$-edge-connectivity to a significantly weaker condition expressed through the canonical decomposition of 3-connected cubic graphs into cyclically $4$-edge-connected ones. Our methods substantially use a surprising and seemingly distant relationship between the decycling number and the maximum genus of a cubic graph.
In root finding and optimization, there are many cases where there is a closed set $A$ one does not the sequence constructed by one's favourite method will converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose a new method aiming to achieve this: we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. We first explain the heuristic for this method in the case where the minimum of the cost function is exactly 0, and then explain how to proceed if the minimum is non-zero (allowing both positive and negative values). The method is very suitable for iterative algorithms which have the descent property. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples to illustrate the usefulness of the new approach.
We study how to verify specific frequency distributions when we observe a stream of $N$ data items taken from a universe of $n$ distinct items. We introduce the \emph{relative Fr\'echet distance} to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if $f $ is close to $g$ or if $f$ is far from $g$ with high probability, when $f$ is given and $g$ is defined by a stream. If $f$ is uniform we show a space $\Omega(n)$ lower bound. If $f$ decreases fast enough, we then only use space $O(\log^2 n\cdot \log\log n)$. The analysis relies on the Spacesaving algorithm \cite{MAE2005,Z22} and on sampling the stream.
We prove that for any graph $G$ of maximum degree at most $\Delta$, the zeros of its chromatic polynomial $\chi_G(x)$ (in $\mathbb{C}$) lie inside the disc of radius $5.94 \Delta$ centered at $0$. This improves on the previously best known bound of approximately $6.91\Delta$. We also obtain improved bounds for graphs of high girth. We prove that for every $g$ there is a constant $K_g$ such that for any graph $G$ of maximum degree at most $\Delta$ and girth at least $g$, the zeros of its chromatic polynomial $\chi_G(x)$ lie inside the disc of radius $K_g \Delta$ centered at $0$, where $K_g$ is the solution to a certain optimization problem. In particular, $K_g < 5$ when $g \geq 5$ and $K_g < 4$ when $g \geq 25$ and $K_g$ tends to approximately $3.86$ as $g \to \infty$. Key to the proof is a classical theorem of Whitney which allows us to relate the chromatic polynomial of a graph $G$ to the generating function of so-called broken-circuit-free forests in $G$. We also establish a zero-free disc for the generating function of all forests in $G$ (aka the partition function of the arboreal gas) which may be of independent interest.
Let $X$ be a $p$-variate random vector and $\widetilde{X}$ a knockoff copy of $X$ (in the sense of \cite{CFJL18}). A new approach for constructing $\widetilde{X}$ (henceforth, NA) has been introduced in \cite{JSPI}. NA has essentially three advantages: (i) To build $\widetilde{X}$ is straightforward; (ii) The joint distribution of $(X,\widetilde{X})$ can be written in closed form; (iii) $\widetilde{X}$ is often optimal under various criteria. However, for NA to apply, $X_1,\ldots, X_p$ should be conditionally independent given some random element $Z$. Our first result is that any probability measure $\mu$ on $\mathbb{R}^p$ can be approximated by a probability measure $\mu_0$ of the form $$\mu_0\bigl(A_1\times\ldots\times A_p\bigr)=E\Bigl\{\prod_{i=1}^p P(X_i\in A_i\mid Z)\Bigr\}.$$ The approximation is in total variation distance when $\mu$ is absolutely continuous, and an explicit formula for $\mu_0$ is provided. If $X\sim\mu_0$, then $X_1,\ldots,X_p$ are conditionally independent. Hence, with a negligible error, one can assume $X\sim\mu_0$ and build $\widetilde{X}$ through NA. Our second result is a characterization of the knockoffs $\widetilde{X}$ obtained via NA. It is shown that $\widetilde{X}$ is of this type if and only if the pair $(X,\widetilde{X})$ can be extended to an infinite sequence so as to satisfy certain invariance conditions. The basic tool for proving this fact is de Finetti's theorem for partially exchangeable sequences. In addition to the quoted results, an explicit formula for the conditional distribution of $\widetilde{X}$ given $X$ is obtained in a few cases. In one of such cases, it is assumed $X_i\in\{0,1\}$ for all $i$.
For a functor $Q$ from a category $C$ to the category $Pos$ of ordered sets and order-preserving functions, we study liftings of various kind of structures from the base category $C$ to the total(or Grothendieck) category $\int Q$. That lifting a monoidal structure corresponds to giving some lax natural transformation making $Q$ almost monoidal, might be part of folklore in category theory.We rely on and generalize the tools supporting this correspondence so to provide exact conditions for lifting symmetric monoidal closed and star-autonomous structures.A corollary of these characterizations is that, if $Q$ factors as a monoidal functor through $SLatt$, the category of complete lattices and sup-preserving functions, then $\int Q$ is always symmetric monoidalclosed. In this case, we also provide a method, based on the double negation nucleus from quantale theory, to turn $\int Q$ into a star-autonomous category.The theory developed, originally motivated from the categories $P-Set$ of Schalk and de Paiva, yields a wide generalization of Hyland and Schalk construction of star-autonomous categories by means of orthogonality structures.
We study the behavior of a label propagation algorithm (LPA) on the Erd\H{o}s-R\'enyi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while at each of the next rounds, ties are broken uniformly at random. The algorithm terminates once all labels stay the same in two consecutive iterations. LPA is successfully used in practice for detecting communities in networks (corresponding to vertex sets with the same label after termination of the algorithm). Perhaps surprisingly, LPA's performance on dense random graphs is hard to analyze, and so far convergence to consenus was known only when $np\ge n^{3/4+\varepsilon}$. By a very careful multi-stage exposure of the edges, we break this barrier and show that, when $np \ge n^{5/8+\varepsilon}$, a.a.s. the algorithm terminates with a single label. Moreover, we show that, if $np\gg n^{2/3}$, a.a.s. this label is the smallest one, whereas if $n^{5/8+\varepsilon}\le np\ll n^{2/3}$, the surviving label is a.a.s. not the smallest one.