Transmit a codeword $x$, that belongs to an $(\ell-1)$-deletion-correcting code of length $n$, over a $t$-deletion channel for some $1\le \ell\le t<n$. Levenshtein, in 2001, proposed the problem of determining $N(n,\ell,t)+1$, the minimum number of distinct channel outputs required to uniquely reconstruct $x$. Prior to this work, $N(n,\ell,t)$ is known only when $\ell\in\{1,2\}$. Here, we provide an asymptotically exact solution for all values of $\ell$ and $t$. Specifically, we show that $N(n,\ell,t)=\binom{2\ell}{\ell}/(t-\ell)! n^{t-\ell} - O(n^{t-\ell-1})$ and in the special instance where $\ell=t$, we show that $N(n,\ell,\ell)=\binom{2\ell}{\ell}$. We also provide a conjecture on the exact value of $N(n,\ell,t)$ for all values of $n$, $\ell$, and $t$.
We show that any $n$-bit string can be recovered with high probability from $\exp(\widetilde{O}(n^{1/5}))$ independent random subsequences.
In this paper, we discuss two-stage encoding algorithms capable of correcting a fraction of asymmetric errors. Suppose that the encoder transmits $n$ binary symbols $(x_1,\ldots,x_n)$ one-by-one over the Z-channel, in which a 1 is received only if a 1 is transmitted. At some designated moment, say $n_1$, the encoder uses noiseless feedback and adjusts further encoding strategy based on the partial output of the channel $(y_1,\ldots,y_{n_1})$. The goal is to transmit error-free as much information as possible under the assumption that the total number of errors inflicted by the Z-channel is limited by $\tau n$, $0<\tau<1$. We propose an encoding strategy that uses a list-decodable code at the first stage and a high-error low-rate code at the second stage. This strategy and our converse result yield that there is a sharp transition at $\tau=\max\limits_{0<w<1}\frac{w + w^3}{1+4w^3}\approx 0.44$ from positive rate to zero rate for two-stage encoding strategies. As side results, we derive bounds on the size of list-decodable codes for the Z-channel and prove that for a fraction $1/4+\epsilon$ of asymmetric errors, an error-correcting code contains at most $O(\epsilon^{-3/2})$ codewords.
This paper tackles two problems that are relevant to coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation of it. The first part of this paper studies the deletion channel that deletes a symbol with some fixed probability $p$, while focusing on two instances of this channel. Since operating the maximum likelihood (ML) decoder in this case is computationally unfeasible, we study a slightly degraded version of this decoder for two channels and its expected normalized distance. We identify the dominant error patterns and based on these observations, it is derived that the expected normalized distance of the degraded ML decoder is roughly $\frac{3q-1}{q-1}p^2$, when the transmitted word is any $q$-ary sequence and $p$ is the channel's deletion probability. We also study the cases when the transmitted word belongs to the Varshamov Tenengolts (VT) code or the shifted VT code. Additionally, the insertion channel is studied as well as the case of two insertion channels. These theoretical results are verified by corresponding simulations. The second part of the paper studies optimal decoding for a special case of the deletion channel, the $k$-deletion channel, which deletes exactly $k$ symbols of the transmitted word uniformly at random. In this part, the goal is to understand how an optimal decoder operates in order to minimize the expected normalized distance. A full characterization of an efficient optimal decoder for this setup, referred to as the maximum likelihood* (ML*) decoder, is given for a channel that deletes one or two symbols.
Consider a random graph process with $n$ vertices corresponding to points $v_{i} \sim {Unif}[0,1]$ embedded randomly in the interval, and where edges are inserted between $v_{i}, v_{j}$ independently with probability given by the graphon $w(v_{i},v_{j}) \in [0,1]$. Following Chuangpishit et al. (2015), we call a graphon $w$ diagonally increasing if, for each $x$, $w(x,y)$ decreases as $y$ moves away from $x$. We call a permutation $\sigma \in S_{n}$ an ordering of these vertices if $v_{\sigma(i)} < v_{\sigma(j)}$ for all $i < j$, and ask: how can we accurately estimate $\sigma$ from an observed graph? We present a randomized algorithm with output $\hat{\sigma}$ that, for a large class of graphons, achieves error $\max_{1 \leq i \leq n} | \sigma(i) - \hat{\sigma}(i)| = O^{*}(\sqrt{n})$ with high probability; we also show that this is the best-possible convergence rate for a large class of algorithms and proof strategies. Under an additional assumption that is satisfied by some popular graphon models, we break this "barrier" at $\sqrt{n}$ and obtain the vastly better rate $O^{*}(n^{\epsilon})$ for any $\epsilon > 0$. These improved seriation bounds can be combined with previous work to give more efficient and accurate algorithms for related tasks, including: estimating diagonally increasing graphons, and testing whether a graphon is diagonally increasing.
There are many applications of max flow with capacities that depend on one or more parameters. Many of these applications fall into the "Source-Sink Monotone" framework, a special case of Topkis's monotonic optimization framework, which implies that the parametric min cuts are nested. When there is a single parameter, this property implies that the number of distinct min cuts is linear in the number of nodes, which is quite useful for constructing algorithms to identify all possible min cuts. When there are multiple Source-Sink Monotone parameters and the vector of parameters are ordered in the usual vector sense, the resulting min cuts are still nested. However, the number of distinct min cuts was an open question. We show that even with only two parameters, the number of distinct min cuts can be exponential in the number of nodes.
The aim of this thesis is to develop a theoretical framework to study parameter estimation of quantum channels. We study the task of estimating unknown parameters encoded in a channel in the sequential setting. A sequential strategy is the most general way to use a channel multiple times. Our goal is to establish lower bounds (called Cramer-Rao bounds) on the estimation error. The bounds we develop are universally applicable; i.e., they apply to all permissible quantum dynamics. We consider the use of catalysts to enhance the power of a channel estimation strategy. This is termed amortization. The power of a channel for a parameter estimation is determined by its Fisher information. Thus, we study how much a catalyst quantum state can enhance the Fisher information of a channel by defining the amortized Fisher information. We establish our bounds by proving that for certain Fisher information quantities, catalyst states do not improve the performance of a sequential estimation protocol compared to a parallel one. The technical term for this is an amortization collapse. We use this to establish bounds when estimating one parameter, or multiple parameters simultaneously. Our bounds apply universally and we also cast them as optimization problems. For the single parameter case, we establish bounds for general quantum channels using both the symmetric logarithmic derivative (SLD) Fisher information and the right logarithmic derivative (RLD) Fisher information. The task of estimating multiple parameters simultaneously is more involved than the single parameter case, because the Cramer-Rao bounds take the form of matrix inequalities. We establish a scalar Cramer-Rao bound for multiparameter channel estimation using the RLD Fisher information. For both single and multiparameter estimation, we provide a no-go condition for the so-called Heisenberg scaling using our RLD-based bound.
$\newcommand{\Emph}[1]{{\it{#1}}} \newcommand{\FF}{\mathcal{F}}\newcommand{\region}{\mathsf{r}}\newcommand{\restrictY}[2]{#1 \cap {#2}}$For a set of points $P \subseteq \mathbb{R}^2$, and a family of regions $\FF$, a $\Emph{local~t-spanner}$ of $P$, is a sparse graph $G$ over $P$, such that, for any region $\region \in \FF$, the subgraph restricted to $\region$, denoted by $\restrictY{G}{\region} = G_{P \cap \region}$, is a $t$-spanner for all the points of $\region \cap P$. We present algorithms for the construction of local spanners with respect to several families of regions, such as homothets of a convex region. Unfortunately, the number of edges in the resulting graph depends logarithmically on the spread of the input point set. We prove that this dependency can not be removed, thus settling an open problem raised by Abam and Borouny. We also show improved constructions (with no dependency on the spread) of local spanners for fat triangles, and regular $k$-gons. In particular, this improves over the known construction for axis parallel squares. We also study a somewhat weaker notion of local spanner where one allows to shrink the region a "bit". Any spanner is a weak local spanner if the shrinking is proportional to the diameter. Surprisingly, we show a near linear size construction of a weak spanner for axis-parallel rectangles, where the shrinkage is $\Emph{multiplicative}$.
In this paper, we investigate the graphs in which all balls are convex and the groups acting on them geometrically (which we call CB-graphs and CB-groups). These graphs have been introduced and characterized by Soltan and Chepoi (1983) and Farber and Jamison (1987). CB-graphs and CB-groups generalize systolic (alias bridged) and weakly systolic graphs and groups, which play an important role in geometric group theory. We present metric and local-to-global characterizations of CB-graphs. Namely, we characterize CB-graphs $G$ as graphs whose triangle-pentagonal complexes $X(G)$ are simply connected and balls of radius at most $3$ are convex. Similarly to systolic and weakly systolic graphs, we prove a dismantlability result for CB-graphs $G$: we show that their squares $G^2$ are dismantlable. This implies that the Rips complexes of CB-graphs are contractible. Finally, we adapt and extend the approach of Januszkiewicz and Swiatkowski (2006) for systolic groups and of Chalopin et al. (2020) for Helly groups, to show that the CB-groups are biautomatic.
Given a sequence $S$ of length $n$, a letter-duplicated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i\in\Sigma$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of $S$ can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at least 6 times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, $(\leq 2,1,\leq 3)$-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence -- much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in $S$ at most 3 times, then the problem admits a factor $1.5-O(\frac{1}{n})$ approximation. Finally, we consider the weighted version, where the weight of a block $x_i^{d_i} (d_i\geq 2)$ could be any positive function which might not grow with $d_i$. We give a non-trivial $O(n^2)$ time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of $S$ whose weight is maximized.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.