For any given alphabet of size $q$, a Homopolymer Free code (HF code) refers to an $(n, M, d)_q$ code of length $n$, size $M$ and minimum Hamming distance $d$, where all the codewords are homopolymer free sequences. For any given alphabet, this work provides upper and lower bounds on the maximum size of any HF code using Sphere Packing bound and Gilbert-Varshamov bound. Further, upper and lower bounds on the maximum size of HF codes for various HF code families are calculated. Also, as a specific case, upper and lower bounds are obtained on the maximum size of homopolymer free DNA codes.
The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.
A binary code of blocklength $n$ and codebook size $M$ is called an $(n,M)$ code, which is studied for memoryless binary symmetric channels (BSCs) with the maximum likelihood (ML) decoding. For any $n \geq 2$, some optimal codes among the linear $(n,4)$ codes have been explicitly characterized in the previous study, but whether the optimal codes among the linear codes are better than all the nonlinear codes or not is unknown. In this paper, we first show that for any $n\geq 2$, there exists an optimal code (among all the $(n,4)$ codes) that is either linear or in a subset of nonlinear codes, called Class-I codes. We identified all the optimal codes among the linear $(n,4)$ codes for each blocklength $n\geq 2$, and found ones that were not given in literature. For any $n$ from $2$ to $300$, all the optimal $(n,4)$ codes are identified, where except for $n=3$, all the optimal $(n,4)$ codes are equivalent to linear codes. There exist optimal $(3,4)$ codes that are not equivalent to linear codes. Furthermore, we derive a subset of nonlinear codes called Class-II codes and justify that for any $n >300$, the set composed of linear, Class-I and Class-II codes and their equivalent codes contains all the optimal $(n,4)$ codes. Both Class-I and Class-II codes are close to linear codes in the sense that they involve only one type of columns that are not included in linear codes. Our results are obtained using a new technique to compare the ML decoding performance of two codes, featured by a partition of the entire range of the channel output.
The Graph Exploration problem asks a searcher to explore an unknown environment. The environment is modeled as a graph, where the searcher needs to visit each vertex beginning at some vertex $s$. Furthermore, Treasure Hunt problems are a variation of Graph Exploration, in which the searcher needs to find a hidden treasure, which is located at some vertex $t$. In these online problems, any online algorithm performs poorly because it has too little knowledge about the instance to react adequately to the requests of the adversary. Thus, the impact of a priori knowledge is of interest. In graph problems, one form of a priori knowledge is a map of the graph. We survey the graph exploration and treasure hunt problem with an unlabeled map, which is an isomorphic copy of the graph, that is provided to the searcher. We formulate decision variants of both problems by interpreting the online problems as a game between the online algorithm (the searcher) and the adversary. The map, however, is not controllable by the adversary. The question is, whether the searcher is able to explore the graph fully or find the treasure for all possible decisions of the adversary. We prove the PSPACE-completeness of these games, whereby we analyze the variations which ask for the mere existence of a tour through the graph or path to the treasure and the variations that include costs. Additionally, we analyze the complexity of related problems that relax the path constraint, allowing multiple visits of vertices or edges, or have additional constraints, like requiring to visit specific edges.
The dichromatic number $\vec{\chi}(G)$ of a digraph $G$ is the least integer $k$ such that $G$ can be partitioned into $k$ acyclic digraphs. A digraph is $k$-dicritical if $\vec{\chi}(G) = k$ and each proper subgraph $H$ of $G$ satisfies $\vec{\chi}(H) \leq k-1$. %An oriented graph is a digraph with no cycle of length $2$. We prove various bounds on the minimum number of arcs in a $k$-dicritical digraph, a structural result on $k$-dicritical digraphs and a result on list-dicolouring. We characterise $3$-dicritical digraphs $G$ with $(k-1)|V(G)| + 1$ arcs. For $k \geq 4$, we characterise $k$-dicritical digraphs $G$ on at least $k+1$ vertices and with $(k-1)|V(G)| + k-3$ arcs, generalising a result of Dirac. We prove that, for $k \geq 5$, every $k$-dicritical digraph $G$ has at least $(k-1/2 - 1/(k-1)) |V(G)| - k(1/2 - 1/(k-1))$ arcs, which is the best known lower bound. We prove that the number of connected components induced by the vertices of degree $2(k-1)$ of a $k$-dicritical digraph is at most the number of connected components in the rest of the digraph, generalising a result of Stiebitz. Finally, we generalise a Theorem of Thomassen on list-chromatic number of undirected graphs to list-dichromatic number of digraphs.
We consider spin systems on general $n$-vertex graphs of unbounded degree and explore the effects of spectral independence on the rate of convergence to equilibrium of global Markov chains. Spectral independence is a novel way of quantifying the decay of correlations in spin system models, which has significantly advanced the study of Markov chains for spin systems. We prove that whenever spectral independence holds, the popular Swendsen--Wang dynamics for the $q$-state ferromagnetic Potts model on graphs of maximum degree $\Delta$, where $\Delta$ is allowed to grow with $n$, converges in $O((\Delta \log n)^c)$ steps where $c > 0$ is a constant independent of $\Delta$ and $n$. We also show a similar mixing time bound for the block dynamics of general spin systems, again assuming that spectral independence holds. Finally, for monotone spin systems such as the Ising model and the hardcore model on bipartite graphs, we show that spectral independence implies that the mixing time of the systematic scan dynamics is $O(\Delta^c \log n)$ for a constant $c>0$ independent of $\Delta$ and $n$. Systematic scan dynamics are widely popular but are notoriously difficult to analyze. Our result implies optimal $O(\log n)$ mixing time bounds for any systematic scan dynamics of the ferromagnetic Ising model on general graphs up to the tree uniqueness threshold. Our main technical contribution is an improved factorization of the entropy functional: this is the common starting point for all our proofs. Specifically, we establish the so-called $k$-partite factorization of entropy with a constant that depends polynomially on the maximum degree of the graph.
This paper is a collection of results on combinatorial properties of codes for the Z-channel. A Z-channel with error fraction $\tau$ takes as input a length-$n$ binary codeword and injects in an adversarial manner up to $n\tau$ asymmetric errors, i.e., errors that only zero out bits but do not flip $0$'s to $1$'s. It is known that the largest $(L-1)$-list-decodable code for the Z-channel with error fraction $\tau$ has exponential size (in $n$) if $\tau$ is less than a critical value that we call the $(L-1)$-list-decoding Plotkin point and has constant size if $\tau$ is larger than the threshold. The $(L-1)$-list-decoding Plotkin point is known to be $ L^{-\frac{1}{L-1}} - L^{-\frac{L}{L-1}} $, which equals $1/4$ for unique-decoding with $ L-1=1 $. In this paper, we derive various results for the size of the largest codes above and below the list-decoding Plotkin point. In particular, we show that the largest $(L-1)$-list-decodable code $\epsilon$-above the Plotkin point, {for any given sufficiently small positive constant $ \epsilon>0 $,} has size $\Theta_L(\epsilon^{-3/2})$ for any $L-1\ge1$. We also devise upper and lower bounds on the exponential size of codes below the list-decoding Plotkin point.
The chain graph model admits both undirected and directed edges in one graph, where symmetric conditional dependencies are encoded via undirected edges and asymmetric causal relations are encoded via directed edges. Though frequently encountered in practice, the chain graph model has been largely under investigated in literature, possibly due to the lack of identifiability conditions between undirected and directed edges. In this paper, we first establish a set of novel identifiability conditions for the Gaussian chain graph model, exploiting a low rank plus sparse decomposition of the precision matrix. Further, an efficient learning algorithm is built upon the identifiability conditions to fully recover the chain graph structure. Theoretical analysis on the proposed method is conducted, assuring its asymptotic consistency in recovering the exact chain graph structure. The advantage of the proposed method is also supported by numerical experiments on both simulated examples and a real application on the Standard & Poor 500 index data.
In this paper, firstly, we reconcile the graphs with permutation-representation number at most two. While the characterization of the class of graphs with permutation-representation number at most three is an open problem, we show that the trees and the even cycles belong to this class. In this connection, we give polynomial-time algorithms for obtaining words representing the trees and the even cycles permutationally.
Consistent weighted least square estimators are proposed for a wide class of nonparametric regression models with random regression function, where this real-valued random function of $k$ arguments is assumed to be continuous with probability 1. We obtain explicit upper bounds for the rate of uniform convergence in probability of the new estimators to the unobservable random regression function for both fixed or random designs. In contrast to the predecessors' results, the bounds for the convergence are insensitive to the correlation structure of the $k$-variate design points. As an application, we study the problem of estimating the mean and covariance functions of random fields with additive noise under dense data conditions. The theoretical results of the study are illustrated by simulation examples which show that the new estimators are more accurate in some cases than the Nadaraya--Watson ones. An example of processing real data on earthquakes in Japan in 2012--2021 is included.
Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all units for whom the treatment effect exceeds that threshold. For example, if the cutoff is zero, the estimand is the set of all units who would benefit from receiving treatment. Assigning treatment just to this set represents the optimal treatment rule that maximises the mean population outcome. Similarly, cutoffs greater than zero represent optimal rules under resource constraints. The level set estimator that we study follows the plug-in principle and consists of simply thresholding a good estimator of the CATE. While many CATE estimators have been recently proposed and analysed, how their properties relate to those of the corresponding level set estimators remains unclear. Our first goal is thus to fill this gap by deriving the asymptotic properties of level set estimators depending on which estimator of the CATE is used. Next, we identify a minimax optimal estimator in a model where the CATE, the propensity score and the outcome model are Holder-smooth of varying orders. We consider data generating processes that satisfy a margin condition governing the probability of observing units for whom the CATE is close to the threshold. We investigate the performance of the estimators in simulations and illustrate our methods on a dataset used to study the effects on mortality of laparoscopic vs open surgery in the treatment of various conditions of the colon.