A detection system, modeled in a graph, is composed of "detectors" positioned at a subset of vertices in order to uniquely locate an ``intruder" at any vertex. \emph{Identifying codes} use detectors that can sense the presence or absence of an intruder within distance one. We introduce a fault-tolerant identifying code called a \emph{redundant identifying code}, which allows at most one detector to go offline or be removed without disrupting the detection system. In real-world applications, this would be a necessary feature, as it would allow for maintenance on individual components without disabling the entire system. Specifically, we prove that the problem of determining the lowest cardinality of an identifying code for an arbitrary graph is NP-complete, we determine the bounds on the lowest cardinality for special classes of graphs, including trees, cylinders, and cubic graphs.
The main problem in the area of property testing is to understand which graph properties are \emph{testable}, which means that with constantly many queries to any input graph $G$, a tester can decide with good probability whether $G$ satisfies the property, or is far from satisfying the property. Testable properties are well understood in the dense model and in the bounded degree model, but little is known in sparse graph classes when graphs are allowed to have unbounded degree. This is the setting of the \emph{sparse model}. We prove that for any proper minor-closed class $\mathcal{G}$, any monotone property (i.e., any property that is closed under taking subgraphs) is testable for graphs from $\mathcal{G}$ in the sparse model. This extends a result of Czumaj and Sohler (FOCS'19), who proved it for monotone properties with finitely many obstructions. Our result implies for instance that for any integers $k$ and $t$, $k$-colorability of $K_t$-minor free graphs is testable in the sparse model. Elek recently proved that monotone properties of bounded degree graphs from minor-closed classes that are closed under disjoint union can be verified by an approximate proof labeling scheme in constant time. We show again that the assumption of bounded degree can be omitted in his result.
We consider the problem of estimating high-dimensional covariance matrices of $K$-populations or classes in the setting where the sample sizes are comparable to the data dimension. We propose estimating each class covariance matrix as a distinct linear combination of all class sample covariance matrices. This approach is shown to reduce the estimation error when the sample sizes are limited, and the true class covariance matrices share a somewhat similar structure. We develop an effective method for estimating the coefficients in the linear combination that minimize the mean squared error under the general assumption that the samples are drawn from (unspecified) elliptically symmetric distributions possessing finite fourth-order moments. To this end, we utilize the spatial sign covariance matrix, which we show (under rather general conditions) to be an asymptotically unbiased estimator of the normalized covariance matrix as the dimension grows to infinity. We also show how the proposed method can be used in choosing the regularization parameters for multiple target matrices in a single class covariance matrix estimation problem. We assess the proposed method via numerical simulation studies including an application in global minimum variance portfolio optimization using real stock data.
The infinite tumbling block graph is a bipartite graph, where each vertex in one partite set is of degree 3 and each vertex in the other partite set is of degree 6. It is a 2-dimensional array of blocks of seven vertices and nine edges, a planar graph that has 3-D looks. This paper introduces tumbling block graphs and considers various graphical parameters for different classes of infinite and finite tumbling blocks.
In this paper we study the number $r_{bwt}$ of equal-letter runs produced by the Burrows-Wheeler transform ($BWT$) when it is applied to purely morphic finite words, which are words generated by iterating prolongable morphisms. Such a parameter $r_{bwt}$ is very significant since it provides a measure of the performances of the $BWT$, in terms of both compressibility and indexing. In particular, we prove that, when $BWT$ is applied to any purely morphic finite word on a binary alphabet, $r_{bwt}$ is $\mathcal{O}(\log n)$, where $n$ is the length of the word. Moreover, we prove that $r_{bwt}$ is $\Theta(\log n)$ for the binary words generated by a large class of prolongable binary morphisms. These bounds are proved by providing some new structural properties of the \emph{bispecial circular factors} of such words.
The ongoing NIST standardization process has shown that Proof of Knowledge (PoK) based signatures have become an important type of possible post-quantum signatures. Regarding code-based cryptography, the original approach for PoK based signatures is the Stern protocol which allows to prove the knowledge of a small weight vector solving a given instance of the Syndrome Decoding (SD) problem over F2. It features a soundness error equal to 2/3. This protocol was improved a few years later by V\'eron who proposed a variation of the scheme based on the General Syndrome Decoding (GSD) problem which leads to better results in term of communication. A few years later, the AGS protocol introduced a variation of the V\'eron protocol based on Quasi-Cyclic (QC) matrices. The AGS protocol permits to obtain an asymptotic soundness error of 1/2 and an improvement in term of communications. In the present paper, we introduce the Quasi-Cyclic Stern PoK which constitutes an adaptation of the AGS scheme in a SD context, as well as several new optimizations for code-based PoK. Our main optimization on the size of the signature can't be applied to GSD based protocols such as AGS and therefore motivated the design of our new protocol. In addition, we also provide a special soundness proof that is compatible with the use of the Fiat-Shamir transform for 5-round protocols. This approach is valid for our protocol but also for the AGS protocol which was lacking such a proof. We compare our results with existing signatures including the recent code-based signatures based on PoK leveraging the MPC in the head paradigm. In practice, our new protocol is as fast as AGS while reducing its associated signature length by 20%. As a consequence, it constitutes an interesting trade-off between signature length and execution time for the design of a code-based signature relying only on the difficulty of the SD problem.
How can we approximate sparse graphs and sequences of sparse graphs (with unbounded average degree)? We consider convergence in the first $k$ moments of the graph spectrum (equivalent to the numbers of closed $k$-walks) appropriately normalized. We introduce a simple, easy to sample, random graph model that captures the limiting spectra of many sequences of interest, including the sequence of hypercube graphs. The Random Overlapping Communities (ROC) model is specified by a distribution on pairs $(s,q)$, $s \in \mathbb{Z}_+, q \in (0,1]$. A graph on $n$ vertices with average degree $d$ is generated by repeatedly picking pairs $(s,q)$ from the distribution, adding an Erd\H{o}s-R\'{e}nyi random graph of edge density $q$ on a subset of vertices chosen by including each vertex with probability $s/n$, and repeating this process so that the expected degree is $d$. Our proof of convergence to a ROC random graph is based on the Stieltjes moment condition. We also show that the model is an effective approximation for individual graphs. For almost all possible triangle-to-edge and four-cycle-to-edge ratios, there exists a pair $(s,q)$ such that the ROC model with this single community type produces graphs with both desired ratios, a property that cannot be achieved by stochastic block models of bounded description size. Moreover, ROC graphs exhibit an inverse relationship between degree and clustering coefficient, a characteristic of many real-world networks.
Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols that use randomization to remove the assumptions of bounds on message delivery times, making them more resilient to adverse network conditions. However, these protocols still fall short of being practical across a broad range of scenarios due to their cubic communication costs, use of expensive primitives, and overall protocol complexity. In this paper, we present Alea-BFT, the first asynchronous BFT protocol to achieve quadratic communication complexity, allowing it to scale to large networks. Alea-BFT brings the key design insight from classical protocols of concentrating part of the work on a single designated replica, and incorporates this principle in a two stage pipelined design, with an efficient broadcast led by the designated replica followed by an inexpensive binary agreement. We evaluated our prototype implementation across 10 sites in 4 continents, and our results show significant scalability gains from the proposed design.
A class of models that have been widely used are the exponential random graph (ERG) models, which form a comprehensive family of models that include independent and dyadic edge models, Markov random graphs, and many other graph distributions, in addition to allow the inclusion of covariates that can lead to a better fit of the model. Another increasingly popular class of models in statistical network analysis are stochastic block models (SBMs). They can be used for the purpose of grouping nodes into communities or discovering and analyzing a latent structure of a network. The stochastic block model is a generative model for random graphs that tends to produce graphs containing subsets of nodes characterized by being connected to each other, called communities. Many researchers from various areas have been using computational tools to adjust these models without, however, analyzing their suitability for the data of the networks they are studying. The complexity involved in the estimation process and in the goodness-of-fit verification methodologies for these models can be factors that make the analysis of adequacy difficult and a possible discard of one model in favor of another. And it is clear that the results obtained through an inappropriate model can lead the researcher to very wrong conclusions about the phenomenon studied. The purpose of this work is to present a simple methodology, based on Hypothesis Tests, to verify if there is a model specification error for these two cases widely used in the literature to represent complex networks: the ERGM and the SBM. We believe that this tool can be very useful for those who want to use these models in a more careful way, verifying beforehand if the models are suitable for the data under study.
A homomorphism from a graph $G$ to a graph $H$ is an edge-preserving mapping from $V(G)$ to $V(H)$. For a fixed graph $H$, in the list homomorphism problem, denoted by LHom($H$), we are given a graph $G$, whose every vertex $v$ is equipped with a list $L(v) \subseteq V(H)$. We ask if there exists a homomorphism $f$ from $G$ to $H$, in which $f(v) \in L(v)$ for every $v \in V(G)$. Feder, Hell, and Huang [JGT~2003] proved that LHom($H$) is polynomial time-solvable if $H$ is a bi-arc-graph, and NP-complete otherwise. We are interested in the complexity of the LHom($H$) problem in graphs excluding a copy of some fixed graph $F$ as an induced subgraph. It is known that if $F$ is connected and is not a path nor a subdivided claw, then for every non-bi-arc graph the LHom($H$) problem is NP-complete and cannot be solved in subexponential time, unless the ETH fails. We consider the remaining cases for connected graphs $F$. If $F$ is a path, we exhibit a full dichotomy. We define a class called predacious graphs and show that if $H$ is not predacious, then for every fixed $t$ the LHom($H$) problem can be solved in quasi-polynomial time in $P_t$-free graphs. On the other hand, if $H$ is predacious, then there exists $t$, such that LHom($H$) cannot be solved in subexponential time in $P_t$-free graphs. If $F$ is a subdivided claw, we show a full dichotomy in two important cases: for $H$ being irreflexive (i.e., with no loops), and for $H$ being reflexive (i.e., where every vertex has a loop). Unless the ETH fails, for irreflexive $H$ the LHom($H$) problem can be solved in subexponential time in graphs excluding a fixed subdivided claw if and only if $H$ is non-predacious and triangle-free. If $H$ is reflexive, then LHom($H$) cannot be solved in subexponential time whenever $H$ is not a bi-arc graph.
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.