The length function $\ell_q(r,R)$ is the smallest length of a $ q $-ary linear code with codimension (redundancy) $r$ and covering radius $R$. In this work, new upper bounds on $\ell_q(tR+1,R)$ are obtained in the following forms: \begin{equation*} \begin{split} &(a)~\ell_q(r,R)\le cq^{(r-R)/R}\cdot\sqrt[R]{\ln q},~ R\ge3,~r=tR+1,~t\ge1, &\phantom{(a)~} q\text{ is an arbitrary prime power},~c\text{ is independent of }q. \end{split} \end{equation*} \begin{equation*} \begin{split} &(b)~\ell_q(r,R)< 3.43Rq^{(r-R)/R}\cdot\sqrt[R]{\ln q},~ R\ge3,~r=tR+1,~t\ge1, &\phantom{(b)~} q\text{ is an arbitrary prime power},~q\text{ is large enough}. \end{split} \end{equation*} In the literature, for $q=(q')^R$ with $q'$ a prime power, smaller upper bounds are known; however, when $q$ is an arbitrary prime power, the bounds of this paper are better than the known ones. For $t=1$, we use a one-to-one correspondence between $[n,n-(R+1)]_qR$ codes and $(R-1)$-saturating $n$-sets in the projective space $\mathrm{PG}(R,q)$. A new construction of such saturating sets providing sets of small size is proposed. Then the $[n,n-(R+1)]_qR$ codes, obtained by geometrical methods, are taken as the starting ones in the lift-constructions (so-called "$q^m$-concatenating constructions") for covering codes to obtain infinite families of codes with growing codimension $r=tR+1$, $t\ge1$.
When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This paper answers this question by offering a new asymptotic theory about inference with missing data that is more general than existing theories. By using more powerful tools from real analysis and probability theory than those used in previous research, it proves that as the sample size increases and the extent of missingness decreases, the average-loglikelihood function generated by partial data and that ignores the missingness mechanism will almost surely converge uniformly to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences from partial data, such as posterior modes, uncertainty estimates, confidence intervals, likelihood ratios, test statistics, and indeed, all quantities or features derived from the partial-data loglikelihood function, will be consistently estimated. They will approximate their complete-data analogues. This adds to previous research which has only proved the consistency and asymptotic normality of the posterior mode, and developed separate theories for Direct-Likelihood, Bayesian, and Frequentist inference. Practical implications of this result are discussed, and the theory is verified using a previous study of International Human Rights Law.
In this paper, we provide a framework for constructing entanglement-assisted quantum error-correcting codes (EAQECCs) from classical additive codes over a finite commutative local Frobenius ring. We derive a formula for the minimum number of entanglement qudits required to construct an EAQECC from a linear code over the ring $\mathbb{Z}_{p^s}$. This is used to obtain an exact expression for the minimum number of entanglement qudits required to construct an EAQECC from an additive code over a Galois ring, which significantly extends known results for EAQECCs over finite fields.
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent to another with a smaller dual tree-depth, and thus does not reflect its geometric structure. We prove that the minimum dual tree-depth of a row-equivalent matrix is equal to the branch-depth of the matroid defined by the columns of the matrix. We design a fixed parameter algorithm for computing branch-depth of matroids represented over a finite field and a fixed parameter algorithm for computing a row-equivalent matrix with minimum dual tree-depth. Finally, we use these results to obtain an algorithm for integer programming running in time g(d*,D)poly(n) where d* is the branch-depth of the constraint matrix; the branch-depth cannot be replaced by the more permissive notion of branch-width.
We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees a cost approximation factor and a number of communication rounds that depend only on the computational capacity of the coordinator. Moreover, the algorithm includes a built-in stopping mechanism, which allows it to use fewer communication rounds whenever possible. We show both theoretically and empirically that in many natural cases, indeed 1-4 rounds suffice. In comparison with the popular k-means|| algorithm, our approach allows exploiting a larger coordinator capacity to obtain a smaller number of rounds. Our experiments show that the k-means cost obtained by the proposed algorithm is usually better than the cost obtained by k-means||, even when the latter is allowed a larger number of rounds. Moreover, the machine running time in our approach is considerably smaller than that of k-means||. Code for running the algorithm and experiments is available at //github.com/selotape/distributed_k_means.
A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.
Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network structures, and recover both the population-level and subject-level gene networks. Our framework encourages sparsity of covariate effects on both the mean and the precision matrix. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and establish in both cases the $\ell_2$ convergence rates and the selection consistency of the estimated precision parameters. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.
A toric code, introduced by Hansen to extend the Reed-Solomon code as a $k$-dimensional subspace of $\mathbb{F}_q^n$, is determined by a toric variety or its associated integral convex polytope $P \subseteq [0,q-2]^n$, where $k=|P \cap \mathbb{Z}^n|$ (the number of integer lattice points of $P$). There are two relevant parameters that determine the quality of a code: the information rate, which measures how much information is contained in a single bit of each codeword; and the relative minimum distance, which measures how many errors can be corrected relative to how many bits each codeword has. Soprunov and Soprunova defined a good infinite family of codes to be a sequence of codes of unbounded polytope dimension such that neither the corresponding information rates nor relative minimum distances go to 0 in the limit. We examine different ways of constructing families of codes by considering polytope operations such as the join and direct sum. In doing so, we give conditions under which no good family can exist and strong evidence that there is no such good family of codes.
Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.