Recently, Armstrong, Guzm\'an, and Sing Long (2021), presented an optimal $O(n^2)$ time algorithm for strict circular seriation (called also the recognition of strict quasi-circular Robinson spaces). In this paper, we give a very simple $O(n\log n)$ time algorithm for computing a compatible circular order for strict circular seriation. When the input space is not known to be strict quasi-circular Robinson, our algorithm is complemented by an $O(n^2)$ time verification of compatibility of the returned order. This algorithm also works for recognition of other types of strict circular Robinson spaces known in the literature. We also prove that the circular Robinson dissimilarities (which are defined by the existence of compatible orders on one of the two arcs between each pair of points) are exactly the pre-circular Robinson dissimilarities (which are defined by a four-point condition).
The theory of mixed finite element methods for solving different types of elliptic partial differential equations in saddle-point formulation is well established since many decades. However, this topic was mostly studied for variational formulations defined upon the same finite-element product spaces of both shape- and test-pairs of primal variable-multiplier. Whenever these two product spaces are different the saddle point problem is asymmetric. It turns out that the conditions to be satisfied by the finite elements product spaces stipulated in the few works on this case may be of limited use in practice. The purpose of this paper is to provide an in-depth analysis of the well-posedness and the uniform stability of asymmetric approximate saddle point problems, based on the theory of continuous linear operators on Hilbert spaces. Our approach leads to necessary and sufficient conditions for such properties to hold, expressed in a readily exploitable form with fine constants. In particular standard interpolation theory suffices to estimate the error of a conforming method.
Range Avoidance (AVOID) is a total search problem where, given a Boolean circuit $C\colon\{0,1\}^n\to\{0,1\}^m$, $m>n$, the task is to find a $y\in\{0,1\}^m$ outside the range of $C$. For an integer $k\geq 2$, $\mathrm{NC}^0_k$-AVOID is a special case of AVOID where each output bit of $C$ depends on at most $k$ input bits. While there is a very natural randomized algorithm for AVOID, a deterministic algorithm for the problem would have many interesting consequences. Ren, Santhanam, and Wang (FOCS 2022) and Guruswami, Lyu, and Wang (RANDOM 2022) proved that explicit constructions of functions of high formula complexity, rigid matrices, and optimal linear codes, reduce to $\mathrm{NC}^0_4$-AVOID, thus establishing conditional hardness of the $\mathrm{NC}^0_4$-AVOID problem. On the other hand, $\mathrm{NC}^0_2$-AVOID admits polynomial-time algorithms, leaving the question about the complexity of $\mathrm{NC}^0_3$-AVOID open. We give the first reduction of an explicit construction question to $\mathrm{NC}^0_3$-AVOID. Specifically, we prove that a polynomial-time algorithm (with an $\mathrm{NP}$ oracle) for $\mathrm{NC}^0_3$-AVOID for the case of $m=n+n^{2/3}$ would imply an explicit construction of a rigid matrix, and, thus, a super-linear lower bound on the size of log-depth circuits. We also give deterministic polynomial-time algorithms for all $\mathrm{NC}^0_k$-AVOID problems for $m\geq n^{k-1}/\log(n)$. Prior work required an $\mathrm{NP}$ oracle, and required larger stretch, $m \geq n^{k-1}$.
We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem.
For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.
This paper examines the approximation of log-determinant for large-scale symmetric positive definite matrices. Inspired by the variance reduction technique, we split the approximation of $\log\det(A)$ into two parts. The first to compute is the trace of the projection of $\log(A)$ onto a suboptimal subspace, while the second is the trace of the projection on the corresponding orthogonal complementary space. For these two approximations, the stochastic Lanczos quadrature method is used. Furthermore, in the construction of the suboptimal subspace, we utilize a projection-cost-preserving sketch to bound the size of the Gaussian random matrix and the dimension of the suboptimal subspace. We provide a rigorous error analysis for our proposed method and explicit lower bounds for its design parameters, offering guidance for practitioners. We conduct numerical experiments to demonstrate our method's effectiveness and illustrate the quality of the derived bounds.
The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.
Under a nonlinear regression model with univariate response an algorithm for the generation of sequential adaptive designs is studied. At each stage, the current design is augmented by adding $p$ design points where $p$ is the dimension of the parameter of the model. The augmenting $p$ points are such that, at the current parameter estimate, they constitute the locally D-optimal design within the set of all saturated designs. Two relevant subclasses of nonlinear regression models are focused on, which were considered in previous work of the authors on the adaptive Wynn algorithm: firstly, regression models satisfying the `saturated identifiability condition' and, secondly, generalized linear models. Adaptive least squares estimators and adaptive maximum likelihood estimators in the algorithm are shown to be strongly consistent and asymptotically normal, under appropriate assumptions. For both model classes, if a condition of `saturated D-optimality' is satisfied, the almost sure asymptotic D-optimality of the generated design sequence is implied by the strong consistency of the adaptive estimators employed by the algorithm. The condition states that there is a saturated design which is locally D-optimal at the true parameter point (in the class of all designs).
For a graph class $\mathcal{G}$, we define the $\mathcal{G}$-modular cardinality of a graph $G$ as the minimum size of a vertex partition of $G$ into modules that each induces a graph in $\mathcal{G}$. This generalizes other module-based graph parameters such as neighborhood diversity and iterated type partition. Moreover, if $\mathcal{G}$ has bounded modular-width, the W[1]-hardness of a problem in $\mathcal{G}$-modular cardinality implies hardness on modular-width, clique-width, and other related parameters. On the other hand, fixed-parameter tractable (FPT) algorithms in $\mathcal{G}$-modular cardinality may provide new ideas for algorithms using such parameters. Several FPT algorithms based on modular partitions compute a solution table in each module, then combine each table into a global solution. This works well when each table has a succinct representation, but as we argue, when no such representation exists, the problem is typically W[1]-hard. We illustrate these ideas on the generic $(\alpha, \beta)$-domination problem, which asks for a set of vertices that contains at least a fraction $\alpha$ of the adjacent vertices of each unchosen vertex, plus some (possibly negative) amount $\beta$. This generalizes known domination problems such as Bounded Degree Deletion, $k$-Domination, and $\alpha$-Domination. We show that for graph classes $\mathcal{G}$ that require arbitrarily large solution tables, these problems are W[1]-hard in the $\mathcal{G}$-modular cardinality, whereas they are fixed-parameter tractable when they admit succinct solution tables. This leads to several new positive and negative results for many domination problems parameterized by known and novel structural graph parameters such as clique-width, modular-width, and $cluster$-modular cardinality.
A binary code of blocklength $n$ and codebook size $M$ is called an $(n,M)$ code, which is studied for memoryless binary symmetric channels (BSCs) with the maximum likelihood (ML) decoding. For any $n \geq 2$, some optimal codes among the linear $(n,4)$ codes have been explicitly characterized in the previous study, but whether the optimal codes among the linear codes are better than all the nonlinear codes or not is unknown. In this paper, we first show that for any $n\geq 2$, there exists an optimal code (among all the $(n,4)$ codes) that is either linear or in a subset of nonlinear codes, called Class-I codes. We identified all the optimal codes among the linear $(n,4)$ codes for each blocklength $n\geq 2$, and found ones that were not given in literature. For any $n$ from $2$ to $300$, all the optimal $(n,4)$ codes are identified, where except for $n=3$, all the optimal $(n,4)$ codes are equivalent to linear codes. There exist optimal $(3,4)$ codes that are not equivalent to linear codes. Furthermore, we derive a subset of nonlinear codes called Class-II codes and justify that for any $n >300$, the set composed of linear, Class-I and Class-II codes and their equivalent codes contains all the optimal $(n,4)$ codes. Both Class-I and Class-II codes are close to linear codes in the sense that they involve only one type of columns that are not included in linear codes. Our results are obtained using a new technique to compare the ML decoding performance of two codes, featured by a partition of the entire range of the channel output.
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column indices to be consecutive. In the detection problem, under the null hypothesis, the observed matrix is a realization of independent and identically distributed standard normal entries. Under the alternative, there exists a set of hidden submatrices with elevated means inside the same standard normal matrix. Recovery refers to the task of locating the hidden submatrices. For both problems, and for both models, we characterize the statistical and computational barriers by deriving information-theoretic lower bounds, designing and analyzing algorithms matching those bounds, and proving computational lower bounds based on the low-degree polynomials conjecture. In particular, we show that the space of the model parameters (i.e., number of planted submatrices, their dimensions, and elevated mean) can be partitioned into three regions: the impossible regime, where all algorithms fail; the hard regime, where while detection or recovery are statistically possible, we give some evidence that polynomial-time algorithm do not exist; and finally the easy regime, where polynomial-time algorithms exist.