Regular functions from infinite words to infinite words can be equivalently specified by MSO-transducers, streaming $\omega$-string transducers as well as deterministic two-way transducers with look-ahead. In their one-way restriction, the latter transducers define the class of rational functions. Even though regular functions are robustly characterised by several finite-state devices, even the subclass of rational functions may contain functions which are not computable (by a Turing machine with infinite input). This paper proposes a decision procedure for the following synthesis problem: given a regular function $f$ (equivalently specified by one of the aforementioned transducer model), is $f$ computable and if it is, synthesize a Turing machine computing it. For regular functions, we show that computability is equivalent to continuity, and therefore the problem boils down to deciding continuity. We establish a generic characterisation of continuity for functions preserving regular languages under inverse image (such as regular functions). We exploit this characterisation to show the decidability of continuity (and hence computability) of rational and regular functions. For rational functions, we show that this can be done in \textsc{NLogSpace} (it was already known to be in \textsc{PTime} by Prieur). In a similar fashion, we also effectively characterise uniform continuity of regular functions, and relate it to the notion of uniform computability, which offers stronger efficiency guarantees.
We show that the problem of determining the feasibility of quadratic systems over $\mathbb{C}$, $\mathbb{R}$, and $\mathbb{Z}$ requires exponential time. This separates P and NP over these fields/rings in the BCSS model of computation.
We introduce the problem of finding a satisfying assignment to a CNF formula that must further belong to a prescribed input subspace. Equivalent formulations of the problem include finding a point outside a union of subspaces (the Union-of-Subspace Avoidance (USA) problem), and finding a common zero of a system of polynomials over $\F_2$ each of which is a product of affine forms. We focus on the case of k-CNF formulas (the k-SUB-SAT problem). Clearly, it is no easier than k-SAT, and might be harder. Indeed, via simple reductions we show NP-hardness for k=2 and W[1]-hardness parameterized by the co-dimension of the subspace. We also prove that the optimization version Max-2-SUB-SAT is NP-hard to approximate better than the trivial 3/4 ratio even on satisfiable instances. On the algorithmic front, we investigate fast exponential algorithms which give non-trivial savings over brute-force algorithms. We give a simple branching algorithm with runtime 1.5^r for 2-SUB-SAT, where $r$ is the subspace dimension and an O^*(1.4312)^n time algorithm where $n$ is the number of variables. For k more than 2, while known algorithms for solving a system of degree $k$ polynomial equations already imply a solution with runtime 2^{r(1-1/2k)}, we explore a more combinatorial approach. For instance, based on the notion of critical variables, we give an algorithm with running time ${n\choose {\le t}} 2^{n-n/k}$, where $n$ is the number of variables and $t$ is the co-dimension of the subspace. This improves upon the running time of the polynomial equations approach for small co-dimension. Our algorithm also achieves polynomial space in contrast to the algebraic approach that uses exponential space.
We consider a stationary linear AR($p$) model with unknown mean. The autoregression parameters as well as the distribution function (d.f.) $G$ of innovations are unknown. The observations contain gross errors (outliers). The distribution of outliers is unknown and arbitrary, their intensity is $\gamma n^{-1/2}$ with an unknown $\gamma$, $n$ is the sample size. The assential problem in such situation is to test the normality of innovations. Normality, as is known, ensures the optimality properties of widely used least squares procedures. To construct and study a Pearson chi-square type test for normality we estimate the unknown mean and the autoregression parameters. Then, using the estimates, we find the residuals in the autoregression. Based on them, we construct a kind of empirical distribution function (r.e.d.f.) , which is a counterpart of the (inaccessible) e.d.f. of the autoregression innovations. Our Pearson's satatistic is the functional from r.e.d.f. Its asymptotic distributions under the hypothesis and the local alternatives are determined by the asymptotic behavior of r.e.d.f. %Therefore, the study of the asymptotic properties of r.e.d.f. is a natural and meaningful task. In the present work, we find and substantiate in details the stochastic expansions of the r.e.d.f. in two situations. In the first one d.f. $ G (x) $ of innovations does not depend on $ n $. We need this result to investigate test statistic under the hypothesis. In the second situation $ G (x) $ depends on $ n $ and has the form of a mixture $ G (x) = A_n (x) = (1-n ^ {- 1/2}) G_0 (x) + n ^ { -1/2} H (x). $ We need this result to study the power of test under the local alternatives.
In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter N, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window, which leads to more stable gradient paths for sparse recovery.
A graph $G$ is $k$-vertex-critical if $G$ has chromatic number $k$ but every proper induced subgraph of $G$ has chromatic number less than $k$. The study of $k$-vertex-critical graphs for graph classes is an important topic in algorithmic graph theory because if the number of such graphs that are in a given hereditary graph class is finite, then there is a polynomial-time algorithm to decide if a graph in the class is $(k-1)$-colorable. In this paper, we prove that for every fixed integer $k\ge 1$, there are only finitely many $k$-vertex-critical ($P_5$,gem)-free graphs and $(P_5,\overline{P_3+P_2})$-free graphs. To prove the results we use a known structure theorem for ($P_5$,gem)-free graphs combined with properties of $k$-vertex-critical graphs. Moreover, we characterize all $k$-vertex-critical ($P_5$,gem)-free graphs and $(P_5,\overline{P_3+P_2})$-free graphs for $k \in \{4,5\}$ using a computer generation algorithm.
In this short note, we prove an asymptotic expansion for the ratio of the Dirichlet density to the multivariate normal density with the same mean and covariance matrix. The expansion is then used to derive an upper bound on the total variation between the corresponding probability measures and rederive the asymptotic variance of the Dirichlet kernel estimators introduced by Aitchison & Lauder (1985) and studied theoretically in Ouimet (2020). Another potential application related to the asymptotic equivalence between the Gaussian variance regression problem and the Gaussian white noise problem is briefly mentioned but left open for future research.
An automaton is unambiguous if for every input it has at most one accepting computation. An automaton is k-ambiguous (for k > 0) if for every input it has at most k accepting computations. An automaton is boundedly ambiguous if it is k-ambiguous for some $k \in \mathbb{N}$. An automaton is finitely (respectively, countably) ambiguous if for every input it has at most finitely (respectively, countably) many accepting computations. The degree of ambiguity of a regular language is defined in a natural way. A language is k-ambiguous (respectively, boundedly, finitely, countably ambiguous) if it is accepted by a k-ambiguous (respectively, boundedly, finitely, countably ambiguous) automaton. Over finite words every regular language is accepted by a deterministic automaton. Over finite trees every regular language is accepted by an unambiguous automaton. Over $\omega$-words every regular language is accepted by an unambiguous B\"uchi automaton and by a deterministic parity automaton. Over infinite trees Carayol et al. showed that there are ambiguous languages. We show that over infinite trees there is a hierarchy of degrees of ambiguity: For every k > 1 there are k-ambiguous languages that are not k - 1 ambiguous; and there are finitely (respectively countably, uncountably) ambiguous languages that are not boundedly (respectively finitely, countably) ambiguous.
Decision trees have long been recognized as models of choice in sensitive applications where interpretability is of paramount importance. In this paper, we examine the computational ability of Boolean decision trees in deriving, minimizing, and counting sufficient reasons and contrastive explanations. We prove that the set of all sufficient reasons of minimal size for an instance given a decision tree can be exponentially larger than the size of the input (the instance and the decision tree). Therefore, generating the full set of sufficient reasons can be out of reach. In addition, computing a single sufficient reason does not prove enough in general; indeed, two sufficient reasons for the same instance may differ on many features. To deal with this issue and generate synthetic views of the set of all sufficient reasons, we introduce the notions of relevant features and of necessary features that characterize the (possibly negated) features appearing in at least one or in every sufficient reason, and we show that they can be computed in polynomial time. We also introduce the notion of explanatory importance, that indicates how frequent each (possibly negated) feature is in the set of all sufficient reasons. We show how the explanatory importance of a feature and the number of sufficient reasons can be obtained via a model counting operation, which turns out to be practical in many cases. We also explain how to enumerate sufficient reasons of minimal size. We finally show that, unlike sufficient reasons, the set of all contrastive explanations for an instance given a decision tree can be derived, minimized and counted in polynomial time.
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as the graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many studies on extending deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks, graph recurrent neural networks, graph attention networks, graph generative networks, spatial-temporal graph convolutional networks, and hybrid forms of GNNs) are summarized, and key applications in power systems such as fault diagnosis, power prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.