The forcing number of a graph with a perfect matching $M$ is the minimum number of edges in $M$ whose endpoints need to be deleted, such that the remaining graph only has a single perfect matching. This number is of great interest in theoretical chemistry, since it conveys information about the structural properties of several interesting molecules. On the other hand, in bipartite graphs the forcing number corresponds to the famous feedback vertex set problem in digraphs. Determining the complexity of finding the smallest forcing number of a given planar graph is still a widely open and important question in this area, originally proposed by Afshani, Hatami, and Mahmoodian in 2004. We take a first step towards the resolution of this question by providing an algorithm that determines the set of all possible forcing numbers of an outerplanar graph in polynomial time. This is the first polynomial-time algorithm concerning this problem for a class of graphs of comparable or greater generality.
Testing the equality of mean vectors across $g$ different groups plays an important role in many scientific fields. In regular frameworks, likelihood-based statistics under the normality assumption offer a general solution to this task. However, the accuracy of standard asymptotic results is not reliable when the dimension $p$ of the data is large relative to the sample size $n_i$ of each group. We propose here an exact directional test for the equality of $g$ normal mean vectors with identical unknown covariance matrix, provided that $\sum_{i=1}^g n_i \ge p+g+1$. In the case of two groups ($g=2$), the directional test is equivalent to the Hotelling's $T^2$ test. In the more general situation where the $g$ independent groups may have different unknown covariance matrices, although exactness does not hold, simulation studies show that the directional test is more accurate than most commonly used likelihood based solutions. Robustness of the directional approach and its competitors under deviation from multivariate normality is also numerically investigated.
Entropy comparison inequalities are obtained for the differential entropy $h(X+Y)$ of the sum of two independent random vectors $X,Y$, when one is replaced by a Gaussian. For identically distributed random vectors $X,Y$, these are closely related to bounds on the entropic doubling constant, which quantifies the entropy increase when adding an independent copy of a random vector to itself. Consequences of both large and small doubling are explored. For the former, lower bounds are deduced on the entropy increase when adding an independent Gaussian, while for the latter, a qualitative stability result for the entropy power inequality is obtained. In the more general case of non-identically distributed random vectors $X,Y$, a Gaussian comparison inequality with interesting implications for channel coding is established: For additive-noise channels with a power constraint, Gaussian codebooks come within a $\frac{{\sf snr}}{3{\sf snr}+2}$ factor of capacity. In the low-SNR regime this improves the half-a-bit additive bound of Zamir and Erez (2004). Analogous results are obtained for additive-noise multiple access channels, and for linear, additive-noise MIMO channels.
In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.
Let $T$ be a tree on $t$ vertices. We prove that for every positive integer $k$ and every graph $G$, either $G$ contains $k$ pairwise vertex-disjoint subgraphs each having a $T$ minor, or there exists a set $X$ of at most $t(k-1)$ vertices of $G$ such that $G-X$ has no $T$ minor. The bound on the size of $X$ is best possible and improves on an earlier $f(t)k$ bound proved by Fiorini, Joret, and Wood (2013) with some very fast growing function $f(t)$. Moreover, our proof is very short and simple.
We consider Gibbs distributions, which are families of probability distributions over a discrete space $\Omega$ with probability mass function of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_{\min}, \beta_{\max}]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The partition function is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters of these distributions are the log partition ratio $q = \log \tfrac{Z(\beta_{\max})}{Z(\beta_{\min})}$ and the counts $c_x = |H^{-1}(x)|$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we also develop algorithms to compute the partition function $Z$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and using $\tilde O(\frac{n^2}{\varepsilon^2})$ samples for integer-valued distributions.
The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more skeptical of higher-order polynomials and interactions *a priori* compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior skepticism can be applied to the possibly seasonal autoregressive (AR) structure of the series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the **fastTS** R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modeling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while producing more accurate predictions.
We continue the study of balanceable graphs, defined by Caro, Hansberg, and Montejano in 2021 as graphs $G$ such that any $2$-coloring of the edges of a sufficiently large complete graph containing sufficiently many edges of each color contains a balanced copy of $G$. While the problem of recognizing balanceable graphs was conjectured to be NP-complete by Dailly, Hansberg, and Ventura in 2021, balanceable graphs admit an elegant combinatorial characterization: a graph is balanceable if and only there exist two vertex subsets, one containing half of all the graph's edges and another one such that the corresponding cut contains half of all the graph's edges. We consider a special case of this property, namely when one of the two sets is a vertex cover, and call the corresponding graphs simply balanceable. We prove a number of results on balanceable and simply balanceable regular graphs. First, we characterize simply balanceable regular graphs via a condition involving the independence number of the graph. Second, we address a question of Dailly, Hansberg, and Ventura from 2021 and show that every cubic graph is balanceable. Third, using Brooks' theorem, we show that every $4$-regular graph with order divisible by $4$ is balanceable. Finally, we show that it is NP-complete to determine if a $9$-regular graph is simply balanceable.
To date, most methods for simulating conditioned diffusions are limited to the Euclidean setting. The conditioned process can be constructed using a change of measure known as Doob's $h$-transform. The specific type of conditioning depends on a function $h$ which is typically unknown in closed form. To resolve this, we extend the notion of guided processes to a manifold $M$, where one replaces $h$ by a function based on the heat kernel on $M$. We consider the case of a Brownian motion with drift, constructed using the frame bundle of $M$, conditioned to hit a point $x_T$ at time $T$. We prove equivalence of the laws of the conditioned process and the guided process with a tractable Radon-Nikodym derivative. Subsequently, we show how one can obtain guided processes on any manifold $N$ that is diffeomorphic to $M$ without assuming knowledge of the heat kernel on $N$. We illustrate our results with numerical simulations and an example of parameter estimation where a diffusion process on the torus is observed discretely in time.
We investigate the ratio $\avM(G)$ of the average size of a maximal matching to the size of a maximum matching in a graph $G$. If many maximal matchings have a size close to $\maxM(G)$, this graph invariant has a value close to 1. Conversely, if many maximal matchings have a small size, $\avM(G)$ approaches $\frac{1}{2}$. We propose a general technique to determine the asymptotic behavior of $\avM(G)$ for various classes of graphs. To illustrate the use of this technique, we first show how it makes it possible to find known asymptotic values of $\avM(G)$ which were typically obtained using generating functions, and we then determine the asymptotic value of $\avM(G)$ for other families of graphs, highlighting the spectrum of possible values of this graph invariant between $\frac{1}{2}$ and $1$.
Given a square pencil $A+ \lambda B$, where $A$ and $B$ are $n\times n$ complex (resp. real) matrices, we consider the problem of finding the singular complex (resp. real) pencil nearest to it in the Frobenius distance. This problem is known to be very difficult, and the few algorithms available in the literature can only deal efficiently with pencils of very small size. We show that the problem is equivalent to minimizing a certain objective function $f$ over the Riemannian manifold $SU(n) \times SU(n)$ (resp. $SO(n) \times SO(n)$ if the nearest real singular pencil is sought), where $SU(n)$ denotes the special unitary group (resp. $SO(n)$ denotes the special orthogonal group). This novel perspective is based on the generalized Schur form of pencils, and yields competitive numerical methods, by pairing it with { algorithms} capable of doing optimization on { Riemannian manifolds. We propose one algorithm that directly minimizes the (almost everywhere, but not everywhere, differentiable) function $f$, as well as a smoothed alternative and a third algorithm that is smooth and can also solve the problem} of finding a nearest singular pencil with a specified minimal index. We provide numerical experiments that show that the resulting methods allow us to deal with pencils of much larger size than alternative techniques, yielding candidate minimizers of comparable or better quality. In the course of our analysis, we also obtain a number of new theoretical results related to the generalized Schur form of a (regular or singular) square pencil and to the minimal index of a singular square pencil whose nullity is $1$.