We treat the problem of the Frobenius distance evaluation from a given matrix $ A \in \mathbb R^{n\times n} $ with distinct eigenvalues to the manifold of matrices with multiple eigenvalues. On restricting considerations to the rank $ 1 $ real perturbation matrices, we prove that the distance in question equals $ \sqrt{z_{\ast}} $ where $ z_{\ast} $ is a positive (generically, the least positive) zero of the algebraic equation $$ \mathcal F(z) = 0, \ \mbox{where} \ \mathcal F(z):= \mathcal D_{\lambda} \left( \det \left[ (\lambda I - A)(\lambda I - A^{\top})-z I_n \right] \right)/z^n $$ and $ \mathcal D_{\lambda} $ stands for the discriminant of the polynomial treated with respect to $\lambda $. In the framework of this approach we also provide the procedure for finding the nearest to $ A $ matrix with multiple eigenvalue. Generalization of the problem to the case of complex perturbations is also discussed. Several examples are presented clarifying the computational aspects of the approach.
The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes. It relates the boundedness of random processes indexed by elements of a metric space to complexity measures arising from certain multiscale combinatorial structures, such as packing and covering trees. This paper builds on the ideas first outlined in a little-noticed preprint of Andreas Maurer to present an information-theoretic perspective on the majorizing measure theorem, according to which the boundedness of random processes is phrased in terms of the existence of efficient variable-length codes for the elements of the indexing metric space.
The Taylor expansion, which stems from Linear Logic and its differential extensions, is an approximation framework for the $\lambda$-calculus (and many of its variants). The reduction of the approximants of a $\lambda$-term induces a reduction on the $\lambda$-term itself, which enjoys a simulation property: whenever a term reduces to another, the approximants reduce accordingly. In recent work, we extended this result to an infinitary $\lambda$-calculus (namely, $\Lambda_{\infty}^{001}$). This short paper solves the question whether the converse property also holds: if the approximants of some term reduce to the approximants of another term, is there a $\beta$-reduction between these terms? This happens to be true for the $\lambda$-calculus, as we show, but our proof fails in the infinitary case. We exhibit a counter-example, refuting the conservativity for $\Lambda_{\infty}^{001}$.
As Machine Learning models are considered for autonomous decisions with significant social impact, the need for understanding how these models work rises rapidly. Explainable Artificial Intelligence (XAI) aims to provide interpretations for predictions made by Machine Learning models, in order to make the model trustworthy and more transparent for the user. For example, selecting relevant input variables for the problem directly impacts the model's ability to learn and make accurate predictions, so obtaining information about input importance play a crucial role when training the model. One of the main XAI techniques to obtain input variable importance is the sensitivity analysis based on partial derivatives. However, existing literature of this method provide no justification of the aggregation metrics used to retrieved information from the partial derivatives. In this paper, a theoretical framework is proposed to study sensitivities of ML models using metric techniques. From this metric interpretation, a complete family of new quantitative metrics called $\alpha$-curves is extracted. These $\alpha$-curves provide information with greater depth on the importance of the input variables for a machine learning model than existing XAI methods in the literature. We demonstrate the effectiveness of the $\alpha$-curves using synthetic and real datasets, comparing the results against other XAI methods for variable importance and validating the analysis results with the ground truth or literature information.
We present the conditional determinantal point process (DPP) approach to obtain new (mostly Fredholm determinantal) expressions for various eigenvalue statistics in random matrix theory. It is well-known that many (especially $\beta=2$) eigenvalue $n$-point correlation functions are given in terms of $n\times n$ determinants, i.e., they are continuous DPPs. We exploit a derived kernel of the conditional DPP which gives the $n$-point correlation function conditioned on the event of some eigenvalues already existing at fixed locations. Using such kernels we obtain new determinantal expressions for the joint densities of the $k$ largest eigenvalues, probability density functions of the $k^\text{th}$ largest eigenvalue, density of the first eigenvalue spacing, and more. Our formulae are highly amenable to numerical computations and we provide various numerical experiments. Several numerical values that required hours of computing time could now be computed in seconds with our expressions, which proves the effectiveness of our approach. We also demonstrate that our technique can be applied to an efficient sampling of DR paths of the Aztec diamond domino tiling. Further extending the conditional DPP sampling technique, we sample Airy processes from the extended Airy kernel. Additionally we propose a sampling method for non-Hermitian projection DPPs.
Courcelle's theorem and its adaptations to cliquewidth have shaped the field of exact parameterized algorithms and are widely considered the archetype of algorithmic meta-theorems. In the past decade, there has been growing interest in developing parameterized approximation algorithms for problems which are not captured by Courcelle's theorem and, in particular, are considered not fixed-parameter tractable under the associated widths. We develop a generalization of Courcelle's theorem that yields efficient approximation schemes for any problem that can be captured by an expanded logic we call Blocked CMSO, capable of making logical statements about the sizes of set variables via so-called weight comparisons. The logic controls weight comparisons via the quantifier-alternation depth of the involved variables, allowing full comparisons for zero-alternation variables and limited comparisons for one-alternation variables. We show that the developed framework threads the very needle of tractability: on one hand it can describe a broad range of approximable problems, while on the other hand we show that the restrictions of our logic cannot be relaxed under well-established complexity assumptions. The running time of our approximation scheme is polynomial in $1/\varepsilon$, allowing us to fully interpolate between faster approximate algorithms and slower exact algorithms. This provides a unified framework to explain the tractability landscape of graph problems parameterized by treewidth and cliquewidth, as well as classical non-graph problems such as Subset Sum and Knapsack.
An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
This article introduces new multiplicative updates for nonnegative matrix factorization with the $\beta$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $\beta$-divergence (i.e., any value of $\beta$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.
Many reinforcement learning (RL) applications have combinatorial action spaces, where each action is a composition of sub-actions. A standard RL approach ignores this inherent factorization structure, resulting in a potential failure to make meaningful inferences about rarely observed sub-action combinations; this is particularly problematic for offline settings, where data may be limited. In this work, we propose a form of linear Q-function decomposition induced by factored action spaces. We study the theoretical properties of our approach, identifying scenarios where it is guaranteed to lead to zero bias when used to approximate the Q-function. Outside the regimes with theoretical guarantees, we show that our approach can still be useful because it leads to better sample efficiency without necessarily sacrificing policy optimality, allowing us to achieve a better bias-variance trade-off. Across several offline RL problems using simulators and real-world datasets motivated by healthcare, we demonstrate that incorporating factored action spaces into value-based RL can result in better-performing policies. Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space when applying RL to observational datasets.
Optimal model reduction for large-scale linear dynamical systems is studied. In contrast to most existing works, the systems under consideration are not required to be stable, neither in discrete nor in continuous time. As a consequence, the underlying rational transfer functions are allowed to have poles in general domains in the complex plane. In particular, this covers the case of specific conservative partial differential equations such as the linear Schr\"odinger and the undamped linear wave equation with spectra on the imaginary axis. By an appropriate modification of the classical continuous time Hardy space $\mathcal{H}_2$, a new $\mathcal{H}_2$ like optimal model reduction problem is introduced and first order optimality conditions are derived. As in the classical $\mathcal{H}_2$ case, these conditions exhibit a rational Hermite interpolation structure for which an iterative model reduction algorithm is proposed. Numerical examples demonstrate the effectiveness of the new method.
A modification of Newton's method for solving systems of $n$ nonlinear equations is presented. The new matrix-free method relies on a given decomposition of the invertible Jacobian of the residual into invertible sparse local Jacobians according to the chain rule of differentiation. It is motivated in the context of local Jacobians with bandwidth $2m+1$ for $m\ll n$. A reduction of the computational cost by $\mathcal{O}(\frac{n}{m})$ can be observed. Supporting run time measurements are presented for the tridiagonal case showing a reduction of the computational cost by $\mathcal{O}(n).$ Generalization yields the combinatorial Matrix-Free Newton Step problem. We prove is NP-completeness and we present several algorithmic components for building methods for its approximate solution. Inspired by adjoint Algorithmic Differentiation the new method shares several challenges for the latter including the DAG Reversal problem. Further challenges are due to combinatorial problems in sparse linear algebra such as Bandwidth or Directed Elimination Ordering.