Observability is a fundamental structural property of any dynamic system and describes the possibility of reconstructing the state that characterizes the system from observing its inputs and outputs. Despite the huge effort made to study this property and to introduce analytical criteria able to check whether a dynamic system satisfies this property or not, there is no general analytical criterion to automatically check the state observability when the dynamics are also driven by unknown inputs. Here, we introduce the general analytical solution of this fundamental problem, often called the unknown input observability problem. This paper provides the general analytical solution of this problem, namely, it provides the systematic procedure, based on automatic computation (differentiation and matrix rank determination), that allows us to automatically check the state observability even in the presence of unknown inputs (Algorithm 6.1). A first solution of this problem was presented in the second part of the book: "Observability: A New Theory Based on the Group of Invariance" [45]. The solution presented by this paper completes the previous solution in [45]. In particular, the new solution exhaustively accounts for the systems that do not belong to the category of the systems that are "canonic with respect to their unknown inputs". The analytical derivations largely exploit several new concepts and analytical results introduced in [45]. Finally, as a simple consequence of the results here obtained, we also provide the answer to the problem of unknown input reconstruction which is intimately related to the problem of state observability. We illustrate the implementation of the new algorithm by studying the observability properties of a nonlinear system in the framework of visual-inertial sensor fusion, whose dynamics are driven by two unknown inputs and one known input.
Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
We consider nonlinear delay differential and renewal equations with infinite delay. We extend the work of Gyllenberg et al, Appl. Math. Comput. (2018) by introducing a unifying abstract framework and derive a finite-dimensional approximating system via pseudospectral discretization. For renewal equations, via integration we consider a reformulation in a space of absolutely continuous functions that ensures that point evaluation is well defined. We prove the one-to-one correspondence of equilibria between the original equation and its approximation, and that linearization and discretization commute. Our most important result is the proof of convergence of the characteristic roots of the pseudospectral approximation of the linear(ized) equations, which ensures that the finite-dimensional system correctly reproduces the stability properties of the original linear equation if the dimension of the approximation is large enough. This result is illustrated with several numerical tests, which also demonstrate the effectiveness of the approach for the bifurcation analysis of equilibria of nonlinear equations.
We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (qv Mallinar et al. 2022).
Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.
Many multivariate data sets exhibit a form of positive dependence, which can either appear globally between all variables or only locally within particular subgroups. A popular notion of positive dependence that allows for localized positivity is positive association. In this work we introduce the notion of extremal positive association for multivariate extremes from threshold exceedances. Via a sufficient condition for extremal association, we show that extremal association generalizes extremal tree models. For H\"usler--Reiss distributions the sufficient condition permits a parametric description that we call the metric property. As the parameter of a H\"usler--Reiss distribution is a Euclidean distance matrix, the metric property relates to research in electrical network theory and Euclidean geometry. We show that the metric property can be localized with respect to a graph and study surrogate likelihood inference. This gives rise to a two-step estimation procedure for locally metrical H\"usler--Reiss graphical models. The second step allows for a simple dual problem, which is implemented via a gradient descent algorithm. Finally, we demonstrate our results on simulated and real data.
A well-known boundary observability inequality for the elasticity system establishes that the energy of the system can be estimated from the solution on a sufficiently large part of the boundary for a sufficiently large time. This inequality is relevant in different contexts as the exact boundary controllability, boundary stabilization, or some inverse source problems. Here we show that a corresponding boundary observability inequality for the spectral collocation approximation of the linear elasticity system in a d-dimensional cube also holds, uniformly with respect to the discretization parameter. This property is essential to prove that natural numerical approaches to the previous problems based on replacing the elasticity system by collocation discretization will give successful approximations of the continuous counterparts.
A Low-rank Spectral Optimization Problem (LSOP) minimizes a linear objective subject to multiple two-sided linear matrix inequalities intersected with a low-rank and spectral constrained domain set. Although solving LSOP is, in general, NP-hard, its partial convexification (i.e., replacing the domain set by its convex hull) termed "LSOP-R," is often tractable and yields a high-quality solution. This motivates us to study the strength of LSOP-R. Specifically, we derive rank bounds for any extreme point of the feasible set of LSOP-R and prove their tightness for the domain sets with different matrix spaces. The proposed rank bounds recover two well-known results in the literature from a fresh angle and also allow us to derive sufficient conditions under which the relaxation LSOP-R is equivalent to the original LSOP. To effectively solve LSOP-R, we develop a column generation algorithm with a vector-based convex pricing oracle, coupled with a rank-reduction algorithm, which ensures the output solution satisfies the theoretical rank bound. Finally, we numerically verify the strength of the LSOP-R and the efficacy of the proposed algorithms.
For safety and robustness of AI systems, we introduce topological parallax as a theoretical and computational tool that compares a trained model to a reference dataset to determine whether they have similar multiscale geometric structure. Our proofs and examples show that this geometric similarity between dataset and model is essential to trustworthy interpolation and perturbation, and we conjecture that this new concept will add value to the current debate regarding the unclear relationship between overfitting and generalization in applications of deep-learning. In typical DNN applications, an explicit geometric description of the model is impossible, but parallax can estimate topological features (components, cycles, voids, etc.) in the model by examining the effect on the Rips complex of geodesic distortions using the reference dataset. Thus, parallax indicates whether the model shares similar multiscale geometric features with the dataset. Parallax presents theoretically via topological data analysis [TDA] as a bi-filtered persistence module, and the key properties of this module are stable under perturbation of the reference dataset.
Partial differential equations (PDEs) are ubiquitous in science and engineering. Prior quantum algorithms for solving the system of linear algebraic equations obtained from discretizing a PDE have a computational complexity that scales at least linearly with the condition number $\kappa$ of the matrices involved in the computation. For many practical applications, $\kappa$ scales polynomially with the size $N$ of the matrices, rendering a polynomial-in-$N$ complexity for these algorithms. Here we present a quantum algorithm with a complexity that is polylogarithmic in $N$ but is independent of $\kappa$ for a large class of PDEs. Our algorithm generates a quantum state that enables extracting features of the solution. Central to our methodology is using a wavelet basis as an auxiliary system of coordinates in which the condition number of associated matrices is independent of $N$ by a simple diagonal preconditioner. We present numerical simulations showing the effect of the wavelet preconditioner for several differential equations. Our work could provide a practical way to boost the performance of quantum-simulation algorithms where standard methods are used for discretization.
While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.