Quadratization of polynomial and nonpolynomial systems of ordinary differential equations is advantageous in a variety of disciplines, such as systems theory, fluid mechanics, chemical reaction modeling and mathematical analysis. A quadratization reveals new variables and structures of a model, which may be easier to analyze, simulate, control, and provides a convenient parametrization for learning. This paper presents novel theory, algorithms and software capabilities for quadratization of non-autonomous ODEs. We provide existence results, depending on the regularity of the input function, for cases when a quadratic-bilinear system can be obtained through quadratization. We further develop existence results and an algorithm that generalizes the process of quadratization for systems with arbitrary dimension that retain the nonlinear structure when the dimension grows. For such systems, we provide dimension-agnostic quadratization. An example is semi-discretized PDEs, where the nonlinear terms remain symbolically identical when the discretization size increases. As an important aspect for practical adoption of this research, we extended the capabilities of the QBee software towards both non-autonomous systems of ODEs and ODEs with arbitrary dimension. We present several examples of ODEs that were previously reported in the literature, and where our new algorithms find quadratized ODE systems with lower dimension than the previously reported lifting transformations. We further highlight an important area of quadratization: reduced-order model learning. This area can benefit significantly from working in the optimal lifting variables, where quadratic models provide a direct parametrization of the model that also avoids additional hyperreduction for the nonlinear terms. A solar wind example highlights these advantages.
Causal operators (CO), such as various solution operators to stochastic differential equations, play a central role in contemporary stochastic analysis; however, there is still no canonical framework for designing Deep Learning (DL) models capable of approximating COs. This paper proposes a "geometry-aware'" solution to this open problem by introducing a DL model-design framework that takes suitable infinite-dimensional linear metric spaces as inputs and returns a universal sequential DL model adapted to these linear geometries. We call these models Causal Neural Operators (CNOs). Our main result states that the models produced by our framework can uniformly approximate on compact sets and across arbitrarily finite-time horizons H\"older or smooth trace class operators, which causally map sequences between given linear metric spaces. Our analysis uncovers new quantitative relationships on the latent state-space dimension of CNOs which even have new implications for (classical) finite-dimensional Recurrent Neural Networks (RNNs). We find that a linear increase of the CNO's (or RNN's) latent parameter space's dimension and of its width, and a logarithmic increase of its depth imply an exponential increase in the number of time steps for which its approximation remains valid. A direct consequence of our analysis shows that RNNs can approximate causal functions using exponentially fewer parameters than ReLU networks.
Quantitative notions of bisimulation are well-known tools for the minimization of dynamical models such as Markov chains and ordinary differential equations (ODEs). In \emph{forward bisimulations}, each state in the quotient model represents an equivalence class and the dynamical evolution gives the overall sum of its members in the original model. Here we introduce generalized forward bisimulation (GFB) for dynamical systems over commutative monoids and develop a partition refinement algorithm to compute the coarsest one. When the monoid is $(\mathbb{R}, +)$, we recover probabilistic bisimulation for Markov chains and more recent forward bisimulations for nonlinear ODEs. Using $(\mathbb{R}, \cdot)$ we get nonlinear reductions for discrete-time dynamical systems and ODEs where each variable in the quotient model represents the product of original variables in the equivalence class. When the domain is a finite set such as the Booleans $\mathbb{B}$, we can apply GFB to Boolean networks (BN), a widely used dynamical model in computational biology. Using a prototype implementation of our minimization algorithm for GFB, we find disjunction- and conjunction-preserving reductions on 60 BN from two well-known repositories, and demonstrate the obtained analysis speed-ups. We also provide the biological interpretation of the reduction obtained for two selected BN, and we show how GFB enables the analysis of a large one that could not be analyzed otherwise. Using a randomized version of our algorithm we find product-preserving (therefore non-linear) reductions on 21 dynamical weighted networks from the literature that could not be handled by the exact algorithm.
In this work, we study the deep signature algorithms for path-dependent options. We extend the backward scheme in [Hur\'e-Pham-Warin. Mathematics of Computation 89, no. 324 (2020)] for state-dependent FBSDEs with reflections to path-dependent FBSDEs with reflections, by adding the signature layer to the backward scheme. Our algorithm applies to both European and American type option pricing problems while the payoff function depends on the whole paths of the underlying forward stock process. We prove the convergence analysis of our numerical algorithm with explicit dependence on the truncation order of the signature and the neural network approximation errors. Numerical examples for the algorithm are provided including: Amerasian option under the Black-Scholes model, American option with a path-dependent geometric mean payoff function, and the Shiryaev's optimal stopping problem.
A least-squares neural network (LSNN) method was introduced for solving scalar linear and nonlinear hyperbolic conservation laws (HCLs) in [7, 6]. This method is based on an equivalent least-squares (LS) formulation and uses ReLU neural network as approximating functions, making it ideal for approximating discontinuous functions with unknown interface location. In the design of the LSNN method for HCLs, the numerical approximation of differential operators is a critical factor, and standard numerical or automatic differentiation along coordinate directions can often lead to a failed NN-based method. To overcome this challenge, this paper rewrites HCLs in their divergence form of space and time and introduces a new discrete divergence operator. As a result, the proposed LSNN method is free of penalization of artificial viscosity. Theoretically, the accuracy of the discrete divergence operator is estimated even for discontinuous solutions. Numerically, the LSNN method with the new discrete divergence operator was tested for several benchmark problems with both convex and non-convex fluxes, and was able to compute the correct physical solution for problems with rarefaction, shock or compound waves. The method is capable of capturing the shock of the underlying problem without oscillation or smearing, even without any penalization of the entropy condition, total variation, and/or artificial viscosity.
Treatment effect estimation under unconfoundedness is a fundamental task in causal inference. In response to the challenge of analyzing high-dimensional datasets collected in substantive fields such as epidemiology, genetics, economics, and social sciences, many methods for treatment effect estimation with high-dimensional nuisance parameters (the outcome regression and the propensity score) have been developed in recent years. However, it is still unclear what is the necessary and sufficient sparsity condition on the nuisance parameters for the treatment effect to be $\sqrt{n}$-estimable. In this paper, we propose a new Double-Calibration strategy that corrects the estimation bias of the nuisance parameter estimates computed by regularized high-dimensional techniques and demonstrate that the corresponding Doubly-Calibrated estimator achieves $1 / \sqrt{n}$-rate as long as one of the nuisance parameters is sparse with sparsity below $\sqrt{n} / \log p$, where $p$ denotes the ambient dimension of the covariates, whereas the other nuisance parameter can be arbitrarily complex and completely misspecified. The Double-Calibration strategy can also be applied to settings other than treatment effect estimation, e.g. regression coefficient estimation in the presence of diverging number of controls in a semiparametric partially linear model.
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
Risk-limiting audits (RLAs) are a significant tool in increasing confidence in the accuracy of elections. They consist of randomized algorithms which check that an election's vote tally, as reported by a vote tabulation system, corresponds to the correct candidates winning. If an initial vote count leads to the wrong election winner, an RLA guarantees to identify the error with high probability over its own randomness. These audits operate by sequentially sampling and examining ballots until they can either confirm the reported winner or identify the true winner. The first part of this work suggests a new generic method, called ``Batchcomp", for converting classical (ballot-level) RLAs into ones that operate on batches. As a concrete application of the suggested method, we develop the first ballot-level RLA for the Israeli Knesset elections, and convert it to one which operates on batches. We ran the suggested ``Batchcomp" procedure on the results of 22nd, 23rd and 24th Knesset elections, both with and without errors. The second part of this work suggests a new use-case for RLAs: verifying that a population census leads to the correct allocation of political power to a nation's districts or federal-states. We present an adaptation of ALPHA, an existing RLA method, to a method which applies to censuses. Our census-RLA is applicable in nations where parliament seats are allocated to geographical regions in proportion to their population according to a certain class of functions (highest averages). It relies on data from both the census and from an additional procedure which is already conducted in many countries today, called a post-enumeration survey.
The aim of this paper is to study the shape optimization method for solving the Bernoulli free boundary problem, a well-known ill-posed problem that seeks the unknown free boundary through Cauchy data. Different formulations have been proposed in the literature that differ in the choice of the objective functional. Specifically, it was shown respectively in [14] and [16] that tracking Neumann data is well-posed but tracking Dirichlet data is not. In this paper we propose a new well-posed objective functional that tracks Dirichlet data at the free boundary. By calculating the Euler derivative and the shape Hessian of the objective functional we show that the new formulation is well-posed, i.e., the shape Hessian is coercive at the minimizers. The coercivity of the shape Hessian may ensure the existence of optimal solutions for the nonlinear Ritz-Galerkin approximation method and its convergence, thus is crucial for the formulation. As a summary, we conclude that tracking Dirichlet or Neumann data in its energy norm is not sufficient, but tracking it in a half an order higher norm will be well-posed. To support our theoretical results we carry out extensive numerical experiments.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.