Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.
Generative diffusion models apply the concept of Langevin dynamics in physics to machine leaning, attracting a lot of interest from industrial application, but a complete picture about inherent mechanisms is still lacking. In this paper, we provide a transparent physics analysis of the diffusion models, deriving the fluctuation theorem, entropy production, Franz-Parisi potential to understand the intrinsic phase transitions discovered recently. Our analysis is rooted in non-equlibrium physics and concepts from equilibrium physics, i.e., treating both forward and backward dynamics as a Langevin dynamics, and treating the reverse diffusion generative process as a statistical inference, where the time-dependent state variables serve as quenched disorder studied in spin glass theory. This unified principle is expected to guide machine learning practitioners to design better algorithms and theoretical physicists to link the machine learning to non-equilibrium thermodynamics.
The paper concerns problems of the recovery of linear operators defined on sets of functions from information of these functions given with stochastic errors. The constructed optimal recovery methods, in general, do not use all the available information. As a consequence, optimal methods are obtained for recovering derivatives of functions from Sobolev classes by the information of their Fourier transforms given with stochastic errors. A similar problem is considered for solutions of the heat equation.
We propose a local, past-oriented fragment of propositional dynamic logic to reason about concurrent scenarios modelled as Mazurkiewicz traces, and prove it to be expressively complete with respect to regular trace languages. Because of locality, specifications in this logic are efficiently translated into asynchronous automata, in a way that reflects the structure of formulas. In particular, we obtain a new proof of Zielonka's fundamental theorem and we prove that any regular trace language can be implemented by a cascade product of localized asynchronous automata, which essentially operate on a single process. These results refine earlier results by Adsul et al. which involved a larger fragment of past propositional dynamic logic and used Mukund and Sohoni's gossip automaton. Our new results avoid using this automaton, or Zielonka's timestamping mechanism and, in particular, they show how to implement a gossip automaton as a cascade product.
A new model is presented to predict hydrogen-assisted fatigue. The model combines a phase field description of fracture and fatigue, stress-assisted hydrogen diffusion, and a toughness degradation formulation with cyclic and hydrogen contributions. Hydrogen-assisted fatigue crack growth predictions exhibit an excellent agreement with experiments over all the scenarios considered, spanning multiple load ratios, H2 pressures and loading frequencies. These are obtained without any calibration with hydrogen-assisted fatigue data, taking as input only mechanical and hydrogen transport material properties, the material's fatigue characteristics (from a single test in air), and the sensitivity of fracture toughness to hydrogen content. Furthermore, the model is used to determine: (i) what are suitable test loading frequencies to obtain conservative data, and (ii) the underestimation made when not pre-charging samples. The model can handle both laboratory specimens and large-scale engineering components, enabling the Virtual Testing paradigm in infrastructure exposed to hydrogen environments and cyclic loading.
Reduced order models based on the transport of a lower dimensional manifold representation of the thermochemical state, such as Principal Component (PC) transport and Machine Learning (ML) techniques, have been developed to reduce the computational cost associated with the Direct Numerical Simulations (DNS) of reactive flows. Both PC transport and ML normally require an abundance of data to exhibit sufficient predictive accuracy, which might not be available due to the prohibitive cost of DNS or experimental data acquisition. To alleviate such difficulties, similar data from an existing dataset or domain (source domain) can be used to train ML models, potentially resulting in adequate predictions in the domain of interest (target domain). This study presents a novel probabilistic transfer learning (TL) framework to enhance the trust in ML models in correctly predicting the thermochemical state in a lower dimensional manifold and a sparse data setting. The framework uses Bayesian neural networks, and autoencoders, to reduce the dimensionality of the state space and diffuse the knowledge from the source to the target domain. The new framework is applied to one-dimensional freely-propagating flame solutions under different data sparsity scenarios. The results reveal that there is an optimal amount of knowledge to be transferred, which depends on the amount of data available in the target domain and the similarity between the domains. TL can reduce the reconstruction error by one order of magnitude for cases with large sparsity. The new framework required 10 times less data for the target domain to reproduce the same error as in the abundant data scenario. Furthermore, comparisons with a state-of-the-art deterministic TL strategy show that the probabilistic method can require four times less data to achieve the same reconstruction error.
We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co-authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation-Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our R package HyperSBM, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co-authorship dataset.
Chemical reaction networks (CRNs) model systems where molecules interact according to a finite set of reactions such as $A + B \to C$, representing that if a molecule of $A$ and $B$ collide, they disappear and a molecule of $C$ is produced. CRNs can compute Boolean-valued predicates $\phi:\mathbb{N}^d \to \{0,1\}$ and integer-valued functions $f:\mathbb{N}^d \to \mathbb{N}$; for instance $X_1 + X_2 \to Y$ computes the function $\min(x_1,x_2)$. We study the computational power of execution bounded CRNs, in which only a finite number of reactions can occur from the initial configuration (e.g., ruling out reversible reactions such as $A \rightleftharpoons B$). The power and composability of such CRNs depend crucially on some other modeling choices that do not affect the computational power of CRNs with unbounded executions, namely whether an initial leader is present, and whether (for predicates) all species are required to "vote" for the Boolean output. If the CRN starts with an initial leader, and can allow only the leader to vote, then all semilinear predicates and functions can be stably computed in $O(n \log n)$ parallel time by execution bounded CRNs. However, if no initial leader is allowed, all species vote, and the CRN is "noncollapsing" (does not shrink from initially large to final $O(1)$ size configurations), then execution bounded CRNs are severely limited, able to compute only eventually constant predicates. A key tool is to characterize execution bounded CRNs as precisely those with a nonnegative linear potential function that is strictly decreased by every reaction, a result that may be of independent interest.
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the mapping between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic mapping is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtained from simulators are always different from the physically deformed shapes due to the errors introduced by hardware and the simplification adopted in physical simulation. To fill the gap, we propose a novel deformation function based sim-to-real learning method that can map the geometric shape of a simulated model into its corresponding shape of the physical model. Unlike the existing sim-to-real learning methods that rely on completely acquired dense markers, our method accommodates sparsely distributed markers and can resiliently use all captured frames -- even for those in the presence of missing markers. To demonstrate its effectiveness, our sim-to-real method has been integrated into a neural network-based computational pipeline designed to tackle the inverse kinematic problem on a pneumatically actuated deformable mannequin.
Deflation techniques are typically used to shift isolated clusters of small eigenvalues in order to obtain a tighter distribution and a smaller condition number. Such changes induce a positive effect in the convergence behavior of Krylov subspace methods, which are among the most popular iterative solvers for large sparse linear systems. We develop a deflation strategy for symmetric saddle point matrices by taking advantage of their underlying block structure. The vectors used for deflation come from an elliptic singular value decomposition relying on the generalized Golub-Kahan bidiagonalization process. The block targeted by deflation is the off-diagonal one since it features a problematic singular value distribution for certain applications. One example is the Stokes flow in elongated channels, where the off-diagonal block has several small, isolated singular values, depending on the length of the channel. Applying deflation to specific parts of the saddle point system is important when using solvers such as CRAIG, which operates on individual blocks rather than the whole system. The theory is developed by extending the existing framework for deflating square matrices before applying a Krylov subspace method like MINRES. Numerical experiments confirm the merits of our strategy and lead to interesting questions about using approximate vectors for deflation.
We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of structural breaks because they can distinguish between individual principal eigenvalues and/or eigenvectors. We demonstrate the application by comparing the principal eigenvalues and principal eigenvectors of S\&P500 Index constituents' daily returns over different years.