The multiple scattering theory (MST) is one of the most widely used methods in electronic structure calculations. It features a perfect separation between the atomic configurations and site potentials, and hence provides an efficient way to simulate defected and disordered systems. This work studies the MST methods from a numerical point of view and shows the convergence with respect to the truncation of the angular momentum summations, which is a fundamental approximation parameter for all MST methods. We provide both rigorous analysis and numerical experiments to illustrate the efficiency of the MST methods within the angular momentum representations.
A numerical algorithm is presented to solve a benchmark problem proposed by Hemker. The algorithm incorporates asymptotic information into the design of appropriate piecewise-uniform Shishkin meshes. Moreover, different co-ordinate systems are utilized due to the different geometries and associated layer structures that are involved in this problem. Numerical results are presented to demonstrate the effectiveness of the proposed numerical algorithm.
The modeling framework of port-Hamiltonian descriptor systems and their use in numerical simulation and control are discussed. The structure is ideal for automated network-based modeling since it is invariant under power-conserving interconnection, congruence transformations, and Galerkin projection. Moreover, stability and passivity properties are easily shown. Condensed forms under orthogonal transformations present easy analysis tools for existence, uniqueness, regularity, and numerical methods to check these properties. After recalling the concepts for general linear and nonlinear descriptor systems, we demonstrate that many difficulties that arise in general descriptor systems can be easily overcome within the port-Hamiltonian framework. The properties of port-Hamiltonian descriptor systems are analyzed, time-discretization, and numerical linear algebra techniques are discussed. Structure-preserving regularization procedures for descriptor systems are presented to make them suitable for simulation and control. Model reduction techniques that preserve the structure and stabilization and optimal control techniques are discussed. The properties of port-Hamiltonian descriptor systems and their use in modeling simulation and control methods are illustrated with several examples from different physical domains. The survey concludes with open problems and research topics that deserve further attention.
Vibration and dissipation in vibro-acoustic systems can be assessed using frequency response analysis. Evaluating a frequency sweep on a full-order model can be very costly, so model order reduction methods are employed to compute cheap-to-evaluate surrogates. This work compares structure-preserving model reduction methods based on rational interpolation and balanced truncation with a specific focus on their applicability to vibro-acoustic systems. Such models typically exhibit a second-order structure and their material properties as well as their excitation may be depending on the driving frequency. We show and compare the effectiveness of all considered methods by applying them to numerical models of vibro-acoustic systems depicting structural vibration, sound transmission, acoustic scattering, and poroelastic problems.
Stochastic PDE eigenvalue problems often arise in the field of uncertainty quantification, whereby one seeks to quantify the uncertainty in an eigenvalue, or its eigenfunction. In this paper we present an efficient multilevel quasi-Monte Carlo (MLQMC) algorithm for computing the expectation of the smallest eigenvalue of an elliptic eigenvalue problem with stochastic coefficients. Each sample evaluation requires the solution of a PDE eigenvalue problem, and so tackling this problem in practice is notoriously computationally difficult. We speed up the approximation of this expectation in four ways: we use a multilevel variance reduction scheme to spread the work over a hierarchy of FE meshes and truncation dimensions; we use QMC methods to efficiently compute the expectations on each level; we exploit the smoothness in parameter space and reuse the eigenvector from a nearby QMC point to reduce the number of iterations of the eigensolver; and we utilise a two-grid discretisation scheme to obtain the eigenvalue on the fine mesh with a single linear solve. The full error analysis of a basic MLQMC algorithm is given in the companion paper [Gilbert and Scheichl, 2022], and so in this paper we focus on how to further improve the efficiency and provide theoretical justification for using nearby QMC points and two-grid methods. Numerical results are presented that show the efficiency of our algorithm, and also show that the four strategies we employ are complementary.
We investigate the nonlinear dynamics of a moving interface in a Hele-Shaw cell subject to an in-plane applied electric field. We develop a spectrally accurate boundary integral method where a coupled integral equation system is formulated. Although the stiffness due to the high order spatial derivatives can be removed, the long-time simulation is still expensive since the evolving velocity of the interface drops dramatically as the interface expands. We remove this physically imposed stiffness by employing a rescaling scheme, which accelerates the slow dynamics and reduces the computational cost. Our nonlinear results reveal that positive currents restrain finger ramification and promote overall stabilization of patterns. On the other hand, negative currents make the interface more unstable and lead to the formation of thin tail structures connecting the fingers and a small inner region. When no flux is injected, and a negative current is utilized, the interface tends to approach the origin and break up into several drops. We investigate the temporal evolution of the smallest distance between the interface and the origin and find that it obeys an algebraic law $\displaystyle (t_*-t)^b$, where $t_*$ is the estimated pinch-off time.
We consider the deformation of a geological structure with non-intersecting faults that can be represented by a layered system of viscoelastic bodies satisfying rate- and state-depending friction conditions along the common interfaces. We derive a mathematical model that contains classical Dieterich- and Ruina-type friction as special cases and accounts for possibly large tangential displacements. Semi-discretization in time by a Newmark scheme leads to a coupled system of non-smooth, convex minimization problems for rate and state to be solved in each time step. Additional spatial discretization by a mortar method and piecewise constant finite elements allows for the decoupling of rate and state by a fixed point iteration and efficient algebraic solution of the rate problem by truncated non-smooth Newton methods. Numerical experiments with a spring slider and a layered multiscale system illustrate the behavior of our model as well as the efficiency and reliability of the numerical solver.
This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.
Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.