In this paper we present a new Eulerian finite element method for the discretization of scalar partial differential equations on evolving surfaces. In this method we use the restriction of standard space-time finite element spaces on a fixed bulk mesh to the space-time surface. The structure of the method is such that it naturally fits to a level set representation of the evolving surface. The higher order version of the method is based on a space-time variant of a mesh deformation that has been developed in the literature for stationary surfaces. The discretization method that we present is of (optimal) higher order accuracy for smoothly varying surfaces with sufficiently smooth solutions. Without any modifications the method can be used for the discretization of problems with topological singularities. A numerical study demonstrates both the higher order accuracy for smooth cases and the robustness with respect to toplogical singularities.
Let $V_* : \mathbb{R}^d \to \mathbb{R}$ be some (possibly non-convex) potential function, and consider the probability measure $\pi \propto e^{-V_*}$. When $\pi$ exhibits multiple modes, it is known that sampling techniques based on Wasserstein gradient flows of the Kullback-Leibler (KL) divergence (e.g. Langevin Monte Carlo) suffer poorly in the rate of convergence, where the dynamics are unable to easily traverse between modes. In stark contrast, the work of Lu et al. (2019; 2022) has shown that the gradient flow of the KL with respect to the Fisher-Rao (FR) geometry exhibits a convergence rate to $\pi$ is that \textit{independent} of the potential function. In this short note, we complement these existing results in the literature by providing an explicit expansion of $\text{KL}(\rho_t^{\text{FR}}\|\pi)$ in terms of $e^{-t}$, where $(\rho_t^{\text{FR}})_{t\geq 0}$ is the FR gradient flow of the KL divergence. In turn, we are able to provide a clean asymptotic convergence rate, where the burn-in time is guaranteed to be finite. Our proof is based on observing a similarity between FR gradient flows and simulated annealing with linear scaling, and facts about cumulant generating functions. We conclude with simple synthetic experiments that demonstrate our theoretical findings are indeed tight. Based on our numerics, we conjecture that the asymptotic rates of convergence for Wasserstein-Fisher-Rao gradient flows are possibly related to this expansion in some cases.
This work introduces, analyzes and demonstrates an efficient and theoretically sound filtering strategy to ensure the condition of the least-squares problem solved at each iteration of Anderson acceleration. The filtering strategy consists of two steps: the first controls the length disparity between columns of the least-squares matrix, and the second enforces a lower bound on the angles between subspaces spanned by the columns of that matrix. The combined strategy is shown to control the condition number of the least-squares matrix at each iteration. The method is shown to be effective on a range of problems based on discretizations of partial differential equations. It is shown particularly effective for problems where the initial iterate may lie far from the solution, and which progress through distinct preasymptotic and asymptotic phases.
Prophet inequalities for rewards maximization are fundamental to optimal stopping theory with extensive applications to mechanism design and online optimization. We study the \emph{cost minimization} counterpart of the classical prophet inequality: a decision maker is facing a sequence of costs $X_1, X_2, \dots, X_n$ drawn from known distributions in an online manner and \emph{must} ``stop'' at some point and take the last cost seen. The goal is to compete with a ``prophet'' who can see the realizations of all $X_i$'s upfront and always select the minimum, obtaining a cost of $\mathbb{E}[\min_i X_i]$. If the $X_i$'s are not identically distributed, no strategy can achieve a bounded approximation, even for random arrival order and $n = 2$. This leads us to consider the case where the $X_i$'s are independent and identically distributed (I.I.D.). For the I.I.D. case, we show that if the distribution satisfies a mild condition, the optimal stopping strategy achieves a (distribution-dependent) constant-factor approximation to the prophet's cost. Moreover, for MHR distributions, this constant is at most $2$. All our results are tight. We also demonstrate an example distribution that does not satisfy the condition and for which the competitive ratio of any algorithm is infinite. Turning our attention to single-threshold strategies, we design a threshold that achieves a $O\left(polylog{n}\right)$-factor approximation, where the exponent in the logarithmic factor is a distribution-dependent constant, and we show a matching lower bound. Finally, we note that our results can be used to design approximately optimal posted price-style mechanisms for procurement auctions which may be of independent interest. Our techniques utilize the \emph{hazard rate} of the distribution in a novel way, allowing for a fine-grained analysis which could find further applications in prophet inequalities.
This paper investigates the inverse source problem with a single propagating mode at multiple frequencies in an acoustic waveguide. The goal is to provide both theoretical justifications and efficient algorithms for imaging extended sources using the sampling methods. In contrast to the existing far/near field operator based on the integral over the space variable in the sampling methods, a multi-frequency far-field operator is introduced based on the integral over the frequency variable. This far-field operator is defined in a way to incorporate the possibly non-linear dispersion relation, a unique feature in waveguides. The factorization method is deployed to establish a rigorous characterization of the range support which is the support of source in the direction of wave propagation. A related factorization-based sampling method is also discussed. These sampling methods are shown to be capable of imaging the range support of the source. Numerical examples are provided to illustrate the performance of the sampling methods, including an example to image a complete sound-soft block.
Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application such that error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of a widely-applied theorem for conditioning GPs on direct measurements to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.
This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of groupings of the covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious Model. The classification can be achieved with simple modifications of the CEM (Classification Expectation Maximization) algorithm, using in the M step suitable estimation methods known for parsimonious models. This approach leads to propose Gaussian Mixture Models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.
Motivated by demand prediction for the custodial prison population in England and Wales, this paper describes an approach to the study of service systems using infinite server queues, where the system has non-empty initial state and the elapsed time of individuals initially present is not known. By separating the population into initial content and new arrivals, we can apply several techniques either separately or jointly to those sub-populations, to enable both short-term queue length predictions and longer-term considerations such as managing congestion and analysing the impact of potential interventions. The focus in the paper is the transient behaviour of the $M_t/G/\infty$ queue with a non-homogeneous Poisson arrival process and our analysis considers various possible simplifications, including approximation. We illustrate the approach in that domain using publicly available data in a Bayesian framework to perform model inference.
Templates have emerged as an effective approach to simplifying the visualization design and programming process. For example, they enable users to quickly generate multiple visualization designs even when using complex toolkits like D3. However, these templates are often treated as rigid artifacts that respond poorly to changes made outside of the template's established parameters, limiting user creativity. Preserving the user's creative flow requires a more dynamic approach to template-based visualization design, where tools can respond gracefully to users' edits when they modify templates in unexpected ways. In this paper, we leverage the structural similarities revealed by templates to design resilient support features for prototyping D3 visualizations: recommendations to suggest complementary interactions for a user's D3 program; and code augmentation to implement recommended interactions with a single click, even when users deviate from pre-defined templates. We demonstrate the utility of these features in Mirny, a d design-focused prototyping environment for D3. In a user study with 20 D3 users, we find that these automated features enable participants to prototype their design ideas with significantly fewer programming iterations. We also characterize key modification strategies used by participants to customize D3 templates. Informed by our findings and participants' feedback, we discuss the key implications of the use of templates for interleaving visualization programming and design.
Semi-Lagrangian (SL) schemes are known as a major numerical tool for solving transport equations with many advantages and have been widely deployed in the fields of computational fluid dynamics, plasma physics modeling, numerical weather prediction, among others. In this work, we develop a novel machine learning-assisted approach to accelerate the conventional SL finite volume (FV) schemes. The proposed scheme avoids the expensive tracking of upstream cells but attempts to learn the SL discretization from the data by incorporating specific inductive biases in the neural network, significantly simplifying the algorithm implementation and leading to improved efficiency. In addition, the method delivers sharp shock transitions and a level of accuracy that would typically require a much finer grid with traditional transport solvers. Numerical tests demonstrate the effectiveness and efficiency of the proposed method.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.