In this work, we systematically investigate linear multi-step methods for differential equations with memory. In particular, we focus on the numerical stability for multi-step methods. According to this investigation, we give some sufficient conditions for the stability and convergence of some common multi-step methods, and accordingly, a notion of A-stability for differential equations with memory. Finally, we carry out the computational performance of our theory through numerical examples.
A novel numerical strategy is introduced for computing approximations of solutions to a Cahn-Hilliard model with degenerate mobilities. This model has recently been introduced as a second-order phase-field approximation for surface diffusion flows. Its numerical discretization is challenging due to the degeneracy of the mobilities, which generally requires an implicit treatment to avoid stability issues at the price of increased complexity costs. To mitigate this drawback, we consider new first- and second-order Scalar Auxiliary Variable (SAV) schemes that, differently from existing approaches, focus on the relaxation of the mobility, rather than the Cahn-Hilliard energy. These schemes are introduced and analysed theoretically in the general context of gradient flows and then specialised for the Cahn-Hilliard equation with mobilities. Various numerical experiments are conducted to highlight the advantages of these new schemes in terms of accuracy, effectiveness and computational cost.
While there exists several inferential methods for analyzing functional data in factorial designs, there is a lack of statistical tests that are valid (i) in general designs, (ii) under non-restrictive assumptions on the data generating process and (iii) allow for coherent post-hoc analyses. In particular, most existing methods assume Gaussianity or equal covariance functions across groups (homoscedasticity) and are only applicable for specific study designs that do not allow for evaluation of interactions. Moreover, all available strategies are only designed for testing global hypotheses and do not directly allow a more in-depth analysis of multiple local hypotheses. To address the first two problems (i)-(ii), we propose flexible integral-type test statistics that are applicable in general factorial designs under minimal assumptions on the data generating process. In particular, we neither postulate homoscedasticity nor Gaussianity. To approximate the statistics' null distribution, we adopt a resampling approach and validate it methodologically. Finally, we use our flexible testing framework to (iii) infer several local null hypotheses simultaneously. To allow for powerful data analysis, we thereby take the complex dependencies of the different local test statistics into account. In extensive simulations we confirm that the new methods are flexibly applicable. Two illustrate data analyses complete our study. The new testing procedures are implemented in the R package multiFANOVA, which will be available on CRAN soon.
Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared to the stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
The hierarchical prior used in Latent Gaussian models (LGMs) induces a posterior geometry prone to frustrate inference algorithms. Marginalizing out the latent Gaussian variable using an integrated Laplace approximation removes the offending geometry, allowing us to do efficient inference on the hyperparameters. To use gradient-based inference we need to compute the approximate marginal likelihood and its gradient. The adjoint-differentiated Laplace approximation differentiates the marginal likelihood and scales well with the dimension of the hyperparameters. While this method can be applied to LGMs with any prior covariance, it only works for likelihoods with a diagonal Hessian. Furthermore, the algorithm requires methods which compute the first three derivatives of the likelihood with current implementations relying on analytical derivatives. I propose a generalization which is applicable to a broader class of likelihoods and does not require analytical derivatives of the likelihood. Numerical experiments suggest the added flexibility comes at no computational cost: on a standard LGM, the new method is in fact slightly faster than the existing adjoint-differentiated Laplace approximation. I also apply the general method to an LGM with an unconventional likelihood. This example highlights the algorithm's potential, as well as persistent challenges.
Prototype-based meta-learning has emerged as a powerful technique for addressing few-shot learning challenges. However, estimating a deterministic prototype using a simple average function from a limited number of examples remains a fragile process. To overcome this limitation, we introduce ProtoDiff, a novel framework that leverages a task-guided diffusion model during the meta-training phase to gradually generate prototypes, thereby providing efficient class representations. Specifically, a set of prototypes is optimized to achieve per-task prototype overfitting, enabling accurately obtaining the overfitted prototypes for individual tasks. Furthermore, we introduce a task-guided diffusion process within the prototype space, enabling the meta-learning of a generative process that transitions from a vanilla prototype to an overfitted prototype. ProtoDiff gradually generates task-specific prototypes from random noise during the meta-test stage, conditioned on the limited samples available for the new task. Furthermore, to expedite training and enhance ProtoDiff's performance, we propose the utilization of residual prototype learning, which leverages the sparsity of the residual prototype. We conduct thorough ablation studies to demonstrate its ability to accurately capture the underlying prototype distribution and enhance generalization. The new state-of-the-art performance on within-domain, cross-domain, and few-task few-shot classification further substantiates the benefit of ProtoDiff.
Discontinuities and delayed terms are encountered in the governing equations of a large class of problems ranging from physics, engineering, medicine to economics. These systems are impossible to be properly modelled and simulated with standard Ordinary Differential Equations (ODE), or any data-driven approximation including Neural Ordinary Differential Equations (NODE). To circumvent this issue, latent variables are typically introduced to solve the dynamics of the system in a higher dimensional space and obtain the solution as a projection to the original space. However, this solution lacks physical interpretability. In contrast, Delay Differential Equations (DDEs) and their data-driven, approximated counterparts naturally appear as good candidates to characterize such complicated systems. In this work we revisit the recently proposed Neural DDE by introducing Neural State-Dependent DDE (SDDDE), a general and flexible framework featuring multiple and state-dependent delays. The developed framework is auto-differentiable and runs efficiently on multiple backends. We show that our method is competitive and outperforms other continuous-class models on a wide variety of delayed dynamical systems.
Recently, Meta-Auto-Decoder (MAD) was proposed as a novel reduced order model (ROM) for solving parametric partial differential equations (PDEs), and the best possible performance of this method can be quantified by the decoder width. This paper aims to provide a theoretical analysis related to the decoder width. The solution sets of several parametric PDEs are examined, and the upper bounds of the corresponding decoder widths are estimated. In addition to the elliptic and the parabolic equations on a fixed domain, we investigate the advection equations that present challenges for classical linear ROMs, as well as the elliptic equations with the computational domain shape as a variable PDE parameter. The resulting fast decay rates of the decoder widths indicate the promising potential of MAD in addressing these problems.
This paper proposes novel computational multiscale methods for linear second-order elliptic partial differential equations in nondivergence-form with heterogeneous coefficients satisfying a Cordes condition. The construction follows the methodology of localized orthogonal decomposition (LOD) and provides operator-adapted coarse spaces by solving localized cell problems on a fine scale in the spirit of numerical homogenization. The degrees of freedom of the coarse spaces are related to nonconforming and mixed finite element methods for homogeneous problems. The rigorous error analysis of one exemplary approach shows that the favorable properties of the LOD methodology known from divergence-form PDEs, i.e., its applicability and accuracy beyond scale separation and periodicity, remain valid for problems in nondivergence-form.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.