The present article aims to design and analyze efficient first-order strong schemes for a generalized A\"{i}t-Sahalia type model arising in mathematical finance and evolving in a positive domain $(0, \infty)$, which possesses a diffusion term with superlinear growth and a highly nonlinear drift that blows up at the origin. Such a complicated structure of the model unavoidably causes essential difficulties in the construction and convergence analysis of time discretizations. By incorporating implicitness in the term $\alpha_{-1} x^{-1}$ and a corrective mapping $\Phi_h$ in the recursion, we develop a novel class of explicit and unconditionally positivity-preserving (i.e., for any step-size $h>0$) Milstein-type schemes for the underlying model. In both non-critical and general critical cases, we introduce a novel approach to analyze mean-square error bounds of the novel schemes, without relying on a priori high-order moment bounds of the numerical approximations. The expected order-one mean-square convergence is attained for the proposed scheme. The above theoretical guarantee can be used to justify the optimal complexity of the Multilevel Monte Carlo method. Numerical experiments are finally provided to verify the theoretical findings.
This paper contributes to the study of optimal experimental design for Bayesian inverse problems governed by partial differential equations (PDEs). We derive estimates for the parametric regularity of multivariate double integration problems over high-dimensional parameter and data domains arising in Bayesian optimal design problems. We provide a detailed analysis for these double integration problems using two approaches: a full tensor product and a sparse tensor product combination of quasi-Monte Carlo (QMC) cubature rules over the parameter and data domains. Specifically, we show that the latter approach significantly improves the convergence rate, exhibiting performance comparable to that of QMC integration of a single high-dimensional integral. Furthermore, we numerically verify the predicted convergence rates for an elliptic PDE problem with an unknown diffusion coefficient in two spatial dimensions, offering empirical evidence supporting the theoretical results and highlighting practical applicability.
We propose and analyze two convection quasi-robust and pressure robust finite element methods for a fully nonlinear time-dependent magnetohydrodynamics problem. The schemes make use of suitable upwind and CIP stabilizations to handle the fluid and magnetic convective terms. The developed error estimates are uniform in both diffusion parameters and optimal with respect to the diffusive norm; furthermore, for the second (more complex) method we are able to show a quicker error reduction rate in convection dominated regimes. A set of numerical tests support our theoretical findings.
We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves $4\times$ context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and models are available at //github.com/getao/icae.
In smoothed particle hydrodynamics (SPH) method, the particle-based approximations are implemented via kernel functions, and the evaluation of performance involves two key criteria: numerical accuracy and computational efficiency. In the SPH community, the Wendland kernel reigns as the prevailing choice due to its commendable accuracy and reasonable computational efficiency. Nevertheless, there exists an urgent need to enhance the computational efficiency of numerical methods while upholding accuracy. In this paper, we employ a truncation approach to limit the compact support of the Wendland kernel to 1.6h. This decision is based on the observation that particles within the range of 1.6h to 2h make negligible contributions, practically approaching zero, to the SPH approximation. To address integration errors stemming from the truncation, we incorporate the Laguerre-Gauss kernel for particle relaxation due to the fact that this kernel has been demonstrated to enable the attainment of particle distributions with reduced residue and integration errors \cite{wang2023fourth}. Furthermore, we introduce the kernel gradient correction to rectify numerical errors from the SPH approximation of kernel gradient and the truncated compact support. A comprehensive set of numerical examples including fluid dynamics in Eulerian formulation and solid dynamics in total Lagrangian formulation are tested and have demonstrated that truncated and standard Wendland kernels enable achieve the same level accuracy but the former significantly increase the computational efficiency.
This paper presents an analysis of properties of two hybrid discretization methods for Gaussian derivatives, based on convolutions with either the normalized sampled Gaussian kernel or the integrated Gaussian kernel followed by central differences. The motivation for studying these discretization methods is that in situations when multiple spatial derivatives of different order are needed at the same scale level, they can be computed significantly more efficiently compared to more direct derivative approximations based on explicit convolutions with either sampled Gaussian kernels or integrated Gaussian kernels. While these computational benefits do also hold for the genuinely discrete approach for computing discrete analogues of Gaussian derivatives, based on convolution with the discrete analogue of the Gaussian kernel followed by central differences, the underlying mathematical primitives for the discrete analogue of the Gaussian kernel, in terms of modified Bessel functions of integer order, may not be available in certain frameworks for image processing, such as when performing deep learning based on scale-parameterized filters in terms of Gaussian derivatives, with learning of the scale levels. In this paper, we present a characterization of the properties of these hybrid discretization methods, in terms of quantitative performance measures concerning the amount of spatial smoothing that they imply, as well as the relative consistency of scale estimates obtained from scale-invariant feature detectors with automatic scale selection, with an emphasis on the behaviour for very small values of the scale parameter, which may differ significantly from corresponding results obtained from the fully continuous scale-space theory, as well as between different types of discretization methods.
An additive Runge-Kutta method is used for the time stepping, which integrates the linear stiff terms by an explicit singly diagonally implicit Runge-Kutta (ESDIRK) method and the nonlinear terms by an explicit Runge-Kutta (ERK) method. In each time step, the implicit solve is performed by the recently developed Hierarchical Poincar\'e-Steklov (HPS) method. This is a fast direct solver for elliptic equations that decomposes the space domain into a hierarchical tree of subdomains and builds spectral collocation solvers locally on the subdomains. These ideas are naturally combined in the presented method since the singly diagonal coefficient in ESDIRK and a fixed time-step ensures that the coefficient matrix in the implicit solve of HPS remains the same for all time stages. This means that the precomputed inverse can be efficiently reused, leading to a scheme with complexity (in two dimensions) $\mathcal{O}(N^{1.5})$ for the precomputation where the solution operator to the elliptic problems is built, and then $\mathcal{O}(N \log N)$ for the solve in each time step. The stability of the method is proved for first order in time and any order in space, and numerical evidence substantiates a claim of stability for a much broader class of time discretization methods. Numerical experiments supporting the accuracy of efficiency of the method in one and two dimensions are presented.
Semi-algebraic priors are ubiquitous in signal processing and machine learning. Prevalent examples include a) linear models where the signal lies in a low-dimensional subspace; b) sparse models where the signal can be represented by only a few coefficients under a suitable basis; and c) a large family of neural network generative models. In this paper, we prove a transversality theorem for semi-algebraic sets in orthogonal or unitary representations of groups: with a suitable dimension bound, a generic translate of any semi-algebraic set is transverse to the orbits of the group action. This, in turn, implies that if a signal lies in a low-dimensional semi-algebraic set, then it can be recovered uniquely from measurements that separate orbits. As an application, we consider the implications of the transversality theorem to the problem of recovering signals that are translated by random group actions from their second moment. As a special case, we discuss cryo-EM: a leading technology to constitute the spatial structure of biological molecules, which serves as our prime motivation. In particular, we derive explicit bounds for recovering a molecular structure from the second moment under a semi-algebraic prior and deduce information-theoretic implications. We also obtain information-theoretic bounds for three additional applications: factoring Gram matrices, multi-reference alignment, and phase retrieval. Finally, we deduce bounds for designing permutation invariant separators in machine learning.
This manuscript derives locally weighted ensemble Kalman methods from the point of view of ensemble-based function approximation. This is done by using pointwise evaluations to build up a local linear or quadratic approximation of a function, tapering off the effect of distant particles via local weighting. This introduces a candidate method (the locally weighted Ensemble Kalman method for inversion) with the motivation of combining some of the strengths of the particle filter (ability to cope with nonlinear maps and non-Gaussian distributions) and the Ensemble Kalman filter (no filter degeneracy).
We suggest new closely related methods for numerical inversion of $Z$-transform and Wiener-Hopf factorization of functions on the unit circle, based on sinh-deformations of the contours of integration, corresponding changes of variables and the simplified trapezoid rule. As applications, we consider evaluation of high moments of probability distributions and construction of causal filters. Programs in Matlab running on a Mac with moderate characteristics achieves the precision E-14 in several dozen of microseconds and E-11 in several milliseconds, respectively.
We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria or Lepskii principle.