The dot product attention mechanism, originally designed for natural language processing (NLP) tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum-many body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems. Interestingly, the same arguments can be extended to the NLP domain, in the limit of long input sentences.
This paper studies a family of convolution quadratures, a numerical technique for efficient evaluation of convolution integrals. We employ the block generalized Adams method to discretize the underlying initial value problem, departing from the well-established approaches that rely on linear multistep formulas or Runge-Kutta methods. The convergence order of the proposed convolution quadrature can be dynamically controlled without requiring grid point adjustments, enhancing exibility. Through strategic selection of the local interpolation polynomial and block size, the method achieves high-order convergence for calculation of convolution integrals with hyperbolic kernels. We provide a rigorous convergence analysis for the proposed convolution quadrature and numerically validate our theoretical findings for various convolution integrals.
We formulate and analyze a multiscale method for an elliptic problem with an oscillatory coefficient based on a skeletal (hybrid) formulation. More precisely, we employ hybrid discontinuous Galerkin approaches and combine them with the localized orthogonal decomposition methodology to obtain a coarse-scale skeletal method that effectively includes fine-scale information. This work is the first step in reliably merging hybrid skeletal formulations and localized orthogonal decomposition to unite the advantages of both strategies. Numerical experiments are presented to illustrate the theoretical findings.
We consider a high-dimensional sparse normal means model where the goal is to estimate the mean vector assuming the proportion of non-zero means is unknown. We model the mean vector by a one-group global-local shrinkage prior belonging to a broad class of such priors that includes the horseshoe prior. We address some questions related to asymptotic properties of the resulting posterior distribution of the mean vector for the said class priors. We consider two ways to model the global parameter in this paper. Firstly by considering this as an unknown fixed parameter and then by an empirical Bayes estimate of it. In the second approach, we do a hierarchical Bayes treatment by assigning a suitable non-degenerate prior distribution to it. We first show that for the class of priors under study, the posterior distribution of the mean vector contracts around the true parameter at a near minimax rate when the empirical Bayes approach is used. Next, we prove that in the hierarchical Bayes approach, the corresponding Bayes estimate attains the minimax risk asymptotically under the squared error loss function. We also show that the posterior contracts around the true parameter at a near minimax rate. These results generalize those of van der Pas et al. (2014) \cite{van2014horseshoe}, (2017) \cite{van2017adaptive}, proved for the horseshoe prior. We have also studied in this work the asymptotic Bayes optimality of global-local shrinkage priors where the number of non-null hypotheses is unknown. Here our target is to propose some conditions on the prior density of the global parameter such that the Bayes risk induced by the decision rule attains Optimal Bayes risk, up to some multiplicative constant. Using our proposed condition, under the asymptotic framework of Bogdan et al. (2011) \cite{bogdan2011asymptotic}, we are able to provide an affirmative answer to satisfy our hunch.
Shape-restricted inferences have exhibited empirical success in various applications with survival data. However, certain works fall short in providing a rigorous theoretical justification and an easy-to-use variance estimator with theoretical guarantee. Motivated by Deng et al. (2023), this paper delves into an additive and shape-restricted partially linear Cox model for right-censored data, where each additive component satisfies a specific shape restriction, encompassing monotonic increasing/decreasing and convexity/concavity. We systematically investigate the consistencies and convergence rates of the shape-restricted maximum partial likelihood estimator (SMPLE) of all the underlying parameters. We further establish the aymptotic normality and semiparametric effiency of the SMPLE for the linear covariate shift. To estimate the asymptotic variance, we propose an innovative data-splitting variance estimation method that boasts exceptional versatility and broad applicability. Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)'s method when the Cox model is violated or the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates based on the SMPLE and its competitors.
We review some recent development in the theory of spatial extremes related to Pareto Processes and modeling of threshold exceedances. We provide theoretical background, methodology for modeling, simulation and inference as well as an illustration to wave height modelling. This preprint is an author version of a chapter to appear in a collaborative book.
Although Bayesian skew-normal models are useful for flexibly modeling spatio-temporal processes, they still have difficulty in computation cost and interpretability in their mean and variance parameters, including regression coefficients. To address these problems, this study proposes a spatio-temporal model that incorporates skewness while maintaining mean and variance, by applying the flexible subclass of the closed skew-normal distribution. An efficient sampling method is introduced, leveraging the autoregressive representation of the model. Additionally, the model's symmetry concerning spatial order is demonstrated, and Mardia's skewness and kurtosis are derived, showing independence from the mean and variance. Simulation studies compare the estimation performance of the proposed model with that of the Gaussian model. The result confirms its superiority in high skewness and low observation noise scenarios. The identification of Cobb-Douglas production functions across US states is examined as an application to real data, revealing that the proposed model excels in both goodness-of-fit and predictive performance.
This paper shows how to use the shooting method, a classical numerical algorithm for solving boundary value problems, to compute the Riemannian distance on the Stiefel manifold $ \mathrm{St}(n,p) $, the set of $ n \times p $ matrices with orthonormal columns. The proposed method is a shooting method in the sense of the classical shooting methods for solving boundary value problems; see, e.g., Stoer and Bulirsch, 1991. The main feature is that we provide an approximate formula for the Fr\'{e}chet derivative of the geodesic involved in our shooting method. Numerical experiments demonstrate the algorithms' accuracy and performance. Comparisons with existing state-of-the-art algorithms for solving the same problem show that our method is competitive and even beats several algorithms in many cases.
We propose a novel projection method to treat near-incompressibility and volumetric locking in small- and large-deformation elasticity and plasticity within the context of higher order material point methods. The material point method is well known to exhibit volumetric locking due to the presence of large numbers of material points per element that are used to decrease the quadrature error. Although there has been considerable research on the treatment of near-incompressibility in the traditional material point method, the issue has not been studied in depth for higher order material point methods. Using the Bbar and Fbar methods as our point of departure we develop an appropriate projection technique for material point methods that use higher order shape functions for the background discretization. The approach is based on the projection of the dilatational part of the appropriate strain rate measure onto a lower dimensional approximation space, according to the traditional Bbar and Fbar techniques, but tailored to the material point method. The presented numerical examples exhibit reduced stress oscillations and are free of volumetric locking and hourglassing phenomena.
This paper proposes a second-order accurate direct Eulerian generalized Riemann problem (GRP) scheme for the ten-moment Gaussian closure equations with source terms. The generalized Riemann invariants associated with the rarefaction waves, the contact discontinuity and the shear waves are given, and the 1D exact Riemann solver is obtained. After that, the generalized Riemann invariants and the Rankine-Hugoniot jump conditions are directly used to resolve the left and right nonlinear waves (rarefaction wave and shock wave) of the local GRP in Eulerian formulation, and then the 1D direct Eulerian GRP scheme is derived. They are much more complicated, technical and nontrivial due to more physical variables and elementary waves. Some 1D and 2D numerical experiments are presented to check the accuracy and high resolution of the proposed GRP schemes, where the 2D direct Eulerian GRP scheme is given by using the Strang splitting method for simplicity. It should be emphasized that several examples of 2D Riemann problems are constructed for the first time.
We study weighted basic parallel processes (WBPP), a nonlinear recursive generalisation of weighted finite automata inspired from process algebra and Petri net theory. Our main result is an algorithm of 2-EXPSPACE complexity for the WBPP equivalence problem. While (unweighted) BPP language equivalence is undecidable, we can use this algorithm to decide multiplicity equivalence of BPP and language equivalence of unambiguous BPP, with the same complexity. These are long-standing open problems for the related model of weighted context-free grammars. Our second contribution is a connection between WBPP, power series solutions of systems of polynomial differential equations, and combinatorial enumeration. To this end we consider constructible differentially finite power series (CDF), a class of multivariate differentially algebraic series introduced by Bergeron and Reutenauer in order to provide a combinatorial interpretation to differential equations. CDF series generalise rational, algebraic, and a large class of D-finite (holonomic) series, for which decidability of equivalence was an open problem. We show that CDF series correspond to commutative WBPP series. As a consequence of our result on WBPP and commutativity, we show that equivalence of CDF power series can be decided with 2-EXPTIME complexity. The complexity analysis is based on effective bounds from algebraic geometry, namely on the length of chains of polynomial ideals constructed by repeated application of finitely many, not necessarily commuting derivations of a multivariate polynomial ring. This is obtained by generalising a result of Novikov and Yakovenko in the case of a single derivation, which is noteworthy since generic bounds on ideal chains are non-primitive recursive in general. On the way, we develop the theory of \WBPP~series and \CDF~power series, exposing several of their appealing properties.