This paper considers Bayesian inference for the partially linear model. Our approach exploits a parametrization of the regression function that is tailored toward estimating a low-dimensional parameter of interest. The key property of the parametrization is that it generates a Neyman orthogonal moment condition meaning that the low-dimensional parameter is less sensitive to the estimation of nuisance parameters. Our large sample analysis supports this claim. In particular, we derive sufficient conditions under which the posterior for the low-dimensional parameter contracts around the truth at the parametric rate and is asymptotically normal with a variance that coincides with the semiparametric efficiency bound. These conditions allow for a larger class of nuisance parameters relative to the original parametrization of the regression model. Overall, we conclude that a parametrization that embeds Neyman orthogonality can be a useful device for debiasing posterior distributions in semiparametric models.
In this paper, we propose a method for estimating model parameters using Small-Angle Scattering (SAS) data based on the Bayesian inference. Conventional SAS data analyses involve processes of manual parameter adjustment by analysts or optimization using gradient methods. These analysis processes tend to involve heuristic approaches and may lead to local solutions.Furthermore, it is difficult to evaluate the reliability of the results obtained by conventional analysis methods. Our method solves these problems by estimating model parameters as probability distributions from SAS data using the framework of the Bayesian inference. We evaluate the performance of our method through numerical experiments using artificial data of representative measurement target models.From the results of the numerical experiments, we show that our method provides not only high accuracy and reliability of estimation, but also perspectives on the transition point of estimability with respect to the measurement time and the lower bound of the angular domain of the measured data.
We consider estimation and inference with data collected from episodic reinforcement learning (RL) algorithms; i.e. adaptive experimentation algorithms that at each period (aka episode) interact multiple times in a sequential manner with a single treated unit. Our goal is to be able to evaluate counterfactual adaptive policies after data collection and to estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e.g. what was the effect of the first period action on the final outcome). Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to $Z$-estimation approaches in the case of static data. However, such estimators fail to be asymptotically normal in the case of adaptive data collection. We propose a re-weighted $Z$-estimation approach with carefully designed adaptive weights to stabilize the episode-varying estimation variance, which results from the nonstationary policy that typical episodic RL algorithms invoke. We identify proper weighting schemes to restore the consistency and asymptotic normality of the re-weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions for target parameters of interest. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.
The aim of this article is to propose a new reduced-order modelling approach for parametric eigenvalue problems arising in electronic structure calculations. Namely, we develop nonlinear reduced basis techniques for the approximation of parametric eigenvalue problems inspired from quantum chemistry applications. More precisely, we consider here a one-dimensional model which is a toy model for the computation of the electronic ground state wavefunction of a system of electrons within a molecule, solution to the many-body electronic Schr\"odinger equation, where the varying parameters are the positions of the nuclei in the molecule. We estimate the decay rate of the Kolmogorov n-width of the set of solutions for this parametric problem in several settings, including the standard L2-norm as well as with distances based on optimal transport. The fact that the latter decays much faster than in the traditional L2-norm setting motivates us to propose a practical nonlinear reduced basis method, which is based on an offline greedy algorithm, and an efficient stochastic energy minimization in the online phase. We finally provide numerical results illustrating the capabilities of the method and good approximation properties, both in the offline and the online phase.
Recently developed reduced-order modeling techniques aim to approximate nonlinear dynamical systems on low-dimensional manifolds learned from data. This is an effective approach for modeling dynamics in a post-transient regime where the effects of initial conditions and other disturbances have decayed. However, modeling transient dynamics near an underlying manifold, as needed for real-time control and forecasting applications, is complicated by the effects of fast dynamics and nonnormal sensitivity mechanisms. To begin to address these issues, we introduce a parametric class of nonlinear projections described by constrained autoencoder neural networks in which both the manifold and the projection fibers are learned from data. Our architecture uses invertible activation functions and biorthogonal weight matrices to ensure that the encoder is a left inverse of the decoder. We also introduce new dynamics-aware cost functions that promote learning of oblique projection fibers that account for fast dynamics and nonnormality. To demonstrate these methods and the specific challenges they address, we provide a detailed case study of a three-state model of vortex shedding in the wake of a bluff body immersed in a fluid, which has a two-dimensional slow manifold that can be computed analytically. In anticipation of future applications to high-dimensional systems, we also propose several techniques for constructing computationally efficient reduced-order models using our proposed nonlinear projection framework. This includes a novel sparsity-promoting penalty for the encoder that avoids detrimental weight matrix shrinkage via computation on the Grassmann manifold.
Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanism, tensor normalization, inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism to smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over 20%. Furthermore, we have developed a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. Scalability is at the heart of our model's design, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, all while maintaining outstanding performance metrics. Rigorous validation of our model design is achieved through a series of comprehensive experiments on our self-collected corpus, boasting a size exceeding 6TB and containing over 2 trillion tokens. To ensure data quality and relevance, we implement a new self-cleaning strategy to filter our collected data. Our pre-trained models will be released to foster community advancements in efficient LLMs.
Turbulence parametrizations will remain a necessary building block in kilometer-scale Earth system models. In convective boundary layers, where the mean vertical gradients of conserved properties such as potential temperature and moisture are approximately zero, the standard ansatz which relates turbulent fluxes to mean vertical gradients via an eddy diffusivity has to be extended by mass flux parametrizations for the typically asymmetric up- and downdrafts in the atmospheric boundary layer. In this work, we present a parametrization for a dry convective boundary layer based on a generative adversarial network. The model incorporates the physics of self-similar layer growth following from the classical mixed layer theory by Deardorff. This enhances the training data base of the generative machine learning algorithm and thus significantly improves the predicted statistics of the synthetically generated turbulence fields at different heights inside the boundary layer. The algorithm training is based on fully three-dimensional direct numerical simulation data. Differently to stochastic parametrizations, our model is able to predict the highly non-Gaussian transient statistics of buoyancy fluctuations, vertical velocity, and buoyancy flux at different heights thus also capturing the fastest thermals penetrating into the stabilized top region. The results of our generative algorithm agree with standard two-equation or multi-plume stochastic mass-flux schemes. The present parametrization provides additionally the granule-type horizontal organization of the turbulent convection which cannot be obtained in any of the other model closures. Our work paves the way to efficient data-driven convective parametrizations in other natural flows, such as moist convection, upper ocean mixing, or convection in stellar interiors.
Splines over triangulations and splines over quadrangulations (tensor product splines) are two common ways to extend bivariate polynomials to splines. However, combination of both approaches leads to splines defined over mixed triangle and quadrilateral meshes using the isogeometric approach. Mixed meshes are especially useful for representing complicated geometries obtained e.g. from trimming. As (bi-)linearly parameterized mesh elements are not flexible enough to cover smooth domains, we focus in this work on the case of planar mixed meshes parameterized by (bi-)quadratic geometry mappings. In particular we study in detail the space of $C^1$-smooth isogeometric spline functions of general polynomial degree over two such mixed mesh elements. We present the theoretical framework to analyze the smoothness conditions over the common interface for all possible configurations of mesh elements. This comprises the investigation of the dimension as well as the construction of a basis of the corresponding $C^1$-smooth isogeometric spline space over the domain described by two elements. Several examples of interest are presented in detail.
We present a scheme for finding all roots of an analytic function in a square domain in the complex plane. The scheme can be viewed as a generalization of the classical approach to finding roots of a function on the real line, by first approximating it by a polynomial in the Chebyshev basis, followed by diagonalizing the so-called ''colleague matrices''. Our extension of the classical approach is based on several observations that enable the construction of polynomial bases in compact domains that satisfy three-term recurrences and are reasonably well-conditioned. This class of polynomial bases gives rise to ''generalized colleague matrices'', whose eigenvalues are roots of functions expressed in these bases. In this paper, we also introduce a special-purpose QR algorithm for finding the eigenvalues of generalized colleague matrices, which is a straightforward extension of the recently introduced componentwise stable QR algorithm for the classical cases (See [Serkh]). The performance of the schemes is illustrated with several numerical examples.
The history of ternary adders goes back to more than six decades ago. Since then, a multitude of ternary full adders (TFAs) have been presented in the literature. This paper conducts a review of TFAs so that one can be familiar with the utilized design methodologies and their prevalence. Moreover, despite numerous TFAs, almost none of them are in their simplest form. A large number of transistors could have been eliminated by considering a partial TFA instead of a complete one. According to our investigation, only 28.6% of the previous designs are partial TFAs. Also, they could have been simplified even further by assuming a partial TFA with an output carry voltage of 0V or VDD. This way, in a single-VDD design, voltage division inside the Carry generator part would have been eliminated and less power dissipated. As far as we have searched, there are only three partial TFAs with this favorable condition in the literature. Additionally, most of the simulation setups in the previous articles are not realistic enough. Therefore, the simulation results reported in these papers are neither comparable nor entirely valid. Therefore, we got motivated to conduct a survey, elaborate on this issue, and enhance some of the previous designs. Among 84 papers, 10 different TFAs (from 11 papers) are selected, simplified, and simulated in this paper. Simulation results by HSPICE and 32nm CNFET technology reveal that the simplified partial TFAs outperform their original versions in terms of delay, power, and transistor count.