久久久久久久精品少妇9999-中文字幕精品无码福利电影

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.

相關內容

分離的

關注 1

近似 · Networking · ReLU · Neural Networks · Analysis ·

2024 年 6 月 21 日

On the growth of the parameters of approximating ReLU neural networks

Erion Morina,Martin Holler

This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures, e.g., in terms of width or depth of the networks, we are concerned with the asymptotic growth of the parameters of approximating networks. Such results are of interest, e.g., for error analysis or consistency results for neural network training. The main result of our work is that, for a ReLU architecture with state of the art approximation error, the realizing parameters grow at most polynomially. The obtained rate with respect to a normalized network size is compared to existing results and is shown to be superior in most cases, in particular for high dimensional input.

Networking · Neural Networks · Learning · 深度學習 · Principle ·

2024 年 6 月 21 日

An extrapolation-driven network architecture for physics-informed deep learning

Yong Wang,Yanzhong Yao,Zhiming Gao

Deep learning with physics-informed neural networks (PINNs) has emerged as a highly popular and effective approach for solving partial differential equations(PDEs). In this paper, we first investigate the extrapolation capability of the PINN method for time-dependent PDEs. Taking advantage of this extrapolation property, we can generalize the training result obtained in the time subinterval to the large interval by adding a correction term to the network parameters of the subinterval. The correction term is determined by further training with the sample points in the added subinterval. Secondly, by designing an extrapolation control function with special characteristics and combining it with the correction term, we construct a new neural network architecture whose network parameters are coupled with the time variable, which we call the extrapolation-driven network architecture. Based on this architecture, using a single neural network, we can obtain the overall PINN solution of the whole domain with the following two characteristics: (1) it completely inherits the local solution of the interval obtained from the previous training, (2) at the interval node, it strictly maintains the continuity and smoothness that the true solution has. The extrapolation-driven network architecture allows us to divide a large time domain into multiple subintervals and solve the time-dependent PDEs one by one in chronological order. This training scheme respects the causality principle and effectively overcomes the difficulties of the conventional PINN method in solving the evolution equation on a large time domain. Numerical experiments verify the performance of our proposed method.

聯合分布 · 神經元 · Networking · 相關系數 · Neural Networks ·

2024 年 6 月 20 日

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks

Jarek Duda

from arxiv, 6 pages, 4 figures

Popular artificial neural networks (ANN) optimize parameters for unidirectional value propagation, assuming some arbitrary parametrization type like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). In contrast, for biological neurons e.g. "it is not uncommon for axonal propagation of action potentials to happen in both directions"~\cite{axon} - suggesting they are optimized to continuously operate in multidirectional way. Additionally, statistical dependencies a single neuron could model is not just (expected) value dependence, but entire joint distributions including also higher moments. Such more agnostic joint distribution neuron would allow for multidirectional propagation (of distributions or values) e.g. $\rho(x|y,z)$ or $\rho(y,z|x)$ by substituting to $\rho(x,y,z)$ and normalizing. There will be discussed Hierarchical Correlation Reconstruction (HCR) for such neuron model: assuming $\rho(x,y,z)=\sum_{ijk} a_{ijk} f_i(x) f_j(y) f_k(z)$ type parametrization of joint distribution in polynomial basis $f_i$, which allows for flexible, inexpensive processing including nonlinearities, direct model estimation and update, trained through standard backpropagation or novel ways for such structure up to tensor decomposition or information bottleneck approach. Using only pairwise (input-output) dependencies, its expected value prediction becomes KAN-like with trained activation functions as polynomials, can be extended by adding higher order dependencies through included products - in conscious interpretable way, allowing for multidirectional propagation of both values and probability densities.

線性的 · 線性回歸 · 前向 · motivation · MoDELS ·

2024 年 6 月 20 日

Convergence guarantees for forward gradient descent in the linear regression model

Thijs Bos,Johannes Schmidt-Hieber

from arxiv, 17 pages

Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters and k the number of samples, we prove that the mean squared error of this method converges for $k\gtrsim d^2\log(d)$ with rate $d^2\log(d)/k.$ Compared to the dimension dependence d for stochastic gradient descent, an additional factor $d\log(d)$ occurs.

泛函 · 近似 · 情景 · Performer · Continuity ·

2024 年 6 月 19 日

MsFEM for advection-dominated problems in heterogeneous media: Stabilization via nonconforming variants

Rutger A. Biezemans,Claude Le Bris,Frédéric Legoll,Alexei Lozinski

We study the numerical approximation of advection-diffusion equations with highly oscillatory coefficients and possibly dominant advection terms by means of the Multiscale Finite Element Method. The latter method is a now classical, finite element type method that performs a Galerkin approximation on a problem-dependent basis set, itself pre-computed in an offline stage. The approach is implemented here using basis functions that locally resolve both the diffusion and the advection terms. Variants with additional bubble functions and possibly weak inter-element continuity are proposed. Some theoretical arguments and a comprehensive set of numerical experiments allow to investigate and compare the stability and the accuracy of the approaches. The best approach constructed is shown to be adequate for both the diffusion- and advection-dominated regimes, and does not rely on an auxiliary stabilization parameter that would have to be properly adjusted.

線性的 · 線性回歸 · 推斷 · Performer · 估計/估計量 ·

2024 年 6 月 18 日

A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression

Isma?l Castillo,Alice L'Huillier,Kolyan Ray,Luke Travis

from arxiv, 46 pages, 5 figures

We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessing step and preserves the computational advantages of mean-field variational Bayes, while ensuring accurate and reliable inference for the target parameter, including for uncertainty quantification. We investigate the numerical performance of our algorithm, showing that it performs competitively with existing methods. We further establish accompanying theoretical guarantees for estimation and uncertainty quantification in the form of a Bernstein--von Mises theorem.

估計/估計量 · 推斷 · MoDELS · Networking · Neural Networks ·

2024 年 6 月 18 日

Neural Bayes estimators for censored inference with peaks-over-threshold models

Jordan Richards,Matthew Sainsbury-Dale,Andrew Zammit-Mangion,Rapha?l Huser

Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that {use data augmentation techniques} to encode censoring information in the neural network {input}. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (${\rm PM}_{2.5}$) concentration over the whole of Saudi Arabia.

方差 · MoDELS · 可理解性 · 統計方法 ·

2024 年 6 月 17 日

A conservation law for posterior predictive variance

Bertrand Clarke,Dean Dustin

from arxiv, 21 pages. arXiv admin note: text overlap with arXiv:2209.00636

We use the law of total variance to generate multiple expressions for the posterior predictive variance in Bayesian hierarchical models. These expressions are sums of terms involving conditional expectations and conditional variances. Since the posterior predictive variance is fixed given the hierarchical model, it represents a constant quantity that is conserved over the various expressions for it. The terms in the expressions can be assessed in absolute or relative terms to understand the main contributors to the length of prediction intervals. Also, sometimes these terms can be intepreted in the context of the hierarchical model. We show several examples, closed form and computational, to illustrate the uses of this approach in model assessment.

Neural Networks · Networking · Weight · 有偏 · Networks ·

2024 年 6 月 17 日

Fixed points of nonnegative neural networks

Tomasz J. Piotrowski,Renato L. G. Cavalcante,Mateusz Gabor

from arxiv, License: CC-BY 4.0, see //creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at //jmlr.org/papers/v25/23-0167.html

We use fixed point theory to analyze nonnegative neural networks, which we define as neural networks that map nonnegative vectors to nonnegative vectors. We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and (weakly) scalable mappings within the framework of nonlinear Perron-Frobenius theory. This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks having inputs and outputs of the same dimension, and these conditions are weaker than those recently obtained using arguments in convex analysis. Furthermore, we prove that the shape of the fixed point set of nonnegative neural networks with nonnegative weights and biases is an interval, which under mild conditions degenerates to a point. These results are then used to obtain the existence of fixed points of more general nonnegative neural networks. From a practical perspective, our results contribute to the understanding of the behavior of autoencoders, and we also offer valuable mathematical machinery for future developments in deep equilibrium models.

TD · Sphering · SPIN · 混合 · 表示 ·

2024 年 6 月 17 日

Bloch sphere representation for Rabi oscillation driven by Rashba field in the two-dimensional harmonic confinement

Kaichi Arai,Tatsuki Tojo,Kyozaburo Takeda

from arxiv, 31 pages, 10 figures

We studied the dynamical properties of Rabi oscillations driven by an alternating Rashba field applied to a two-dimensional (2D) harmonic confinement system. We solve the time-dependent (TD) Schr\"{o}dinger equation numerically and rewrite the resulting TD wavefunction onto the Bloch sphere (BS) using two BS parameters of the zenith ($\theta_B$) and azimuthal ($\phi_B$) angles, extracting the phase information $\phi_B$ as well as the mixing ratio $\theta_B$ between the two BS-pole states. We employed a two-state rotating wave (TSRW) approach and studied the fundamental features of $\theta_B$ and $\phi_B$ over time. The TSRW approach reveals a triangular wave formation in $\theta_B$. Moreover, at each apex of the triangular wave, the TD wavefunction passes through the BS pole, and the state is completely replaced by the opposite spin state. The TSRW approach also elucidates a linear change in $\phi_B$. The slope of $\phi_B$ vs. time is equal to the difference between the dynamical terms, leading to a confinement potential in the harmonic system. The TSRW approach further demonstrates a jump in the phase difference by $\pi$ when the wavefunction passes through the BS pole. The alternating Rashba field causes multiple successive Rabi transitions in the 2D harmonic system. We then introduce the effective BS (EBS) and transform these complicated transitions into an equivalent "single" Rabi one. Consequently, the EBS parameters $\theta_B^{\mathrm{eff}}$ and $\phi_B^{\mathrm{eff}}$ exhibit mixing and phase difference between two spin states $\alpha$ and $\beta$, leading to a deep understanding of the TD features of multi-Rabi oscillations. Furthermore, the combination of the BS representation with the TSRW approach successfully reveals the dynamical properties of the Rabi oscillation, even beyond the TSRW approximation.