亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the approximation capacity of some variation spaces corresponding to shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth functions are contained in these spaces with finite variation norms. For functions with less smoothness, the approximation rates in terms of the variation norm are established. Using these results, we are able to prove the optimal approximation rates in terms of the number of neurons for shallow ReLU$^k$ neural networks. It is also shown how these results can be used to derive approximation bounds for deep neural networks and convolutional neural networks (CNNs). As applications, we study convergence rates for nonparametric regression using three ReLU neural network models: shallow neural network, over-parameterized neural network, and CNN. In particular, we show that shallow neural networks can achieve the minimax optimal rates for learning H\"older functions, which complements recent results for deep neural networks. It is also proven that over-parameterized (deep or shallow) neural networks can achieve nearly optimal rates for nonparametric regression.

相關內容

神經網絡(Neural Networks)是世界上三個最古老的神經建模學會的檔案期刊:國際神經網絡學會(INNS)、歐洲神經網絡學會(ENNS)和日本神經網絡學會(JNNS)。神經網絡提供了一個論壇,以發展和培育一個國際社會的學者和實踐者感興趣的所有方面的神經網絡和相關方法的計算智能。神經網絡歡迎高質量論文的提交,有助于全面的神經網絡研究,從行為和大腦建模,學習算法,通過數學和計算分析,系統的工程和技術應用,大量使用神經網絡的概念和技術。這一獨特而廣泛的范圍促進了生物和技術研究之間的思想交流,并有助于促進對生物啟發的計算智能感興趣的跨學科社區的發展。因此,神經網絡編委會代表的專家領域包括心理學,神經生物學,計算機科學,工程,數學,物理。該雜志發表文章、信件和評論以及給編輯的信件、社論、時事、軟件調查和專利信息。文章發表在五個部分之一:認知科學,神經科學,學習系統,數學和計算分析、工程和應用。 官網地址:

Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used to answer research questions related to non-linear changes in cognitive aging: the linear mixed model (LMM) with a quadratic term, LMM with splines, the functional mixed model, the piecewise linear mixed model, and the sigmoidal mixed model. We first theoretically describe the models. Next, using data from two prospective cohorts with annual cognitive testing, we compared the interpretation of the models by investigating associations of education on cognitive change before death. Lastly, we performed a simulation study to empirically evaluate the models and provide practical recommendations. Except for the LMM-quadratic, the fit of all models was generally adequate to capture non-linearity of cognitive change and models were relatively robust. Although spline-based models have no interpretable nonlinearity parameters, their convergence was easier to achieve, and they allow graphical interpretation. In contrast, piecewise and sigmoidal models, with interpretable non-linear parameters, may require more data to achieve convergence.

Program synthesis with Genetic Programming searches for a correct program that satisfies the input specification, which is usually provided as input-output examples. One particular challenge is how to effectively handle loops and recursion avoiding programs that never terminate. A helpful abstraction that can alleviate this problem is the employment of Recursion Schemes that generalize the combination of data production and consumption. Recursion Schemes are very powerful as they allow the construction of programs that can summarize data, create sequences, and perform advanced calculations. The main advantage of writing a program using Recursion Schemes is that the programs are composed of well defined templates with only a few parts that need to be synthesized. In this paper we make an initial study of the benefits of using program synthesis with fold and unfold templates, and outline some preliminary experimental results. To highlight the advantages and disadvantages of this approach, we manually solved the entire GPSB benchmark using recursion schemes, highlighting the parts that should be evolved compared to alternative implementations. We noticed that, once the choice of which recursion scheme is made, the synthesis process can be simplified as each of the missing parts of the template are reduced to simpler functions, which are further constrained by their own input and output types.

We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=\sigma(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $\sigma : {\mathbb R} \to {\mathbb R}^+$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $\sigma(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\chi\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $\mu(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \chi$. We propose a two-level nested methodology: an inner level where, for a given $\chi$, we compute an optimizer $D^\star(\chi)$ by a gradient system approach, and an outer level where we tune $\chi$ so that the value $c$ is reached by $\mu(D^\star(\chi)A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.

We study strong approximation of scalar additive noise driven stochastic differential equations (SDEs) at time point $1$ in the case that the drift coefficient is bounded and has Sobolev regularity $s\in(0,1)$. Recently, it has been shown in [arXiv:2101.12185v2 (2022)] that for such SDEs the equidistant Euler approximation achieves an $L^2$-error rate of at least $(1+s)/2$, up to an arbitrary small $\varepsilon$, in terms of the number of evaluations of the driving Brownian motion $W$. In the present article we prove a matching lower error bound for $s\in(1/2,1)$. More precisely we show that, for every $s\in(1/2,1)$, the $L^2$-error rate $(1+s)/2$ can, up to a logarithmic term, not be improved in general by no numerical method based on finitely many evaluations of $W$ at fixed time points. Up to now, this result was known in the literature only for the cases $s=1/2-$ and $s=1-$. For the proof we employ the coupling of noise technique recently introduced in [arXiv:2010.00915 (2020)] to bound the $L^2$-error of an arbitrary approximation from below by the $L^2$-distance of two occupation time functionals provided by a specifically chosen drift coefficient with Sobolev regularity $s$ and two solutions of the corresponding SDE with coupled driving Brownian motions. For the analysis of the latter distance we employ a transformation of the original SDE to overcome the problem of correlated increments of the difference of the two coupled solutions, occupation time estimates to cope with the lack of regularity of the chosen drift coefficient around the point $0$ and scaling properties of the drift coefficient.

The scale function holds significant importance within the fluctuation theory of Levy processes, particularly in addressing exit problems. However, its definition is established through the Laplace transform, thereby lacking explicit representations in general. This paper introduces a novel series representation for this scale function, employing Laguerre polynomials to construct a uniformly convergent approximate sequence. Additionally, we derive statistical inference based on specific discrete observations, presenting estimators of scale functions that are asymptotically normal.

We extend generalized functional linear models under independence to a situation in which a functional covariate is related to a scalar response variable that exhibits spatial dependence. For estimation, we apply basis expansion and truncation for dimension reduction of the covariate process followed by a composite likelihood estimating equation to handle the spatial dependency. We develop asymptotic results for the proposed model under a repeating lattice asymptotic context, allowing us to construct a confidence interval for the spatial dependence parameter and a confidence band for the parameter function. A binary conditionals model is presented as a concrete illustration and is used in simulation studies to verify the applicability of the asymptotic inferential results.

We introduce a fine-grained framework for uncertainty quantification of predictive models under distributional shifts. This framework distinguishes the shift in covariate distributions from that in the conditional relationship between the outcome (Y) and the covariates (X). We propose to reweight the training samples to adjust for an identifiable covariate shift while protecting against worst-case conditional distribution shift bounded in an $f$-divergence ball. Based on ideas from conformal inference and distributionally robust learning, we present an algorithm that outputs (approximately) valid and efficient prediction intervals in the presence of distributional shifts. As a use case, we apply the framework to sensitivity analysis of individual treatment effects with hidden confounding. The proposed methods are evaluated in simulation studies and three real data applications, demonstrating superior robustness and efficiency compared with existing benchmarks.

We develop a sparse hierarchical $hp$-finite element method ($hp$-FEM) for the Helmholtz equation with rotationally invariant variable coefficients posed on a two-dimensional disk or annulus. The mesh is an inner disk cell (omitted if on an annulus domain) and concentric annuli cells. The discretization preserves the Fourier mode decoupling of rotationally invariant operators, such as the Laplacian, which manifests as block diagonal mass and stiffness matrices. Moreover, the matrices have a sparsity pattern independent of the order of the discretization and admit an optimal complexity factorization. The sparse $hp$-FEM can handle radial discontinuities in the right-hand side and in rotationally invariant Helmholtz coefficients. We consider examples such as a high-frequency Helmholtz equation with radial discontinuities, the time-dependent Schr\"odinger equation, and an extension to a three-dimensional cylinder domain, with a quasi-optimal solve, via the Alternating Direction Implicit (ADI) algorithm.

In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity could exhibit a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights -- in particular their effective rank -- influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

北京阿比特科技有限公司