亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.

相關內容

神(shen)(shen)經(jing)網(wang)絡(luo)(luo)(Neural Networks)是(shi)世界(jie)上三個最(zui)古老的(de)(de)神(shen)(shen)經(jing)建(jian)模(mo)學(xue)(xue)(xue)會(hui)(hui)的(de)(de)檔案期刊:國(guo)際(ji)神(shen)(shen)經(jing)網(wang)絡(luo)(luo)學(xue)(xue)(xue)會(hui)(hui)(INNS)、歐(ou)洲神(shen)(shen)經(jing)網(wang)絡(luo)(luo)學(xue)(xue)(xue)會(hui)(hui)(ENNS)和(he)(he)(he)日本神(shen)(shen)經(jing)網(wang)絡(luo)(luo)學(xue)(xue)(xue)會(hui)(hui)(JNNS)。神(shen)(shen)經(jing)網(wang)絡(luo)(luo)提(ti)(ti)供了一(yi)個論(lun)壇,以(yi)發(fa)展和(he)(he)(he)培育一(yi)個國(guo)際(ji)社會(hui)(hui)的(de)(de)學(xue)(xue)(xue)者和(he)(he)(he)實踐者感興趣的(de)(de)所有(you)方面(mian)(mian)的(de)(de)神(shen)(shen)經(jing)網(wang)絡(luo)(luo)和(he)(he)(he)相關方法(fa)的(de)(de)計(ji)算(suan)(suan)智能。神(shen)(shen)經(jing)網(wang)絡(luo)(luo)歡(huan)迎高質量(liang)論(lun)文(wen)的(de)(de)提(ti)(ti)交,有(you)助于(yu)全面(mian)(mian)的(de)(de)神(shen)(shen)經(jing)網(wang)絡(luo)(luo)研究(jiu),從行為和(he)(he)(he)大腦建(jian)模(mo),學(xue)(xue)(xue)習算(suan)(suan)法(fa),通(tong)過數(shu)學(xue)(xue)(xue)和(he)(he)(he)計(ji)算(suan)(suan)分析,系統的(de)(de)工(gong)程(cheng)(cheng)和(he)(he)(he)技術應用,大量(liang)使用神(shen)(shen)經(jing)網(wang)絡(luo)(luo)的(de)(de)概念和(he)(he)(he)技術。這一(yi)獨特而廣泛(fan)的(de)(de)范圍(wei)促(cu)進(jin)了生物和(he)(he)(he)技術研究(jiu)之間的(de)(de)思想交流,并有(you)助于(yu)促(cu)進(jin)對生物啟發(fa)的(de)(de)計(ji)算(suan)(suan)智能感興趣的(de)(de)跨學(xue)(xue)(xue)科(ke)社區的(de)(de)發(fa)展。因此,神(shen)(shen)經(jing)網(wang)絡(luo)(luo)編委會(hui)(hui)代表的(de)(de)專家領域包括心理學(xue)(xue)(xue),神(shen)(shen)經(jing)生物學(xue)(xue)(xue),計(ji)算(suan)(suan)機(ji)科(ke)學(xue)(xue)(xue),工(gong)程(cheng)(cheng),數(shu)學(xue)(xue)(xue),物理。該(gai)雜志發(fa)表文(wen)章、信(xin)(xin)件和(he)(he)(he)評(ping)論(lun)以(yi)及(ji)給編輯的(de)(de)信(xin)(xin)件、社論(lun)、時事、軟件調查和(he)(he)(he)專利信(xin)(xin)息。文(wen)章發(fa)表在五個部(bu)分之一(yi):認知科(ke)學(xue)(xue)(xue),神(shen)(shen)經(jing)科(ke)學(xue)(xue)(xue),學(xue)(xue)(xue)習系統,數(shu)學(xue)(xue)(xue)和(he)(he)(he)計(ji)算(suan)(suan)分析、工(gong)程(cheng)(cheng)和(he)(he)(he)應用。 官網(wang)地(di)址:

Human cognition operates on a "Global-first" cognitive mechanism, prioritizing information processing based on coarse-grained details. This mechanism inherently possesses an adaptive multi-granularity description capacity, resulting in computational traits such as efficiency, robustness, and interpretability. The analysis pattern reliance on the finest granularity and single-granularity makes most existing computational methods less efficient, robust, and interpretable, which is an important reason for the current lack of interpretability in neural networks. Multi-granularity granular-ball computing employs granular-balls of varying sizes to daptively represent and envelop the sample space, facilitating learning based on these granular-balls. Given that the number of coarse-grained "granular-balls" is fewer than sample points, granular-ball computing proves more efficient. Moreover, the inherent coarse-grained nature of granular-balls reduces susceptibility to fine-grained sample disturbances, enhancing robustness. The multi-granularity construct of granular-balls generates topological structures and coarse-grained descriptions, naturally augmenting interpretability. Granular-ball computing has successfully ventured into diverse AI domains, fostering the development of innovative theoretical methods, including granular-ball classifiers, clustering techniques, neural networks, rough sets, and evolutionary computing. This has notably ameliorated the efficiency, noise robustness, and interpretability of traditional methods. Overall, granular-ball computing is a rare and innovative theoretical approach in AI that can adaptively and simultaneously enhance efficiency, robustness, and interpretability. This article delves into the main application landscapes for granular-ball computing, aiming to equip future researchers with references and insights to refine and expand this promising theory.

This study examines the varying coefficient model in tail index regression. The varying coefficient model is an efficient semiparametric model that avoids the curse of dimensionality when including large covariates in the model. In fact, the varying coefficient model is useful in mean, quantile, and other regressions. The tail index regression is not an exception. However, the varying coefficient model is flexible, but leaner and simpler models are preferred for applications. Therefore, it is important to evaluate whether the estimated coefficient function varies significantly with covariates. If the effect of the non-linearity of the model is weak, the varying coefficient structure is reduced to a simpler model, such as a constant or zero. Accordingly, the hypothesis test for model assessment in the varying coefficient model has been discussed in mean and quantile regression. However, there are no results in tail index regression. In this study, we investigate the asymptotic properties of an estimator and provide a hypothesis testing method for varying coefficient models for tail index regression.

We adopt the integral definition of the fractional Laplace operator and study an optimal control problem on Lipschitz domains that involves a fractional elliptic partial differential equation (PDE) as state equation and a control variable that enters the state equation as a coefficient; pointwise constraints on the control variable are considered as well. We establish the existence of optimal solutions and analyze first and, necessary and sufficient, second order optimality conditions. Regularity estimates for optimal variables are also analyzed. We develop two finite element discretization strategies: a semidiscrete scheme in which the control variable is not discretized, and a fully discrete scheme in which the control variable is discretized with piecewise constant functions. For both schemes, we analyze the convergence properties of discretizations and derive error estimates.

We present an approach for analyzing message passing graph neural networks (MPNNs) based on an extension of graphon analysis to a so called graphon-signal analysis. A MPNN is a function that takes a graph and a signal on the graph (a graph-signal) and returns some value. Since the input space of MPNNs is non-Euclidean, i.e., graphs can be of any size and topology, properties such as generalization are less well understood for MPNNs than for Euclidean neural networks. We claim that one important missing ingredient in past work is a meaningful notion of graph-signal similarity measure, that endows the space of inputs to MPNNs with a regular structure. We present such a similarity measure, called the graphon-signal cut distance, which makes the space of all graph-signals a dense subset of a compact metric space -- the graphon-signal space. Informally, two deterministic graph-signals are close in cut distance if they ``look like'' they were sampled from the same random graph-signal model. Hence, our cut distance is a natural notion of graph-signal similarity, which allows comparing any pair of graph-signals of any size and topology. We prove that MPNNs are Lipschitz continuous functions over the graphon-signal metric space. We then give two applications of this result: 1) a generalization bound for MPNNs, and, 2) the stability of MPNNs to subsampling of graph-signals. Our results apply to any regular enough MPNN on any distribution of graph-signals, making the analysis rather universal.

Deep learning methods are emerging as popular computational tools for solving forward and inverse problems in traffic flow. In this paper, we study a neural operator framework for learning solutions to nonlinear hyperbolic partial differential equations with applications in macroscopic traffic flow models. In this framework, an operator is trained to map heterogeneous and sparse traffic input data to the complete macroscopic traffic state in a supervised learning setting. We chose a physics-informed Fourier neural operator ($\pi$-FNO) as the operator, where an additional physics loss based on a discrete conservation law regularizes the problem during training to improve the shock predictions. We also propose to use training data generated from random piecewise constant input data to systematically capture the shock and rarefied solutions. From experiments using the LWR traffic flow model, we found superior accuracy in predicting the density dynamics of a ring-road network and urban signalized road. We also found that the operator can be trained using simple traffic density dynamics, e.g., consisting of $2-3$ vehicle queues and $1-2$ traffic signal cycles, and it can predict density dynamics for heterogeneous vehicle queue distributions and multiple traffic signal cycles $(\geq 2)$ with an acceptable error. The extrapolation error grew sub-linearly with input complexity for a proper choice of the model architecture and training data. Adding a physics regularizer aided in learning long-term traffic density dynamics, especially for problems with periodic boundary data.

As a surrogate for computationally intensive meso-scale simulation of woven composites, this article presents Recurrent Neural Network (RNN) models. Leveraging the power of transfer learning, the initialization challenges and sparse data issues inherent in cyclic shear strain loads are addressed in the RNN models. A mean-field model generates a comprehensive data set representing elasto-plastic behavior. In simulations, arbitrary six-dimensional strain histories are used to predict stresses under random walking as the source task and cyclic loading conditions as the target task. Incorporating sub-scale properties enhances RNN versatility. In order to achieve accurate predictions, the model uses a grid search method to tune network architecture and hyper-parameter configurations. The results of this study demonstrate that transfer learning can be used to effectively adapt the RNN to varying strain conditions, which establishes its potential as a useful tool for modeling path-dependent responses in woven composites.

We define and study a fully-convolutional neural network stochastic model, NN-Turb, which generates a 1-dimensional field with some turbulent velocity statistics. In particular, the generated process satisfies the Kolmogorov 2/3 law for second order structure function. It also presents negative skewness across scales (i.e. Kolmogorov 4/5 law) and exhibits intermittency as characterized by skewness and flatness. Furthermore, our model is never in contact with turbulent data and only needs the desired statistical behavior of the structure functions across scales for training.

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

北京阿比特科技有限公司