亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Physics-informed neural networks (PINNs) have been proposed to learn the solution of partial differential equations (PDE). In PINNs, the residual form of the PDE of interest and its boundary conditions are lumped into a composite objective function as an unconstrained optimization problem, which is then used to train a deep feed-forward neural network. Here, we show that this specific way of formulating the objective function is the source of severe limitations in the PINN approach when applied to different kinds of PDEs. To address these limitations, we propose a versatile framework that can tackle both inverse and forward problems. The framework is adept at multi-fidelity data fusion and can seamlessly constrain the governing physics equations with proper initial and boundary conditions. The backbone of the proposed framework is a nonlinear, equality-constrained optimization problem formulation aimed at minimizing a loss functional, where an augmented Lagrangian method (ALM) is used to formally convert a constrained-optimization problem into an unconstrained-optimization problem. We implement the ALM within a stochastic, gradient-descent type training algorithm in a way that scrupulously focuses on meeting the constraints without sacrificing other loss terms. Additionally, as a modification of the original residual layers, we propose lean residual layers in our neural network architecture to address the so-called vanishing-gradient problem. We demonstrate the efficacy and versatility of our physics- and equality-constrained deep-learning framework by applying it to learn the solutions of various multi-dimensional PDEs, including a nonlinear inverse problem from the hydrology field with multi-fidelity data fusion. The results produced with our proposed model match exact solutions very closely for all the cases considered.

相關內容

神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)(Neural Networks)是(shi)世界上三個(ge)最(zui)古老的(de)(de)(de)(de)神(shen)(shen)經(jing)建模學(xue)(xue)(xue)(xue)會(hui)的(de)(de)(de)(de)檔案期(qi)刊:國際神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(INNS)、歐(ou)洲神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(ENNS)和日本(ben)神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)(xue)(xue)(xue)會(hui)(JNNS)。神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)提供了(le)一(yi)個(ge)論壇(tan),以發(fa)展(zhan)和培育(yu)一(yi)個(ge)國際社(she)會(hui)的(de)(de)(de)(de)學(xue)(xue)(xue)(xue)者和實踐者感興趣的(de)(de)(de)(de)所有方面的(de)(de)(de)(de)神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)和相關方法(fa)的(de)(de)(de)(de)計算(suan)智(zhi)能。神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)歡迎高質量(liang)論文(wen)的(de)(de)(de)(de)提交,有助(zhu)于(yu)全面的(de)(de)(de)(de)神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)研究,從行為和大(da)腦建模,學(xue)(xue)(xue)(xue)習(xi)算(suan)法(fa),通過數學(xue)(xue)(xue)(xue)和計算(suan)分析(xi),系統的(de)(de)(de)(de)工(gong)程和技(ji)術(shu)應用(yong),大(da)量(liang)使(shi)用(yong)神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)的(de)(de)(de)(de)概念(nian)和技(ji)術(shu)。這一(yi)獨特而廣(guang)泛的(de)(de)(de)(de)范圍(wei)促(cu)(cu)進了(le)生(sheng)物(wu)(wu)和技(ji)術(shu)研究之(zhi)間(jian)的(de)(de)(de)(de)思想交流,并有助(zhu)于(yu)促(cu)(cu)進對生(sheng)物(wu)(wu)啟發(fa)的(de)(de)(de)(de)計算(suan)智(zhi)能感興趣的(de)(de)(de)(de)跨(kua)學(xue)(xue)(xue)(xue)科社(she)區(qu)的(de)(de)(de)(de)發(fa)展(zhan)。因此(ci),神(shen)(shen)經(jing)網(wang)(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)編委會(hui)代表(biao)的(de)(de)(de)(de)專(zhuan)家領域包括(kuo)心理學(xue)(xue)(xue)(xue),神(shen)(shen)經(jing)生(sheng)物(wu)(wu)學(xue)(xue)(xue)(xue),計算(suan)機科學(xue)(xue)(xue)(xue),工(gong)程,數學(xue)(xue)(xue)(xue),物(wu)(wu)理。該雜(za)志發(fa)表(biao)文(wen)章(zhang)、信件(jian)和評論以及給編輯的(de)(de)(de)(de)信件(jian)、社(she)論、時事、軟件(jian)調查和專(zhuan)利信息。文(wen)章(zhang)發(fa)表(biao)在(zai)五個(ge)部分之(zhi)一(yi):認知(zhi)科學(xue)(xue)(xue)(xue),神(shen)(shen)經(jing)科學(xue)(xue)(xue)(xue),學(xue)(xue)(xue)(xue)習(xi)系統,數學(xue)(xue)(xue)(xue)和計算(suan)分析(xi)、工(gong)程和應用(yong)。 官網(wang)(wang)(wang)(wang)地址:

We propose a numerical method based on physics-informed Random Projection Neural Networks for the solution of Initial Value Problems (IVPs) of Ordinary Differential Equations (ODEs) with a focus on stiff problems. We address an Extreme Learning Machine with a single hidden layer with radial basis functions having as widths uniformly distributed random variables, while the values of the weights between the input and the hidden layer are set equal to one. The numerical solution of the IVPs is obtained by constructing a system of nonlinear algebraic equations, which is solved with respect to the output weights by the Gauss-Newton method, using a simple adaptive scheme for adjusting the time interval of integration. To assess its performance, we apply the proposed method for the solution of four benchmark stiff IVPs, namely the Prothero-Robinson, van der Pol, ROBER and HIRES problems. Our method is compared with an adaptive Runge-Kutta method based on the Dormand-Prince pair, and a variable-step variable-order multistep solver based on numerical differentiation formulas, as implemented in the \texttt{ode45} and \texttt{ode15s} MATLAB functions, respectively. We show that the proposed scheme yields good approximation accuracy, thus outperforming \texttt{ode45} and \texttt{ode15s}, especially in the cases where steep gradients arise. Furthermore, the computational times of our approach are comparable with those of the two MATLAB solvers for practical purposes.

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a >10% accuracy improvement over SGD-trained models for common model architectures on various datasets.

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines.

We introduce a compositional physics-aware neural network (FINN) for learning spatiotemporal advection-diffusion processes. FINN implements a new way of combining the learning abilities of artificial neural networks with physical and structural knowledge from numerical simulation by modeling the constituents of partial differential equations (PDEs) in a compositional manner. Results on both one- and two-dimensional PDEs (Burger's, diffusion-sorption, diffusion-reaction, Allen-Cahn) demonstrate FINN's superior modeling accuracy and excellent out-of-distribution generalization ability beyond initial and boundary conditions. With only one tenth of the number of parameters on average, FINN outperforms pure machine learning and other state-of-the-art physics-aware models in all cases -- often even by multiple orders of magnitude. Moreover, FINN outperforms a calibrated physical model when approximating sparse real-world data in a diffusion-sorption scenario, confirming its generalization abilities and showing explanatory potential by revealing the unknown retardation factor of the observed process.

Neural network-based methods for solving differential equations have been gaining traction. They work by improving the differential equation residuals of a neural network on a sample of points in each iteration. However, most of them employ standard sampling schemes like uniform or perturbing equally spaced points. We present a novel sampling scheme which samples points adversarially to maximize the loss of the current solution estimate. A sampler architecture is described along with the loss terms used for training. Finally, we demonstrate that this scheme outperforms pre-existing schemes by comparing both on a number of problems.

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs), a structured network with MLP modules, have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently diverse. Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. In the first stage, we sufficiently widen the deep thin network and train it until convergence. In the second stage, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by letting the thin network imitate the immediate outputs of the wide network from layer to layer. In the last stage, we further fine tune this well initialized deep thin network. The theoretical guarantee is established by using mean field analysis, which shows the advantage of layerwise imitation over traditional training deep thin networks from scratch by backpropagation. We also conduct large-scale empirical experiments to validate our approach. By training with our method, ResNet50 can outperform ResNet101, and BERT_BASE can be comparable with BERT_LARGE, where both the latter models are trained via the standard training procedures as in the literature.

Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

北京阿比特科技有限公司