亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Nonlinear parametric systems have been widely used in modeling nonlinear dynamics in science and engineering. Bifurcation analysis of these nonlinear systems on the parameter space are usually used to study the solution structure such as the number of solutions and the stability. In this paper, we develop a new machine learning approach to compute the bifurcations via so-called equation-driven neural networks (EDNNs). The EDNNs consist of a two-step optimization: the first step is to approximate the solution function of the parameter by training empirical solution data; the second step is to compute bifurcations by using the approximated neural network obtained in the first step. Both theoretical convergence analysis and numerical implementation on several examples have been performed to demonstrate the feasibility of the proposed method.

相關內容

神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)(Neural Networks)是世界上三個最古老的(de)神(shen)(shen)(shen)經(jing)(jing)建(jian)模(mo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)的(de)檔(dang)案期刊(kan):國際(ji)神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(INNS)、歐洲(zhou)神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(ENNS)和(he)(he)日本神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(JNNS)。神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)提供了(le)一個論壇,以發展(zhan)和(he)(he)培(pei)育一個國際(ji)社(she)會(hui)的(de)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)者和(he)(he)實踐者感興(xing)趣(qu)的(de)所有(you)方(fang)面的(de)神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)和(he)(he)相關方(fang)法的(de)計算(suan)智能(neng)。神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)歡(huan)迎高質量論文(wen)的(de)提交(jiao)(jiao),有(you)助于全面的(de)神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)研究(jiu),從行為(wei)和(he)(he)大(da)腦(nao)建(jian)模(mo),學(xue)(xue)(xue)(xue)(xue)(xue)(xue)習算(suan)法,通過(guo)數學(xue)(xue)(xue)(xue)(xue)(xue)(xue)和(he)(he)計算(suan)分析(xi),系(xi)統的(de)工程(cheng)(cheng)和(he)(he)技術應用(yong),大(da)量使用(yong)神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)的(de)概(gai)念和(he)(he)技術。這(zhe)一獨特(te)而廣泛的(de)范圍促進了(le)生物(wu)(wu)和(he)(he)技術研究(jiu)之間的(de)思想交(jiao)(jiao)流,并有(you)助于促進對生物(wu)(wu)啟發的(de)計算(suan)智能(neng)感興(xing)趣(qu)的(de)跨學(xue)(xue)(xue)(xue)(xue)(xue)(xue)科(ke)(ke)社(she)區的(de)發展(zhan)。因此,神(shen)(shen)(shen)經(jing)(jing)網(wang)絡(luo)(luo)(luo)編委會(hui)代表的(de)專(zhuan)家領域包括心理學(xue)(xue)(xue)(xue)(xue)(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)生物(wu)(wu)學(xue)(xue)(xue)(xue)(xue)(xue)(xue),計算(suan)機科(ke)(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue),工程(cheng)(cheng),數學(xue)(xue)(xue)(xue)(xue)(xue)(xue),物(wu)(wu)理。該雜志(zhi)發表文(wen)章、信(xin)件和(he)(he)評論以及給編輯的(de)信(xin)件、社(she)論、時事、軟件調(diao)查(cha)和(he)(he)專(zhuan)利信(xin)息。文(wen)章發表在五個部分之一:認知科(ke)(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)科(ke)(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue),學(xue)(xue)(xue)(xue)(xue)(xue)(xue)習系(xi)統,數學(xue)(xue)(xue)(xue)(xue)(xue)(xue)和(he)(he)計算(suan)分析(xi)、工程(cheng)(cheng)和(he)(he)應用(yong)。 官網(wang)地址:

Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE). More precisely, we give nonasymptotic expected $l^2$-distance bounds between the LSE and the true regression function, where expectation is evaluated on a fresh, counterfactual, trajectory. We leverage recently developed information-theoretic methods to establish the optimality of the LSE for nonparametric hypotheses classes in terms of supremum norm metric entropy and a subgaussian parameter. Next, we relate this subgaussian parameter to the stability of the underlying process using notions from dynamical systems theory. When combined, these developments lead to rate-optimal error bounds that scale as $T^{-1/(2+q)}$ for suitably stable processes and hypothesis classes with metric entropy growth of order $\delta^{-q}$. Here, $T$ is the length of the observed trajectory, $\delta \in \mathbb{R}_+$ is the packing granularity and $q\in (0,2)$ is a complexity term. Finally, we specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS).

As they have a vital effect on social decision-making, AI algorithms should be not only accurate but also fair. Among various algorithms for fairness AI, learning fair representation (LFR), whose goal is to find a fair representation with respect to sensitive variables such as gender and race, has received much attention. For LFR, the adversarial training scheme is popularly employed as is done in the generative adversarial network type algorithms. The choice of a discriminator, however, is done heuristically without justification. In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. The most notable result of the proposed LFR algorithm is its theoretical guarantee about the fairness of the final prediction model, which has not been considered yet. That is, we derive theoretical relations between the fairness of representation and the fairness of the prediction model built on the top of the representation (i.e., using the representation as the input). Moreover, by numerical experiments, we show that our proposed LFR algorithm is computationally lighter and more stable, and the final prediction model is competitive or superior to other LFR algorithms using more complex discriminators.

Despite their overwhelming capacity to overfit, deep neural networks trained by specific optimization algorithms tend to generalize well to unseen data. Recently, researchers explained it by investigating the implicit regularization effect of optimization algorithms. A remarkable progress is the work (Lyu&Li, 2019), which proves gradient descent (GD) maximizes the margin of homogeneous deep neural networks. Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process. However, theoretical guarantee for the generalization of adaptive optimization algorithms is still lacking. In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neural networks. We prove that adaptive algorithms that adopt exponential moving average strategy in conditioner (such as Adam and RMSProp) can maximize the margin of the neural network, while AdaGrad that directly sums historical squared gradients in conditioner can not. It indicates superiority on generalization of exponential moving average strategy in the design of the conditioner. Technically, we provide a unified framework to analyze convergent direction of adaptive optimization algorithms by constructing novel adaptive gradient flow and surrogate margin. Our experiments can well support the theoretical findings on convergent direction of adaptive optimization algorithms.

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

Few-shot Learning aims to learn classifiers for new classes with only a few training examples per class. Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels. The meta-learning approaches train a meta learner to predict weights of homogeneous-structured task-specific networks, requiring a uniform number of classes across tasks. The metric-learning approaches learn one task-invariant metric for all the tasks, and they fail if the tasks diverge. We propose to deal with these limitations with meta metric learning. Our meta metric learning approach consists of task-specific learners, that exploit metric learning to handle flexible labels, and a meta learner, that discovers good parameters and gradient decent to specify the metrics in task-specific learners. Thus the proposed model is able to handle unbalanced classes as well as to generate task-specific metrics. We test our approach in the `$k$-shot $N$-way' few-shot learning setting used in previous work and new realistic few-shot setting with diverse multi-domain tasks and flexible label numbers. Experiments show that our approach attains superior performances in both settings.

Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

北京阿比特科技有限公司