亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with over-squashing and serves as a justification for a class of methods that fall under `graph rewiring'.

相關內容

神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(Neural Networks)是(shi)世界上(shang)三(san)個最古老(lao)的(de)神(shen)(shen)經(jing)(jing)(jing)建模學(xue)(xue)(xue)會的(de)檔(dang)案(an)期刊:國際(ji)神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(INNS)、歐洲神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(ENNS)和日本神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)學(xue)(xue)(xue)會(JNNS)。神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)提供(gong)了(le)一個論壇,以發(fa)展和培育(yu)一個國際(ji)社會的(de)學(xue)(xue)(xue)者和實(shi)踐者感(gan)興趣的(de)所有(you)方面的(de)神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)和相關方法的(de)計(ji)算(suan)(suan)(suan)智(zhi)能。神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)歡迎(ying)高質量論文(wen)的(de)提交(jiao),有(you)助于全面的(de)神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)研究,從(cong)行為和大腦建模,學(xue)(xue)(xue)習(xi)算(suan)(suan)(suan)法,通過數學(xue)(xue)(xue)和計(ji)算(suan)(suan)(suan)分(fen)(fen)析,系(xi)統的(de)工(gong)程和技術應用(yong),大量使(shi)用(yong)神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)的(de)概(gai)念和技術。這一獨特而(er)廣泛的(de)范圍促進了(le)生(sheng)物(wu)(wu)和技術研究之間的(de)思想交(jiao)流,并有(you)助于促進對生(sheng)物(wu)(wu)啟發(fa)的(de)計(ji)算(suan)(suan)(suan)智(zhi)能感(gan)興趣的(de)跨學(xue)(xue)(xue)科社區的(de)發(fa)展。因此,神(shen)(shen)經(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)編(bian)委(wei)會代(dai)表(biao)的(de)專家領域(yu)包括(kuo)心理學(xue)(xue)(xue),神(shen)(shen)經(jing)(jing)(jing)生(sheng)物(wu)(wu)學(xue)(xue)(xue),計(ji)算(suan)(suan)(suan)機科學(xue)(xue)(xue),工(gong)程,數學(xue)(xue)(xue),物(wu)(wu)理。該雜(za)志(zhi)發(fa)表(biao)文(wen)章、信件(jian)和評(ping)論以及給編(bian)輯(ji)的(de)信件(jian)、社論、時事(shi)、軟件(jian)調查和專利信息。文(wen)章發(fa)表(biao)在五個部分(fen)(fen)之一:認知科學(xue)(xue)(xue),神(shen)(shen)經(jing)(jing)(jing)科學(xue)(xue)(xue),學(xue)(xue)(xue)習(xi)系(xi)統,數學(xue)(xue)(xue)和計(ji)算(suan)(suan)(suan)分(fen)(fen)析、工(gong)程和應用(yong)。 官網(wang)(wang)(wang)地址:

We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $\epsilon$ and their distance from initialization is $g(\epsilon)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=\Omega(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=\Omega(\log^4 (T))$. Moreover, with $m=\Omega(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression.

Beeping models are models for networks of weak devices, such as sensor networks or biological networks. In these networks, nodes are allowed to communicate only via emitting beeps: unary pulses of energy. Listening nodes only the capability of {\it carrier sensing}: they can only distinguish between the presence or absence of a beep, but receive no other information. The noisy beeping model further assumes listening nodes may be disrupted by random noise. Despite this extremely restrictive communication model, it transpires that complex distributed tasks can still be performed by such networks. In this paper we provide an optimal procedure for simulating general message passing in the beeping and noisy beeping models. We show that a round of \textsf{Broadcast CONGEST} can be simulated in $O(\Delta\log n)$ round of the noisy (or noiseless) beeping model, and a round of \textsf{CONGEST} can be simulated in $O(\Delta^2\log n)$ rounds (where $\Delta$ is the maximum degree of the network). We also prove lower bounds demonstrating that no simulation can use asymptotically fewer rounds. This allows a host of graph algorithms to be efficiently implemented in beeping models. As an example, we present an $O(\log n)$-round \textsf{Broadcast CONGEST} algorithm for maximal matching, which, when simulated using our method, immediately implies a near-optimal $O(\Delta \log^2 n)$-round maximal matching algorithm in the noisy beeping model.

The success of deep learning models depends on the size and quality of the dataset to solve certain tasks. Here, we explore how far generated data can aid real data in improving the performance of Neural Networks. In this work, we consider facial expression recognition since it requires challenging local data generation at the level of local regions such as mouth, eyebrows, etc, rather than simple augmentation. Generative Adversarial Networks (GANs) provide an alternative method for generating such local deformations but they need further validation. To answer our question, we consider noncomplex Convolutional Neural Networks (CNNs) based classifiers for recognizing Ekman emotions. For the data generation process, we consider generating facial expressions (FEs) by relying on two GANs. The first generates a random identity while the second imposes facial deformations on top of it. We consider training the CNN classifier using FEs from: real-faces, GANs-generated, and finally using a combination of real and GAN-generated faces. We determine an upper bound regarding the data generation quantity to be mixed with the real one which contributes the most to enhancing FER accuracy. In our experiments, we find out that 5-times more synthetic data to the real FEs dataset increases accuracy by 16%.

A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where $\lambda$ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be ``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the $(\eta,c)$-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds.

With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.

Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.

Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at //github.com/gitgiter/Graph-Adversarial-Learning.

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.

北京阿比特科技有限公司